Hmm :( Even for an Active/Passive configuration? I'm guessing we will
need to do something with Pacemaker in the meantime?
On Wed, Aug 9, 2017 at 12:37 PM, Jason Dillaman wrote:
> I can probably say that it won't work out-of-the-gate for Hyper-V
> since it most likely
Hi Jason,
Thank you so much for all of the information. This really provides some
good insight on the integration of iSCSI with LIO. Lets hope that kernel
folks can work fast ahah
Sam
On Wed, Aug 9, 2017 at 12:48 PM, Jason Dillaman wrote:
> Yeah -- the issue is that if
Curious if there is a method way I could see in near real-time the io
patters for an fs. For instance, what files are currently being
read/written and the block sizes. I suspect this is a big ask. The only
thing I know of that can provide that level of detail for a filesystem is
dtrace with zfs.
Thank you for comment.
I can understand what you mean.
When one osd goes down, the osd has many PGs through whole ceph cluster
nodes, so each nodes can have one backfill/recovery per osd and ceph
culster shows many backfills/recoverys.
The other side, When one osd goes up, the osd needs to copy
osd_max_backfills is a setting per osd. With that set to 1, each osd will
only be involved in a single backfill/recovery at the same time. However
the cluster as a whole will have as many backfills as it can while each osd
is only involved in 1 each.
On Wed, Aug 9, 2017 at 10:58 PM 하현
Hi,
I’m trying to deploy an Openstack with Openstack Kolla. With Kolla I can easily
deploy most Openstack components and ceph by containers. I wander if there is
any reliability or performance issue with container/docker?
Thank you!
Xu Yun
___
Hello,
ceph version 0.94.10 (b1e0532418e4631af01acbc0cedd426f1905f4af)
We had a few problems related to the simple operation of replacing a
failed OSD, and some clarification would be appreciated. It is not
very simple to observe what specifically happened (the timeline was
gathered from half a
Hi,
I recently had a mds outage beucase the mds suicided due to "dne in the mds
map".
I've asked it here before and I know that happens because the monitors took
out this mds from the mds map even though it was alive.
Weird thing, there was no network related issues happening at the time,
which
Yeah -- the issue is that if nodeA is the active path and Windows
issues some PRs, then if nodeA fails and nodeB is promoted to the
active path, those PRs won't exist and Windows will balk and fail the
device. I've seen some posts online w/ folks writing custom pacemaker
resource scripts to try to
Hi Sam,
Pacemaker will take care of HA failover but you will need to progagate
the PR data yourself.
If you are interested in a solution that works out of the box with
Windows, have a look at PetaSAN
www.petasan.org
It works well with MS hyper-v/storage spaces/Scale Out File Server.
Cheers
I just hit this too, and found it was fixed in master, so generated a
backport issue & PR:
http://tracker.ceph.com/issues/20966
https://github.com/ceph/ceph/pull/16952
--
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Trustee & Treasurer
E-Mail : robb...@gentoo.org
GnuPG FP :
On Wed, Aug 9, 2017 at 11:42 PM, Timothy Wolgemuth
wrote:
> Here is the output:
>
> [ceph-deploy@ceph01 my-cluster]$ sudo /usr/bin/ceph --connect-timeout=25
> --cluster=ceph --name mon. --keyring=/var/lib/ceph/mon/ceph-ceph01/keyring
> auth get client.admin
> 2017-08-09
Hi ceph experts.
I confused when set limitation of osd max backfills.
When osd down recovery occuerred, and osd up is same.
I want to set limitation for backfills to 1.
So, I set config as below.
# ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok config show|egrep
Here is the output:
[ceph-deploy@ceph01 my-cluster]$ sudo /usr/bin/ceph --connect-timeout=25
--cluster=ceph --name mon. --keyring=/var/lib/ceph/mon/ceph-ceph01/keyring
auth get client.admin
2017-08-09 09:07:00.519683 7f389700 0 -- :/1582396262 >>
192.168.100.11:6789/0 pipe(0x7efffc0617c0
Hello David,
On Wed, Aug 9, 2017 at 3:08 PM, David Turner wrote:
> When exactly is the timeline of when the io error happened?
The timeline was included in the email, hour:min:sec resolution. I
spared millisecs since it doesn't really change things.
> If the primary
>
Hi David,
thanks for your feedback.
With that in mind, I did rm a 15TB RBD Pool about 1 hour or so before this
had happened.
I wouldn't think it would be related to this because there was nothing
different going on after I removed it. Not even high system load.
But considering what you sid, I
When exactly is the timeline of when the io error happened? If the primary
osd was dead, but not marked down in the cluster yet, then the cluster
would sit there and expect that osd too respond. If this definitely
happened after the primary osd was marked down, then it's a different story.
I'm
This is for osd.0 (more below)
bluestore(/var/lib/ceph/osd/ceph-0) _verify_csum bad crc32c/0x1000
checksum at blob offset 0x0, got 0x1a128a93, expected 0x90407f75, device
location [0x5826c1~1000], logical extent 0x0~1000
bluestore(/var/lib/ceph/osd/ceph-12) _verify_csum bad crc32c/0x1000
I just want to point out that there are many different types of network
issues that don't involve entire networks. Bad nic, bad/loose cable, a
service on a server restarting our modifying the network stack, etc.
That said there are other things that can prevent an mds service, or any
service from
19 matches
Mail list logo