Re: [ceph-users] Iscsi configuration

2017-08-09 Thread Samuel Soulard
Hmm :( Even for an Active/Passive configuration? I'm guessing we will need to do something with Pacemaker in the meantime? On Wed, Aug 9, 2017 at 12:37 PM, Jason Dillaman wrote: > I can probably say that it won't work out-of-the-gate for Hyper-V > since it most likely

Re: [ceph-users] Iscsi configuration

2017-08-09 Thread Samuel Soulard
Hi Jason, Thank you so much for all of the information. This really provides some good insight on the integration of iSCSI with LIO. Lets hope that kernel folks can work fast ahah Sam On Wed, Aug 9, 2017 at 12:48 PM, Jason Dillaman wrote: > Yeah -- the issue is that if

[ceph-users] Cephfs IO monitoring

2017-08-09 Thread Brady Deetz
Curious if there is a method way I could see in near real-time the io patters for an fs. For instance, what files are currently being read/written and the block sizes. I suspect this is a big ask. The only thing I know of that can provide that level of detail for a filesystem is dtrace with zfs.

Re: [ceph-users] osd backfills and recovery limit issue

2017-08-09 Thread Hyun Ha
Thank you for comment. I can understand what you mean. When one osd goes down, the osd has many PGs through whole ceph cluster nodes, so each nodes can have one backfill/recovery per osd and ceph culster shows many backfills/recoverys. The other side, When one osd goes up, the osd needs to copy

Re: [ceph-users] osd backfills and recovery limit issue

2017-08-09 Thread David Turner
osd_max_backfills is a setting per osd. With that set to 1, each osd will only be involved in a single backfill/recovery at the same time. However the cluster as a whole will have as many backfills as it can while each osd is only involved in 1 each. On Wed, Aug 9, 2017 at 10:58 PM 하현

[ceph-users] Container deployment

2017-08-09 Thread 徐蕴
Hi, I’m trying to deploy an Openstack with Openstack Kolla. With Kolla I can easily deploy most Openstack components and ceph by containers. I wander if there is any reliability or performance issue with container/docker? Thank you! Xu Yun ___

[ceph-users] IO Error reaching client when primary osd get funky but secondaries are ok

2017-08-09 Thread Peter Gervai
Hello, ceph version 0.94.10 (b1e0532418e4631af01acbc0cedd426f1905f4af) We had a few problems related to the simple operation of replacing a failed OSD, and some clarification would be appreciated. It is not very simple to observe what specifically happened (the timeline was gathered from half a

[ceph-users] lease_timeout - new election

2017-08-09 Thread Webert de Souza Lima
Hi, I recently had a mds outage beucase the mds suicided due to "dne in the mds map". I've asked it here before and I know that happens because the monitors took out this mds from the mds map even though it was alive. Weird thing, there was no network related issues happening at the time, which

Re: [ceph-users] Iscsi configuration

2017-08-09 Thread Jason Dillaman
Yeah -- the issue is that if nodeA is the active path and Windows issues some PRs, then if nodeA fails and nodeB is promoted to the active path, those PRs won't exist and Windows will balk and fail the device. I've seen some posts online w/ folks writing custom pacemaker resource scripts to try to

Re: [ceph-users] Iscsi configuration

2017-08-09 Thread Maged Mokhtar
Hi Sam, Pacemaker will take care of HA failover but you will need to progagate the PR data yourself. If you are interested in a solution that works out of the box with Windows, have a look at PetaSAN www.petasan.org It works well with MS hyper-v/storage spaces/Scale Out File Server. Cheers

Re: [ceph-users] jewel - radosgw-admin bucket limit check broken?

2017-08-09 Thread Robin H. Johnson
I just hit this too, and found it was fixed in master, so generated a backport issue & PR: http://tracker.ceph.com/issues/20966 https://github.com/ceph/ceph/pull/16952 -- Robin Hugh Johnson Gentoo Linux: Dev, Infra Lead, Foundation Trustee & Treasurer E-Mail : robb...@gentoo.org GnuPG FP :

Re: [ceph-users] New install error

2017-08-09 Thread Brad Hubbard
On Wed, Aug 9, 2017 at 11:42 PM, Timothy Wolgemuth wrote: > Here is the output: > > [ceph-deploy@ceph01 my-cluster]$ sudo /usr/bin/ceph --connect-timeout=25 > --cluster=ceph --name mon. --keyring=/var/lib/ceph/mon/ceph-ceph01/keyring > auth get client.admin > 2017-08-09

[ceph-users] osd backfills and recovery limit issue

2017-08-09 Thread 하현
Hi ceph experts. I confused when set limitation of osd max backfills. When osd down recovery occuerred, and osd up is same. I want to set limitation for backfills to 1. So, I set config as below. # ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok config show|egrep

Re: [ceph-users] New install error

2017-08-09 Thread Timothy Wolgemuth
Here is the output: [ceph-deploy@ceph01 my-cluster]$ sudo /usr/bin/ceph --connect-timeout=25 --cluster=ceph --name mon. --keyring=/var/lib/ceph/mon/ceph-ceph01/keyring auth get client.admin 2017-08-09 09:07:00.519683 7f389700 0 -- :/1582396262 >> 192.168.100.11:6789/0 pipe(0x7efffc0617c0

Re: [ceph-users] IO Error reaching client when primary osd get funky but secondaries are ok

2017-08-09 Thread Peter Gervai
Hello David, On Wed, Aug 9, 2017 at 3:08 PM, David Turner wrote: > When exactly is the timeline of when the io error happened? The timeline was included in the email, hour:min:sec resolution. I spared millisecs since it doesn't really change things. > If the primary >

Re: [ceph-users] lease_timeout - new election

2017-08-09 Thread Webert de Souza Lima
Hi David, thanks for your feedback. With that in mind, I did rm a 15TB RBD Pool about 1 hour or so before this had happened. I wouldn't think it would be related to this because there was nothing different going on after I removed it. Not even high system load. But considering what you sid, I

Re: [ceph-users] IO Error reaching client when primary osd get funky but secondaries are ok

2017-08-09 Thread David Turner
When exactly is the timeline of when the io error happened? If the primary osd was dead, but not marked down in the cluster yet, then the cluster would sit there and expect that osd too respond. If this definitely happened after the primary osd was marked down, then it's a different story. I'm

Re: [ceph-users] Pg inconsistent / export_files error -5

2017-08-09 Thread Marc Roos
This is for osd.0 (more below) bluestore(/var/lib/ceph/osd/ceph-0) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0x1a128a93, expected 0x90407f75, device location [0x5826c1~1000], logical extent 0x0~1000 bluestore(/var/lib/ceph/osd/ceph-12) _verify_csum bad crc32c/0x1000

Re: [ceph-users] lease_timeout - new election

2017-08-09 Thread David Turner
I just want to point out that there are many different types of network issues that don't involve entire networks. Bad nic, bad/loose cable, a service on a server restarting our modifying the network stack, etc. That said there are other things that can prevent an mds service, or any service from