Re: [ceph-users] rbd iscsi gateway question
On Mon, 2017-04-10 at 12:13 -0500, Mike Christie wrote: > > > LIO-TCMU+librbd-iscsi [1] [2] looks really promising and seams to > > be the > > way to go. It would be great if somebody as insight about the > > maturity > > of the project, is it ready for testing purposes ? > > > > It is not mature yet. You can do IO to a rbd image, but it currently > does a queue depth of only 1. > > We are in the process of merging patches from a couple branches to > add > rbd aio support, failover/failback across gateways, perf > improvements, > and lots of bug fixes. With them, linux works well, and we are > working > on a couple windows bugs. > > For ESX, we are hoping to be ready around the end of summer. You > should > not use ESX with tcmu/tcmu-runner right now, because several commands > are not implemented or implemented incorrectly for ESX. Thanks Mike, much appreciated. Any pointers/URL to stay informed about the progress ? Cheers, Cédric ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd iscsi gateway question
On 04/10/2017 01:21 PM, Timofey Titovets wrote: > JFYI: Today we get totaly stable setup Ceph + ESXi "without hacks" and > this pass stress tests. > > 1. Don't try pass RBD directly to LIO, this setup are unstable > 2. Instead of that, use Qemu + KVM (i use proxmox for that create VM) > 3. Attach RBD to VM as VIRTIO-SCSI disk (must be exported by > target_core_iblock) I think you avoid the hung command problem, because lio uses the local/initiator side scsi layer to send commands to the virtio-scsi device which has timeouts similar to ESX. They will timeout and fire the virtio-scsi error handler, and commands will not just hang. I think you can now do something similar with Ilya's patch and use krbd directly with target_core_iblock: https://www.spinics.net/lists/ceph-devel/msg35618.html > 4. Make a LIO Target in VM > 4.1 Sync Iniciator (ESXi) and target (LIO) options (best change Target > options) > 4.2 You can enable almost all VAAI (also emulate_tpu=1, emulate_tpws=1) > 4.3 For performance reason use noop on RBD disk in VM and set > is_nonrot=1 (disable ESXi sheduller) > 5. ESXi are "stupid" and have a problem with CAS on LIO (and some > other storage vendors (google for info)), so for stable working > without disconects of LUN set VMFS3.UseATSForHBOnVMFS5 to ZERO on All > ESXi that use this lun. > 6. Don't try make Target HA (not tested but i think you will catch > problems with VMFS), you must do something like VM HA for that. > Yes, the problem is for HA where commands need to be cleaned up before they are retried through different GWs/paths, so one command is not racing with the retry and new commands. > This setup tested with latest ESXi and VMFS6. > > Thanks. > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd iscsi gateway question
JFYI: Today we get totaly stable setup Ceph + ESXi "without hacks" and this pass stress tests. 1. Don't try pass RBD directly to LIO, this setup are unstable 2. Instead of that, use Qemu + KVM (i use proxmox for that create VM) 3. Attach RBD to VM as VIRTIO-SCSI disk (must be exported by target_core_iblock) 4. Make a LIO Target in VM 4.1 Sync Iniciator (ESXi) and target (LIO) options (best change Target options) 4.2 You can enable almost all VAAI (also emulate_tpu=1, emulate_tpws=1) 4.3 For performance reason use noop on RBD disk in VM and set is_nonrot=1 (disable ESXi sheduller) 5. ESXi are "stupid" and have a problem with CAS on LIO (and some other storage vendors (google for info)), so for stable working without disconects of LUN set VMFS3.UseATSForHBOnVMFS5 to ZERO on All ESXi that use this lun. 6. Don't try make Target HA (not tested but i think you will catch problems with VMFS), you must do something like VM HA for that. This setup tested with latest ESXi and VMFS6. Thanks. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd iscsi gateway question
On 04/06/2017 08:46 AM, David Disseldorp wrote: > On Thu, 6 Apr 2017 14:27:01 +0100, Nick Fisk wrote: > ... >>> I'm not to sure what you're referring to WRT the spiral of death, but we did >>> patch some LIO issues encountered when a command was aborted while >>> outstanding at the LIO backstore layer. >>> These specific fixes are carried in the mainline kernel, and can be tested >>> using the AbortTaskSimpleAsync libiscsi test. >> >> Awesome, glad this has finally been fixed. Death spiral was referring to >> when using it with ESXi, both the initiator and target effectively hang >> forever and if you didn't catch it soon enough, sometimes you end up having >> to kill all vm's and reboot hosts. > > Sounds like it could be the same thing. Stale iSCSI sessions remain > around which block subsequent login attempts. > >> Do you know what kernel version these changes would have first gone into? I >> thought I looked back into this last summer and it was still showing the >> same behavior. > > The fix I was referring to is: > commit 5e2c956b8aa24d4f33ff7afef92d409eed164746 > Author: Nicholas Bellinger> Date: Wed May 25 12:25:04 2016 -0700 > > target: Fix missing complete during ABORT_TASK + CMD_T_FABRIC_STOP > > It's carried in v4.8+ and was also flagged for 3.14+ stable inclusion, > so should be present in many distro kernels by now. That said, there > have been many other changes in this area. > I think we can still hit the issue with this patch. The general problem is handling commands that are going to take longer than the initiator side's error handler. ESX will end up marking the VM/storage as failed and the user has to manually intervene. It is similar to linux where a /dev/sdX is marked offline, and the user has to then manually online it and restart layers above it. So we should root cause the reason for commands taking so long. If it is just a normal case, then to handle this issue in a more generic way for all initiators, Nick suggested to implement a target side timeout: https://www.spinics.net/lists/target-devel/msg14780.html In tcmu-runner we could then abort/kill the command based on a timer there and then fail the command before the ESX timers fire. The difficult part is of course aborting a running rbd command. Note that you can currently set the tcmu timeout: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/target/target_core_user.c?id=7d7a743543905a8297dce53b36e793e5307da5d7 discussed in that thread and you will avoid the problem, but there is no code to stop the running command in tcmu-runner, so it would not be safe in some setups. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd iscsi gateway question
On 04/06/2017 03:22 AM, yipik...@gmail.com wrote: > On 06/04/2017 09:42, Nick Fisk wrote: >> >> I assume Brady is referring to the death spiral LIO gets into with >> some initiators, including vmware, if an IO takes longer than about >> 10s. I haven’t heard of anything, and can’t see any changes, so I >> would assume this issue still remains. >> >> >> >> I would look at either SCST or NFS for now. >> > LIO-TCMU+librbd-iscsi [1] [2] looks really promising and seams to be the > way to go. It would be great if somebody as insight about the maturity > of the project, is it ready for testing purposes ? > It is not mature yet. You can do IO to a rbd image, but it currently does a queue depth of only 1. We are in the process of merging patches from a couple branches to add rbd aio support, failover/failback across gateways, perf improvements, and lots of bug fixes. With them, linux works well, and we are working on a couple windows bugs. For ESX, we are hoping to be ready around the end of summer. You should not use ESX with tcmu/tcmu-runner right now, because several commands are not implemented or implemented incorrectly for ESX. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd iscsi gateway question
On Thu, 6 Apr 2017 14:27:01 +0100, Nick Fisk wrote: ... > > I'm not to sure what you're referring to WRT the spiral of death, but we did > > patch some LIO issues encountered when a command was aborted while > > outstanding at the LIO backstore layer. > > These specific fixes are carried in the mainline kernel, and can be tested > > using the AbortTaskSimpleAsync libiscsi test. > > Awesome, glad this has finally been fixed. Death spiral was referring to when > using it with ESXi, both the initiator and target effectively hang forever > and if you didn't catch it soon enough, sometimes you end up having to kill > all vm's and reboot hosts. Sounds like it could be the same thing. Stale iSCSI sessions remain around which block subsequent login attempts. > Do you know what kernel version these changes would have first gone into? I > thought I looked back into this last summer and it was still showing the same > behavior. The fix I was referring to is: commit 5e2c956b8aa24d4f33ff7afef92d409eed164746 Author: Nicholas BellingerDate: Wed May 25 12:25:04 2016 -0700 target: Fix missing complete during ABORT_TASK + CMD_T_FABRIC_STOP It's carried in v4.8+ and was also flagged for 3.14+ stable inclusion, so should be present in many distro kernels by now. That said, there have been many other changes in this area. Cheers, David ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd iscsi gateway question
We were beta till early Feb. so we are relatively young. If there are issues/bugs, we'd certainly be interested to know through our forum. Note that with us you can always use the cli and bypass the UI and it will be straight Ceph/LIO commands if you wish. From: Brady Deetz Sent: Thursday, April 06, 2017 3:21 PM To: ceph-users Subject: Re: [ceph-users] rbd iscsi gateway question I appreciate everybody's responses here. I remember the announcement of Petasan a whole back on here and some concerns about it. Is anybody using it in production yet? On Apr 5, 2017 9:58 PM, "Brady Deetz" <bde...@gmail.com> wrote: I apologize if this is a duplicate of something recent, but I'm not finding much. Does the issue still exist where dropping an OSD results in a LUN's I/O hanging? I'm attempting to determine if I have to move off of VMWare in order to safely use Ceph as my VM storage. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd iscsi gateway question
> -Original Message- > From: David Disseldorp [mailto:dd...@suse.de] > Sent: 06 April 2017 14:06 > To: Nick Fisk <n...@fisk.me.uk> > Cc: 'Maged Mokhtar' <mmokh...@petasan.org>; 'Brady Deetz' > <bde...@gmail.com>; 'ceph-users' <ceph-us...@ceph.com> > Subject: Re: [ceph-users] rbd iscsi gateway question > > X-Assp-URIBL failed: 'suse.de'(black.uribl.com ) > X-Assp-Spam-Level: * > X-Assp-Envelope-From: dd...@suse.de > X-Assp-Intended-For: n...@fisk.me.uk > X-Assp-ID: ASSP.fisk.me.uk (49148-08075) > X-Assp-Version: 1.9.1.4(1.0.00) > > Hi, > > On Thu, 6 Apr 2017 13:31:00 +0100, Nick Fisk wrote: > > > > I believe there > > > was a request to include it mainstream kernel but it did not happen, > > > probably waiting for TCMU solution which will be better/cleaner design. > > Indeed, we're proceeding with TCMU as a future upstream acceptable > implementation. > > > Yes, should have mentioned this, if you are using the suse kernel, > > they have a fix for this spiral of death problem. > > I'm not to sure what you're referring to WRT the spiral of death, but we did > patch some LIO issues encountered when a command was aborted while > outstanding at the LIO backstore layer. > These specific fixes are carried in the mainline kernel, and can be tested > using the AbortTaskSimpleAsync libiscsi test. Awesome, glad this has finally been fixed. Death spiral was referring to when using it with ESXi, both the initiator and target effectively hang forever and if you didn't catch it soon enough, sometimes you end up having to kill all vm's and reboot hosts. Do you know what kernel version these changes would have first gone into? I thought I looked back into this last summer and it was still showing the same behavior. > > Cheers, David ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd iscsi gateway question
I appreciate everybody's responses here. I remember the announcement of Petasan a whole back on here and some concerns about it. Is anybody using it in production yet? On Apr 5, 2017 9:58 PM, "Brady Deetz"wrote: > I apologize if this is a duplicate of something recent, but I'm not > finding much. Does the issue still exist where dropping an OSD results in a > LUN's I/O hanging? > > I'm attempting to determine if I have to move off of VMWare in order to > safely use Ceph as my VM storage. > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd iscsi gateway question
Hi, On Thu, 6 Apr 2017 13:31:00 +0100, Nick Fisk wrote: > > I believe there > > was a request to include it mainstream kernel but it did not happen, > > probably waiting for TCMU solution which will be better/cleaner design. Indeed, we're proceeding with TCMU as a future upstream acceptable implementation. > Yes, should have mentioned this, if you are using the suse kernel, they have > a fix for this spiral of death problem. I'm not to sure what you're referring to WRT the spiral of death, but we did patch some LIO issues encountered when a command was aborted while outstanding at the LIO backstore layer. These specific fixes are carried in the mainline kernel, and can be tested using the AbortTaskSimpleAsync libiscsi test. Cheers, David ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd iscsi gateway question
> -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Maged Mokhtar > Sent: 06 April 2017 12:21 > To: Brady Deetz <bde...@gmail.com>; ceph-users <ceph-us...@ceph.com> > Subject: Re: [ceph-users] rbd iscsi gateway question > > The io hang (it is actually a pause not hang) is done by Ceph only in case of a > simultaneous failure of 2 hosts or 2 osds on separate hosts. A single host/osd > being out will not cause this. In PetaSAN project www.petasan.org we use > LIO/krbd. We have done a lot of tests on VMWare, in case of io failure, the io > will block for approx 30s on the VMWare ESX (default timeout, but can be > configured) then it will resume on the other MPIO path. > > We are using a custom LIO/kernel upstreamed from SLE 12 used in their > enterprise storage offering, it supports direct rbd backstore. I believe there > was a request to include it mainstream kernel but it did not happen, > probably waiting for TCMU solution which will be better/cleaner design. Yes, should have mentioned this, if you are using the suse kernel, they have a fix for this spiral of death problem. Any other distribution or vanilla kernel, will hang if a Ceph IO takes longer than about 5-10s. It's the path failure bit which is the problem, LIO tries to abort the IO, but RBD doesn't support this yet. > > Cheers /maged > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd iscsi gateway question
The io hang (it is actually a pause not hang) is done by Ceph only in case of a simultaneous failure of 2 hosts or 2 osds on separate hosts. A single host/osd being out will not cause this. In PetaSAN project www.petasan.org we use LIO/krbd. We have done a lot of tests on VMWare, in case of io failure, the io will block for approx 30s on the VMWare ESX (default timeout, but can be configured) then it will resume on the other MPIO path. We are using a custom LIO/kernel upstreamed from SLE 12 used in their enterprise storage offering, it supports direct rbd backstore. I believe there was a request to include it mainstream kernel but it did not happen, probably waiting for TCMU solution which will be better/cleaner design. Cheers /maged ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd iscsi gateway question
> On 6 Apr 2017, at 08:42, Nick Fiskwrote: > > I assume Brady is referring to the death spiral LIO gets into with some > initiators, including vmware, if an IO takes longer than about 10s. We have occasionally seen this issue with vmware+LIO, almost always when upgrading OSD nodes. Didn’t realise it was a known issue! Apart from that, though, we've found LIO generally to be far more performant and stable (especially in our multipathing setup) so would like to stick with it if possible. I’m wondering, are there any additional steps we should be taking to minimise the risk of LIO timeouts during upgrades? At the moment, we set the cluster to “noout”, stop the node’s services, upgrade the packages and reboot. For instance, is there a way to drain connections from clients to a particular node before shutting down its OSDs? Thanks, Oliver. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd iscsi gateway question
On 06/04/2017 09:42, Nick Fisk wrote: > > I assume Brady is referring to the death spiral LIO gets into with > some initiators, including vmware, if an IO takes longer than about > 10s. I haven’t heard of anything, and can’t see any changes, so I > would assume this issue still remains. > > > > I would look at either SCST or NFS for now. > LIO-TCMU+librbd-iscsi [1] [2] looks really promising and seams to be the way to go. It would be great if somebody as insight about the maturity of the project, is it ready for testing purposes ? Cheers Cédric [1] https://ceph.com/planet/ceph-rbd-and-iscsi/ [2] https://github.com/open-iscsi/tcmu-runner > > > > *From:*ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On > Behalf Of *Adrian Saul > *Sent:* 06 April 2017 05:32 > *To:* Brady Deetz <bde...@gmail.com>; ceph-users <ceph-us...@ceph.com> > *Subject:* Re: [ceph-users] rbd iscsi gateway question > > > > > > I am not sure if there is a hard and fast rule you are after, but > pretty much anything that would cause ceph transactions to be blocked > (flapping OSD, network loss, hung host) has the potential to block RBD > IO which would cause your iSCSI LUNs to become unresponsive for that > period. > > > > For the most part though, once that condition clears things keep > working, so its not like a hang where you need to reboot to clear it. > Some situations we have hit with our setup: > > > > * Failed OSDs (dead disks) – no issues > * Cluster rebalancing – ok if throttled back to keep service times down > * Network packet loss (bad fibre) – painful, broken communication > everywhere, caused a krbd hang needing a reboot > * RBD Snapshot deletion – disk latency through roof, cluster > unresponsive for minutes at a time, won’t do again. > > > > > > > > *From:*ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On > Behalf Of *Brady Deetz > *Sent:* Thursday, 6 April 2017 12:58 PM > *To:* ceph-users > *Subject:* [ceph-users] rbd iscsi gateway question > > > > I apologize if this is a duplicate of something recent, but I'm not > finding much. Does the issue still exist where dropping an OSD results > in a LUN's I/O hanging? > > > > I'm attempting to determine if I have to move off of VMWare in order > to safely use Ceph as my VM storage. > > Confidentiality: This email and any attachments are confidential and > may be subject to copyright, legal or some other professional > privilege. They are intended solely for the attention and use of the > named addressee(s). They may only be copied, distributed or disclosed > with the consent of the copyright owner. If you have received this > email by mistake or by breach of the confidentiality clause, please > notify the sender immediately by return email and delete or destroy > all copies of the email. Any confidentiality, privilege or copyright > is not waived or lost because this email has been sent to you by mistake. > > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd iscsi gateway question
In my case I am using SCST, so that is what my experience is based on. For our VMware we are using NFS, but for Hyper-V and Solaris we are using iSCSI. There is actually some work done to make userland SCST which could be interesting for making a scst_librbd integration that bypasses the need for krbd. From: Nick Fisk [mailto:n...@fisk.me.uk] Sent: Thursday, 6 April 2017 5:43 PM To: Adrian Saul; 'Brady Deetz'; 'ceph-users' Subject: RE: [ceph-users] rbd iscsi gateway question I assume Brady is referring to the death spiral LIO gets into with some initiators, including vmware, if an IO takes longer than about 10s. I haven’t heard of anything, and can’t see any changes, so I would assume this issue still remains. I would look at either SCST or NFS for now. From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Adrian Saul Sent: 06 April 2017 05:32 To: Brady Deetz <bde...@gmail.com>; ceph-users <ceph-us...@ceph.com> Subject: Re: [ceph-users] rbd iscsi gateway question I am not sure if there is a hard and fast rule you are after, but pretty much anything that would cause ceph transactions to be blocked (flapping OSD, network loss, hung host) has the potential to block RBD IO which would cause your iSCSI LUNs to become unresponsive for that period. For the most part though, once that condition clears things keep working, so its not like a hang where you need to reboot to clear it. Some situations we have hit with our setup: - Failed OSDs (dead disks) – no issues - Cluster rebalancing – ok if throttled back to keep service times down - Network packet loss (bad fibre) – painful, broken communication everywhere, caused a krbd hang needing a reboot - RBD Snapshot deletion – disk latency through roof, cluster unresponsive for minutes at a time, won’t do again. From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Brady Deetz Sent: Thursday, 6 April 2017 12:58 PM To: ceph-users Subject: [ceph-users] rbd iscsi gateway question I apologize if this is a duplicate of something recent, but I'm not finding much. Does the issue still exist where dropping an OSD results in a LUN's I/O hanging? I'm attempting to determine if I have to move off of VMWare in order to safely use Ceph as my VM storage. Confidentiality: This email and any attachments are confidential and may be subject to copyright, legal or some other professional privilege. They are intended solely for the attention and use of the named addressee(s). They may only be copied, distributed or disclosed with the consent of the copyright owner. If you have received this email by mistake or by breach of the confidentiality clause, please notify the sender immediately by return email and delete or destroy all copies of the email. Any confidentiality, privilege or copyright is not waived or lost because this email has been sent to you by mistake. Confidentiality: This email and any attachments are confidential and may be subject to copyright, legal or some other professional privilege. They are intended solely for the attention and use of the named addressee(s). They may only be copied, distributed or disclosed with the consent of the copyright owner. If you have received this email by mistake or by breach of the confidentiality clause, please notify the sender immediately by return email and delete or destroy all copies of the email. Any confidentiality, privilege or copyright is not waived or lost because this email has been sent to you by mistake. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd iscsi gateway question
I assume Brady is referring to the death spiral LIO gets into with some initiators, including vmware, if an IO takes longer than about 10s. I haven’t heard of anything, and can’t see any changes, so I would assume this issue still remains. I would look at either SCST or NFS for now. From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Adrian Saul Sent: 06 April 2017 05:32 To: Brady Deetz <bde...@gmail.com>; ceph-users <ceph-us...@ceph.com> Subject: Re: [ceph-users] rbd iscsi gateway question I am not sure if there is a hard and fast rule you are after, but pretty much anything that would cause ceph transactions to be blocked (flapping OSD, network loss, hung host) has the potential to block RBD IO which would cause your iSCSI LUNs to become unresponsive for that period. For the most part though, once that condition clears things keep working, so its not like a hang where you need to reboot to clear it. Some situations we have hit with our setup: * Failed OSDs (dead disks) – no issues * Cluster rebalancing – ok if throttled back to keep service times down * Network packet loss (bad fibre) – painful, broken communication everywhere, caused a krbd hang needing a reboot * RBD Snapshot deletion – disk latency through roof, cluster unresponsive for minutes at a time, won’t do again. From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Brady Deetz Sent: Thursday, 6 April 2017 12:58 PM To: ceph-users Subject: [ceph-users] rbd iscsi gateway question I apologize if this is a duplicate of something recent, but I'm not finding much. Does the issue still exist where dropping an OSD results in a LUN's I/O hanging? I'm attempting to determine if I have to move off of VMWare in order to safely use Ceph as my VM storage. Confidentiality: This email and any attachments are confidential and may be subject to copyright, legal or some other professional privilege. They are intended solely for the attention and use of the named addressee(s). They may only be copied, distributed or disclosed with the consent of the copyright owner. If you have received this email by mistake or by breach of the confidentiality clause, please notify the sender immediately by return email and delete or destroy all copies of the email. Any confidentiality, privilege or copyright is not waived or lost because this email has been sent to you by mistake. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rbd iscsi gateway question
I am not sure if there is a hard and fast rule you are after, but pretty much anything that would cause ceph transactions to be blocked (flapping OSD, network loss, hung host) has the potential to block RBD IO which would cause your iSCSI LUNs to become unresponsive for that period. For the most part though, once that condition clears things keep working, so its not like a hang where you need to reboot to clear it. Some situations we have hit with our setup: - Failed OSDs (dead disks) – no issues - Cluster rebalancing – ok if throttled back to keep service times down - Network packet loss (bad fibre) – painful, broken communication everywhere, caused a krbd hang needing a reboot - RBD Snapshot deletion – disk latency through roof, cluster unresponsive for minutes at a time, won’t do again. From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Brady Deetz Sent: Thursday, 6 April 2017 12:58 PM To: ceph-users Subject: [ceph-users] rbd iscsi gateway question I apologize if this is a duplicate of something recent, but I'm not finding much. Does the issue still exist where dropping an OSD results in a LUN's I/O hanging? I'm attempting to determine if I have to move off of VMWare in order to safely use Ceph as my VM storage. Confidentiality: This email and any attachments are confidential and may be subject to copyright, legal or some other professional privilege. They are intended solely for the attention and use of the named addressee(s). They may only be copied, distributed or disclosed with the consent of the copyright owner. If you have received this email by mistake or by breach of the confidentiality clause, please notify the sender immediately by return email and delete or destroy all copies of the email. Any confidentiality, privilege or copyright is not waived or lost because this email has been sent to you by mistake. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] rbd iscsi gateway question
I apologize if this is a duplicate of something recent, but I'm not finding much. Does the issue still exist where dropping an OSD results in a LUN's I/O hanging? I'm attempting to determine if I have to move off of VMWare in order to safely use Ceph as my VM storage. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com