Re: [Lsf] It's time to put together the schedule
On 2/25/2015 12:36 PM, Mike Christie wrote: On 02/22/2015 09:25 PM, Mike Christie wrote: I just hit a bug in the userspace code. Will send that later. Hey Sagi, Attached is the userspace patch, user-mq6.patch. It is made over 0001-iscsid-make-sure-actor-is-delated-before-reschedulin.patch (the patch to fix that double schedule bug you guys found). I am also attaching a updated kernel patch. It has some fixes for logout and iscsi_tcp mq setup. To use the patches, just set the new iscsid.conf setting node.session.queue_ids. It is just a string of ints: node.session.queue_ids = 1 2 4 8 that get passed in to the kernel. For each id, iscsid will create a session and have the LLD map whatever they want to that id value. Login is the same: iscsiadm -m node -T yourtargget -p ip --login However, after you login you have to manually scan iscsiadm -m session --rescan For logout, you currently have to make sure you logout all the sessions, so use: iscsiadm -m node -T yourtargget -p ip --logout or iscsiadm -m session --logout If you just pass in a specific session id like here: iscsiadm -m session -r SID --logout then that will wait for all the other sessions in the group to be logged out before completion the task. I did this because I was not yet sure how to handle dynamic hctx updates in the kernel. For the LLD implementation, I hooked in iscsi_tcp to the session/group creation code. Like I said before, I was not sure what every driver/fw/hw was going to map to, so the queue id that is getting passed into the session/connection/ep creation functions is really generic and you can map it to whatever you like right now. For ib_iser, you should look at iscsi_tcp.c's create_session_grp and destroy_session_grp callouts to see how to allocate the host in a backward compatible way. I'll do that. Sofware iscsi/iser is doing a host per session still, then doing a session_grp per host and multiple sessions per group. HW iscsi offload will continue to do a host per some hw/fw resource, then it can have multiple groups and multiple sessions per group. I am passing in the queue_id to bind to in every object callout (ep_connect, conn_create, session_create), because I was not sure at what time all the drivers needed to bind/setup-mappings at. So pick which ever makes sense and let me know. I have not had time to break this into a proper patchset. Was not ready to send as a RFC set. There is debugging and // comments in places, but feel free to give me any feedback. That's not a problem, we will get it ready for submission together... If you did get my other mails/patches a while back then make sure you are using the new userspace patches/tools in this mail with the updated kernel patch in this mail. I have not yet added kernel/user compat code, so you will hit hangs/crashes if you mix and match. Thanks Mike, I'm working with upstream both in user and kernel. Couple of quick first comments: - Passing a list in node.session.queue_ids indeed allows the user a degree of freedom, but it might be an overshoot. We should allow giving a range type of queue_ids and the default should be a range [0-default_nr_queues]. On a side note, I suspect we will pretty soon find out that this linear assignment will be the only useful setting and leave only nr_queues setting. - In the single queue case, we need to pass the kernel a default WILDCARD_QUEUE_ID so drivers can spread the completion contexts of each session as they do today (and don't introduce a performance regression). - About the queue_id. sw_tcp will need it for the TX/RX threads CPU binding, so it is used as a conn attribute. iser (and I assume other offloads as well) will need it for MSIX vector assignments, so it is used as an endpoint attribute. The session is completely logical, thus it should not hold a queue_id assignment (IMO at least). - The session group should allocate session_map to only hold the number of sessions it was passed with (not nr_cpu_ids with possible holes). The session selection is based on the mq mapping, thus it should be a 1x1 mapping to hctx. So it is basically boils down to: idx = sc-request-q-mq_map[sc-request-mq_ctx-cpu]; session = grp-session_map[idx]; and when we will properly use block layer tagging: tag = blk_mq_unique_tag(sc-request); idx = blk_mq_unique_tag_to_hwq(tag); session = grp-session_map[idx]; So I guess my point is, we should not assign a queue_id to a session, the ep/conn queue_id was used at establishment for context assignment. - About shared tags. So for scsi commands and TMFs we don't have a problem since we are guaranteed ITTs are unique. I wander how will we allocate a unique ITT for iscsi specific tasks (LOGIN, TEXT, LOGOUT, NOOP_OUT). My implementation did it per-session, so I reserved a range of ITTs for iscsi specific commands (in a kfifo), I wander how we can do that for multiple sessions. We need some kind of tag allocator
Re: [PATCHv2 0/3] Fix issues resulting from actor rewrite
On Thu, Feb 12, 2015 at 11:30:10PM -0600, Mike Christie wrote: On 02/12/2015 06:33 PM, Chris Leech wrote: It looks like the communication with iscsiuio has a similar case where a polling function reschedules itself, following up with a patch to fix the delay there. Have you been able to hit that iscsiuio reschedule code path? There was a bug where we could call actor_timer multiple times on undeleted timer. I made the attached patch (made over github tree). Sorry for the delay on this. I hadn't hit that case when I sent the patch, it was purely looking for the same pattern. I resorted to a forced scenerio, disabling iscsiuio and modifying uip_broadcast to return ISCSI_ERR_AGAIN on connection failure, to test the EGAIN handling. That did reproduce the high CPU load immediate polling, after my patch the retry was in 1 second intervals as expected. I'll take a look at your update to address the double actor_timer call to make sure it at least behaves well in the same case. - Chris -- You received this message because you are subscribed to the Google Groups open-iscsi group. To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscr...@googlegroups.com. To post to this group, send email to open-iscsi@googlegroups.com. Visit this group at http://groups.google.com/group/open-iscsi. For more options, visit https://groups.google.com/d/optout.
Re: [PATCHv2 0/3] Fix issues resulting from actor rewrite
On Wed, Feb 25, 2015 at 01:38:52PM -0800, Chris Leech wrote: On Thu, Feb 12, 2015 at 11:30:10PM -0600, Mike Christie wrote: On 02/12/2015 06:33 PM, Chris Leech wrote: It looks like the communication with iscsiuio has a similar case where a polling function reschedules itself, following up with a patch to fix the delay there. Have you been able to hit that iscsiuio reschedule code path? There was a bug where we could call actor_timer multiple times on undeleted timer. I made the attached patch (made over github tree). Sorry for the delay on this. I hadn't hit that case when I sent the patch, it was purely looking for the same pattern. I resorted to a forced scenerio, disabling iscsiuio and modifying uip_broadcast to return ISCSI_ERR_AGAIN on connection failure, to test the EGAIN handling. That did reproduce the high CPU load immediate polling, after my patch the retry was in 1 second intervals as expected. I'll take a look at your update to address the double actor_timer call to make sure it at least behaves well in the same case. OK, if I combine this (unified login retry instead of seperate UIO polling + mod_timer changes) with the previous patches to fix the delay it retries in 1 sec intervals on iscsiuio errors. - Chris -- You received this message because you are subscribed to the Google Groups open-iscsi group. To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscr...@googlegroups.com. To post to this group, send email to open-iscsi@googlegroups.com. Visit this group at http://groups.google.com/group/open-iscsi. For more options, visit https://groups.google.com/d/optout.
Re: Changes to iSCSI device are not consistent across network
Hello, Unless you have a cluster file system in place what you are seeing is expected. Each node believes it owns that volume exclusively. There's nothing in iSCSI or SCSI protocol to address this. A write from one node doesn't tell the other node to update its cached image of that disk. Without a file system to handle that process there's no workaround. Regards, Don On Feb 25, 2015 8:21 PM, m0pt0pmat...@gmail.com wrote: Hey guys, Forgive me, but I'm super new to this. I have two CentOS 7 nodes. I'm using LIO to export a sparse file over iSCSI. The sparse file was created as a LIO FILEIO with write-back disabled (write-through) In targetcli, I create a LUN on my iSCSI frontend I formatted the sparse file to have an EXT4 filesystem. On both the target node and the initiator node, I can initiate a iSCSI session (iscsiadm -m node --login), mount the device, and read and write to it. However, changes to the device are not consistent across the network until i logout of the iSCSI session. (iscsiadm -m node --logout) (both nodes have to logout. The first logout writes the changes, and the second one refreshes them) Somewhere, caching is occurring, but I'm not sure where. Just in case you're curious, my use case is to have multiple nodes write to the same remote disk (or file) in parallel. Any direction or advice would be great. Thank you. -Matt -- You received this message because you are subscribed to the Google Groups open-iscsi group. To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscr...@googlegroups.com. To post to this group, send email to open-iscsi@googlegroups.com. Visit this group at http://groups.google.com/group/open-iscsi. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups open-iscsi group. To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscr...@googlegroups.com. To post to this group, send email to open-iscsi@googlegroups.com. Visit this group at http://groups.google.com/group/open-iscsi. For more options, visit https://groups.google.com/d/optout.
Changes to iSCSI device are not consistent across network
Hey guys, Forgive me, but I'm super new to this. I have two CentOS 7 nodes. I'm using LIO to export a sparse file over iSCSI. The sparse file was created as a LIO FILEIO with write-back disabled (write-through) In targetcli, I create a LUN on my iSCSI frontend I formatted the sparse file to have an EXT4 filesystem. On both the target node and the initiator node, I can initiate a iSCSI session (iscsiadm -m node --login), mount the device, and read and write to it. However, changes to the device are not consistent across the network until i logout of the iSCSI session. (iscsiadm -m node --logout) (both nodes have to logout. The first logout writes the changes, and the second one refreshes them) Somewhere, caching is occurring, but I'm not sure where. Just in case you're curious, my use case is to have multiple nodes write to the same remote disk (or file) in parallel. Any direction or advice would be great. Thank you. -Matt -- You received this message because you are subscribed to the Google Groups open-iscsi group. To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscr...@googlegroups.com. To post to this group, send email to open-iscsi@googlegroups.com. Visit this group at http://groups.google.com/group/open-iscsi. For more options, visit https://groups.google.com/d/optout.