Re: [Lsf] It's time to put together the schedule

2015-02-25 Thread Sagi Grimberg

On 2/25/2015 12:36 PM, Mike Christie wrote:

On 02/22/2015 09:25 PM, Mike Christie wrote:

I just hit a bug in the userspace code. Will send that later.


Hey Sagi,

Attached is the userspace patch, user-mq6.patch. It is made over
0001-iscsid-make-sure-actor-is-delated-before-reschedulin.patch (the
patch to fix that double schedule bug you guys found).

I am also attaching a updated kernel patch. It has some fixes for logout
and iscsi_tcp mq setup.

To use the patches, just set the new iscsid.conf setting
node.session.queue_ids. It is just a string of ints:

node.session.queue_ids = 1 2 4 8

that get passed in to the kernel. For each id, iscsid will create a
session and have the LLD map whatever they want to that id value. Login
is the same:

iscsiadm -m node -T yourtargget -p ip --login

However, after you login you have to manually scan

iscsiadm -m session --rescan


For logout, you currently have to make sure you logout all the sessions,
so use:

iscsiadm -m node -T yourtargget -p ip --logout

or

iscsiadm -m session --logout

If you just pass in a specific session id like here:

iscsiadm -m session -r SID --logout

then that will wait for all the other sessions in the group to be logged
out before completion the task. I did this because I was not yet sure
how to handle dynamic hctx updates in the kernel.


For the LLD implementation, I hooked in iscsi_tcp to the session/group
creation code. Like I said before, I was not sure what every
driver/fw/hw was going to map to, so the queue id that is getting passed
into the session/connection/ep creation functions is really generic and
you can map it to whatever you like right now.

For ib_iser, you should look at iscsi_tcp.c's create_session_grp and
destroy_session_grp callouts to see how to allocate the host in a
backward compatible way.


I'll do that.


Sofware iscsi/iser is doing a host per session
still, then doing a session_grp per host and multiple sessions per
group. HW iscsi offload will continue to do a host per some hw/fw
resource, then it can have multiple groups and multiple sessions per group.

I am passing in the queue_id to bind to in every object callout
(ep_connect, conn_create, session_create), because I was not sure at
what time all the drivers needed to bind/setup-mappings at. So pick
which ever makes sense and let me know.

I have not had time to break this into a proper patchset. Was not ready
to send as a RFC set. There is debugging and // comments in places, but
feel free to give me any feedback.


That's not a problem, we will get it ready for submission together...



If you did get my other mails/patches a while back then make sure you
are using the new userspace patches/tools in this mail with the updated
kernel patch in this mail. I have not yet added kernel/user compat code,
so you will hit hangs/crashes if you mix and match.



Thanks Mike, I'm working with upstream both in user and kernel.

Couple of quick first comments:

- Passing a list in node.session.queue_ids indeed allows the user
  a degree of freedom, but it might be an overshoot. We should allow
  giving a range type of queue_ids and the default should be a range
  [0-default_nr_queues].
  On a side note, I suspect we will pretty soon find out that this
  linear assignment will be the only useful setting and leave only
  nr_queues setting.

- In the single queue case, we need to pass the kernel a default
  WILDCARD_QUEUE_ID so drivers can spread the completion contexts of
  each session as they do today (and don't introduce a performance
  regression).

- About the queue_id. sw_tcp will need it for the TX/RX threads
  CPU binding, so it is used as a conn attribute. iser (and I assume
  other offloads as well) will need it for MSIX vector assignments, so
  it is used as an endpoint attribute. The session is completely
  logical, thus it should not hold a queue_id assignment (IMO at least).

- The session group should allocate session_map to only hold the number
  of sessions it was passed with (not nr_cpu_ids with possible holes).
  The session selection is based on the mq mapping, thus it should be a
  1x1 mapping to hctx. So it is basically boils down to:

  idx = sc-request-q-mq_map[sc-request-mq_ctx-cpu];
  session = grp-session_map[idx];

  and when we will properly use block layer tagging:

  tag = blk_mq_unique_tag(sc-request);
  idx = blk_mq_unique_tag_to_hwq(tag);
  session = grp-session_map[idx];

  So I guess my point is, we should not assign a queue_id to a session,
  the ep/conn queue_id was used at establishment for context assignment.

- About shared tags. So for scsi commands and TMFs we don't have a
  problem since we are guaranteed ITTs are unique. I wander how will we
  allocate a unique ITT for iscsi specific tasks (LOGIN, TEXT, LOGOUT,
  NOOP_OUT). My implementation did it per-session, so I reserved a range
  of ITTs for iscsi specific commands (in a kfifo), I wander how we can
  do that for multiple sessions. We need some kind of tag allocator 

Re: [PATCHv2 0/3] Fix issues resulting from actor rewrite

2015-02-25 Thread Chris Leech
On Thu, Feb 12, 2015 at 11:30:10PM -0600, Mike Christie wrote:
 On 02/12/2015 06:33 PM, Chris Leech wrote:
  It looks like the communication with iscsiuio has a similar case where
  a polling function reschedules itself, following up with a patch to fix
  the delay there.
 
 Have you been able to hit that iscsiuio reschedule code path? There was
 a bug where we could call actor_timer multiple times on undeleted timer.
 I made the attached patch (made over github tree).

Sorry for the delay on this.  I hadn't hit that case when I sent the
patch, it was purely looking for the same pattern.  I resorted to a
forced scenerio, disabling iscsiuio and modifying uip_broadcast to
return ISCSI_ERR_AGAIN on connection failure, to test the EGAIN
handling.  That did reproduce the high CPU load immediate polling, after
my patch the retry was in 1 second intervals as expected.

I'll take a look at your update to address the double actor_timer call
to make sure it at least behaves well in the same case.

- Chris

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To post to this group, send email to open-iscsi@googlegroups.com.
Visit this group at http://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.


Re: [PATCHv2 0/3] Fix issues resulting from actor rewrite

2015-02-25 Thread Chris Leech
On Wed, Feb 25, 2015 at 01:38:52PM -0800, Chris Leech wrote:
 On Thu, Feb 12, 2015 at 11:30:10PM -0600, Mike Christie wrote:
  On 02/12/2015 06:33 PM, Chris Leech wrote:
   It looks like the communication with iscsiuio has a similar case where
   a polling function reschedules itself, following up with a patch to fix
   the delay there.
  
  Have you been able to hit that iscsiuio reschedule code path? There was
  a bug where we could call actor_timer multiple times on undeleted timer.
  I made the attached patch (made over github tree).
 
 Sorry for the delay on this.  I hadn't hit that case when I sent the
 patch, it was purely looking for the same pattern.  I resorted to a
 forced scenerio, disabling iscsiuio and modifying uip_broadcast to
 return ISCSI_ERR_AGAIN on connection failure, to test the EGAIN
 handling.  That did reproduce the high CPU load immediate polling, after
 my patch the retry was in 1 second intervals as expected.
 
 I'll take a look at your update to address the double actor_timer call
 to make sure it at least behaves well in the same case.

OK, if I combine this (unified login retry instead of seperate UIO
polling + mod_timer changes) with the previous patches to fix the delay
it retries in 1 sec intervals on iscsiuio errors.

- Chris

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To post to this group, send email to open-iscsi@googlegroups.com.
Visit this group at http://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.


Re: Changes to iSCSI device are not consistent across network

2015-02-25 Thread Donald Williams
Hello,

Unless you have a cluster file system in place what you are seeing is
expected.   Each node believes it owns that volume exclusively.   There's
nothing in iSCSI or SCSI protocol to address this.  A write from one node
doesn't tell the other node to update its cached image of that disk.
Without a file system to handle that process there's no workaround.

Regards,

Don
On Feb 25, 2015 8:21 PM, m0pt0pmat...@gmail.com wrote:

 Hey guys,

 Forgive me, but I'm super new to this.

 I have two CentOS 7 nodes. I'm using LIO to export a sparse file over
 iSCSI.

 The sparse file was created as a LIO FILEIO with write-back disabled
 (write-through)
 In targetcli, I create a LUN on my iSCSI frontend

 I formatted the sparse file to have an EXT4 filesystem.

 On both the target node and the initiator node, I can initiate a iSCSI
 session (iscsiadm -m node --login), mount the device, and read and write to
 it.

 However, changes to the device are not consistent across the network until
 i logout of the iSCSI session. (iscsiadm -m node --logout) (both nodes have
 to logout. The first logout writes the changes, and the second one
 refreshes them)

 Somewhere, caching is occurring, but I'm not sure where.

 Just in case you're curious, my use case is to have multiple nodes write
 to the same remote disk (or file) in parallel.

 Any direction or advice would be great. Thank you.

 -Matt

 --
 You received this message because you are subscribed to the Google Groups
 open-iscsi group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to open-iscsi+unsubscr...@googlegroups.com.
 To post to this group, send email to open-iscsi@googlegroups.com.
 Visit this group at http://groups.google.com/group/open-iscsi.
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To post to this group, send email to open-iscsi@googlegroups.com.
Visit this group at http://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.


Changes to iSCSI device are not consistent across network

2015-02-25 Thread m0pt0pmatt17
Hey guys,

Forgive me, but I'm super new to this.

I have two CentOS 7 nodes. I'm using LIO to export a sparse file over iSCSI.

The sparse file was created as a LIO FILEIO with write-back disabled 
(write-through)
In targetcli, I create a LUN on my iSCSI frontend

I formatted the sparse file to have an EXT4 filesystem.

On both the target node and the initiator node, I can initiate a iSCSI 
session (iscsiadm -m node --login), mount the device, and read and write to 
it.

However, changes to the device are not consistent across the network until 
i logout of the iSCSI session. (iscsiadm -m node --logout) (both nodes have 
to logout. The first logout writes the changes, and the second one 
refreshes them)

Somewhere, caching is occurring, but I'm not sure where.

Just in case you're curious, my use case is to have multiple nodes write to 
the same remote disk (or file) in parallel. 

Any direction or advice would be great. Thank you.

-Matt

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To post to this group, send email to open-iscsi@googlegroups.com.
Visit this group at http://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.