On Friday, November 21, 2014 8:06:55 PM UTC-8, Mike Christie wrote:

On Nov 18, 2014, at 3:35 PM, Lee Duncan <leeman...@gmail.com <javascript:>> 
wrote: 

> The following patch fixes a problem where the CPU becomes compute bound 
> when rediscovering targets, when there are hundreds of sessions. 
> 
> When his occurs, most of the time is spent in the function 
> iscsi_sysfs_for_each_session(). This function does a scandir(), 
> sorted alphabetically, to get a list of sessions, then scans 
> that list looking for a match. When there are hundreds of sesions 
> this can take forever. 
> 
> This patch saves the current session and then ensures that this 
> session sorted to the front of the list. Testing shows that 
> CPU usage goes from near 100% to near 0% when running cable 
> plug tests with hundreds of sessions. 

When/how is iscsi_sysfs_for_each_session getting run during this test? 

Is it iscsid during its relogin/eh handling or something else like a script 
running iscsiadm commands? 



The CPU-bound problem occurs when running a disconnect test.
With over a hundred sessions cables are randomly unplugged and
plugged back in. So it is relogin/eh handling.


Here is a stack track from interrupting the daemon when it's CPU-bound.
(Apologies for the wrap):


(gdb) where
#0 0x00007fb795d6fb76 in strcmp () from /lib64/libc.so.6
#1 0x000000000041c173 in sysfs_attr_get_value (devpath=<value optimized 
out>, attr_name=<value optimized out>) at sysfs.c:378
#2 0x000000000041c4cd in sysfs_get_value (id=0x7fff524e0ae0 "host88", 
subsys=0x454452 "iscsi_host", param=0x4543da "port_state") at sysfs.c:562
#3 0x000000000041c661 in sysfs_get_str (id=0x1145fc20 
"/class/iscsi_host/host12956/netdev", subsys=0x7fff524e0514 
"/class/iscsi_host/host88/port_state", param=0x2f <Address 0x2f out of 
bounds>,
value=0x0, value_size=-4) at sysfs.c:613
#4 0x000000000042049a in iscsi_sysfs_read_iface (iface=0x17301dc0, 
host_no=88, session=0x9593233 "session79", iface_kern_id=0x0) at 
iscsi_sysfs.c:769
#5 0x0000000000421e3a in iscsi_sysfs_get_sessioninfo_by_id 
(info=0x17301db0, session=0x9593233 "session79") at iscsi_sysfs.c:1154
#6 0x000000000042207a in iscsi_sysfs_for_each_session (data=0x16a8e120, 
nr_found=0x7fff524e0e44, fn=0x4042f0 <iscsi_match_session>) at 
iscsi_sysfs.c:1185
#7 0x000000000043569b in iscsi_check_for_running_session (rec=0x1145fc20) 
at session_mgmt.c:469
#8 0x0000000000436726 in update_sessions (new_rec_list=0x7fff524e0ef0, 
targetname=0x0, iname=0x2f <Address 0x2f out of bounds>) at discoveryd.c:176
#9 0x0000000000436b54 in __do_st_disc_and_login (drec=<value optimized 
out>) at discoveryd.c:1051
#10 do_st_disc_and_login (drec=<value optimized out>) at discoveryd.c:1067
#11 0x0000000000436100 in fork_disc (def_iname=0x0, drec=0x7fff524e0f90, 
poll_inval=30, do_disc_and_login=0x436a30 <do_st_disc_and_login>) at 
discoveryd.c:210
#12 0x00000000004361d5 in st_start (data=<value optimized out>, 
drec=0x7fff524e0f90) at discoveryd.c:1082
#13 0x000000000041b6b8 in idbm_for_each_drec (type=0, config_root=<value 
optimized out>, data=0x0, fn=0x436180 <st_start>) at idbm.c:1388
#14 0x0000000000435171 in main (argc=<value optimized out>, argv=<value 
optimized out>) at iscsid.c:524

Looked over the code a bit more since I had some time, and it looks like 
this is
one of the two places where the code calls iscsi_sysfs_for_each_session()
when it's really just looking for one session.

And it does this by:

- getting a list of all sessions (sorted, which isn't
  needed but doesn't seem to hurt). They seem
  to be sorted in alphabetical session-ID order

- for each entry, populate it's sysfs properties, then
  see if this session matches 4 key values:

  - session ID should match (numerical compare)
  - target name should match (string compare)
  - iSCSI address should match (string + number)
  - port no should match (numerical), and
  - interface should match (string x2)

These tests are done in short-circuit fashion, so if the
session ID does not match, then the other tests
are not done, even though we retrieved all the
/sysfs data.

And in our problem case, we have a  particular session
ID we are looking for. But that session could easily be
sorted at the back of the list (before my patch),
if it's a newer session. And that would make *lots* of
/sysfs IOs that were not needed.

The reason the submitted fix works is that I work around
this unneeded gathering of /sysfs data by sorting the
session we care about to the front of the list.

Another perhaps cleaner re-write would be to
create a new function that can search efficiently for
a session by session ID, not gathering the other
/sysfs data until needed, i.e. if and only if the
session matches. We know that session
IDs are unique, so it could search the session list
and return just the one session we are looking for,
or FAILURE if it cannot be found.

I will work on a patch -- it's starting to sound
interesting to me.  :)

[Apologies for smiley faces -- too much time on FB.]

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To post to this group, send email to open-iscsi@googlegroups.com.
Visit this group at http://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.

Reply via email to