On Friday, November 21, 2014 8:06:55 PM UTC-8, Mike Christie wrote: On Nov 18, 2014, at 3:35 PM, Lee Duncan <leeman...@gmail.com <javascript:>> wrote:
> The following patch fixes a problem where the CPU becomes compute bound > when rediscovering targets, when there are hundreds of sessions. > > When his occurs, most of the time is spent in the function > iscsi_sysfs_for_each_session(). This function does a scandir(), > sorted alphabetically, to get a list of sessions, then scans > that list looking for a match. When there are hundreds of sesions > this can take forever. > > This patch saves the current session and then ensures that this > session sorted to the front of the list. Testing shows that > CPU usage goes from near 100% to near 0% when running cable > plug tests with hundreds of sessions. When/how is iscsi_sysfs_for_each_session getting run during this test? Is it iscsid during its relogin/eh handling or something else like a script running iscsiadm commands? The CPU-bound problem occurs when running a disconnect test. With over a hundred sessions cables are randomly unplugged and plugged back in. So it is relogin/eh handling. Here is a stack track from interrupting the daemon when it's CPU-bound. (Apologies for the wrap): (gdb) where #0 0x00007fb795d6fb76 in strcmp () from /lib64/libc.so.6 #1 0x000000000041c173 in sysfs_attr_get_value (devpath=<value optimized out>, attr_name=<value optimized out>) at sysfs.c:378 #2 0x000000000041c4cd in sysfs_get_value (id=0x7fff524e0ae0 "host88", subsys=0x454452 "iscsi_host", param=0x4543da "port_state") at sysfs.c:562 #3 0x000000000041c661 in sysfs_get_str (id=0x1145fc20 "/class/iscsi_host/host12956/netdev", subsys=0x7fff524e0514 "/class/iscsi_host/host88/port_state", param=0x2f <Address 0x2f out of bounds>, value=0x0, value_size=-4) at sysfs.c:613 #4 0x000000000042049a in iscsi_sysfs_read_iface (iface=0x17301dc0, host_no=88, session=0x9593233 "session79", iface_kern_id=0x0) at iscsi_sysfs.c:769 #5 0x0000000000421e3a in iscsi_sysfs_get_sessioninfo_by_id (info=0x17301db0, session=0x9593233 "session79") at iscsi_sysfs.c:1154 #6 0x000000000042207a in iscsi_sysfs_for_each_session (data=0x16a8e120, nr_found=0x7fff524e0e44, fn=0x4042f0 <iscsi_match_session>) at iscsi_sysfs.c:1185 #7 0x000000000043569b in iscsi_check_for_running_session (rec=0x1145fc20) at session_mgmt.c:469 #8 0x0000000000436726 in update_sessions (new_rec_list=0x7fff524e0ef0, targetname=0x0, iname=0x2f <Address 0x2f out of bounds>) at discoveryd.c:176 #9 0x0000000000436b54 in __do_st_disc_and_login (drec=<value optimized out>) at discoveryd.c:1051 #10 do_st_disc_and_login (drec=<value optimized out>) at discoveryd.c:1067 #11 0x0000000000436100 in fork_disc (def_iname=0x0, drec=0x7fff524e0f90, poll_inval=30, do_disc_and_login=0x436a30 <do_st_disc_and_login>) at discoveryd.c:210 #12 0x00000000004361d5 in st_start (data=<value optimized out>, drec=0x7fff524e0f90) at discoveryd.c:1082 #13 0x000000000041b6b8 in idbm_for_each_drec (type=0, config_root=<value optimized out>, data=0x0, fn=0x436180 <st_start>) at idbm.c:1388 #14 0x0000000000435171 in main (argc=<value optimized out>, argv=<value optimized out>) at iscsid.c:524 Looked over the code a bit more since I had some time, and it looks like this is one of the two places where the code calls iscsi_sysfs_for_each_session() when it's really just looking for one session. And it does this by: - getting a list of all sessions (sorted, which isn't needed but doesn't seem to hurt). They seem to be sorted in alphabetical session-ID order - for each entry, populate it's sysfs properties, then see if this session matches 4 key values: - session ID should match (numerical compare) - target name should match (string compare) - iSCSI address should match (string + number) - port no should match (numerical), and - interface should match (string x2) These tests are done in short-circuit fashion, so if the session ID does not match, then the other tests are not done, even though we retrieved all the /sysfs data. And in our problem case, we have a particular session ID we are looking for. But that session could easily be sorted at the back of the list (before my patch), if it's a newer session. And that would make *lots* of /sysfs IOs that were not needed. The reason the submitted fix works is that I work around this unneeded gathering of /sysfs data by sorting the session we care about to the front of the list. Another perhaps cleaner re-write would be to create a new function that can search efficiently for a session by session ID, not gathering the other /sysfs data until needed, i.e. if and only if the session matches. We know that session IDs are unique, so it could search the session list and return just the one session we are looking for, or FAILURE if it cannot be found. I will work on a patch -- it's starting to sound interesting to me. :) [Apologies for smiley faces -- too much time on FB.] -- You received this message because you are subscribed to the Google Groups "open-iscsi" group. To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscr...@googlegroups.com. To post to this group, send email to open-iscsi@googlegroups.com. Visit this group at http://groups.google.com/group/open-iscsi. For more options, visit https://groups.google.com/d/optout.