On Thursday, July 28, 2016 at 2:16:12 AM UTC-7, Frank Fegert wrote: > > Hello all, >
Hi Frank. > > disclaimer: i'm not a programmer, so the following might be utterly > and completely wrong ;-) > > TL;DR: i'm getting segfaults from iscsiuio upon any target login. > Specifically this happens in iscsiuio/src/unix/nic_nl.c. Debugging > this lead me to believe this is a case of trying to unlock a not > locked mutex. A quick and dirty hack which works around this is > available here: > > https://github.com/frank-fegert/open-iscsi/commit/9f770f9eb0f302d146d455f1d68648e2d0172eb6 > > > There is probably room for a proper fix (e.g. counter on the number of > locks?), which considers the semantics of the whole code. > I looked at your suggested fix, and I think it's almost correct. The caller should indeed have the mutex locked when calling pthread_cond_wait. It's probably just luck this hasn't blown up before. But I think it just needs one tweak. You need to call pthread_mutex_unlock() in the error case, i.e. in the "rc != 0" case, before calling LOG_ERR(). Can you submit a patch or pull request updated with this change? > > > The longer explanation: > > My setup involves 6 Dell M630 hosts (host1, host{5,6,7,8,9}), all with > BCM57810 iSOEs. BCM57810 firmware, software (Debian 8) and targets are > exactly the same on all hosts. I'm using the Debian open-iscsi package, > which is rebuild in order to include iscsiuio and has the changes up to > Git commit 0fa43f29 - but excluding c6d1117b and 76832662 (externalization > of the open-isns code) - backported. The only difference is, host1 has > Intel E5 v3 CPUs, host{5,6,7,8,9} have Intel E5 v4 CPUs. > > On host1 everything works fine, iscsiuio runs as expected, access to > targets is working flawlessly. > On host{5,6,7,8,9} i'm getting segfaults like the one in the example > shown below, while trying to log in to any target. > > Searching for "__lll_unlock_elision" in conjunction with "pthread_*" > led me to the following resources: > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=800574 > https://lwn.net/Articles/534758/ > > which are pointing towards a CPU (Broadwell and Skylake) specific pro- > blem when not carefully using mutexes. The general opinion from there > and other related bug reports seems to be, that the application be- > haviour should be changed, in order to fix such an issue. > > Tracking this issue further down i ended up in the function "nl_process_ > handle_thread" in iscsiuio/src/unix/nic_nl.c and specifically here: > > 486 rc = pthread_cond_wait(&nic->nl_process_cond, > 487 &nic->nl_process_mutex); > > From the pthread_cond_wait manpage: > The pthread_cond_timedwait() and pthread_cond_wait() functions shall > block on a condition variable. They shall be called with mutex locked > by the calling thread or undefined behavior results. > > On the first pass of the loop, this constraint seems to be true. At > the end of the loop at: > > 499 pthread_mutex_unlock(&nic->nl_process_mutex); > > the mutex is then unlocked. Thus - if i understand the code right - > the above constraint is no longer met on the subsequent passes of the > loop. On Intel E5 v3 this seemed to be tolerated and without any impact. > But on Intel E5 v4 (and other CPUs implementing HLE and RTM) this IMHO > causes the observed segfault. > > Could someone more familiar with mutex handling in phtreads and/or the > semantics of the iscsiuio code please take a look at this? I'd be inter- > ested if my analysis is correct and whether my quick'n'dirty fix has > any major side-effects. And - of course - what a proper fix for the ob- > served segfault would look like ;-) > > Thanks & best regards, > > Frank > > > host5:~# gdb /sbin/iscsiuio > GNU gdb (Debian 7.7.1+dfsg-5) 7.7.1 > Copyright (C) 2014 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later < > http://gnu.org/licenses/gpl.html> > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. Type "show copying" > and "show warranty" for details. > This GDB was configured as "x86_64-linux-gnu". > Type "show configuration" for configuration details. > For bug reporting instructions, please see: > <http://www.gnu.org/software/gdb/bugs/>. > Find the GDB manual and other documentation resources online at: > <http://www.gnu.org/software/gdb/documentation/>. > For help, type "help". > Type "apropos word" to search for commands related to "word"... > Reading symbols from /sbin/iscsiuio...(no debugging symbols found)...done. > > (gdb) run -d 4 -f > Starting program: /sbin/iscsiuio -d 4 -f > [Thread debugging using libthread_db enabled] > Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". > INFO [Wed Jul 27 10:01:45 2016]Initialize logger using log file: > /var/log/iscsiuio.log > INFO [Wed Jul 27 10:01:45 2016]Started iSCSI uio stack: Ver 0.7.8.2 > INFO [Wed Jul 27 10:01:45 2016]Build date: Fri Jul 22 15:40:04 CEST 2016 > INFO [Wed Jul 27 10:01:45 2016]Debug mode enabled > INFO [Wed Jul 27 10:01:45 2016]Running on sysname: 'Linux', release: > '3.16.0-4-amd64', version '#1 SMP Debian 3.16.7-ckt25-2+deb8u3 > (2016-07-02)' machine: 'x86_64' > DBG [Wed Jul 27 10:01:45 2016]Loaded nic library 'bnx2' Version: > '0.7.8.2' build on Fri Jul 22 15:40:04 CEST 2016' > DBG [Wed Jul 27 10:01:45 2016]Added 'bnx2' nic library > DBG [Wed Jul 27 10:01:45 2016]Loaded nic library 'bnx2x' Version: > '0.7.8.2' build on Fri Jul 22 15:40:04 CEST 2016' > DBG [Wed Jul 27 10:01:45 2016]Added 'bnx2x' nic library > [New Thread 0x7ffff760f700 (LWP 4942)] > INFO [Wed Jul 27 10:01:45 2016]signal handling thread ready > INFO [Wed Jul 27 10:01:45 2016]nic_utils Found host[11]: host11 > INFO [Wed Jul 27 10:01:45 2016]Done capturing > /sys/class/iscsi_host/host11/netdev > INFO [Wed Jul 27 10:01:45 2016]Done capturing > /sys/class/iscsi_host/host11/netdev > INFO [Wed Jul 27 10:01:45 2016]nic_utils looking for uio device for eth3 > WARN [Wed Jul 27 10:01:45 2016]Could not find assoicate uio device with > eth3 > ERR [Wed Jul 27 10:01:45 2016]nic_utils Could not determine UIO name for > eth3 > INFO [Wed Jul 27 10:01:45 2016]nic_utils Found host[12]: host12 > INFO [Wed Jul 27 10:01:45 2016]Done capturing > /sys/class/iscsi_host/host12/netdev > INFO [Wed Jul 27 10:01:45 2016]Done capturing > /sys/class/iscsi_host/host12/netdev > INFO [Wed Jul 27 10:01:45 2016]nic_utils looking for uio device for eth2 > INFO [Wed Jul 27 10:01:45 2016]nic_utils eth2 associated with uio0 > INFO [Wed Jul 27 10:01:45 2016]nic_utils NIC not found creating an > instance for host_no: 12 eth2 > DBG [Wed Jul 27 10:01:45 2016]Could not increase process priority: > Success > [New Thread 0x7ffff6e0e700 (LWP 4943)] > DBG [Wed Jul 27 10:01:45 2016]iscsi_ipc Started iscsid listening thread > DBG [Wed Jul 27 10:01:45 2016]iscsi_ipc Waiting for iscsid command > INFO [Wed Jul 27 10:01:45 2016]NIC_NL Netlink to CNIC on pid 4938 is > ready > DBG [Wed Jul 27 10:01:57 2016]iscsi_ipc recv iscsid request: cmd: 1, > payload_len: 11720 > INFO [Wed Jul 27 10:01:57 2016]iscsi_ipc Received request for 'eth2' to > set IP address: '10.0.1.62' VLAN: '0' > INFO [Wed Jul 27 10:01:57 2016]iscsi_ipc Using netmask: 0.0.0.0 > INFO [Wed Jul 27 10:01:57 2016]iscsi_ipc eth2, using existing NIC > INFO [Wed Jul 27 10:01:57 2016]nic_utils looking for uio device for eth2 > INFO [Wed Jul 27 10:01:57 2016]nic_utils eth2 associated with uio0 > INFO [Wed Jul 27 10:01:57 2016]Done capturing /sys/class/uio/uio0/name > INFO [Wed Jul 27 10:01:57 2016]nic_utils eth2: Verified uio name > bnx2x_cnic with library bnx2x > INFO [Wed Jul 27 10:01:57 2016]eth2: found NIC with library 'bnx2x' > INFO [Wed Jul 27 10:01:57 2016]iscsi_ipc eth2 library set using > transport_name bnx2i > INFO [Wed Jul 27 10:01:57 2016]iscsi_ipc eth2: requesting configuration > using static IP address > DBG [Wed Jul 27 10:01:57 2016]iscsi_ipc eth2 couldn't find interface > with ip_type: 0x2 creating it > INFO [Wed Jul 27 10:01:57 2016]nic eth2: Added nic interface for VLAN: 0, > protocol: 2 > INFO [Wed Jul 27 10:01:57 2016]iscsi_ipc eth2: created network interface > [New Thread 0x7ffff660d700 (LWP 4947)] > WARN [Wed Jul 27 10:01:57 2016]nic_utils eth2: device already disabled: > flag: 0x1088 state: 0x1 > INFO [Wed Jul 27 10:01:57 2016]iscsi_ipc eth2: configuring using static > IP IPv4 address :10.0.1.62 > INFO [Wed Jul 27 10:01:57 2016]iscsi_ipc netmask: 255.255.255.0 > [New Thread 0x7ffff5e0c700 (LWP 4948)] > INFO [Wed Jul 27 10:01:57 2016]iscsi_ipc ISCSID_UIP_IPC_GET_IFACE: > command: 1 name: bnx2i.d0:43:1e:51:98:53, netdev: eth2 ipaddr: 10.0.1.62 > vlan: 0 transport_name:bnx2i > INFO [Wed Jul 27 10:01:57 2016]nic_utils eth2: spinning up thread for nic > DBG [Wed Jul 27 10:01:57 2016]iscsi_ipc Waiting for iscsid command > [New Thread 0x7ffff560b700 (LWP 4949)] > DBG [Wed Jul 27 10:01:57 2016]nic eth2: Waiting to be enabled > INFO [Wed Jul 27 10:01:57 2016]Created nic thread: eth2 > INFO [Wed Jul 27 10:01:57 2016]iscsi_ipc eth2: started NIC enable thread > state: 0x1 > DBG [Wed Jul 27 10:01:57 2016]nic eth2: is now enabled > INFO [Wed Jul 27 10:01:57 2016]bnx2x eth2: bnx2x driver using version > 1.78.19 > ERR [Wed Jul 27 10:01:58 2016]bnx2x /dev/uio0: uio device has been > brought up via pid: 4938 on fd: 7 > INFO [Wed Jul 27 10:01:58 2016]Done capturing /sys/class/uio/uio0/name > INFO [Wed Jul 27 10:01:58 2016]bnx2x eth2: Verified is a cnic_uio device > DBG [Wed Jul 27 10:01:58 2016]bnx2x eth2: using rx ring size: 15, rx > buffer size: 1024 > INFO [Wed Jul 27 10:01:58 2016]Done capturing /sys/class/uio/uio0/event > DBG [Wed Jul 27 10:01:58 2016]bnx2x Chip ID: 168e1000 > INFO [Wed Jul 27 10:01:58 2016]nic_id eth2: is found at 03:00.00 > INFO [Wed Jul 27 10:01:58 2016]bnx2x eth2: func 0x0, pfid 0x0, client_id > 0x88, cid 0x1 > DBG [Wed Jul 27 10:01:58 2016]bnx2x eth2: mode = 0x100 > INFO [Wed Jul 27 10:01:58 2016]bnx2x eth2: Using mac address: > d0:43:1e:51:98:53 > INFO [Wed Jul 27 10:01:58 2016]eth2: bnx2x initialized > INFO [Wed Jul 27 10:01:58 2016]nic eth2: Initialized ip stack: VLAN: 0 > INFO [Wed Jul 27 10:01:58 2016]nic eth2: mac: d0:43:1e:51:98:53 > INFO [Wed Jul 27 10:01:58 2016]nic eth2: Using IP address: 10.0.1.62 > INFO [Wed Jul 27 10:01:58 2016]nic eth2: Using netmask: 255.255.255.0 > INFO [Wed Jul 27 10:01:58 2016]nic eth2: enabled vlan 0 protocol: 2 > INFO [Wed Jul 27 10:01:58 2016]nic eth2: entering main nic loop > DBG [Wed Jul 27 10:01:58 2016]nic_utils eth2: device enabled > [Thread 0x7ffff5e0c700 (LWP 4948) exited] > DBG [Wed Jul 27 10:01:59 2016]iscsi_ipc recv iscsid request: cmd: 1, > payload_len: 11720 > INFO [Wed Jul 27 10:01:59 2016]iscsi_ipc Received request for 'eth2' to > set IP address: '10.0.1.62' VLAN: '0' > INFO [Wed Jul 27 10:01:59 2016]iscsi_ipc Using netmask: 0.0.0.0 > INFO [Wed Jul 27 10:01:59 2016]iscsi_ipc eth2, using existing NIC > INFO [Wed Jul 27 10:01:59 2016]nic_utils looking for uio device for eth2 > INFO [Wed Jul 27 10:01:59 2016]nic_utils eth2 associated with uio0 > INFO [Wed Jul 27 10:01:59 2016]eth2: Have NIC library 'bnx2x' > INFO [Wed Jul 27 10:01:59 2016]Done capturing /sys/class/uio/uio0/name > INFO [Wed Jul 27 10:01:59 2016]nic_utils eth2: Verified uio name > bnx2x_cnic with library bnx2x > INFO [Wed Jul 27 10:01:59 2016]eth2: found NIC with library 'bnx2x' > INFO [Wed Jul 27 10:01:59 2016]iscsi_ipc eth2 library set using > transport_name bnx2i > INFO [Wed Jul 27 10:01:59 2016]iscsi_ipc eth2: requesting configuration > using static IP address > INFO [Wed Jul 27 10:01:59 2016]iscsi_ipc eth2: using existing network > interface > INFO [Wed Jul 27 10:01:59 2016]iscsi_ipc eth2: IP configuration didn't > change using 0x2 > INFO [Wed Jul 27 10:01:59 2016]iscsi_ipc eth2: NIC already enabled flags: > 0x1084 state: 0x4 > > INFO [Wed Jul 27 10:01:59 2016]iscsi_ipc ISCSID_UIP_IPC_GET_IFACE: > command: 1 name: bnx2i.d0:43:1e:51:98:53, netdev: eth2 ipaddr: 10.0.1.62 > vlan: 0 transport_name:bnx2i > DBG [Wed Jul 27 10:01:59 2016]iscsi_ipc Waiting for iscsid command > INFO [Wed Jul 27 10:02:00 2016]NIC_NL Received path_req for host 12 > INFO [Wed Jul 27 10:02:00 2016]Done capturing > /sys/class/iscsi_host/host12/netdev > DBG [Wed Jul 27 10:02:00 2016]NIC_NL Pulled nl event > INFO [Wed Jul 27 10:02:00 2016]NIC_NL eth2: Processing 'path_req' > DBG [Wed Jul 27 10:02:00 2016]NIC_NL eth2: PATH_REQ with iface_num -1 > VLAN 32768 > DBG [Wed Jul 27 10:02:00 2016]CNIC eth2: Netlink message with VLAN ID: > 0, path MTU: 9000 minor: 0 ip_addr_len: 4 > DBG [Wed Jul 27 10:02:00 2016]CNIC eth2: src=10.0.1.62 > DBG [Wed Jul 27 10:02:00 2016]CNIC eth2: dst=10.0.1.2 > DBG [Wed Jul 27 10:02:00 2016]CNIC eth2: nm=255.255.255.0 > INFO [Wed Jul 27 10:02:00 2016]CNIC eth2: Didn't find IPv4: '10.0.1.2' in > ARP table > DBG [Wed Jul 27 10:02:00 2016]CNIC eth2: Sent cnic arp request for IP: > 10.0.1.2 > INFO [Wed Jul 27 10:02:00 2016]Found 10.0.1.2 at b0:83:fe:cc:57:bb > DBG [Wed Jul 27 10:02:00 2016]CNIC neighbor reply sent back to kernel > 10.0.1.62 at b0:83:fe:cc:57:bb with vlan 0 > INFO [Wed Jul 27 10:02:00 2016]NIC_NL eth2: 'path_req' operation finished > > Program received signal SIGSEGV, Segmentation fault. > [Switching to Thread 0x7ffff660d700 (LWP 4947)] > __lll_unlock_elision (lock=0x55555577fd40, private=0) at > ../nptl/sysdeps/unix/sysv/linux/x86/elision-unlock.c:29 > 29 ../nptl/sysdeps/unix/sysv/linux/x86/elision-unlock.c: No such file > or directory. > > > (gdb) info threads > Id Target Id Frame > 6 Thread 0x7ffff560b700 (LWP 4949) "iscsiuio" 0x00007ffff76f1ae3 in > select () at ../sysdeps/unix/syscall-template.S:81 > * 4 Thread 0x7ffff660d700 (LWP 4947) "iscsiuio" __lll_unlock_elision > (lock=0x55555577fd40, private=0) at > ../nptl/sysdeps/unix/sysv/linux/x86/elision-unlock.c:29 > 3 Thread 0x7ffff6e0e700 (LWP 4943) "iscsiuio" 0x00007ffff79c9ccd in > accept () at ../sysdeps/unix/syscall-template.S:81 > 2 Thread 0x7ffff760f700 (LWP 4942) "iscsiuio" do_sigwait > (set=<optimized out>, sig=0x7ffff760eeac) at > ../nptl/sysdeps/unix/sysv/linux/../../../../../sysdeps/unix/sysv/linux/sigwait.c:63 > > > 1 Thread 0x7ffff7fea700 (LWP 4938) "iscsiuio" 0x00007ffff79c9e9d in > recvmsg () at ../sysdeps/unix/syscall-template.S:81 > > > (gdb) thread apply all bt > > Thread 6 (Thread 0x7ffff560b700 (LWP 4949)): > #0 0x00007ffff76f1ae3 in select () at > ../sysdeps/unix/syscall-template.S:81 > #1 0x000055555555ac06 in ?? () > #2 0x000055555555d39e in nic_loop () > #3 0x00007ffff79c30a4 in start_thread (arg=0x7ffff560b700) at > pthread_create.c:309 > #4 0x00007ffff76f887d in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 > > Thread 4 (Thread 0x7ffff660d700 (LWP 4947)): > #0 __lll_unlock_elision (lock=0x55555577fd40, private=0) at > ../nptl/sysdeps/unix/sysv/linux/x86/elision-unlock.c:29 > #1 0x00007ffff79c7007 in pthread_cond_wait@@GLIBC_2.3.2 () at > ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:94 > #2 0x000055555555e803 in nl_process_handle_thread () > #3 0x00007ffff79c30a4 in start_thread (arg=0x7ffff660d700) at > pthread_create.c:309 > #4 0x00007ffff76f887d in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 > > Thread 3 (Thread 0x7ffff6e0e700 (LWP 4943)): > #0 0x00007ffff79c9ccd in accept () at > ../sysdeps/unix/syscall-template.S:81 > #1 0x00005555555641b0 in ?? () > #2 0x00007ffff79c30a4 in start_thread (arg=0x7ffff6e0e700) at > pthread_create.c:309 > #3 0x00007ffff76f887d in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 > > Thread 2 (Thread 0x7ffff760f700 (LWP 4942)): > #0 do_sigwait (set=<optimized out>, sig=0x7ffff760eeac) at > ../nptl/sysdeps/unix/sysv/linux/../../../../../sysdeps/unix/sysv/linux/sigwait.c:63 > > > #1 0x00007ffff79ca693 in __sigwait (set=0x7ffff760eeb0, sig=0x0) at > ../nptl/sysdeps/unix/sysv/linux/../../../../../sysdeps/unix/sysv/linux/sigwait.c:97 > > > #2 0x000055555555a49c in _start () > > Thread 1 (Thread 0x7ffff7fea700 (LWP 4938)): > #0 0x00007ffff79c9e9d in recvmsg () at > ../sysdeps/unix/syscall-template.S:81 > #1 0x000055555555e5e9 in ?? () > #2 0x000055555555eea8 in nic_nl_open () > #3 0x000055555555a1b8 in main () > -- You received this message because you are subscribed to the Google Groups "open-iscsi" group. To unsubscribe from this group and stop receiving emails from it, send an email to open-iscsi+unsubscr...@googlegroups.com. To post to this group, send email to open-iscsi@googlegroups.com. Visit this group at https://groups.google.com/group/open-iscsi. For more options, visit https://groups.google.com/d/optout.