On Thursday, July 28, 2016 at 2:16:12 AM UTC-7, Frank Fegert wrote:
>
> Hello all, 
>

Hi Frank. 

>
> disclaimer: i'm not a programmer, so the following might be utterly 
> and completely wrong ;-) 
>
> TL;DR: i'm getting segfaults from iscsiuio upon any target login. 
> Specifically this happens in iscsiuio/src/unix/nic_nl.c. Debugging 
> this lead me to believe this is a case of trying to unlock a not 
> locked mutex. A quick and dirty hack which works around this is 
> available here: 
>   
> https://github.com/frank-fegert/open-iscsi/commit/9f770f9eb0f302d146d455f1d68648e2d0172eb6
>  
>
> There is probably room for a proper fix (e.g. counter on the number of 
> locks?), which considers the semantics of the whole code. 
>

I looked at your suggested fix, and I think it's almost correct.

The caller should indeed have the mutex locked when calling
pthread_cond_wait. It's probably just luck this hasn't blown up before.

But I think it just needs one tweak. You need to call
pthread_mutex_unlock() in the error case, i.e. in the
"rc != 0" case, before calling LOG_ERR().

Can you submit a patch or pull request updated with this
change?

>
>
> The longer explanation: 
>
> My setup involves 6 Dell M630 hosts (host1, host{5,6,7,8,9}), all with 
> BCM57810 iSOEs. BCM57810 firmware, software (Debian 8) and targets are 
> exactly the same on all hosts. I'm using the Debian open-iscsi package, 
> which is rebuild in order to include iscsiuio and has the changes up to 
> Git commit 0fa43f29 - but excluding c6d1117b and 76832662 (externalization 
> of the open-isns code) - backported. The only difference is, host1 has 
> Intel E5 v3 CPUs, host{5,6,7,8,9} have Intel E5 v4 CPUs. 
>
> On host1 everything works fine, iscsiuio runs as expected, access to 
> targets is working flawlessly. 
> On host{5,6,7,8,9} i'm getting segfaults like the one in the example 
> shown below, while trying to log in to any target. 
>
> Searching for "__lll_unlock_elision" in conjunction with "pthread_*" 
> led me to the following resources: 
>   https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=800574 
>   https://lwn.net/Articles/534758/ 
>
> which are pointing towards a CPU (Broadwell and Skylake) specific pro- 
> blem when not carefully using mutexes. The general opinion from there 
> and other related bug reports seems to be, that the application be- 
> haviour should be changed, in order to fix such an issue. 
>
> Tracking this issue further down i ended up in the function "nl_process_ 
> handle_thread" in iscsiuio/src/unix/nic_nl.c and specifically here: 
>
> 486         rc = pthread_cond_wait(&nic->nl_process_cond, 
> 487                        &nic->nl_process_mutex); 
>
> From the pthread_cond_wait manpage: 
>   The pthread_cond_timedwait() and pthread_cond_wait() functions shall 
>   block on a condition variable. They shall be called with mutex locked 
>   by the calling thread or undefined behavior results. 
>
> On the first pass of the loop, this constraint seems to be true. At 
> the end of the loop at: 
>
> 499         pthread_mutex_unlock(&nic->nl_process_mutex); 
>
> the mutex is then unlocked. Thus - if i understand the code right - 
> the above constraint is no longer met on the subsequent passes of the 
> loop. On Intel E5 v3 this seemed to be tolerated and without any impact. 
> But on Intel E5 v4 (and other CPUs implementing HLE and RTM) this IMHO 
> causes the observed segfault. 
>
> Could someone more familiar with mutex handling in phtreads and/or the 
> semantics of the iscsiuio code please take a look at this? I'd be inter- 
> ested if my analysis is correct and whether my quick'n'dirty fix has 
> any major side-effects. And - of course - what a proper fix for the ob- 
> served segfault would look like ;-) 
>
> Thanks & best regards, 
>
>     Frank 
>
>
> host5:~# gdb /sbin/iscsiuio 
> GNU gdb (Debian 7.7.1+dfsg-5) 7.7.1 
> Copyright (C) 2014 Free Software Foundation, Inc. 
> License GPLv3+: GNU GPL version 3 or later <
> http://gnu.org/licenses/gpl.html> 
> This is free software: you are free to change and redistribute it. 
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying" 
> and "show warranty" for details. 
> This GDB was configured as "x86_64-linux-gnu". 
> Type "show configuration" for configuration details. 
> For bug reporting instructions, please see: 
> <http://www.gnu.org/software/gdb/bugs/>. 
> Find the GDB manual and other documentation resources online at: 
> <http://www.gnu.org/software/gdb/documentation/>. 
> For help, type "help". 
> Type "apropos word" to search for commands related to "word"... 
> Reading symbols from /sbin/iscsiuio...(no debugging symbols found)...done. 
>
> (gdb) run -d 4 -f 
> Starting program: /sbin/iscsiuio -d 4 -f 
> [Thread debugging using libthread_db enabled] 
> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". 
> INFO  [Wed Jul 27 10:01:45 2016]Initialize logger using log file: 
> /var/log/iscsiuio.log 
> INFO  [Wed Jul 27 10:01:45 2016]Started iSCSI uio stack: Ver 0.7.8.2 
> INFO  [Wed Jul 27 10:01:45 2016]Build date: Fri Jul 22 15:40:04 CEST 2016 
> INFO  [Wed Jul 27 10:01:45 2016]Debug mode enabled 
> INFO  [Wed Jul 27 10:01:45 2016]Running on sysname: 'Linux', release: 
> '3.16.0-4-amd64', version '#1 SMP Debian 3.16.7-ckt25-2+deb8u3 
> (2016-07-02)' machine: 'x86_64' 
> DBG   [Wed Jul 27 10:01:45 2016]Loaded nic library 'bnx2' Version: 
> '0.7.8.2' build on Fri Jul 22 15:40:04 CEST 2016' 
> DBG   [Wed Jul 27 10:01:45 2016]Added 'bnx2' nic library 
> DBG   [Wed Jul 27 10:01:45 2016]Loaded nic library 'bnx2x' Version: 
> '0.7.8.2' build on Fri Jul 22 15:40:04 CEST 2016' 
> DBG   [Wed Jul 27 10:01:45 2016]Added 'bnx2x' nic library 
> [New Thread 0x7ffff760f700 (LWP 4942)] 
> INFO  [Wed Jul 27 10:01:45 2016]signal handling thread ready 
> INFO  [Wed Jul 27 10:01:45 2016]nic_utils Found host[11]: host11 
> INFO  [Wed Jul 27 10:01:45 2016]Done capturing 
> /sys/class/iscsi_host/host11/netdev 
> INFO  [Wed Jul 27 10:01:45 2016]Done capturing 
> /sys/class/iscsi_host/host11/netdev 
> INFO  [Wed Jul 27 10:01:45 2016]nic_utils looking for uio device for eth3 
> WARN  [Wed Jul 27 10:01:45 2016]Could not find assoicate uio device with 
> eth3 
> ERR   [Wed Jul 27 10:01:45 2016]nic_utils Could not determine UIO name for 
> eth3 
> INFO  [Wed Jul 27 10:01:45 2016]nic_utils Found host[12]: host12 
> INFO  [Wed Jul 27 10:01:45 2016]Done capturing 
> /sys/class/iscsi_host/host12/netdev 
> INFO  [Wed Jul 27 10:01:45 2016]Done capturing 
> /sys/class/iscsi_host/host12/netdev 
> INFO  [Wed Jul 27 10:01:45 2016]nic_utils looking for uio device for eth2 
> INFO  [Wed Jul 27 10:01:45 2016]nic_utils eth2 associated with uio0 
> INFO  [Wed Jul 27 10:01:45 2016]nic_utils NIC not found creating an 
> instance for host_no: 12 eth2 
> DBG   [Wed Jul 27 10:01:45 2016]Could not increase process priority: 
> Success 
> [New Thread 0x7ffff6e0e700 (LWP 4943)] 
> DBG   [Wed Jul 27 10:01:45 2016]iscsi_ipc Started iscsid listening thread 
> DBG   [Wed Jul 27 10:01:45 2016]iscsi_ipc Waiting for iscsid command 
> INFO  [Wed Jul 27 10:01:45 2016]NIC_NL Netlink to CNIC on pid 4938 is 
> ready 
> DBG   [Wed Jul 27 10:01:57 2016]iscsi_ipc recv iscsid request: cmd: 1, 
> payload_len: 11720 
> INFO  [Wed Jul 27 10:01:57 2016]iscsi_ipc Received request for 'eth2' to 
> set IP address: '10.0.1.62' VLAN: '0' 
> INFO  [Wed Jul 27 10:01:57 2016]iscsi_ipc Using netmask: 0.0.0.0 
> INFO  [Wed Jul 27 10:01:57 2016]iscsi_ipc  eth2, using existing NIC 
> INFO  [Wed Jul 27 10:01:57 2016]nic_utils looking for uio device for eth2 
> INFO  [Wed Jul 27 10:01:57 2016]nic_utils eth2 associated with uio0 
> INFO  [Wed Jul 27 10:01:57 2016]Done capturing /sys/class/uio/uio0/name 
> INFO  [Wed Jul 27 10:01:57 2016]nic_utils eth2: Verified uio name 
> bnx2x_cnic with library bnx2x 
> INFO  [Wed Jul 27 10:01:57 2016]eth2: found NIC with library 'bnx2x' 
> INFO  [Wed Jul 27 10:01:57 2016]iscsi_ipc eth2 library set using 
> transport_name bnx2i 
> INFO  [Wed Jul 27 10:01:57 2016]iscsi_ipc eth2: requesting configuration 
> using static IP address 
> DBG   [Wed Jul 27 10:01:57 2016]iscsi_ipc eth2 couldn't find interface 
> with ip_type: 0x2 creating it 
> INFO  [Wed Jul 27 10:01:57 2016]nic eth2: Added nic interface for VLAN: 0, 
> protocol: 2 
> INFO  [Wed Jul 27 10:01:57 2016]iscsi_ipc eth2: created network interface 
> [New Thread 0x7ffff660d700 (LWP 4947)] 
> WARN  [Wed Jul 27 10:01:57 2016]nic_utils eth2: device already disabled: 
> flag: 0x1088 state: 0x1 
> INFO  [Wed Jul 27 10:01:57 2016]iscsi_ipc eth2: configuring using static 
> IP IPv4 address :10.0.1.62 
> INFO  [Wed Jul 27 10:01:57 2016]iscsi_ipc  netmask: 255.255.255.0 
> [New Thread 0x7ffff5e0c700 (LWP 4948)] 
> INFO  [Wed Jul 27 10:01:57 2016]iscsi_ipc ISCSID_UIP_IPC_GET_IFACE: 
> command: 1 name: bnx2i.d0:43:1e:51:98:53, netdev: eth2 ipaddr: 10.0.1.62 
> vlan: 0 transport_name:bnx2i 
> INFO  [Wed Jul 27 10:01:57 2016]nic_utils eth2: spinning up thread for nic 
> DBG   [Wed Jul 27 10:01:57 2016]iscsi_ipc Waiting for iscsid command 
> [New Thread 0x7ffff560b700 (LWP 4949)] 
> DBG   [Wed Jul 27 10:01:57 2016]nic eth2: Waiting to be enabled 
> INFO  [Wed Jul 27 10:01:57 2016]Created nic thread: eth2 
> INFO  [Wed Jul 27 10:01:57 2016]iscsi_ipc eth2: started NIC enable thread 
> state: 0x1 
> DBG   [Wed Jul 27 10:01:57 2016]nic eth2: is now enabled 
> INFO  [Wed Jul 27 10:01:57 2016]bnx2x eth2: bnx2x driver using version 
> 1.78.19 
> ERR   [Wed Jul 27 10:01:58 2016]bnx2x /dev/uio0: uio device has been 
> brought up via pid: 4938 on fd: 7 
> INFO  [Wed Jul 27 10:01:58 2016]Done capturing /sys/class/uio/uio0/name 
> INFO  [Wed Jul 27 10:01:58 2016]bnx2x eth2: Verified is a cnic_uio device 
> DBG   [Wed Jul 27 10:01:58 2016]bnx2x eth2: using rx ring size: 15, rx 
> buffer size: 1024 
> INFO  [Wed Jul 27 10:01:58 2016]Done capturing /sys/class/uio/uio0/event 
> DBG   [Wed Jul 27 10:01:58 2016]bnx2x Chip ID: 168e1000 
> INFO  [Wed Jul 27 10:01:58 2016]nic_id eth2: is found at 03:00.00 
> INFO  [Wed Jul 27 10:01:58 2016]bnx2x eth2: func 0x0, pfid 0x0, client_id 
> 0x88, cid 0x1 
> DBG   [Wed Jul 27 10:01:58 2016]bnx2x eth2: mode = 0x100 
> INFO  [Wed Jul 27 10:01:58 2016]bnx2x eth2:  Using mac address: 
> d0:43:1e:51:98:53 
> INFO  [Wed Jul 27 10:01:58 2016]eth2: bnx2x initialized 
> INFO  [Wed Jul 27 10:01:58 2016]nic eth2: Initialized ip stack: VLAN: 0 
> INFO  [Wed Jul 27 10:01:58 2016]nic eth2: mac: d0:43:1e:51:98:53 
> INFO  [Wed Jul 27 10:01:58 2016]nic eth2: Using IP address: 10.0.1.62 
> INFO  [Wed Jul 27 10:01:58 2016]nic eth2: Using netmask: 255.255.255.0 
> INFO  [Wed Jul 27 10:01:58 2016]nic eth2: enabled vlan 0 protocol: 2 
> INFO  [Wed Jul 27 10:01:58 2016]nic eth2: entering main nic loop 
> DBG   [Wed Jul 27 10:01:58 2016]nic_utils eth2: device enabled 
> [Thread 0x7ffff5e0c700 (LWP 4948) exited] 
> DBG   [Wed Jul 27 10:01:59 2016]iscsi_ipc recv iscsid request: cmd: 1, 
> payload_len: 11720 
> INFO  [Wed Jul 27 10:01:59 2016]iscsi_ipc Received request for 'eth2' to 
> set IP address: '10.0.1.62' VLAN: '0' 
> INFO  [Wed Jul 27 10:01:59 2016]iscsi_ipc Using netmask: 0.0.0.0 
> INFO  [Wed Jul 27 10:01:59 2016]iscsi_ipc  eth2, using existing NIC 
> INFO  [Wed Jul 27 10:01:59 2016]nic_utils looking for uio device for eth2 
> INFO  [Wed Jul 27 10:01:59 2016]nic_utils eth2 associated with uio0 
> INFO  [Wed Jul 27 10:01:59 2016]eth2: Have NIC library 'bnx2x' 
> INFO  [Wed Jul 27 10:01:59 2016]Done capturing /sys/class/uio/uio0/name 
> INFO  [Wed Jul 27 10:01:59 2016]nic_utils eth2: Verified uio name 
> bnx2x_cnic with library bnx2x 
> INFO  [Wed Jul 27 10:01:59 2016]eth2: found NIC with library 'bnx2x' 
> INFO  [Wed Jul 27 10:01:59 2016]iscsi_ipc eth2 library set using 
> transport_name bnx2i 
> INFO  [Wed Jul 27 10:01:59 2016]iscsi_ipc eth2: requesting configuration 
> using static IP address 
> INFO  [Wed Jul 27 10:01:59 2016]iscsi_ipc eth2: using existing network 
> interface 
> INFO  [Wed Jul 27 10:01:59 2016]iscsi_ipc eth2: IP configuration didn't 
> change using 0x2 
> INFO  [Wed Jul 27 10:01:59 2016]iscsi_ipc eth2: NIC already enabled flags: 
> 0x1084 state: 0x4 
>
> INFO  [Wed Jul 27 10:01:59 2016]iscsi_ipc ISCSID_UIP_IPC_GET_IFACE: 
> command: 1 name: bnx2i.d0:43:1e:51:98:53, netdev: eth2 ipaddr: 10.0.1.62 
> vlan: 0 transport_name:bnx2i 
> DBG   [Wed Jul 27 10:01:59 2016]iscsi_ipc Waiting for iscsid command 
> INFO  [Wed Jul 27 10:02:00 2016]NIC_NL Received path_req for host 12 
> INFO  [Wed Jul 27 10:02:00 2016]Done capturing 
> /sys/class/iscsi_host/host12/netdev 
> DBG   [Wed Jul 27 10:02:00 2016]NIC_NL Pulled nl event 
> INFO  [Wed Jul 27 10:02:00 2016]NIC_NL eth2: Processing 'path_req' 
> DBG   [Wed Jul 27 10:02:00 2016]NIC_NL eth2: PATH_REQ with iface_num -1 
> VLAN 32768 
> DBG   [Wed Jul 27 10:02:00 2016]CNIC eth2: Netlink message with VLAN ID: 
> 0, path MTU: 9000 minor: 0 ip_addr_len: 4 
> DBG   [Wed Jul 27 10:02:00 2016]CNIC eth2: src=10.0.1.62 
> DBG   [Wed Jul 27 10:02:00 2016]CNIC eth2: dst=10.0.1.2 
> DBG   [Wed Jul 27 10:02:00 2016]CNIC eth2: nm=255.255.255.0 
> INFO  [Wed Jul 27 10:02:00 2016]CNIC eth2: Didn't find IPv4: '10.0.1.2' in 
> ARP table 
> DBG   [Wed Jul 27 10:02:00 2016]CNIC eth2: Sent cnic arp request for IP: 
> 10.0.1.2 
> INFO  [Wed Jul 27 10:02:00 2016]Found 10.0.1.2 at b0:83:fe:cc:57:bb 
> DBG   [Wed Jul 27 10:02:00 2016]CNIC neighbor reply sent back to kernel 
> 10.0.1.62 at b0:83:fe:cc:57:bb with vlan 0 
> INFO  [Wed Jul 27 10:02:00 2016]NIC_NL eth2: 'path_req' operation finished 
>
> Program received signal SIGSEGV, Segmentation fault. 
> [Switching to Thread 0x7ffff660d700 (LWP 4947)] 
> __lll_unlock_elision (lock=0x55555577fd40, private=0) at 
> ../nptl/sysdeps/unix/sysv/linux/x86/elision-unlock.c:29 
> 29      ../nptl/sysdeps/unix/sysv/linux/x86/elision-unlock.c: No such file 
> or directory. 
>
>
> (gdb) info threads 
>   Id   Target Id         Frame 
>   6    Thread 0x7ffff560b700 (LWP 4949) "iscsiuio" 0x00007ffff76f1ae3 in 
> select () at ../sysdeps/unix/syscall-template.S:81 
> * 4    Thread 0x7ffff660d700 (LWP 4947) "iscsiuio" __lll_unlock_elision 
> (lock=0x55555577fd40, private=0) at 
> ../nptl/sysdeps/unix/sysv/linux/x86/elision-unlock.c:29 
>   3    Thread 0x7ffff6e0e700 (LWP 4943) "iscsiuio" 0x00007ffff79c9ccd in 
> accept () at ../sysdeps/unix/syscall-template.S:81 
>   2    Thread 0x7ffff760f700 (LWP 4942) "iscsiuio" do_sigwait 
> (set=<optimized out>, sig=0x7ffff760eeac) at 
> ../nptl/sysdeps/unix/sysv/linux/../../../../../sysdeps/unix/sysv/linux/sigwait.c:63
>  
>
>   1    Thread 0x7ffff7fea700 (LWP 4938) "iscsiuio" 0x00007ffff79c9e9d in 
> recvmsg () at ../sysdeps/unix/syscall-template.S:81 
>
>
> (gdb) thread apply all bt 
>
> Thread 6 (Thread 0x7ffff560b700 (LWP 4949)): 
> #0  0x00007ffff76f1ae3 in select () at 
> ../sysdeps/unix/syscall-template.S:81 
> #1  0x000055555555ac06 in ?? () 
> #2  0x000055555555d39e in nic_loop () 
> #3  0x00007ffff79c30a4 in start_thread (arg=0x7ffff560b700) at 
> pthread_create.c:309 
> #4  0x00007ffff76f887d in clone () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 
>
> Thread 4 (Thread 0x7ffff660d700 (LWP 4947)): 
> #0  __lll_unlock_elision (lock=0x55555577fd40, private=0) at 
> ../nptl/sysdeps/unix/sysv/linux/x86/elision-unlock.c:29 
> #1  0x00007ffff79c7007 in pthread_cond_wait@@GLIBC_2.3.2 () at 
> ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:94 
> #2  0x000055555555e803 in nl_process_handle_thread () 
> #3  0x00007ffff79c30a4 in start_thread (arg=0x7ffff660d700) at 
> pthread_create.c:309 
> #4  0x00007ffff76f887d in clone () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 
>
> Thread 3 (Thread 0x7ffff6e0e700 (LWP 4943)): 
> #0  0x00007ffff79c9ccd in accept () at 
> ../sysdeps/unix/syscall-template.S:81 
> #1  0x00005555555641b0 in ?? () 
> #2  0x00007ffff79c30a4 in start_thread (arg=0x7ffff6e0e700) at 
> pthread_create.c:309 
> #3  0x00007ffff76f887d in clone () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 
>
> Thread 2 (Thread 0x7ffff760f700 (LWP 4942)): 
> #0  do_sigwait (set=<optimized out>, sig=0x7ffff760eeac) at 
> ../nptl/sysdeps/unix/sysv/linux/../../../../../sysdeps/unix/sysv/linux/sigwait.c:63
>  
>
> #1  0x00007ffff79ca693 in __sigwait (set=0x7ffff760eeb0, sig=0x0) at 
> ../nptl/sysdeps/unix/sysv/linux/../../../../../sysdeps/unix/sysv/linux/sigwait.c:97
>  
>
> #2  0x000055555555a49c in _start () 
>
> Thread 1 (Thread 0x7ffff7fea700 (LWP 4938)): 
> #0  0x00007ffff79c9e9d in recvmsg () at 
> ../sysdeps/unix/syscall-template.S:81 
> #1  0x000055555555e5e9 in ?? () 
> #2  0x000055555555eea8 in nic_nl_open () 
> #3  0x000055555555a1b8 in main () 
>

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to open-iscsi+unsubscr...@googlegroups.com.
To post to this group, send email to open-iscsi@googlegroups.com.
Visit this group at https://groups.google.com/group/open-iscsi.
For more options, visit https://groups.google.com/d/optout.

Reply via email to