Re: [Open/iSCSI] Memory leak in repetitive --login/--logout with v2.0-870.1

2009-01-11 Thread Mike Christie

Nicholas A. Bellinger wrote:
 Greetings Mike, Hannes and Co,
 
 During some recent testing using the Open/iSCSI Initiator v2.0-870.1,
 against the LIO-Target v3.0 tree, I noticed that while running the
 following script:
 
 while [ 1 ]; do
   iscsiadm -m node -T $TARGETNAME -p $PORTAL --login
   iscsiadm -m node -T $TARGETNAME -p $PORTAL --logout
 done
 
 for an extended period of time that I started getting OOM failures on
 the VMs running Open/iSCSI.   Upon closer examination, this is what I
 found:
 
 Open-iSCSI Node 1
 
 Linux ubuntu 2.6.27.10 #2 SMP Tue Jan 6 18:33:00 PST 2009 i686 GNU/Linux
 
 Using open-iscsi-2.0-870.1:
 
 [78196.520214] scsi7981 : iSCSI Initiator over TCP/IP
 [78284.175307] scsi7982 : iSCSI Initiator over TCP/IP
 [78338.568656] scsi7983 : iSCSI Initiator over TCP/IP
 [78405.22] scsi7984 : iSCSI Initiator over TCP/IP
 


Hey, so are there any devices on the target? I do not see the normal 
type/size info we see when scsi disks are found. Just checking. That 
rules a lot of places out.

If there are disks, but they just are not gettting logged could you 
remove them from the target so we can take some structs out of the mix?


 Output from slaptop:
 
   OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME   
 1037001 1036598  99%0.03K   9177  113 36708K size-32
 
 -
 
 Open-iSCSI Node 2
 
 Linux opensuse 2.6.22.5-31-default #1 SMP 2007/09/21 22:29:00 UTC i686 i686 
 i386 GNU/Linux
 
 scsi7046 : iSCSI Initiator over TCP/IP
 scsi7047 : iSCSI Initiator over TCP/IP
 scsi7048 : iSCSI Initiator over TCP/IP
 scsi7049 : iSCSI Initiator over TCP/IP
 
 Output from slabtop:
 
   OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME   
 914057 913581  99%0.03K   8089  113 32356K size-32
 
 -
 
 So it appears that memory is getting leaked in the size-32 range with
 each --login + --logout invocation.  I also tried the same test with the
 shipping Open/iSCSI code in Debian v4 and OpenSuse 10.3 and these also
 suffer from the same issue.
 
 Also of interest is that running the following script for Discovery
 SendTargets *DOES NOT* reproduce the leak.
 
 while [ 1 ]; do
   iscsiadm -m discovery -t sendtargets -p $PORTAL
 done


The leak in the size-32 slab would be a kernel object right? if so the 
sendtargets test not leaking means that this is a problem in the 
session/connection kernel struct setup/destruction. The sendtargets code 
is all in userspace so it would not leak in those objects.


I was out of the office sick last week, so let me catch up on work stuff 
then I will try to send a patch. If you want you could try to stick 
printks in iscsi driver model object release functions to make sure they 
are getting fired, but that gets nasty.


 
 Please let me know if there is anything else I can do to help diagnose
 the issue.
 
 Many thanks for your most valuable of time,
 
 --nab
 
 
 
 --
 To unsubscribe from this list: send the line unsubscribe linux-scsi in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---



Re: connection, host resets, I/O errors eventually (DRBD, but not only)

2009-01-11 Thread Mike Christie

Tomasz Chmielewski wrote:
 Mike Christie schrieb:
 
 The scsi layer sets a timeout on each command. If it does not execute in 
 X seconds, it will run the iscsi eh.

 So you can increase the scsi command time:

 To modify the udev rule open /etc/udev/rules.d/50-udev.rules, and find the
 following lines:

 ACTION==add, SUBSYSTEM==scsi , SYSFS{type}==0|7|14, \
  RUN+=/bin/sh -c 'echo 60  /sys$$DEVPATH/timeout'


 and you probably want to decrease the number of oustanding commands by 
 setting the node.session.cmds_max for that session. With 50 kB/s you 
 might as well set this to 1 command.
 
 This helps a bit, but after some time, something weird happens.
 
 
 I increased the timeout to 240 seconds.
 
 The data flows fine for some time, but after a couple of minutes, every
 program running on that initiator machine seems to freeze (i.e. ping
 stops to ping, top stops to refresh the data, they can't be
 interrupted / won't exit with ctrl+c).
 There is no traffic any more between the target and the initiator.
 
 The machine is a bit alive, as it replies to pings and responds to
 sysrq magic, and I can switch VTs (ctrl+alt+F1...).
 
 
 The machine has its root filesystem accessible via iSCSI (via fast LAN,
 to a different target) which can somehow contribute to the problem? It 
 runs a 2.6.22 kernel.
 Some bad interaction if the initiator is connected to two targets with
 different IPs, and connection to one target is very slow?
 

There should not be. Each session/connection to the target is going to 
get its own threads for sending IO. The receiving is done in the network 
softirq and cannot sleep or dominate the use.

Did you set the queue limit lower too? If so did you do it globally (set 
it in iscsid.conf and discovery the targets) or did you run it for a 
specific sesssion (run iscsiadm -m node -T target -p ip:port -o update 
-n ..)? Maybe if you did it globally the lower queue depth is 
slowing the IO execution and affecting the apps. This is probably not 
the case though. I only know things like a big database not like its IO 
slowed down and I do not think other apps would notice the slow down as 
long as IO completes.

Or were there any iscsi or IO messages in the logs?

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~--~~~~--~~--~--~---