Re: [Open/iSCSI] Memory leak in repetitive --login/--logout with v2.0-870.1
Nicholas A. Bellinger wrote: Greetings Mike, Hannes and Co, During some recent testing using the Open/iSCSI Initiator v2.0-870.1, against the LIO-Target v3.0 tree, I noticed that while running the following script: while [ 1 ]; do iscsiadm -m node -T $TARGETNAME -p $PORTAL --login iscsiadm -m node -T $TARGETNAME -p $PORTAL --logout done for an extended period of time that I started getting OOM failures on the VMs running Open/iSCSI. Upon closer examination, this is what I found: Open-iSCSI Node 1 Linux ubuntu 2.6.27.10 #2 SMP Tue Jan 6 18:33:00 PST 2009 i686 GNU/Linux Using open-iscsi-2.0-870.1: [78196.520214] scsi7981 : iSCSI Initiator over TCP/IP [78284.175307] scsi7982 : iSCSI Initiator over TCP/IP [78338.568656] scsi7983 : iSCSI Initiator over TCP/IP [78405.22] scsi7984 : iSCSI Initiator over TCP/IP Hey, so are there any devices on the target? I do not see the normal type/size info we see when scsi disks are found. Just checking. That rules a lot of places out. If there are disks, but they just are not gettting logged could you remove them from the target so we can take some structs out of the mix? Output from slaptop: OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 1037001 1036598 99%0.03K 9177 113 36708K size-32 - Open-iSCSI Node 2 Linux opensuse 2.6.22.5-31-default #1 SMP 2007/09/21 22:29:00 UTC i686 i686 i386 GNU/Linux scsi7046 : iSCSI Initiator over TCP/IP scsi7047 : iSCSI Initiator over TCP/IP scsi7048 : iSCSI Initiator over TCP/IP scsi7049 : iSCSI Initiator over TCP/IP Output from slabtop: OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 914057 913581 99%0.03K 8089 113 32356K size-32 - So it appears that memory is getting leaked in the size-32 range with each --login + --logout invocation. I also tried the same test with the shipping Open/iSCSI code in Debian v4 and OpenSuse 10.3 and these also suffer from the same issue. Also of interest is that running the following script for Discovery SendTargets *DOES NOT* reproduce the leak. while [ 1 ]; do iscsiadm -m discovery -t sendtargets -p $PORTAL done The leak in the size-32 slab would be a kernel object right? if so the sendtargets test not leaking means that this is a problem in the session/connection kernel struct setup/destruction. The sendtargets code is all in userspace so it would not leak in those objects. I was out of the office sick last week, so let me catch up on work stuff then I will try to send a patch. If you want you could try to stick printks in iscsi driver model object release functions to make sure they are getting fired, but that gets nasty. Please let me know if there is anything else I can do to help diagnose the issue. Many thanks for your most valuable of time, --nab -- To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---
Re: connection, host resets, I/O errors eventually (DRBD, but not only)
Tomasz Chmielewski wrote: Mike Christie schrieb: The scsi layer sets a timeout on each command. If it does not execute in X seconds, it will run the iscsi eh. So you can increase the scsi command time: To modify the udev rule open /etc/udev/rules.d/50-udev.rules, and find the following lines: ACTION==add, SUBSYSTEM==scsi , SYSFS{type}==0|7|14, \ RUN+=/bin/sh -c 'echo 60 /sys$$DEVPATH/timeout' and you probably want to decrease the number of oustanding commands by setting the node.session.cmds_max for that session. With 50 kB/s you might as well set this to 1 command. This helps a bit, but after some time, something weird happens. I increased the timeout to 240 seconds. The data flows fine for some time, but after a couple of minutes, every program running on that initiator machine seems to freeze (i.e. ping stops to ping, top stops to refresh the data, they can't be interrupted / won't exit with ctrl+c). There is no traffic any more between the target and the initiator. The machine is a bit alive, as it replies to pings and responds to sysrq magic, and I can switch VTs (ctrl+alt+F1...). The machine has its root filesystem accessible via iSCSI (via fast LAN, to a different target) which can somehow contribute to the problem? It runs a 2.6.22 kernel. Some bad interaction if the initiator is connected to two targets with different IPs, and connection to one target is very slow? There should not be. Each session/connection to the target is going to get its own threads for sending IO. The receiving is done in the network softirq and cannot sleep or dominate the use. Did you set the queue limit lower too? If so did you do it globally (set it in iscsid.conf and discovery the targets) or did you run it for a specific sesssion (run iscsiadm -m node -T target -p ip:port -o update -n ..)? Maybe if you did it globally the lower queue depth is slowing the IO execution and affecting the apps. This is probably not the case though. I only know things like a big database not like its IO slowed down and I do not think other apps would notice the slow down as long as IO completes. Or were there any iscsi or IO messages in the logs? --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups open-iscsi group. To post to this group, send email to open-iscsi@googlegroups.com To unsubscribe from this group, send email to open-iscsi+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/open-iscsi -~--~~~~--~~--~--~---