Evan Broder wrote:
> Evan Broder wrote:
>> Ok - I'm running a few of my VMs, and setup some simple monitoring to
>> let me know if an error gets logged. I'll get back to the list if/when I
>> find anything.
>>
>> - Evan
> 
> I'm no longer seeing the "detected conn error", but I am still seeing
> the symptoms of the problems: on a multiple of 5 minutes past the hour,
> VMs become non-responsive for 30-60 seconds. The logs with all of the

Are you doing iscsi root?

How do you know they become unresponsive? Do you have a ssh session and 
at this time you cannot contact the vm or do you have a console on the 
vm and at that time you cannot run any commands?

> debugging options turned on are a bit...overwhelming, but I've posted
> the logs from an incident today. The incident happened at 14:05; I've
> included the logs from 14:04-14:06 at
> http://web.mit.edu/broder/Public/iscsi/citadel-station-2008-12-12-14-05.bz2
> 


It looks from Dec 12 14:05:09 citadel-station kernel: [107789.172478] 
iscsi: done [sc ffff88012d96ca40 res 0 itt 0xf]

to

Dec 12 14:05:38 citadel-station kernel: [107938.402859] iscsi: aborting 
[sc ffff8801287c91c0 itt 0x7c]

The only IO we see is the target sending us a nop (iscsi ping) and us 
responding. At 14:05:38 a scsi command times out and so the scsi layer 
fires up its error handling which sends an abort that fails then sends a 
lu reset which unjams us and the initiator then resends the commands it 
thought was lost. Note that if the lu reset failed, we would have 
eventually seen the conn error message you saw before so we probably are 
hitting the right code paths.

The initiator thinks these two commands:
Dec 12 14:05:38 citadel-station kernel: [107938.405876] iscsi: failing 
in progress sc ffff8801287c91c0 itt 0x7c
Dec 12 14:05:38 citadel-station kernel: [107938.405881] iscsi: failing 
in progress sc ffff8801274b6440 itt 0x5c

did not complete. But I am betting the target thinks it completed it 
since the target seems fine and is still pinging us.


The iscsi logs show we did not get the command response. So either the 
target droppped a packet or the network dropped something or the 
initiator dropped it in the network processing.

Could you run this again but get a ethereal/wireshark trace at the same 
time? That way we can see if the network layer is getting the packet.


Could you also run with this patch? In the open-iscsi source do

patch -p1 -i path-to-patch/remove-undef-tcp.patch

then rebuild with

make DEBUG_TCP=1 DEBUG_SCSI=1
make DEBUG_TCP=1 DEBUG_SCSI=1 install


One other question. Is the iscsi initiator running in the vm or in the 
host? Is this xen or something else like vmware? Could you take the 
ethereal/wireshark trace from wherever iscsi is running so we can see 
what its network layer is getting? If it is running in a guest and you 
can get a ethereal/wireshark trace from both the guest and host that 
would be nice too.

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~----------~----~----~----~------~----~------~--~---

diff --git a/kernel/iscsi_tcp.c b/kernel/iscsi_tcp.c
index d074146..69286d0 100644
--- a/kernel/iscsi_tcp.c
+++ b/kernel/iscsi_tcp.c
@@ -48,7 +48,6 @@ MODULE_AUTHOR("Dmitry Yusupov <dmitry_...@yahoo.com>, "
              "Alex Aizman <itn...@yahoo.com>");
 MODULE_DESCRIPTION("iSCSI/TCP data-path");
 MODULE_LICENSE("GPL");
-#undef DEBUG_TCP
 #define DEBUG_ASSERT
 
 #ifdef DEBUG_TCP

Reply via email to