Mike or others,

I would like to enable a small (yes, feel free to snicker here...) bit of 
debugging on one of our systems to try to capture tcp window sizes (so that I 
can attempt to trend some behaviors).  Mike, recall awhile ago you had 
mentioned that the following will help us gather more information:

  echo 1 > /sys/module/libiscsi2/parameters/debug_libiscsi
  echo 1 > /sys/module/libiscsi_tcp/parameters/debug_libiscsi_tcp
  echo 1 > /sys/module/iscsi_tcp/parameters/debug_iscsi_tcp

I believe that the only one that I really need to turn on to get more 
information about size of data being sent would be this one:

  echo 1 > /sys/module/libiscsi_tcp/parameters/debug_libiscsi_tcp

as it gives me something like this when I run this command:
[root@oim61024006 ~]# echo 1 > 
/sys/module/libiscsi_tcp/parameters/debug_libiscsi_tcp; sleep 1; dd 
if=/dev/zero of=/var/ovs/mount/DCA44DBDD7944327A7945E96BF8F7CCC/joetest.dd.img 
bs=1K count=1; echo 0 > /sys/module/libiscsi_tcp/parameters/debug_libiscsi_tcp
1+0 records in
1+0 records out
1024 bytes (1.0 kB) copied, 3.7e-05 seconds, 27.7 MB/s
[root@oim61024006 ~]# 

That produced this in /var/log/messages:
Apr 28 15:26:07 oim61024006 kernel:  connection22:0: iscsi_tcp_task_init task 
deq [itt 0x1c6 imm 0 unsol 0]
Apr 28 15:26:07 oim61024006 kernel:  connection22:0: iscsi_tcp_segment_done 
copied 0 0 size 48 xmit
Apr 28 15:26:07 oim61024006 kernel:  connection22:0: iscsi_tcp_segment_done 
copied 0 48 size 48 xmit
Apr 28 15:26:07 oim61024006 kernel:  connection22:0: iscsi_tcp_segment_done 
total copied 48 total size 48
Apr 28 15:26:07 oim61024006 kernel:  connection22:0: iscsi_tcp_recv_skb in 560 
bytes
Apr 28 15:26:07 oim61024006 kernel:  connection22:0: iscsi_tcp_recv_skb skb 
deace200 ptr=dd84c800 avail=560
Apr 28 15:26:07 oim61024006 kernel:  connection22:0: iscsi_tcp_segment_done 
copied 0 0 size 48 recv
Apr 28 15:26:07 oim61024006 kernel:  connection22:0: iscsi_tcp_segment_recv 
copying 48
Apr 28 15:26:07 oim61024006 kernel:  connection22:0: iscsi_tcp_segment_done 
copied 0 48 size 48 recv
Apr 28 15:26:07 oim61024006 kernel:  connection22:0: iscsi_tcp_segment_done 
total copied 48 total size 48
Apr 28 15:26:07 oim61024006 kernel:  connection22:0: iscsi_tcp_recv_skb segment 
done

---> and I'm thinking that this is actually where my dd starts going:
Apr 28 15:26:07 oim61024006 kernel:  connection22:0: iscsi_tcp_hdr_dissect 
opcode 0x25 ahslen 0 datalen 512
Apr 28 15:26:07 oim61024006 kernel:  connection22:0: iscsi_tcp_hdr_dissect 
iscsi_tcp_begin_data_in( offset=0, datalen=512)
Apr 28 15:26:07 oim61024006 kernel:  connection22:0: iscsi_tcp_recv_skb in 512 
bytes
Apr 28 15:26:07 oim61024006 kernel:  connection22:0: iscsi_tcp_recv_skb skb 
deace200 ptr=dd84c830 avail=512
Apr 28 15:26:07 oim61024006 kernel:  connection22:0: iscsi_tcp_segment_done 
copied 0 0 size 512 recv
Apr 28 15:26:07 oim61024006 kernel:  connection22:0: iscsi_tcp_segment_recv 
copying 512
Apr 28 15:26:07 oim61024006 kernel:  connection22:0: iscsi_tcp_segment_done 
copied 0 512 size 512 recv
Apr 28 15:26:07 oim61024006 kernel:  connection22:0: iscsi_tcp_segment_done 
total copied 512 total size 512
Apr 28 15:26:07 oim61024006 kernel:  connection22:0: iscsi_tcp_recv_skb segment 
done
Apr 28 15:26:07 oim61024006 kernel:  connection22:0: iscsi_tcp_hdr_recv_prep 
(digest disabled)
Apr 28 15:26:07 oim61024006 kernel:  connection22:0: iscsi_tcp_recv_skb in 0 
bytes
Apr 28 15:26:07 oim61024006 kernel:  connection22:0: iscsi_tcp_recv_skb no more 
data avail. Consumed 0
--> and this is where it had finished?

Apr 28 15:26:07 oim61024006 kernel:  connection22:0: iscsi_tcp_segment_done 
copied 0 0 size 0 xmit
Apr 28 15:26:07 oim61024006 kernel:  connection22:0: iscsi_tcp_segment_done 
total copied 0 total size 0

I'm assuming that these "size 48 recv" might be some keepalive thing (nop 
pings, dm-multipath readsector 0, etc..)?


Would someone on the list be able to help me understand what these messages 
mean?  

For example, in the first one above:

1)  "iscsi_tcp_hdr_dissect opcode 0x25 ahslen 0 datalen 512" 

   0x25 = I don't need to know exactly, but I'm assuming this is telling the 
target to get ready for a write
   ahslen = some type of send length?
   datalen 512 = amount that the initiator will be sending in this segment?

2) "iscsi_tcp_hdr_dissect iscsi_tcp_begin_data_in( offset=0, datalen=512)"

   basically tells disk to start a 0 and write 512bytes?
   
3) "iscsi_tcp_recv_skb in 512 bytes"

   skb = send kb's?   

etc...

But what I'm really trying to pull out of this information is trying to trend 
how often the initiator is trying to xmit data, but is not allowed to.   Maybe 
there is also a way (outside of just tcpdumping... or even with tcpdump, with a 
tweaked filter so I don't get gigs of pcaps) to understand what the tcp window 
size is currently being set to.

I would then try to parse it after the fact to put the data in rrd graphs.

If there are any sites that explain this in more detail, I would happy to dig.  
The only issue I have is that I'm not a programmer... I haven't reviewed or 
coded in C in about 10 years (and even then I was just learning it in school).  
So if the site is too in depth, I may still be asking questions... but 
regardless, I am more than happy to hit the ground running and at least do the 
due diligence of reviewing the info if someone can point me in the right 
direction.

Thanks in advance to anyone for the help... it is certainly appreciated!

Thanks,
Joe


On Apr 15, 2011, at 2:31 PM, Mike Christie wrote:

> On 04/14/2011 02:17 PM, Mike Christie wrote:
>> This message means we have sent 2592 bytes out of 4096. The 0 means that
>> we are trying to send the rest of the data, but we are not able to sent
>> anything (0 is the number of bytes we were able to send when we did
>> sendpage/sendmsg). So send the full logs so I can see what else is in
>> there.
>> 
> 
> Ok in your current logs we have
> 
> Apr 13 11:07:23 oim61024008 kernel:  connection3:0: ping timeout of 5 secs 
> expired, recv timeout 5, last rx 559699, last ping 560949, now 562199
> 
> 
> Then looking over session3/connection3:0 we have
> 
> connection3:0: iscsi_tcp_segment_done copied 2632 0 size 4096 xmit
> 
> from Apr 13 11:07:23  to the middle of Apr 13 11:07:13.
> 
> So for 10 seconds we are trying to do sendpage but it keeps returning 0. For 
> some reason the network layer is just not sending new IO. The network layer 
> is probably retransmitting and so the network layer runs out of writespace.
> 
> Did you by any chance also take a network trace (tcpdump or 
> ethereal/wireshark)? Could you send it?
> 
> Can you run a rhel kernel? I can add some debugging in to the kernel.

-- 
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.

Reply via email to