Following Vazer's initial post to the storage forum,
there followed a number of email directly between Vazer
and Peter Dunlap & myself, which I want to summarise here...

Vazer did a great job collecting Ethernet packet capture trace files, 
enabling a comparison of the behaviour of the Comstar iscsi target,
with the Microsoft iscsi target and the NetApp iscsi target.

Vazer identified a couple of scenario where the Comstar target failed.

Vazer emailed the capture files to Peter Dunlap & myself,and it was
interesting to analyse these files and see where they went wrong.

:--------:

First let's look at Vazer's "Not Working Scenario 2":
"The Physical Server is booted with the iSCSI Offload NIC enabled in BIOS."

Vazer was trying to boot Windows 2003 server from the LUN,
and it stalled saying "NTDETECT failed".

When I looked at the trace, I could see the iscsi boot started fine,
with many 'Read(10)'s with transfer lengths of 1, 2, or 8 block.
Then a point was reached where the initiator did a read with a
requested transfer length of 20 blocks.
At which point the Comstar target responded with scsi status code 0x28
meaning 'Task Set Full'

http://en.wikipedia.org/wiki/SCSI_Status_Code

Vazer tried the iscsi boot afew times and the trace files always
stopped in exactly the same place.

Peter Dunlap identified the problem as happening because
the NIC was negotiating a very small value for MaxBurstLength (0x2000)
The stall occurred when attempting a transfer of size 0x2800
(20 block, each of 512 bytes = 10240'd = 0x2800).

This is the issue tracked by bug 6867945, for which a fix was recently putback.

:--------:

Now let's look at Vazer's "Not Working Scenario 1":
"Server booted with Windows 2008 with HyperV, but the MS iSCSI Initiator
connects to the target using the broadcom multifunction NIC with its offload 
iSCSI
capabilities. There is no LUN recognized by the Initiator.
With a NetApp iScsi target or Microsoft target this works fine."

In the trace file I could see the initiator sending 'Report LUNs'
three times, with allocation length of 16.
Each time it gets the same response from the comstar target,
'Status Good' and LUN List length 8, LUN 0.
(Which is valid, but the initiator does not seem to like it.)

This is then followed by the initiator sending 'Inquiry LUN' three times, 
all with EVPD zero, page zero, and allocation length 36.
Each time it gets the same response from the comstar target,
That response is 'Status Good'.(Which again should be ok.)
And at this stage it stalls.

The equivalent traces for the Microsoft and NetApp targets give
the same response to 'Report LUNS', but then the initiator continues on Ok
and the only difference I could spot was they set the status flag 'S'
to say that status accompanies the Data-In pdu.

Peter Dunlap thought the cause of the problem may be the issues tracked by
bug 6818484, which was fixed in snv_113.

However Vazer retried the scenario with snv_117, and still had the problem.
AFAIK the issue is unresolved.

Best Regards
Nigel Smith
-- 
This message posted from opensolaris.org
_______________________________________________
storage-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/storage-discuss

Reply via email to