Nigel, Rick is currently on vacation and will return Monday, April 2nd. You can expect a response sometime next week. He will be buried with email so it may take awhile to get to this.

-Ken

On Mar 29, 2007, at 9:15 AM, Nigel Smith wrote:

Rafael, thanks for posting your mdb backtrace, which I think is quite interesting.

The warning you are seeing in '/var/adm/messages' from the
initiator, about 'received an unsupported opcode:0x41' is the
exact same as what occurred on my PC, which caused a core dump.

Also the 'stack backtraces' from core file you have,
is similar to the problem I was seeing.
(I was using snv_54 at that time.)

Ok, let's compare the two set's of stack backtraces:

In your case Rafael, the mdb trace show:
(Deepest level shown first)

libc.so.1`_assert+0x6e(8075568, 8075558, 265)
t10_cmd_state_machine+0xcb(80954f8, 3)
trans_aioread+0x4c(80954f8, 809b5f0, 2000, 5000, 0, 80942b0)
raw_read+0x4a7(80954f8, 808fb98, 10)
raw_cmd+0x23(80954f8, 808fb98, 10)
lu_runner+0x79b(8092070)

In the case of my earlier core dump, I saw:

libc.so.1`_assert+0x6e(807675c, 807674c, 259)
t10_cmd_state_machine+0x25e(80d3950, 2)
t10_cmd_shoot_event+0x53(80d3950, 2)
trans_send_complete+0x6c(80d3950, 0)
spc_mselect_data+0x92(80d3950, 80b2ab0, 0, 80b2ab0, 20)
sbc_data+0x2b(80d3950, 80b2ab0, 0, 80b2ab0, 20)
trans_rqst_dataout+0x142(80d3950, 80b2ab0, 20, 0, 80b2ab0, 8069c98)
spc_mselect+0x54(80d3950, 80a1ad8, 10)
sbc_cmd+0x23(80d3950, 80a1ad8, 10)
lu_runner+0x79b(80afdd0)

Based on my (limited) understanding of the code, what is happening
is that the 't10_cmd_state_machine' routine, is detecting a state transition
 that should never, in-theory happen, and so it issues an 'assert',
 which aborts the program, and creates the core dump.
 (And then SMF restarts 'iscsitgtd', and it keeps looping as it
 tries again to login.)

Rick McNeal, the author of the code, in the case of my core dump, traced the
 bug to the routine "spc_mselect_data()", which is line 501 in
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/cmd/ iscsi/iscsitgtd/t10_spc.c
where he had a "break;" which should have be a "return;".

In your case Rafael, maybe there is a problem in "trans_aioread()",
which is line 1249 in
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/cmd/ iscsi/iscsitgtd/t10_sam.c
or in "raw_read()",
which is line 282 in
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/cmd/ iscsi/iscsitgtd/t10_raw_if.c

Rafael, the other useful thing you could try is to use the 'truss' command.
See my post here:
http://mail.opensolaris.org/pipermail/storage-discuss/2007-February/ 000801.html

You can use 'truss' to trace the routines called by a running process - in this case 'iscsitgt'. (The parameter -p specifies the process id, and -o specifies the output file name. And the parameter '-u a.out' means trace 'user- level' functions.)

In the truss output file, when iscsitgtd fails you should see the " A s s e r t i o n", and the interesting part will be the lines leading up to that, which should show the routines being called in the iscsitgt code. This should further help to trace
 what is going wrong.

Of course, what we really need is for Rick to jump in here and give his view.
I guess maybe he is busy or on holiday at the moment.

And of course, all the above is really just to help Rick find the bug and squash it. We would then need to ask him to release a new set of packages, like he did here: http://mail.opensolaris.org/pipermail/storage-discuss/2007-January/ 000748.html

Ok, I look forward to seeing the truss output.
Thanks
Nigel Smith
http://nwsmith.blogspot.com/

Oh, by the way, I have found some good links concerning 'mdb',
which I have posted here:
http://del.icio.us/nwsmith/solaris-mdb


This message posted from opensolaris.org
_______________________________________________
storage-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/storage-discuss

Ken Davis - Manager
Sun Microsystems
New Solaris Storage Group
work: 303.395.4168
cell: 720.837.5818



_______________________________________________
storage-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/storage-discuss

Reply via email to