Rafael, thanks for posting your mdb backtrace, which I think is
quite interesting.
The warning you are seeing in '/var/adm/messages' from the
initiator, about 'received an unsupported opcode:0x41' is the
exact same as what occurred on my PC, which caused a core dump.
Also the 'stack backtraces' from core file you have,
is similar to the problem I was seeing.
(I was using snv_54 at that time.)
Ok, let's compare the two set's of stack backtraces:
In your case Rafael, the mdb trace show:
(Deepest level shown first)
libc.so.1`_assert+0x6e(8075568, 8075558, 265)
t10_cmd_state_machine+0xcb(80954f8, 3)
trans_aioread+0x4c(80954f8, 809b5f0, 2000, 5000, 0, 80942b0)
raw_read+0x4a7(80954f8, 808fb98, 10)
raw_cmd+0x23(80954f8, 808fb98, 10)
lu_runner+0x79b(8092070)
In the case of my earlier core dump, I saw:
libc.so.1`_assert+0x6e(807675c, 807674c, 259)
t10_cmd_state_machine+0x25e(80d3950, 2)
t10_cmd_shoot_event+0x53(80d3950, 2)
trans_send_complete+0x6c(80d3950, 0)
spc_mselect_data+0x92(80d3950, 80b2ab0, 0, 80b2ab0, 20)
sbc_data+0x2b(80d3950, 80b2ab0, 0, 80b2ab0, 20)
trans_rqst_dataout+0x142(80d3950, 80b2ab0, 20, 0, 80b2ab0, 8069c98)
spc_mselect+0x54(80d3950, 80a1ad8, 10)
sbc_cmd+0x23(80d3950, 80a1ad8, 10)
lu_runner+0x79b(80afdd0)
Based on my (limited) understanding of the code, what is happening
is that the 't10_cmd_state_machine' routine, is detecting a state
transition
that should never, in-theory happen, and so it issues an 'assert',
which aborts the program, and creates the core dump.
(And then SMF restarts 'iscsitgtd', and it keeps looping as it
tries again to login.)
Rick McNeal, the author of the code, in the case of my core dump,
traced the
bug to the routine "spc_mselect_data()", which is line 501 in
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/cmd/
iscsi/iscsitgtd/t10_spc.c
where he had a "break;" which should have be a "return;".
In your case Rafael, maybe there is a problem in "trans_aioread()",
which is line 1249 in
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/cmd/
iscsi/iscsitgtd/t10_sam.c
or in "raw_read()",
which is line 282 in
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/cmd/
iscsi/iscsitgtd/t10_raw_if.c
Rafael, the other useful thing you could try is to use the 'truss'
command.
See my post here:
http://mail.opensolaris.org/pipermail/storage-discuss/2007-February/
000801.html
You can use 'truss' to trace the routines called by a running
process - in this
case 'iscsitgt'. (The parameter -p specifies the process id, and -o
specifies the
output file name. And the parameter '-u a.out' means trace 'user-
level' functions.)
In the truss output file, when iscsitgtd fails you should see the "
A s s e r t i o n", and the interesting part will be the lines
leading up to that, which should show
the routines being called in the iscsitgt code. This should
further help to trace
what is going wrong.
Of course, what we really need is for Rick to jump in here and give
his view.
I guess maybe he is busy or on holiday at the moment.
And of course, all the above is really just to help Rick find the
bug and squash it.
We would then need to ask him to release a new set of packages,
like he did here:
http://mail.opensolaris.org/pipermail/storage-discuss/2007-January/
000748.html
Ok, I look forward to seeing the truss output.
Thanks
Nigel Smith
http://nwsmith.blogspot.com/
Oh, by the way, I have found some good links concerning 'mdb',
which I have posted here:
http://del.icio.us/nwsmith/solaris-mdb
This message posted from opensolaris.org
_______________________________________________
storage-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/storage-discuss