Re: NFS hard semantics wanted: how to?

Mike Christie Wed, 22 Dec 2010 10:12:42 -0800

On 12/22/2010 05:57 AM, torn5 wrote:

Hello open-iscsi people
I am approaching iscsi, and I am currently doing some "reliability" tests.


In particular I would like to be able to reboot the target machine
without the initiators to lose data.
Like NFS hard mounts.

If the target goes down:
1) I want the device to be frozen so that applications get stuck while
trying to access the device
2) and when the target comes up again, I want the in-flight commands to
be re-played back to the target so that no data is lost.

I was able to obtain part 1, by increasing the replacement_timeout to a
high enough value.

However it seems I cannot obtain part 2 because there are still errors
in the dmesg. I think this is due to the lost inflight commands (my
guess.... from what's written in the README).

These are the errors I see:
[31291.360009] EXT4-fs (sdd1): error count: 10
[31291.360013] EXT4-fs (sdd1): initial error at 1292972264:
ext4_remount:3755
[31291.360015] EXT4-fs (sdd1): last error at 1292976117: ext4_put_super:719
They look harmful...


Firstly I don't understand why open-iscsi does not requeue inflight
commands by itself as soon as it blocks the device for connection lost.

This is what is done. When the connection problem (caused by targetreset in your case) is detected, the iscsi layer blocks the devices.Then it fails IO that was running to the scsi layer which tells the scsilayer to requeue if it can (for tape and passthrough like sg io youcould not retry but for disk commands you would retry up to 5 times).Because the devices are blocked all new IO and requeued IO then sits inthe queue until we unblock the device queue.

If the replacement timeout expires before we can reconnect, the deviceis unblocked and everything in the device queue is failed and any new IOis failed.

If we reconnect within the replacement timeout period, we unblock thequeue and we run IO that was requeued during the problem detection andthen run then new IOs.

It seems the braindead obvious solution to me. Then, if the
replacement_timeout expires, all commands (inflight and queued) should
be failed together to the above layer. I don't understand why they


They don't.

should get a different treatment.


Secondly, I read in the docs that SCSI commands are retried 5 times.
Ok good! then I don't understand why ext4 still sees data loss. I was
doing cycles of
...
stop target service
wait 15 secs
start target service
wait 15 secs
...
(the initiator in the meanwhile is untarring tens of thousands of files
from a kernel tar in a forever loop)


In just 15 seconds I cannot believe the scsi commands could really fail
5 times, that would be a 3 seconds timeout, it's too low...

Could you send the rest of your /var/log/messages? It should have somescsi error code info and block layer error info.


Could you also turn on iscsi eh debugging

echo 1 > /sys/module/libiscsi/parameters/debug_libiscsi_eh

before running your test? That sends more logging info to /var/log/messages.


And also when the SCSI layer resubmits the command (second submission)
the device is blocked so the command should get stuck in the queue and
should stay there until connection is recovered (supposing a high enough
replacement_timeout) so the commands should not fail more than once.
Then why the errors?

I have even increased the /sys/block/sdX/device/timeout to a very high
value. That's the timeout for SCSI isn't it?

That timeout only monitors if a command has been sent to the driver andnot completed within that timeout. When the problem is detected and werequeue IO, the timer is halted when it is requeued, then when IO isrestarted the timer is reset.


--
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.

Re: NFS hard semantics wanted: how to?

Reply via email to