Janwar Dinata wrote:
> Hi Mike, 

Hey,

> I've been playing some more with the node.session.timeo.replacement_timeout = 
> x.
> A good x for me is at least 30s.
> 
> The test that I am doing is creating and deleting a target for at least 10 
> minutes while traffic is running at the same time.
> I can adjust the timeout value that will allow the test to run without 
> disktest reporting io errors on me.
> 
> Attached is the what initiator /var/log/messages that during both successful 
> and failing disktest.
> If everything on the messages file is as expected, then I think your last 
> patch fix the problem I had.
> 

Sorry about that. I am going to add what these common errors mean to the 
README so it is clear.

Dec  5 11:49:57 computeB014a kernel:  connection5:0: ping timeout of 15 
secs expired, last rx 86492733, last ping 86502733, now 86517751

this is just means we sent a nop and did not get  response within 15 
seconds. It is expected if the target is gone and is not failing the nop 
or you did something like a cable pull where nops are not going to reach 
a target.

Dec  5 11:49:57 computeB014a kernel:  connection5:0: detected conn error 
(1011)
Dec  5 11:49:58 computeB014a iscsid: Kernel reported iSCSI connection 
1:0 error (1011) state (3)

Is just an indication that the connection was failed as a result of the 
nop timing out.


Dec  5 11:50:05 computeB014a iscsid: connection1:0 is operational after 
recovery (1 attempts)

We logged back in.


Dec  5 11:55:50 computeB014a iscsid: Target requests logout within 1 
seconds for connection

Target sent AEN, and we sent logout.

Dec  5 11:55:53 computeB014a iscsid: connect failed (111)

We sent the logout and tried to relogin. We could not because target was 
not there.


Dec  5 11:56:09 computeB014a kernel:  session4: session recovery timed 
out after 20 secs


This might concern you. It means that we tried to connect for 20 
(replacment_timeout) seconds, but could not log back in, so we are going 
to fail any IO running and any new IO. You would want a longer timeout 
to make sure IO that could get sent or is running at this time does not 
get failed.


Dec  5 11:56:31 computeB014a iscsid: connection8:0 is operational after 
recovery (10 attempts)

Target came back and we logged back in.


Dec  5 11:56:35 computeB014a iscsid: iSCSI daemon with pid=21738 started!
Dec  5 11:56:35 computeB014a iscsid: iscsid shutting down.
Dec  5 11:56:35 computeB014a kernel: Loading iSCSI transport class v2.0-870.
Dec  5 11:56:35 computeB014a kernel: iscsi: register
ed transport (tcp)

Looks like you restarted the iscsi service. You do not want to normally 
do this while sessions are running and disks are in use. It will force 
any IO that is running or queued to fail.

Then in the log it looks like you deleted targets and restarted the 
iscsi service some more.


Dec  5 12:17:30 computeB014a iscsid: Kernel reported iSCSI connection 
8:0 error (1011) state (3)
Dec  5 12:17:49 computeB014a kernel:  session2: session recovery timed 
out after 20 secs
Dec  5 12:17:49 computeB014a kernel: sd 47:0:0:0: SCSI error: return 
code = 0x00010000

This is what you want to be concerened with. If the replacement/recovery 
timeout is too short you will get these errors which mean that the IO is 
going to be failed. If you are using dm-mulitpath this is fine, but if 
you have a App like FS using this disk directly this will cause errors.




> Thanks
> Janwar
> > 


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"open-iscsi" group.
To post to this group, send email to open-iscsi@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/open-iscsi
-~----------~----~----~----~------~----~------~--~---

Reply via email to