[Lustre-discuss] New lustre message
I don't know if this is a bad thing, I was doing a stress of our new lustre install and managed to have a client kicked out with the following message on the OST that kicked it out: Lustre: 6584:0:(ldlm_lib.c:760:target_handle_connect()) nobackup- OST: refuse reconnection from 749b3c01-4ac0- [EMAIL PROTECTED]@tcp to 0x0102f7cdc000; still busy with 6 active RPCs Was this just a result of hammering the filesystem really hard? Both OSS became CPU bound, so I would not be surprised if it was just to much. Any other common causes of this message (I never saw it with our old setup) would be great. Thanks, New install is working great, nice product. Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] New lustre message
On Thu, 2008-08-21 at 22:23 -0400, Brock Palen wrote: I don't know if this is a bad thing, I was doing a stress of our new lustre install and managed to have a client kicked out with the following message on the OST that kicked it out: To be clear the below message is not a client being evicted but rather a client trying to reconnect after it has been evicted. Lustre: 6584:0:(ldlm_lib.c:760:target_handle_connect()) nobackup- OST: refuse reconnection from 749b3c01-4ac0- [EMAIL PROTECTED]@tcp to 0x0102f7cdc000; still busy with 6 active RPCs The OSS is refusing to allow the client to reconnect however because it is still trying to finish the transactions the client had in progress when it was evicted. Was this just a result of hammering the filesystem really hard? Could be, if the load was atypical and you have tuned your obd_timeout for a more typical load. Typically, until AT is in full swing, you need to tune for your worst case scenario. b. signature.asc Description: This is a digitally signed message part ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] New lustre message
On Aug 21, 2008, at 11:17 PM, Brian J. Murrell wrote: On Thu, 2008-08-21 at 22:23 -0400, Brock Palen wrote: I don't know if this is a bad thing, I was doing a stress of our new lustre install and managed to have a client kicked out with the following message on the OST that kicked it out: To be clear the below message is not a client being evicted but rather a client trying to reconnect after it has been evicted. Thanks yes, this message appeared after the eviction notice, Lustre: 6584:0:(ldlm_lib.c:760:target_handle_connect()) nobackup- OST: refuse reconnection from 749b3c01-4ac0- [EMAIL PROTECTED]@tcp to 0x0102f7cdc000; still busy with 6 active RPCs The OSS is refusing to allow the client to reconnect however because it is still trying to finish the transactions the client had in progress when it was evicted. Good to know that its just for 'that' client. Was this just a result of hammering the filesystem really hard? Could be, if the load was atypical and you have tuned your obd_timeout for a more typical load. Typically, until AT is in full swing, you need to tune for your worst case scenario. b. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss