Re: [Lustre-discuss] Fwd: Reg /// OSS rebooted automatically
Daniel, In the future you might want to consider posting some entries or pieces of a log rather than the entire log file. =) Was this from the OSS that you say was rebooting or from your MDS node? I would look at the log file of the OSS node(s) that contain OST0006 and OST0007 and see if there are any RAID errors. It might be a network problem as well. Morning is coming and one of the developers will likely respond to this with more suggestions. --Jeff ---mobile signature--- Jeff Johnson - Aeon Computing jeff.john...@aeoncomputing.com m: 619-204-9061 On Dec 20, 2010, at 23:13, Daniel Raj danielraj2...@gmail.com wrote: Dec 19 04:19:49 cluster kernel: Lustre: 23300:0:(ldlm_lib.c:575:target_handle_reconnect()) dan3-OST0006: d957783f-e60b-07b0-2c86-ecfbc7eb57b6 reconnecting Dec 19 04:19:49 cluster kernel: Lustre: 23300:0:(ldlm_lib.c:575:target_handle_reconnect()) Skipped 4 previous similar messages Dec 19 04:30:05 cluster kernel: Lustre: 23308:0:(ldlm_lib.c:575:target_handle_reconnect()) dan3-OST0006: d957783f-e60b-07b0-2c86-ecfbc7eb57b6 reconnecting Dec 19 04:30:05 cluster kernel: LustreError: 137-5: UUID 'cluster-ost7_UUID' is not available for connect (no target) Dec 19 04:30:05 cluster kernel: LustreError: 23290:0:(ldlm_lib.c:1892:target_send_reply_msg()) @@@ processing error (-19) r...@8103fd722c00 x1355442914715019/t0 o8-?@?:0/0 lens 368/0 e 0 to 0 dl 1292713305 ref 1 fl Interpret:/0/0 rc -19/0 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Fwd: Reg /// OSS rebooted automatically
On Tue, 2010-12-21 at 11:13 +0530, Daniel Raj wrote: I am Daniel. My OSS getting automatically rebooted again and again If you mean a full reboot and not a panic, this is very likely not a Lustre problem. *kernel: LustreError: 23351:0:(ldlm_lib.c:1892:target_send_reply_msg()) @@@ processing error (-19) r...@810400e24400 x1353488904620274/t0 o8-?@?:0/0 lens 368/0 e 0 to 0 dl 1292738958 ref 1 fl Interpret:/0/0 rc -19/0 kernel: LustreError: 137-5: UUID 'south-ost7_UUID' is not available for connect (no target) kernel: LustreError: 23284:0:(ldlm_lib.c:1892:target_send_reply_msg()) @@@ processing error (-19) r...@8101124c7c00 x1353488904620359/t0 o8-?@?:0/0 lens 368/0 e 0 to 0 dl 1292739025 ref 1 fl Interpret:/0/0 rc -19/0 * None of these messages indicates a reboot or any condition that would cause a reboot. But then again, you have provided only a very small amount of the log from which no conclusions can be drawn. b. signature.asc Description: This is a digitally signed message part ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Fwd: Reg /// OSS rebooted automatically
Daniel, It looks like your OST backend storage device may be having an issue. I would check the health and stability of the backend storage device or raid you are using for an OST device. It wouldn't likely cause a system reboot of your OSS system. There may be more problems, hardware and/or OS related that are causing the system to reboot in addition to Lustre complaining that it can't find the OST storage device. Others here on the list will likely give you a more detailed answer. The storage device is the place i would look first. --Jeff -- -- Jeff Johnson Manager Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x101 f: 858-412-3845 m: 619-204-9061 4905 Morena Boulevard, Suite 1313 - San Diego, CA 92117 On Mon, Dec 20, 2010 at 9:43 PM, Daniel Raj danielraj2...@gmail.com wrote: Hi Genius, Good Day !! I am Daniel. My OSS getting automatically rebooted again and again . kindly help to me Its showing the below error messages *kernel: LustreError: 23351:0:(ldlm_lib.c:1892:target_send_reply_msg()) @@@ processing error (-19) r...@810400e24400 x1353488904620274/t0 o8-?@?:0/0 lens 368/0 e 0 to 0 dl 1292738958 ref 1 fl Interpret:/0/0 rc -19/0 kernel: LustreError: 137-5: UUID 'south-ost7_UUID' is not available for connect (no target) kernel: LustreError: 23284:0:(ldlm_lib.c:1892:target_send_reply_msg()) @@@ processing error (-19) r...@8101124c7c00 x1353488904620359/t0 o8-?@?:0/0 lens 368/0 e 0 to 0 dl 1292739025 ref 1 fl Interpret:/0/0 rc -19/0 * Regards, Daniel A ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Fwd: Reg /// OSS rebooted automatically
Hi Jeff, Thanks for your reply *Storage information *: DL380G5 == OSS + 16GB Ram OS== SFS G3.2-2 + centos 5.3 + lustre 1.8.3 MSA60 box == OST RAID 6 Regards, Daniel A On Tue, Dec 21, 2010 at 11:45 AM, Jeff Johnson jeff.john...@aeoncomputing.com wrote: Daniel, It looks like your OST backend storage device may be having an issue. I would check the health and stability of the backend storage device or raid you are using for an OST device. It wouldn't likely cause a system reboot of your OSS system. There may be more problems, hardware and/or OS related that are causing the system to reboot in addition to Lustre complaining that it can't find the OST storage device. Others here on the list will likely give you a more detailed answer. The storage device is the place i would look first. --Jeff -- -- Jeff Johnson Manager Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x101 f: 858-412-3845 m: 619-204-9061 4905 Morena Boulevard, Suite 1313 - San Diego, CA 92117 On Mon, Dec 20, 2010 at 9:43 PM, Daniel Raj danielraj2...@gmail.comwrote: Hi Genius, Good Day !! I am Daniel. My OSS getting automatically rebooted again and again . kindly help to me Its showing the below error messages *kernel: LustreError: 23351:0:(ldlm_lib.c:1892:target_send_reply_msg()) @@@ processing error (-19) r...@810400e24400 x1353488904620274/t0 o8-?@?:0/0 lens 368/0 e 0 to 0 dl 1292738958 ref 1 fl Interpret:/0/0 rc -19/0 kernel: LustreError: 137-5: UUID 'south-ost7_UUID' is not available for connect (no target) kernel: LustreError: 23284:0:(ldlm_lib.c:1892:target_send_reply_msg()) @@@ processing error (-19) r...@8101124c7c00 x1353488904620359/t0 o8-?@?:0/0 lens 368/0 e 0 to 0 dl 1292739025 ref 1 fl Interpret:/0/0 rc -19/0 * Regards, Daniel A ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Fwd: Reg /// OSS rebooted automatically
Daniel, Check the health and stability of your raid-6 volume. Make sure the raid is healthy and online. Use whatever monitor utility came with your raid card or check /proc/mdstat if it's a Linux mdraid. Check /var/log/messages for error messages from your raid or other hardware. --Jeff ---mobile signature--- Jeff Johnson - Aeon Computing jeff.john...@aeoncomputing.com On Dec 20, 2010, at 22:27, Daniel Raj danielraj2...@gmail.com wrote: Hi Jeff, Thanks for your reply Storage information : DL380G5 == OSS + 16GB Ram OS== SFS G3.2-2 + centos 5.3 + lustre 1.8.3 MSA60 box == OST RAID 6 Regards, Daniel A On Tue, Dec 21, 2010 at 11:45 AM, Jeff Johnson jeff.john...@aeoncomputing.com wrote: Daniel, It looks like your OST backend storage device may be having an issue. I would check the health and stability of the backend storage device or raid you are using for an OST device. It wouldn't likely cause a system reboot of your OSS system. There may be more problems, hardware and/or OS related that are causing the system to reboot in addition to Lustre complaining that it can't find the OST storage device. Others here on the list will likely give you a more detailed answer. The storage device is the place i would look first. --Jeff -- -- Jeff Johnson Manager Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x101 f: 858-412-3845 m: 619-204-9061 4905 Morena Boulevard, Suite 1313 - San Diego, CA 92117 On Mon, Dec 20, 2010 at 9:43 PM, Daniel Raj danielraj2...@gmail.com wrote: Hi Genius, Good Day !! I am Daniel. My OSS getting automatically rebooted again and again . kindly help to me Its showing the below error messages kernel: LustreError: 23351:0:(ldlm_lib.c:1892:target_send_reply_msg()) @@@ processing error (-19) r...@810400e24400 x1353488904620274/t0 o8-?@?:0/0 lens 368/0 e 0 to 0 dl 1292738958 ref 1 fl Interpret:/0/0 rc -19/0 kernel: LustreError: 137-5: UUID 'south-ost7_UUID' is not available for connect (no target) kernel: LustreError: 23284:0:(ldlm_lib.c:1892:target_send_reply_msg()) @@@ processing error (-19) r...@8101124c7c00 x1353488904620359/t0 o8-?@?:0/0 lens 368/0 e 0 to 0 dl 1292739025 ref 1 fl Interpret:/0/0 rc -19/0 Regards, Daniel A ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss