Re: [Lustre-discuss] Fwd: Reg /// OSS rebooted automatically

2010-12-21 Thread Jeff Johnson
Daniel,

In the future you might want to consider posting some entries or pieces of a 
log rather than the entire log file. =)

Was this from the OSS that you say was rebooting or from your MDS node? I would 
look at the log file of the OSS node(s) that contain OST0006 and OST0007 and 
see if there are any RAID errors. It might be a network problem as well.

Morning is coming and one of the developers will likely respond to this with 
more suggestions.

--Jeff

---mobile signature---
Jeff Johnson - Aeon Computing
jeff.john...@aeoncomputing.com
m: 619-204-9061

On Dec 20, 2010, at 23:13, Daniel Raj danielraj2...@gmail.com wrote:

 Dec 19 04:19:49 cluster kernel: Lustre: 
 23300:0:(ldlm_lib.c:575:target_handle_reconnect()) dan3-OST0006: 
 d957783f-e60b-07b0-2c86-ecfbc7eb57b6 reconnecting
 Dec 19 04:19:49 cluster kernel: Lustre: 
 23300:0:(ldlm_lib.c:575:target_handle_reconnect()) Skipped 4 previous similar 
 messages
 Dec 19 04:30:05 cluster kernel: Lustre: 
 23308:0:(ldlm_lib.c:575:target_handle_reconnect()) dan3-OST0006: 
 d957783f-e60b-07b0-2c86-ecfbc7eb57b6 reconnecting
 Dec 19 04:30:05 cluster kernel: LustreError: 137-5: UUID 'cluster-ost7_UUID' 
 is not available  for connect (no target)
 Dec 19 04:30:05 cluster kernel: LustreError: 
 23290:0:(ldlm_lib.c:1892:target_send_reply_msg()) @@@ processing error (-19)  
 r...@8103fd722c00 x1355442914715019/t0 o8-?@?:0/0 lens 368/0 e 0 to 
 0 dl 1292713305 ref 1 fl Interpret:/0/0 rc -19/0
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Fwd: Reg /// OSS rebooted automatically

2010-12-21 Thread Brian J. Murrell
On Tue, 2010-12-21 at 11:13 +0530, Daniel Raj wrote: 
 
 I am Daniel. My OSS getting  automatically rebooted again and again

If you mean a full reboot and not a panic, this is very likely not a
Lustre problem.

 *kernel: LustreError: 23351:0:(ldlm_lib.c:1892:target_send_reply_msg()) @@@
 processing error (-19)  r...@810400e24400 x1353488904620274/t0
 o8-?@?:0/0 lens 368/0 e 0 to 0 dl 1292738958 ref 1 fl Interpret:/0/0 rc
 -19/0
 kernel: LustreError: 137-5: UUID 'south-ost7_UUID' is not available  for
 connect (no target)
 kernel: LustreError: 23284:0:(ldlm_lib.c:1892:target_send_reply_msg()) @@@
 processing error (-19)  r...@8101124c7c00 x1353488904620359/t0
 o8-?@?:0/0 lens 368/0 e 0 to 0 dl 1292739025 ref 1 fl Interpret:/0/0 rc
 -19/0
 *

None of these messages indicates a reboot or any condition that would
cause a reboot.  But then again, you have provided only a very small
amount of the log from which no conclusions can be drawn.

b.



signature.asc
Description: This is a digitally signed message part
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Fwd: Reg /// OSS rebooted automatically

2010-12-20 Thread Jeff Johnson
Daniel,

It looks like your OST backend storage device may be having an issue. I
would check the health and stability of the backend storage device or raid
you are using for an OST device. It wouldn't likely cause a system reboot of
your OSS system. There may be more problems, hardware and/or OS related that
are causing the system to reboot in addition to Lustre complaining that it
can't find the OST storage device.

Others here on the list will likely give you a more detailed answer. The
storage device is the place i would look first.

--Jeff

-- 
--
Jeff Johnson
Manager
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x101   f: 858-412-3845
m: 619-204-9061

4905 Morena Boulevard, Suite 1313 - San Diego, CA 92117

On Mon, Dec 20, 2010 at 9:43 PM, Daniel Raj danielraj2...@gmail.com wrote:




 Hi Genius,


 Good Day  !!


 I am Daniel. My OSS getting  automatically rebooted again and again .
 kindly help to me

 Its showing the below error messages


  *kernel: LustreError: 23351:0:(ldlm_lib.c:1892:target_send_reply_msg())
 @@@ processing error (-19)  r...@810400e24400 x1353488904620274/t0
 o8-?@?:0/0 lens 368/0 e 0 to 0 dl 1292738958 ref 1 fl Interpret:/0/0 rc
 -19/0
 kernel: LustreError: 137-5: UUID 'south-ost7_UUID' is not available  for
 connect (no target)
 kernel: LustreError: 23284:0:(ldlm_lib.c:1892:target_send_reply_msg()) @@@
 processing error (-19)  r...@8101124c7c00 x1353488904620359/t0
 o8-?@?:0/0 lens 368/0 e 0 to 0 dl 1292739025 ref 1 fl Interpret:/0/0 rc
 -19/0
 *

 Regards,

 Daniel A


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Fwd: Reg /// OSS rebooted automatically

2010-12-20 Thread Daniel Raj
Hi Jeff,


Thanks for your reply

*Storage information *:


DL380G5   == OSS + 16GB Ram
OS== SFS G3.2-2 + centos 5.3 + lustre 1.8.3
MSA60 box   == OST
RAID 6


Regards,

Daniel A

On Tue, Dec 21, 2010 at 11:45 AM, Jeff Johnson 
jeff.john...@aeoncomputing.com wrote:

 Daniel,

 It looks like your OST backend storage device may be having an issue. I
 would check the health and stability of the backend storage device or raid
 you are using for an OST device. It wouldn't likely cause a system reboot of
 your OSS system. There may be more problems, hardware and/or OS related that
 are causing the system to reboot in addition to Lustre complaining that it
 can't find the OST storage device.

 Others here on the list will likely give you a more detailed answer. The
 storage device is the place i would look first.

 --Jeff

 --
 --
 Jeff Johnson
 Manager
 Aeon Computing

 jeff.john...@aeoncomputing.com
 www.aeoncomputing.com
 t: 858-412-3810 x101   f: 858-412-3845
 m: 619-204-9061

 4905 Morena Boulevard, Suite 1313 - San Diego, CA 92117


 On Mon, Dec 20, 2010 at 9:43 PM, Daniel Raj danielraj2...@gmail.comwrote:




 Hi Genius,


 Good Day  !!


 I am Daniel. My OSS getting  automatically rebooted again and again .
 kindly help to me

 Its showing the below error messages


  *kernel: LustreError: 23351:0:(ldlm_lib.c:1892:target_send_reply_msg())
 @@@ processing error (-19)  r...@810400e24400 x1353488904620274/t0
 o8-?@?:0/0 lens 368/0 e 0 to 0 dl 1292738958 ref 1 fl Interpret:/0/0 rc
 -19/0
 kernel: LustreError: 137-5: UUID 'south-ost7_UUID' is not available  for
 connect (no target)
 kernel: LustreError: 23284:0:(ldlm_lib.c:1892:target_send_reply_msg()) @@@
 processing error (-19)  r...@8101124c7c00 x1353488904620359/t0
 o8-?@?:0/0 lens 368/0 e 0 to 0 dl 1292739025 ref 1 fl Interpret:/0/0 rc
 -19/0
 *

 Regards,

 Daniel A



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Fwd: Reg /// OSS rebooted automatically

2010-12-20 Thread Jeff Johnson
Daniel,

Check the health and stability of your raid-6 volume. Make sure the raid is 
healthy and online. Use whatever monitor utility came with your raid card or 
check /proc/mdstat if it's a Linux mdraid. Check /var/log/messages for error 
messages from your raid or other hardware.

--Jeff

---mobile signature---
Jeff Johnson - Aeon Computing
jeff.john...@aeoncomputing.com

On Dec 20, 2010, at 22:27, Daniel Raj danielraj2...@gmail.com wrote:

 Hi Jeff,
 
 
 Thanks for your reply 
 
 Storage information : 
 
 
 DL380G5   == OSS + 16GB Ram 
 OS== SFS G3.2-2 + centos 5.3 + lustre 1.8.3
 MSA60 box   == OST
 RAID 6
 
 
 Regards,
 
 Daniel A 
 
 On Tue, Dec 21, 2010 at 11:45 AM, Jeff Johnson 
 jeff.john...@aeoncomputing.com wrote:
 Daniel,
 
 It looks like your OST backend storage device may be having an issue. I would 
 check the health and stability of the backend storage device or raid you are 
 using for an OST device. It wouldn't likely cause a system reboot of your OSS 
 system. There may be more problems, hardware and/or OS related that are 
 causing the system to reboot in addition to Lustre complaining that it can't 
 find the OST storage device.
 
 Others here on the list will likely give you a more detailed answer. The 
 storage device is the place i would look first.
 
 --Jeff
 
 -- 
 --
 Jeff Johnson
 Manager
 Aeon Computing
 
 jeff.john...@aeoncomputing.com
 www.aeoncomputing.com
 t: 858-412-3810 x101   f: 858-412-3845
 m: 619-204-9061
 
 4905 Morena Boulevard, Suite 1313 - San Diego, CA 92117
 
 
 On Mon, Dec 20, 2010 at 9:43 PM, Daniel Raj danielraj2...@gmail.com wrote:
 
 
 
 Hi Genius,
 
 
 Good Day  !!
 
 
 I am Daniel. My OSS getting  automatically rebooted again and again . kindly 
 help to me 
 
 Its showing the below error messages 
 
 
  kernel: LustreError: 23351:0:(ldlm_lib.c:1892:target_send_reply_msg()) @@@ 
 processing error (-19)  r...@810400e24400 x1353488904620274/t0 
 o8-?@?:0/0 lens 368/0 e 0 to 0 dl 1292738958 ref 1 fl Interpret:/0/0 rc 
 -19/0
 kernel: LustreError: 137-5: UUID 'south-ost7_UUID' is not available  for 
 connect (no target)
 kernel: LustreError: 23284:0:(ldlm_lib.c:1892:target_send_reply_msg()) @@@ 
 processing error (-19)  r...@8101124c7c00 x1353488904620359/t0 
 o8-?@?:0/0 lens 368/0 e 0 to 0 dl 1292739025 ref 1 fl Interpret:/0/0 rc 
 -19/0
 
 
 Regards,
 
 Daniel A 
 
 
 
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss