Re: [CentOS] Cause for kernel panic

2011-12-15 Thread Nicolas Ross
   From this, is there a way to determine the cause ? kdump is not
 confirgured nor used, since the fencing of the node renders kdump 
 useless.

 This is the second time in a few weeks it happens.

 /var/log/messages should have more information; could you include it?

No, unfortunently, the last message in the log is a normal one, an after 
that it's the boot process.

I will look at netconsole as Ross suggested.

Regards, 

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Cause for kernel panic

2011-12-15 Thread Leonard den Ottolander
Hello Corey,

On Wed, 2011-12-14 at 20:50 -0700, Corey Henderson wrote:
 /var/log/messages should have more information; could you include it?

Please do not ask people to include log files or other attachments to a
public mailing list! Information like that should be pasted online (f.e.
at http://pastebin.com/ ) and a link to the resource should be used.

Regards,
Leonard.

-- 
mount -t life -o ro /dev/dna /genetic/research


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Cause for kernel panic

2011-12-15 Thread Lamar Owen
On Thursday, December 15, 2011 09:51:52 AM Leonard den Ottolander wrote:
 Please do not ask people to include log files or other attachments to a
 public mailing list! Information like that should be pasted online (f.e.
 at http://pastebin.com/ ) and a link to the resource should be used.

I must disagree with this; for IRC this is appropriate, since typical IRC chat 
logs are not indexed by google and the like, nor are questioners encouraged to 
read the archives of the IRC logs.

I can't count the times I've searched for a solution to a problem, found 
someone with the same issue posting online, tracked down some potential 
solution, only to find that the pastebin referenced as having the solution was 
no longer there.

Ditto for links to fixes on rapidshare, megaupload, googledocs, and kin.  It 
would be nice to excerpt logs and fixes for future searching through google or 
directly through the archives.  

Or, to put it more bluntly, you shouldn't tell people to search the archives 
but then have people put essential data on an ephemeral resource that is 
dissociated from the archive.

IMHO, of course.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


[CentOS] Cause for kernel panic

2011-12-14 Thread Nicolas Ross
Hi ! On an 8-node cluster, one of the node did a kernel panic.

The only bit of information I have is on a ssh console I had open, which 
said :


Message from syslogd@node108 at Dec 14 19:00:15 ...
  kernel:[ cut here ]

Message from syslogd@node108 at Dec 14 19:00:15 ...
  kernel:invalid opcode:  [#1] SMP

Message from syslogd@node108 at Dec 14 19:00:15 ...
  kernel:last sysfs file: 
/sys/devices/system/cpu/cpu15/cache/index2/shared_cpu_map

Message from syslogd@node108 at Dec 14 19:00:15 ...
  kernel:Stack:

Message from syslogd@node108 at Dec 14 19:00:15 ...
  kernel:Call Trace:

Message from syslogd@node108 at Dec 14 19:00:15 ...
  kernel:Code: 01 00 00 e8 26 8a cd e0 85 c0 0f 85 0e ff ff ff 48 89 df 
e8 76 f8 ff ff e9 01 ff ff ff 31 d2 eb d4 48 89 de 31 ff e8 c3 e3 ff ff 
0f 0b eb fe 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48

Message from syslogd@node108 at Dec 14 19:00:15 ...
  kernel:Kernel panic - not syncing: Fatal exception


 From this, is there a way to determine the cause ? kdump is not 
confirgured nor used, since the fencing of the node renders kdump useless.

This is the second time in a few weeks it happens.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Cause for kernel panic

2011-12-14 Thread Corey Henderson
On 12/14/2011 8:49 PM, Nicolas Ross wrote:
   From this, is there a way to determine the cause ? kdump is not
 confirgured nor used, since the fencing of the node renders kdump useless.

 This is the second time in a few weeks it happens.

/var/log/messages should have more information; could you include it?

-- 
Corey Henderson
http://cormander.com/
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Cause for kernel panic

2011-12-14 Thread Ross Walker
On Dec 14, 2011, at 10:49 PM, Nicolas Ross rossnick-li...@cybercat.ca wrote:

 Hi ! On an 8-node cluster, one of the node did a kernel panic.
 
 The only bit of information I have is on a ssh console I had open, which 
 said :
 
 
 Message from syslogd@node108 at Dec 14 19:00:15 ...
  kernel:[ cut here ]
 
 Message from syslogd@node108 at Dec 14 19:00:15 ...
  kernel:invalid opcode:  [#1] SMP
 
 Message from syslogd@node108 at Dec 14 19:00:15 ...
  kernel:last sysfs file: 
 /sys/devices/system/cpu/cpu15/cache/index2/shared_cpu_map
 
 Message from syslogd@node108 at Dec 14 19:00:15 ...
  kernel:Stack:
 
 Message from syslogd@node108 at Dec 14 19:00:15 ...
  kernel:Call Trace:
 
 Message from syslogd@node108 at Dec 14 19:00:15 ...
  kernel:Code: 01 00 00 e8 26 8a cd e0 85 c0 0f 85 0e ff ff ff 48 89 df 
 e8 76 f8 ff ff e9 01 ff ff ff 31 d2 eb d4 48 89 de 31 ff e8 c3 e3 ff ff 
 0f 0b eb fe 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48
 
 Message from syslogd@node108 at Dec 14 19:00:15 ...
  kernel:Kernel panic - not syncing: Fatal exception
 
 
 From this, is there a way to determine the cause ? kdump is not 
 confirgured nor used, since the fencing of the node renders kdump useless.
 
 This is the second time in a few weeks it happens.

Setup netconsole to log kernel messages to the node on the left. Then you can 
get the the oops messages if any node crashes.

-Ross

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos