[NILFS users] NILFS hanging SLES 11 - advise on diagnosis needed

Barham, David Fri, 02 Oct 2009 03:17:31 -0700

Hi
I'm running SLES 11, 2.6.27.19-5-default with NILFS2 nilfs-2.0.16. I have a 
1.5Tb NILFS2 partition which I am setting up with the intention of using 
Robocopy from various PCs via samba. The robocopy scripts run nightly and a 
checkpoint is taken once night. A script stops samba, unmounts the previous 
weeks checkpoint, deletes the checkpoint, creates a new one and then mounts it 
and restarts samba. This should mean that at any time the user can go back to 
'snapshot_{DAY}' to get their files back.


So far so good.

However as I copy the previously backed up files from the previous linux 
machine where I was doing this (only giving a 'current' copy with reiserfs). 
I'm finding that the new machine is occasionally hanging. The OS just locks up, 
screen on console frozen but host still responds to ping. 

I'm trying to work out what is causing the hang, I'm getting various messages 
in the log from smartd relating to the disk which houses the NILFS along the 
lines of:

 Oct  2 09:56:59 cpli6008 syslog-ng[1933]: Log statistics; 
dropped='pipe(/dev/xconsole)=0', dropped='pipe(/dev/tty10)=0', 
processed='center(queued)=947', processed='center(received)=478', 
processed='destination(newsnotice)=0', processed='destination(acpid)=0', 
processed='destination(firewall)=0', processed='destination(mail)=12', 
processed='destination(mailinfo)=12', processed='destination(console)=151', 
processed='destination(newserr)=0', processed='destination(newscrit)=0', 
processed='destination(messages)=466', processed='destination(mailwarn)=0', 
processed='destination(localmessages)=0', processed='destination(netmgm)=0', 
processed='destination(mailerr)=0', processed='destination(xconsole)=151', 
processed='destination(warn)=155', processed='source(src)=478'
Oct  2 09:57:25 cpli6008 smartd[3473]: Device: /dev/sda [SAT], SMART Usage 
Attribute: 194 Temperature_Celsius changed from 110 to 112
Oct  2 09:57:25 cpli6008 smartd[3473]: Device: /dev/sdb [SAT], SMART Prefailure 
Attribute: 1 Raw_Read_Error_Rate changed from 115 to 117
Oct  2 09:57:25 cpli6008 smartd[3473]: Device: /dev/sdb [SAT], SMART Usage 
Attribute: 189 High_Fly_Writes changed from 88 to 87
Oct  2 09:57:25 cpli6008 smartd[3473]: Device: /dev/sdb [SAT], SMART Usage 
Attribute: 190 Airflow_Temperature_Cel changed from 60 to 61
Oct  2 09:57:25 cpli6008 smartd[3473]: Device: /dev/sdb [SAT], SMART Usage 
Attribute: 194 Temperature_Celsius changed from 40 to 39
Oct  2 09:57:25 cpli6008 smartd[3473]: Device: /dev/sdb [SAT], SMART Usage 
Attribute: 195 Hardware_ECC_Recovered changed from 50 to 51

{machine stops responding and gets power cycled}

Oct  2 10:10:58 cpli6008 syslog-ng[1948]: syslog-ng starting up; version='2.0.9'

Do folks think that the hang is NILFS or dodgy hardware/reporting from smartd? 
Is there any advise on getting some debug or status information from NILFS to 
help show it isn't the cause of the problem. I would have expected that if it 
went bang I'd have seen something 'worrying' in the log. 

For information the hardware is a Dell Precision 380.

Many thanks
David Barham
Siemens PLM Software


_______________________________________________
users mailing list
[email protected]
https://www.nilfs.org/mailman/listinfo/users

[NILFS users] NILFS hanging SLES 11 - advise on diagnosis needed

Reply via email to