Hi
I'm running SLES 11, 2.6.27.19-5-default with NILFS2 nilfs-2.0.16. I have a
1.5Tb NILFS2 partition which I am setting up with the intention of using
Robocopy from various PCs via samba. The robocopy scripts run nightly and a
checkpoint is taken once night. A script stops samba, unmounts the previous
weeks checkpoint, deletes the checkpoint, creates a new one and then mounts it
and restarts samba. This should mean that at any time the user can go back to
'snapshot_{DAY}' to get their files back.
So far so good.
However as I copy the previously backed up files from the previous linux
machine where I was doing this (only giving a 'current' copy with reiserfs).
I'm finding that the new machine is occasionally hanging. The OS just locks up,
screen on console frozen but host still responds to ping.
I'm trying to work out what is causing the hang, I'm getting various messages
in the log from smartd relating to the disk which houses the NILFS along the
lines of:
Oct 2 09:56:59 cpli6008 syslog-ng[1933]: Log statistics;
dropped='pipe(/dev/xconsole)=0', dropped='pipe(/dev/tty10)=0',
processed='center(queued)=947', processed='center(received)=478',
processed='destination(newsnotice)=0', processed='destination(acpid)=0',
processed='destination(firewall)=0', processed='destination(mail)=12',
processed='destination(mailinfo)=12', processed='destination(console)=151',
processed='destination(newserr)=0', processed='destination(newscrit)=0',
processed='destination(messages)=466', processed='destination(mailwarn)=0',
processed='destination(localmessages)=0', processed='destination(netmgm)=0',
processed='destination(mailerr)=0', processed='destination(xconsole)=151',
processed='destination(warn)=155', processed='source(src)=478'
Oct 2 09:57:25 cpli6008 smartd[3473]: Device: /dev/sda [SAT], SMART Usage
Attribute: 194 Temperature_Celsius changed from 110 to 112
Oct 2 09:57:25 cpli6008 smartd[3473]: Device: /dev/sdb [SAT], SMART Prefailure
Attribute: 1 Raw_Read_Error_Rate changed from 115 to 117
Oct 2 09:57:25 cpli6008 smartd[3473]: Device: /dev/sdb [SAT], SMART Usage
Attribute: 189 High_Fly_Writes changed from 88 to 87
Oct 2 09:57:25 cpli6008 smartd[3473]: Device: /dev/sdb [SAT], SMART Usage
Attribute: 190 Airflow_Temperature_Cel changed from 60 to 61
Oct 2 09:57:25 cpli6008 smartd[3473]: Device: /dev/sdb [SAT], SMART Usage
Attribute: 194 Temperature_Celsius changed from 40 to 39
Oct 2 09:57:25 cpli6008 smartd[3473]: Device: /dev/sdb [SAT], SMART Usage
Attribute: 195 Hardware_ECC_Recovered changed from 50 to 51
{machine stops responding and gets power cycled}
Oct 2 10:10:58 cpli6008 syslog-ng[1948]: syslog-ng starting up; version='2.0.9'
Do folks think that the hang is NILFS or dodgy hardware/reporting from smartd?
Is there any advise on getting some debug or status information from NILFS to
help show it isn't the cause of the problem. I would have expected that if it
went bang I'd have seen something 'worrying' in the log.
For information the hardware is a Dell Precision 380.
Many thanks
David Barham
Siemens PLM Software
_______________________________________________
users mailing list
[email protected]
https://www.nilfs.org/mailman/listinfo/users