Bug#515653: uptimed: on brutal reboot, all records are lost

2011-01-18 Thread Thibaut VARENE
On Tue, Jan 18, 2011 at 8:28 AM, Laurent Bonnaud
laurent.bonn...@inpg.fr wrote:
 Hi,

 I also experienced this bug several times on a laptop that sometimes
 fails to resume from suspend and with an ext4 filesystem.

 Here is a patch that should fix the problem:

 --- urec.c~     2009-01-02 00:46:00.0 +0100
 +++ urec.c      2011-01-18 08:07:28.886203152 +0100
 @@ -263,6 +263,7 @@
                        if ((max  0)  (++i = max)) break;
                }
        }
 +       fflush(f);
        fclose(f);
        rename(FILE_RECORDS, FILE_RECORDS.old);
        rename(FILE_RECORDS.tmp, FILE_RECORDS);


DESCRIPTION
   The  fclose() function flushes the stream pointed to by fp (writing any
   buffered output data using fflush(3)) and closes  the  underlying  file
   descriptor.


For the records, as it's been explained before (
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=536823#29 ) what's
needed here to definitely fix this problem is added logic which
would check upon opening the database that it's not empty, and if it
is, that would discard it and use the backup one.

For what it's worth, this behaviour is expected: when fclose() is
hit, the data may still reside in the VFS cache. On journaled
filesystem, under the most usual setups, only the metadata may be
actually flushed. When a crash occurs, the journal will restore
filesystem consistency by either removing (case covered by the use of
the backup file) or zero-out (XFS will typically do that) files which
are in an inconsistent state with the journal.

The only way to prevent this would be to add a fsync() before every
fclose(), which would force sync the data to disk. But then, I suggest
reading the fsync(2) manpage to understand the implications in terms
of performance impact and constant wakeup of harddrives for users who
spin down their drives, overall impact which would be totally
unacceptable for a bug that's caused by a non normal operation of
the software. (NB: no, a crash or a power loss is not normal use of
the system).

It's a common misconception that people believe filesystems, and
especially journaled ones, should be crash-proof. They're not.

HTH

T-Bone

-- 
Thibaut VARENE
http://www.parisc-linux.org/~varenet/



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#515653: uptimed: on brutal reboot, all records are lost

2011-01-17 Thread Laurent Bonnaud
Hi,

I also experienced this bug several times on a laptop that sometimes
fails to resume from suspend and with an ext4 filesystem.

Here is a patch that should fix the problem:

--- urec.c~ 2009-01-02 00:46:00.0 +0100
+++ urec.c  2011-01-18 08:07:28.886203152 +0100
@@ -263,6 +263,7 @@
if ((max  0)  (++i = max)) break;
}
}
+   fflush(f);
fclose(f);
rename(FILE_RECORDS, FILE_RECORDS.old);
rename(FILE_RECORDS.tmp, FILE_RECORDS);

-- 
Laurent Bonnaud.
http://www.lis.inpg.fr/pages_perso/bonnaud/





-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#515653: uptimed: on brutal reboot, all records are lost

2009-02-16 Thread Sandro Tosi
Package: uptimed
Version: 1:0.3.16-2
Severity: important

Hello,
I was in need to reboot brutally my box (due to a series of oops on kernel
avoiding any action), but when system came up again, all uptime records were
lost :(

This is really really annoying...

Thanks,
Sandro

-- System Information:
Debian Release: 5.0
  APT prefers unstable
  APT policy: (500, 'unstable'), (1, 'experimental')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.25-2-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages uptimed depends on:
ii  debconf [debconf-2.0] 1.5.24 Debian configuration management sy
ii  libc6 2.7-18 GNU C Library: Shared libraries
ii  libuptimed0   1:0.3.16-2 Library for uptimed

uptimed recommends no packages.

uptimed suggests no packages.

-- debconf information:
  uptimed/mail/do_mail: Never
  uptimed/mail/address: r...@localhost
  uptimed/interval: 60
  uptimed/mail/milestones_info:
  uptimed/maxrecords: 50



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#515653: uptimed: on brutal reboot, all records are lost

2009-02-16 Thread Thibaut VARENE
severity 515653 normal
tags 515653 moreinfo
thanks

On Mon, Feb 16, 2009 at 7:03 PM, Sandro Tosi mo...@debian.org wrote:
 Package: uptimed
 Version: 1:0.3.16-2
 Severity: important

 Hello,
 I was in need to reboot brutally my box (due to a series of oops on kernel
 avoiding any action), but when system came up again, all uptime records were
 lost :(

I don't get it. This was supposed to be fixed with the current version
of uptimed. What filesystem are you using on /var? Were there any
error messages when uptimed started? What's the content of
/var/spool/uptimed/records* ?

Thanks



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#515653: uptimed: on brutal reboot, all records are lost

2009-02-16 Thread Thibaut VARÈNE
Please don't remove the BTS from the CC-list. Bug report information  
must be recorded.


Le 16 févr. 09 à 19:26, Sandro Tosi a écrit :
On Mon, Feb 16, 2009 at 19:19, Thibaut VARENE vare...@debian.org  
wrote:

What filesystem are you using on /var?


xfs


By default, XFS will zero-out inconsistent files after an unclean  
mount. If that's what happened, it's likely uptimed tried to use the  
(invalid) content of the file instead of its backup, which is why you  
didn't see anything in the log (when it uses its backup database it  
prints a message).


Radek, I don't really know what to do about this. Working around  
filesystem issues is gonna be a burden. Adding supplementary checks to  
assert the validity of the file being read doesn't look really  
straightforward; do you have any suggestion about this? The way I see  
it, we could bail out in urec.c:231, and 1) give feedback on failure  
to read record entry, while 2) falling back to the backup database on  
such a failure... Of course, if the backup db is also damaged, we're  
doomed.



What's the content of /var/spool/uptimed/records* ?


$ for file in /var/spool/uptimed/records* ; do echo --- $file ---
; cat $file ; done
--- /var/spool/uptimed/records ---
35228:1234767249:Linux 2.6.25-2-amd64
5978:1234802550:Linux 2.6.25-2-amd64
--- /var/spool/uptimed/records.old ---
35228:1234767249:Linux 2.6.25-2-amd64
5918:1234802550:Linux 2.6.25-2-amd64




That's ok.

--
Thibaut VARÈNE
http://www.parisc-linux.org/~varenet/




--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#515653: uptimed: on brutal reboot, all records are lost

2009-02-16 Thread Sandro Tosi
On Mon, Feb 16, 2009 at 20:02, Thibaut VARÈNE vare...@debian.org wrote:
 Please don't remove the BTS from the CC-list. Bug report information must be
 recorded.

Didn't mean to remove Bts address: click on Reply instead of Reply
to all bt mistake.

 Le 16 févr. 09 à 19:26, Sandro Tosi a écrit :

 On Mon, Feb 16, 2009 at 19:19, Thibaut VARENE vare...@debian.org wrote:

 What filesystem are you using on /var?

 xfs

 By default, XFS will zero-out inconsistent files after an unclean mount. If
 that's what happened, it's likely uptimed tried to use the (invalid) content
 of the file instead of its backup, which is why you didn't see anything in
 the log (when it uses its backup database it prints a message).

That might be what happened.

 Radek, I don't really know what to do about this. Working around filesystem
 issues is gonna be a burden. Adding supplementary checks to assert the
 validity of the file being read doesn't look really straightforward; do you
 have any suggestion about this? The way I see it, we could bail out in
 urec.c:231, and 1) give feedback on failure to read record entry, while 2)
 falling back to the backup database on such a failure... Of course, if the
 backup db is also damaged, we're doomed.

Fall back on the backup seems a smart move, of course in case both are
corrupted, starting from scratch is the only option at hand.

Regards,
-- 
Sandro Tosi (aka morph, morpheus, matrixhasu)
My website: http://matrixhasu.altervista.org/
Me at Debian: http://wiki.debian.org/SandroTosi



--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org