Re: [systemd-devel] journal repair

2014-10-08 Thread Lennart Poettering
On Fri, 19.09.14 10:00, Jóhann B. Guðmundsson (johan...@gmail.com) wrote:

 Is the plan to introduce an repair switch or is the plan to inform the users
 how they should proceed if that is not the case since users are getting
 confused when they encounter journal errors like these
 
 Data object missing from hash at entry...
 Data object references invalid entry at...
 Invalid tail monotonic timestamp...
 Invalid object contents at...
 File corruption detected at...
 etc.
 
 And are wasting their time on the internet searching for means to fix those
 errors.
 
 I think we need to somehow provide the end user with the next step once a
 corruption of anykind has been detected in the relevant journal file even if
 it's just.
 
 FAIL: corruption detected, your logs are fucked delete the file.

There isn't really any point in deleting them. journalctl salvages
automatically everything it can when reading them. Since the files are
mostly append-only the corruptions usually only affect half-written
entries at the end, and hence all earlier once should just work.

I am pretty sure we simply need to document this in more detail, and
clarify that corrupted journal files are nothing to act on, and the
journalctl recovers what it can on read, implicitly, with no fsck-like
tool being necessary, and without requiring people to manually delete
anything.

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] journal repair

2014-09-19 Thread Jóhann B. Guðmundsson
Is the plan to introduce an repair switch or is the plan to inform the 
users how they should proceed if that is not the case since users are 
getting confused when they encounter journal errors like these


Data object missing from hash at entry...
Data object references invalid entry at...
Invalid tail monotonic timestamp...
Invalid object contents at...
File corruption detected at...
etc.

And are wasting their time on the internet searching for means to fix 
those errors.


I think we need to somehow provide the end user with the next step once 
a corruption of anykind has been detected in the relevant journal file 
even if it's just.


FAIL: corruption detected, your logs are fucked delete the file.

JBG
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] journal repair

2014-09-19 Thread Zbigniew Jędrzejewski-Szmek
On Fri, Sep 19, 2014 at 10:00:02AM +, Jóhann B. Guðmundsson wrote:
 Is the plan to introduce an repair switch or is the plan to inform
 the users how they should proceed if that is not the case since
 users are getting confused when they encounter journal errors like
 these
 
 Data object missing from hash at entry...
 Data object references invalid entry at...
 Invalid tail monotonic timestamp...
 Invalid object contents at...
 File corruption detected at...
 etc.
 
 And are wasting their time on the internet searching for means to
 fix those errors.
 
 I think we need to somehow provide the end user with the next step
 once a corruption of anykind has been detected in the relevant
 journal file even if it's just.
It is now possible to fix files by rewriting them:

  journalctl --file /var/log/xxx.journal | systemd-journal-remote --file 
/tmp/xxx.journal -
  mv /tmp/xxx.journal /var/log/xxx.journal

We could easily provide the functionality to do this automatically,
but I don't know how useful this would be.

 FAIL: corruption detected, your logs are fucked delete the file.
The error is usually at the end, so deleting all entries just because
one is bad does not seem reasonable.

Zbyszek

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] journal repair

2014-09-19 Thread Jóhann B. Guðmundsson


On 09/19/2014 02:10 PM, Zbigniew Jędrzejewski-Szmek wrote:

   journalctl --file /var/log/xxx.journal | systemd-journal-remote --file 
/tmp/xxx.journal -
   mv /tmp/xxx.journal /var/log/xxx.journal

We could easily provide the functionality to do this automatically,
but I don't know how useful this would be.


Granting them the ability to fix this via switch --repair or offer to 
fix it when encountered which they confirm with yes or no or simply do 
it automatically will be very useful for those users that are 
encountering these errors.


It kinda goes hand in hand if you provide the ability to verify you also 
provide the ability to fix it in the process.


JBG
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel