Re: [arch-general] journalctl and I/O errors

2014-02-08 Thread Janna Martl
On Thu, Feb 6, 2014 at 1:03 AM, Sébastien Leblanc leblancse...@gmail.comwrote:

 Conclusion (as I understand it):

 1. There is definitely a bug in Journalctl: it crashes (segfaults) on I/O
 errors.

 2. You have a drive that is failing, or your BIOS might not be set
 correctly.


Thanks, all, for the analysis. I have submitted a bug report [1] for
systemd. Also, it seems you were right about hardware failure --
though I still can't get smartctl to acknowledge anything being wrong
(except for nonzero current pending sector count), I ran badblocks,
which found a bunch of errors, including on sectors corresponding to
files outside /var/log/journal. Guess I should get a new drive.


[1] https://bugs.freedesktop.org/show_bug.cgi?id=74714


Re: [arch-general] journalctl and I/O errors

2014-02-08 Thread Sébastien Leblanc
On 8 February 2014 12:06, Janna Martl janna.martl...@gmail.com wrote:

 On Thu, Feb 6, 2014 at 1:03 AM, Sébastien Leblanc 
 leblancse...@gmail.comwrote:

  Conclusion (as I understand it):
 
  1. There is definitely a bug in Journalctl: it crashes (segfaults) on I/O
  errors.
 
  2. You have a drive that is failing, or your BIOS might not be set
  correctly.
 

 Thanks, all, for the analysis. I have submitted a bug report [1] for
 systemd. Also, it seems you were right about hardware failure --
 though I still can't get smartctl to acknowledge anything being wrong
 (except for nonzero current pending sector count), I ran badblocks,
 which found a bunch of errors, including on sectors corresponding to
 files outside /var/log/journal. Guess I should get a new drive.


 [1] https://bugs.freedesktop.org/show_bug.cgi?id=74714


Do not rely too much on SMART data; in their hard disk failure study,
Google concluded that when SMART reports a drive as unhealthy, it is
often right, but for many drives that failed, SMART was still
reporting them as healthy.

 Failure Trends in a Large Disk Drive Population, by Google Inc.
 [...]
 Our analysis identifies several parameters from the drive’s
 self monitoring facility (SMART) that correlate highly with
 failures. Despite this high correlation, we conclude that mod-
 els based on SMART parameters alone are unlikely to be useful
 for predicting individual drive failures

 http://static.googleusercontent.com/media/research.google.com/en//archive/disk_failures.pdf

-- 
Sébastien Leblanc


Re: [arch-general] journalctl and I/O errors

2014-02-05 Thread Martti Kühne
On Tue, Feb 4, 2014 at 7:33 AM, Janna Martl janna.martl...@gmail.com wrote:
 On Mon, Feb 3, 2014 at 5:48 AM, Martti Kühne mysat...@gmail.com wrote:
Mind sharing the coredump so we could have a look? :)

 Here's a backtrace, in case that helps:

 (gdb) bt full
 #0  journal_file_move_to_object (f=f@entry=0x969cbc0, type=type@entry=3,
 offset=2638264, ret=ret@entry=0xbfc6505c) at src/journal/journal-file.c:434

Uhh...

That error is inside the inner workings of the journal. The data
received there is obviously corrupted. Be really careful with that
drive now, I suggest immediate replacement unless you prefer relying
on imposition of hands and summoning the gods of sheer luck to get the
same data back that was written.

cheers!
mar77i


Re: [arch-general] journalctl and I/O errors

2014-02-05 Thread Sébastien Leblanc
Conclusion (as I understand it):

1. There is definitely a bug in Journalctl: it crashes (segfaults) on I/O
errors.

2. You have a drive that is failing, or your BIOS might not be set
correctly. This is causing the I/O errors. How large is the drive? You
might have to turn off settings such as SATA legacy compatibility or the
like -- I had a 3TB drive that would cause ATA command errors on a ~2006
computer; I found this option in the BIOS and as soon as I turned it off
everything worked perfectly.

Although journalctl should not crash on I/O errors, I think it is not
unreasonable to assume that many other apps do not tolerate I/O errors
either. So I would say: you should still report this bug upstream.

-- 
Sébastien Leblanc


Re: [arch-general] journalctl and I/O errors

2014-02-05 Thread Anatol Pomozov
Hi

I agree on both points.

On Wed, Feb 5, 2014 at 10:03 PM, Sébastien Leblanc
leblancse...@gmail.com wrote:
 Conclusion (as I understand it):

 1. There is definitely a bug in Journalctl: it crashes (segfaults) on I/O
 errors.

A few months ago I had a problem with btrfs. I set +C attribute
(disable copy-on-write) on existing journal files. Btrfs recommends to
put the attribute on empty files and seems was confused that I applied
it to non-empty files. Btrfs started returning IO error when I was
trying to read the file with 'less' and journalctl started crashing
with segfault. It is very similar to what being discussed here. And I
agree that journalctl should play nicer.

If anyone still sees this problem please run 'strace journalctl ...'.
If it shows that a filesystem operation returns IO error right before
SEGFAULT then it proves current thesis.

 2. You have a drive that is failing, or your BIOS might not be set
 correctly. This is causing the I/O errors. How large is the drive? You
 might have to turn off settings such as SATA legacy compatibility or the
 like -- I had a 3TB drive that would cause ATA command errors on a ~2006
 computer; I found this option in the BIOS and as soon as I turned it off
 everything worked perfectly.

 Although journalctl should not crash on I/O errors, I think it is not
 unreasonable to assume that many other apps do not tolerate I/O errors
 either. So I would say: you should still report this bug upstream.

 --
 Sébastien Leblanc


Re: [arch-general] journalctl and I/O errors

2014-02-03 Thread Martti Kühne
On Thu, Jan 30, 2014 at 7:28 PM, Janna Martl janna.martl...@gmail.com wrote:
 You are able to read all the journald files, aren't you? You are running
 cp -r so it looks like so.

 IO errors can also cause a process to hang, and enter the dreaded D
 state.


 It varies. Sometimes it hangs as a D-state process; sometimes it
 explicitly says 'segmentation fault' and other times it just says 'bus
 error (core dumped)'.

 Also cp -r does die with exit status 1 and errors

 /bin/cp: error reading
 'journal/8ba8e3a22718630e590610ed07dc/system@81fd64a9006446f2af954d739934edeb-00073a5b-0004f09d7cc446d8.journal':
 Input/output error
 /bin/cp: failed to extend
 'journal-new/8ba8e3a22718630e590610ed07dc/system@81fd64a9006446f2af954d739934edeb-00073a5b-0004f09d7cc446d8.journal':
 Input/output error

 Most of the files are copied though and everything works fine
 afterward.


Mind sharing the coredump so we could have a look? :)


Re: [arch-general] journalctl and I/O errors

2014-02-03 Thread Janna Martl
On Mon, Feb 3, 2014 at 5:48 AM, Martti Kühne mysat...@gmail.com wrote:
Mind sharing the coredump so we could have a look? :)

Here's a backtrace, in case that helps:

(gdb) bt full
#0  journal_file_move_to_object (f=f@entry=0x969cbc0, type=type@entry=3,
offset=2638264, ret=ret@entry=0xbfc6505c) at src/journal/journal-file.c:434
r = 1
t = 0xb2c781b8
o = 0xb2c781b8
__PRETTY_FUNCTION__ = journal_file_move_to_object
#1  0x0805fb37 in generic_array_get (f=f@entry=0x969cbc0, first=375376,
i=optimized out, ret=ret@entry=0xbfc651ac, offset=offset@entry=0xbfc651b0)
at src/journal/journal-file.c:1443
o = 0xb2c68b40
p = optimized out
a = optimized out
t = 3158
r = optimized out
ci = 0x9687280
__PRETTY_FUNCTION__ = generic_array_get
#2  0x08060269 in journal_file_next_entry (f=f@entry=0x969cbc0,
o=o@entry=0xb2c77c38,
p=2636856, direction=direction@entry=DIRECTION_DOWN, ret=ret@entry=0xbfc651ac,

offset=offset@entry=0xbfc651b0) at src/journal/journal-file.c:1920
i = 3181
r = optimized out
__PRETTY_FUNCTION__ = journal_file_next_entry
#3  0x0805855c in next_with_matches (j=j@entry=0x9670850, f=f@entry=0x969cbc0,
direction=direction@entry=DIRECTION_DOWN, ret=ret@entry=0xbfc651ac,
offset=offset@entry=0xbfc651b0)
at src/journal/sd-journal.c:814
c = 0xb2c77c38
cp = 2636856
__PRETTY_FUNCTION__ = next_with_matches
#4  0x0805a744 in next_beyond_location (offset=synthetic pointer,
ret=0xbfc651a4, direction=DIRECTION_DOWN, f=0x969cbc0, j=0x9670850) at
src/journal/sd-journal.c:836
c = 0xb2c77c38
cp = 2636856
r = optimized out
#5  real_journal_next (j=j@entry=0x9670850,
direction=direction@entry=DIRECTION_DOWN)
at src/journal/sd-journal.c:895
found = optimized out
f = 0x969cbc0
new_file = 0x969b4e0
new_offset = 399336
o = 0xb324f718
p = optimized out
i = 0x968df04
r = optimized out
__func__ = real_journal_next
#6  0x0805b08e in sd_journal_next (j=0x9670850) at
src/journal/sd-journal.c:931
No locals.
#7  0x0804daa1 in main (argc=1, argv=0xbfc65574) at
src/journal/journalctl.c:1573
flags = optimized out
r = 0
j = 0x9670850
need_seek = true
previous_boot_id = {bytes =
X\357`\266L\307O̟\274\356\232\177\241\b2, qwords = {14722204839188688728,
3605309071142337695}}
previous_boot_id_valid = true
first_line = true
n_shown = 41695
ellipsized = false
__func__ = main


Re: [arch-general] journalctl and I/O errors

2014-01-30 Thread Nowaker

A couple months ago, I started getting I/O errors (see below) whenever
I tried to do journalctl


You are able to read all the journald files, aren't you? You are running 
cp -r so it looks like so. `cp` would die with non-zero exit status if 
there were read errors I guess. If it's possible to read the file, 
journalctl should not segfault IMO, so it should be OK to file an issue.


--
Kind regards,
Damian Nowak
StratusHost
www.AtlasHost.eu


Re: [arch-general] journalctl and I/O errors

2014-01-30 Thread Paul Gideon Dann
On Thursday 30 Jan 2014 11:46:38 Nowaker wrote:
  A couple months ago, I started getting I/O errors (see below) 
whenever
  I tried to do journalctl
 
 You are able to read all the journald files, aren't you? You are running
 cp -r so it looks like so. `cp` would die with non-zero exit status if
 there were read errors I guess. If it's possible to read the file,
 journalctl should not segfault IMO, so it should be OK to file an issue.

IO errors can also cause a process to hang, and enter the dreaded D 
state.  These processes usually can't be killed, because they're blocked 
in the kernel.

Paul


Re: [arch-general] journalctl and I/O errors

2014-01-30 Thread Thomas Bächler
Am 30.01.2014 11:46, schrieb Nowaker:
 If it's possible to read the file,
 journalctl should not segfault IMO, so it should be OK to file an issue.

No program should ever segfault. Unexpected input or errors must be
handled properly.




signature.asc
Description: OpenPGP digital signature


Re: [arch-general] journalctl and I/O errors

2014-01-30 Thread Leonid Isaev
On Thu, 30 Jan 2014 01:11:54 -0500
Janna Martl janna.martl...@gmail.com wrote:

 A couple months ago, I started getting I/O errors (see below) whenever
 I tried to do journalctl -n X for sufficiently large X (and journalctl
 would segfault). I assumed my hard drive was going to die, but in the
 mean time this was annoying so I did:
 
 # cp -rp /var/log/journal{,-clone}
 # rm -rf /var/log/journal
 # mv /var/log/journal{-clone,}
 
 There were a few errors after step 1 but this made the segfaulting
 stop. A few weeks later, same problem, same fix. A few weeks later,
 ditto. In all this time, the only I/O errors I've ever seen have been
 related to accessing the journal. The SMART data for the drive says
 that the current pending sector count is 57, but the reallocated
 sector count is 0.
 
 So if I'm reading this correctly, there have been a bunch of instances
 of unreadable sectors, somehow only pertaining to one directory
 despite efforts to make an independent copy of the data (which should
 put it in random other sectors), but none of them have been confirmed
 bad. So at this point I'm guessing it's a systemd bug (and there are
 no actual bad sectors), but it would be great if someone could confirm
 that any of this actually makes sense before I file a bug report based
 on ignorance and speculation.

The errors are pretty low-level. So, this is either hard drive failing in
an obscure fashion (I/O errors do not necessarily imply bad blocks) or a
strange filesystem problem.

Try running smart long test and a thorough fsck. If those come up clean, try
mounting another partition (or bind mounting a directory from another
partition) at /var/log/journal and see if errors persist. Finally, if /var is
a separate partition, I'd back it up, fill it with zeroes (cat /dev/zero
/dev/sdXY), and if no errors are encountered, restore. Hopefully, the drive
firmware will relocate all bad sectors (if any).

Cheers,
L.
 
 
 Thanks!
 
 - J.M.
 
 
 
 ---
 Example I/O error:
 
 ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
 ata1.00: BMDMA stat 0x5
 ata1.00: failed command: READ DMA EXT
 ata1.00: cmd 25/00:08:c8:ca:92/00:00:2e:00:00/e0 tag 0 dma 4096 in
  res 51/40:05:cb:ca:92/40:00:2e:00:00/ee Emask 0x9 (media error)
 ata1.00: status: { DRDY ERR }
 ata1.00: error: { UNC }
 ata1.00: configured for UDMA/100
 sd 0:0:0:0: [sda] Unhandled sense code
 sd 0:0:0:0: [sda]
 mResult: hostbyte=0x00 driverbyte=0x08
 sd 0:0:0:0: [sda]
 mSense Key : 0x3 [current] [descriptor]
 mDescriptor sense data with sense descriptors (in hex):
 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
 2e 92 ca cb
 sd 0:0:0:0: [sda]
 mASC=0x11 ASCQ=0x4
 sd 0:0:0:0: [sda] CDB:
 mcdb[0]=0x28: 28 00 2e 92 ca c8 00 00 08 00
 end_request: I/O error, dev sda, sector 781372107
 ata1: EH complete



-- 
Leonid Isaev
GnuPG key: 0x164B5A6D
Fingerprint: C0DF 20D0 C075 C3F1 E1BE  775A A7AE F6CB 164B 5A6D


signature.asc
Description: PGP signature


Re: [arch-general] journalctl and I/O errors

2014-01-30 Thread Janna Martl
 You are able to read all the journald files, aren't you? You are running
 cp -r so it looks like so.

 IO errors can also cause a process to hang, and enter the dreaded D
 state.


It varies. Sometimes it hangs as a D-state process; sometimes it
explicitly says 'segmentation fault' and other times it just says 'bus
error (core dumped)'.

Also cp -r does die with exit status 1 and errors

/bin/cp: error reading
'journal/8ba8e3a22718630e590610ed07dc/system@81fd64a9006446f2af954d739934edeb-00073a5b-0004f09d7cc446d8.journal':
Input/output error
/bin/cp: failed to extend
'journal-new/8ba8e3a22718630e590610ed07dc/system@81fd64a9006446f2af954d739934edeb-00073a5b-0004f09d7cc446d8.journal':
Input/output error

Most of the files are copied though and everything works fine
afterward.


[arch-general] journalctl and I/O errors

2014-01-29 Thread Janna Martl
A couple months ago, I started getting I/O errors (see below) whenever
I tried to do journalctl -n X for sufficiently large X (and journalctl
would segfault). I assumed my hard drive was going to die, but in the
mean time this was annoying so I did:

# cp -rp /var/log/journal{,-clone}
# rm -rf /var/log/journal
# mv /var/log/journal{-clone,}

There were a few errors after step 1 but this made the segfaulting
stop. A few weeks later, same problem, same fix. A few weeks later,
ditto. In all this time, the only I/O errors I've ever seen have been
related to accessing the journal. The SMART data for the drive says
that the current pending sector count is 57, but the reallocated
sector count is 0.

So if I'm reading this correctly, there have been a bunch of instances
of unreadable sectors, somehow only pertaining to one directory
despite efforts to make an independent copy of the data (which should
put it in random other sectors), but none of them have been confirmed
bad. So at this point I'm guessing it's a systemd bug (and there are
no actual bad sectors), but it would be great if someone could confirm
that any of this actually makes sense before I file a bug report based
on ignorance and speculation.

Thanks!

- J.M.



---
Example I/O error:

ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
ata1.00: BMDMA stat 0x5
ata1.00: failed command: READ DMA EXT
ata1.00: cmd 25/00:08:c8:ca:92/00:00:2e:00:00/e0 tag 0 dma 4096 in
 res 51/40:05:cb:ca:92/40:00:2e:00:00/ee Emask 0x9 (media error)
ata1.00: status: { DRDY ERR }
ata1.00: error: { UNC }
ata1.00: configured for UDMA/100
sd 0:0:0:0: [sda] Unhandled sense code
sd 0:0:0:0: [sda]
mResult: hostbyte=0x00 driverbyte=0x08
sd 0:0:0:0: [sda]
mSense Key : 0x3 [current] [descriptor]
mDescriptor sense data with sense descriptors (in hex):
72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
2e 92 ca cb
sd 0:0:0:0: [sda]
mASC=0x11 ASCQ=0x4
sd 0:0:0:0: [sda] CDB:
mcdb[0]=0x28: 28 00 2e 92 ca c8 00 00 08 00
end_request: I/O error, dev sda, sector 781372107
ata1: EH complete