Re: [gentoo-user] Diagnosing file corruption
On Thu, Aug 06, 2015 at 12:00:30PM +1000, wraeth wrote: On 06/08/15 10:34, Bryan Gardiner wrote: After I make a fresh backup of my files, how would you recommend troubleshooting this? Run memtest or a hard drive tester? Since the files seemingly corrupted themselves after install without being touched, I'm highly suspicious of the hard drive, but would like to rule other things out (if say for example that CONFIG_X86_INTEL_PSTATE CPU clock booster is dangerous, or nvidia-drivers, or ...). Haven't checked for corruption on /home yet. One key question that doesn't seem to have been asked yet: have you performed an fsck on the partition? You could try booting to a livecd environment and running fsck -fc /dev/sdXY (adjusting for your device schema accordingly) on your apparently failing partition(s) to see if there is a filesystem corruption... Thanks very much for the suggestions, everyone. I ended up using fsck -fc and -fcc, which resulted in no bad blocks being detected. I also wanted to make sure no other files in that range of disk were corrupted, so I extracted the extents used by the bad files: cat bad-files | while read file; do echo ${file} debugfs -R dump_extents ${file} /dev/mikasa-vg/gentoo done bad-extents found the files in the regions between the bad files: for block in $(seq 5302485 5302486) $(seq 5302489 5302498) $(seq 5302504 5302508); do inode=$(debugfs -R icheck ${block} /dev/mikasa-vg/gentoo 2/dev/null | perl -ne 'if (/^\d+\s+(\d+)$/) {print $1, \n}') if [[ -n $inode ]]; then echo ${block} ${inode} $(debugfs -R ncheck ${inode} /dev/mikasa-vg/gentoo 2/dev/null | awk 'NR==2 {print $2}') else echo ${block} fi done and file'd those to make sure that they were okay. This is only a personal computer, so I'm going to call this a one-off issue and move on, and leave the stronger approaches for another day. Thanks again! Bryan -- If people do not believe that mathematics is simple, it is only because they do not realize how complicated life is - von Neumann signature.asc Description: Digital signature
Re: [gentoo-user] Diagnosing file corruption
On 6 August 2015 at 01:34, Bryan Gardiner b...@khumba.net wrote: Hello list, This is the disk: *-disk description: ATA Disk product: ST1000LM024 HN-M vendor: Seagate physical id: 0.0.0 bus info: scsi@4:0.0.0 logical name: /dev/sda version: 0001 size: 931GiB (1TB) capabilities: gpt-1.00 partitioned partitioned:gpt configuration: ansiversion=5 guid=---- sectorsize=4096 Thanks for any help you can provide, Bryan Complex question. Simple answer... Spinrite :-) -- All the best, Robert
Re: [gentoo-user] Diagnosing file corruption
On Wednesday, August 05, 2015 5:34:43 PM Bryan Gardiner wrote: Hello list, On my most recent update, I had some build failures that led me to find that some files on my root partition have been corrupted. This is a new Asus N550JK laptop, a mostly-stable amd64 install with gentoo-sources-4.0.5 and ext4-root-in-LVM-in-LUKS-on-HDD, and Debian lives in there too (no problems showed up verifying Debian's packages; I installed Debian on Jul 1 and used it for a week before getting time to set up Gentoo). These are the package merge times, package names, and files that I found to be corrupted via qcheck (there were also a couple Python headers that I fixed by rebuilding). They appear to be filled with random data. The binpkg contents in /usr/portage/packages are okay, so I don't know when the files were corrupted; their mtimes haven't been updated since the packages were installed. Thu-Jul-30-22:40:23-2015 app-arch/p7zip-9.20.1-r5 /usr/lib64/p7zip/Lang/va.txt Thu-Jul-30-22:40:23-2015 app-arch/p7zip-9.20.1-r5 /usr/lib64/p7zip/help/cmdline/switches/large_pages.htm Sun-Jul-19-22:34:30-2015 dev-libs/libzip-1.0.1 /usr/share/man/man3/zip_error_get_sys_type.3.bz2 Sun-Jul-26-22:35:28-2015 dev-python/pygments-2.0.1-r1 /usr/lib64/python2.7/site-packages/pygments/styles/pastie.pyc Wed-Jul-08-23:34:56-2015 media-libs/tiff-4.0.3-r6 /usr/share/man/man3/TIFFGetField.3tiff.bz2 Thu-Jul-30-10:05:31-2015 sci-mathematics/scilab-5.5.2 /usr/share/scilab/modules/compatibility_functions/macros/%b_l_s.bin -(from-stage3-on-Jul-8)- sys-apps/acl-2.2.52-r1 /usr/share/man/man3/acl_set_file.3.bz2 I haven't had any unclean shutdowns, it looks like OpenRC is unmounting things cleanly on shutdown, and suspend appears to work fine. After I make a fresh backup of my files, how would you recommend troubleshooting this? Run memtest or a hard drive tester? Since the files seemingly corrupted themselves after install without being touched, I'm highly suspicious of the hard drive, but would like to rule other things out (if say for example that CONFIG_X86_INTEL_PSTATE CPU clock booster is dangerous, or nvidia-drivers, or ...). Haven't checked for corruption on /home yet. This is the disk: *-disk description: ATA Disk product: ST1000LM024 HN-M vendor: Seagate physical id: 0.0.0 bus info: scsi@4:0.0.0 logical name: /dev/sda version: 0001 size: 931GiB (1TB) capabilities: gpt-1.00 partitioned partitioned:gpt configuration: ansiversion=5 guid=---- sectorsize=4096 Thanks for any help you can provide, Bryan You can use badblocks to rule out a bad drive (be sure to read the documentation first if you haven't). But I would guess that something LUKS related is more likely. There may be clues in your log files (probably around the time when you installed these packages). -- Fernando Rodriguez
Re: [gentoo-user] Diagnosing file corruption
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 On 06/08/15 10:34, Bryan Gardiner wrote: After I make a fresh backup of my files, how would you recommend troubleshooting this? Run memtest or a hard drive tester? Since the files seemingly corrupted themselves after install without being touched, I'm highly suspicious of the hard drive, but would like to rule other things out (if say for example that CONFIG_X86_INTEL_PSTATE CPU clock booster is dangerous, or nvidia-drivers, or ...). Haven't checked for corruption on /home yet. One key question that doesn't seem to have been asked yet: have you performed an fsck on the partition? You could try booting to a livecd environment and running fsck -fc /dev/sdXY (adjusting for your device schema accordingly) on your apparently failing partition(s) to see if there is a filesystem corruption... - -- wraeth wra...@wraeth.id.au GnuPG Key: B2D9F759 -BEGIN PGP SIGNATURE- Version: GnuPG v2 iF4EAREIAAYFAlXCv7kACgkQXcRKerLZ91npQwD/U41L/qmK8g7d0bWx6tR3SxbW 4bGheAvX3lWJvgMnG9QA/AuO7wnaKTcWeqoT7c+R7e8UHaaOfwaoS1w2J2hGVINJ =Ykkl -END PGP SIGNATURE-
[gentoo-user] Diagnosing file corruption
Hello list, On my most recent update, I had some build failures that led me to find that some files on my root partition have been corrupted. This is a new Asus N550JK laptop, a mostly-stable amd64 install with gentoo-sources-4.0.5 and ext4-root-in-LVM-in-LUKS-on-HDD, and Debian lives in there too (no problems showed up verifying Debian's packages; I installed Debian on Jul 1 and used it for a week before getting time to set up Gentoo). These are the package merge times, package names, and files that I found to be corrupted via qcheck (there were also a couple Python headers that I fixed by rebuilding). They appear to be filled with random data. The binpkg contents in /usr/portage/packages are okay, so I don't know when the files were corrupted; their mtimes haven't been updated since the packages were installed. Thu-Jul-30-22:40:23-2015 app-arch/p7zip-9.20.1-r5 /usr/lib64/p7zip/Lang/va.txt Thu-Jul-30-22:40:23-2015 app-arch/p7zip-9.20.1-r5 /usr/lib64/p7zip/help/cmdline/switches/large_pages.htm Sun-Jul-19-22:34:30-2015 dev-libs/libzip-1.0.1 /usr/share/man/man3/zip_error_get_sys_type.3.bz2 Sun-Jul-26-22:35:28-2015 dev-python/pygments-2.0.1-r1 /usr/lib64/python2.7/site-packages/pygments/styles/pastie.pyc Wed-Jul-08-23:34:56-2015 media-libs/tiff-4.0.3-r6 /usr/share/man/man3/TIFFGetField.3tiff.bz2 Thu-Jul-30-10:05:31-2015 sci-mathematics/scilab-5.5.2 /usr/share/scilab/modules/compatibility_functions/macros/%b_l_s.bin -(from-stage3-on-Jul-8)- sys-apps/acl-2.2.52-r1 /usr/share/man/man3/acl_set_file.3.bz2 I haven't had any unclean shutdowns, it looks like OpenRC is unmounting things cleanly on shutdown, and suspend appears to work fine. After I make a fresh backup of my files, how would you recommend troubleshooting this? Run memtest or a hard drive tester? Since the files seemingly corrupted themselves after install without being touched, I'm highly suspicious of the hard drive, but would like to rule other things out (if say for example that CONFIG_X86_INTEL_PSTATE CPU clock booster is dangerous, or nvidia-drivers, or ...). Haven't checked for corruption on /home yet. This is the disk: *-disk description: ATA Disk product: ST1000LM024 HN-M vendor: Seagate physical id: 0.0.0 bus info: scsi@4:0.0.0 logical name: /dev/sda version: 0001 size: 931GiB (1TB) capabilities: gpt-1.00 partitioned partitioned:gpt configuration: ansiversion=5 guid=---- sectorsize=4096 Thanks for any help you can provide, Bryan signature.asc Description: Digital signature