Re: F21 fails to mount root part, btrfs check: Couldn't open file system
On Tue, Mar 31, 2015 at 4:09 PM, Chris Murphy li...@colorremedies.com wrote: A failure of the HDD cannot be ruled out, low power conditions, cheap consumer part... Well you have to rule that out before anyone on this list can really help. Try booting Fedora 21 install media, and using smartctl -x on the drive. smartctl thinks the drive is ok. Unfortunately, it doesn't have a truth serum to distinguish whether this drive lies about writes or not... [root@localhost liveuser]# smartctl -x /dev/sda smartctl 6.2 2014-07-16 r3952 [x86_64-linux-3.17.4-301.fc21.x86_64] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Seagate Laptop SSHD Device Model: ST500LM000-1EJ162 Serial Number:W3709VQD LU WWN Device Id: 5 000c50 069f901e9 Firmware Version: SM14 User Capacity:500,107,862,016 bytes [500 GB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate:5400 rpm Device is:In smartctl database [for details use: -P show] ATA Version is: ACS-2, ACS-3 T13/2161-D revision 3b SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is:Wed Apr 1 11:44:43 2015 EDT SMART support is: Available - device has SMART capability. SMART support is: Enabled AAM feature is: Unavailable APM level is: 128 (minimum power consumption without standby) Rd look-ahead is: Enabled Write cache is: Enabled ATA Security is: Disabled, frozen [SEC2] Write SCT (Get) XXX Error Recovery Control Command failed: scsi error aborted command Wt Cache Reorder: N/A === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00)Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0)The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 139) seconds. Offline data collection capabilities: (0x73) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. No Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities:(0x0003)Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability:(0x01)Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 98) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities:(0x1081)SCT Status supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAGSVALUE WORST THRESH FAIL RAW_VALUE 1 Raw_Read_Error_Rate POSR-- 112 099 006-46707576 3 Spin_Up_TimePO 099 099 000-0 4 Start_Stop_Count-O--CK 100 100 020-147 5 Reallocated_Sector_Ct PO--CK 100 100 010-0 7 Seek_Error_Rate POSR-- 078 060 030-65832005 9 Power_On_Hours -O--CK 092 092 000-7775 10 Spin_Retry_CountPO--C- 100 100 097-0 12 Power_Cycle_Count -O--CK 100 100 020-159 184 End-to-End_Error-O--CK 100 100 099-0 187 Reported_Uncorrect -O--CK 100 100 000-0 188 Command_Timeout -O--CK 100 100 000-1 189 High_Fly_Writes -O-RCK 095 095 000-5 190 Airflow_Temperature_Cel -O---K 070 058 045-30 (Min/Max 27/31) 191 G-Sense_Error_Rate -O--CK 100 100 000-0 192 Power-Off_Retract_Count -O--CK 100 100 000-25 193 Load_Cycle_Count-O--CK 066 066 000-68484 194 Temperature_Celsius -O---K 030 042 000-30 (0 16 0 0 0) 197 Current_Pending_Sector -O--C- 100 100 000-0 198 Offline_Uncorrectable C- 100 100 000-0 199 UDMA_CRC_Error_Count-OSRCK 200 200 000-0 254 Free_Fall_Sensor-O--CK 100 100 000-0 ||_ K auto-keep |__ C event count ___ R error rate ||| S speed/performance ||_ O updated online
Re: F21 fails to mount root part, btrfs check: Couldn't open file system
Hi Chris, list, thanks for your debugging ideas so far. Now this gets interesting. I booted off a LiveUSB disk, and it just mounted sysroot. WTH? See below. Perhaps the newer kernel (in latest F21) has regressed in handling some kinds of errors during mount, or the dracut/systemd mounting process is less resilient than mounting under a fully booted system? [root@localhost liveuser]# uname -a Linux localhost 3.17.4-301.fc21.x86_64 #1 SMP Thu Nov 27 19:09:10 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux ## Before booting into liveUSB, I made a copy of ## rdsosreport.txt in the /home partition ## which is a separate btrfs fs, and seems ## to not be affected by the problem at all [root@localhost liveuser]# mkdir /myhome [root@localhost liveuser]# mkdir /mysysroot [root@localhost liveuser]# mount /dev/sda2 /myhome [root@localhost liveuser]# ls /myhome home rdsosreport.txt [root@localhost liveuser]# fpaste /myhome/rdsosreport.txt Uploading (93.4KiB)... http://ur1.ca/k2zue - http://paste.fedoraproject.org/205971/01928142 Strange, on first book from live USB F21 image, it just mounts ## (I tried about half a dozen cold boots earlier -- all resulting in ## the same initramfs/dracut/systemd emergency shell...) [root@localhost liveuser]# mount /dev/sda6 /mysysroot Apr 01 11:26:51 localhost kernel: BTRFS info (device sda6): disk space caching is enabled Apr 01 11:26:56 localhost kernel: BTRFS: checking UUID tree Apr 01 11:26:56 localhost kernel: SELinux: initialized (dev sda6, type btrfs), uses xattr [root@localhost liveuser]# ls /mysysroot root [root@localhost liveuser]# ls /mysysroot/root bin boot dev etc home lib lib64 media mnt opt proc root run sbin srv sys sysroot tmp usr var [root@localhost liveuser]# umount /dev/sda6 [root@localhost liveuser]# btrfs check /dev/sda6 Checking filesystem on /dev/sda6 UUID: 94637b35-a294-4be2-aa47-82c52d6d53ef checking extents checking free space cache checking fs roots root 256 inode 39841 errors 400, nbytes wrong found 7747100703 bytes used err is 1 total csum bytes: 11912932 total tree bytes: 476725248 total fs tree bytes: 434733056 total extent tree bytes: 22986752 btree space waste bytes: 83962424 file data blocks allocated: 30820143104 referenced 11997040640 Btrfs v3.17 [root@localhost liveuser]# btrfs check --repair /dev/sda6 enabling repair mode Fixed 0 roots. Checking filesystem on /dev/sda6 UUID: 94637b35-a294-4be2-aa47-82c52d6d53ef checking extents checking free space cache cache and super generation don't match, space cache will be invalidated checking fs roots root 256 inode 39841 errors 400, nbytes wrong found 7747100703 bytes used err is 1 total csum bytes: 11912932 total tree bytes: 476725248 total fs tree bytes: 434733056 total extent tree bytes: 22986752 btree space waste bytes: 83962424 file data blocks allocated: 30820143104 referenced 11997040640 Btrfs v3.17 EOM cheers, martin -- martin.langh...@gmail.com - ask interesting questions - don't get distracted with shiny stuff - working code first ~ http://docs.moodle.org/en/User:Martin_Langhoff -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: F21 fails to mount root part, btrfs check: Couldn't open file system
On Wed, Apr 1, 2015 at 11:42 AM, Martin Langhoff martin.langh...@gmail.com wrote: See below. Perhaps the newer kernel (in latest F21) has regressed in handling some kinds of errors during mount, or the dracut/systemd mounting process is less resilient than mounting under a fully booted system? This is getting even more interesting. Under 3.17.4-301.fc21.x86 from LiveUSB, I could mount, even repair the disk. Since the repair, the on-disk latest kernel (3.18.9-200.fc21) tries to boot, but dracut/systemd time out on mounting sysroot after waiting for quite a while. I don't get a dracut shell anymore so the failure mode has changed. I may try to set a breakpoint to force a shell. I do have an earlier F21 kernel on disk-- 3.18.7-200.fc21 -- and this boots the system without a glitch. After a complete boot with 3.18.7.200, clean shutdown and booting into 3.18.9-200 is still broken, same failure mode. Will try to capture some info from a dracut breakpoint (I'll try mount). At this point this really looks like a regression. cheers, martin -- martin.langh...@gmail.com - ask interesting questions - don't get distracted with shiny stuff - working code first ~ http://docs.moodle.org/en/User:Martin_Langhoff -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: F21 fails to mount root part, btrfs check: Couldn't open file system
Whenever I have these boot problems, I'm noticing that sometimes the device, /dev/sda5, is showing up with lsblk (libblkid) as /dev/block/8:5 while everything else (not-Btrfs) on that device shows up as /dev/sdaX. Does anyone know what that might mean? Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: F21 fails to mount root part, btrfs check: Couldn't open file system
On Wed, Apr 1, 2015 at 2:04 PM, Chris Murphy li...@colorremedies.com wrote: When I had this same btrfs check error, it was the exact inode number and same /etc/shadow file. I didn't diff the two shadow files, but I That's too bizarre for words. Two folks, on two different systems, getting btrfs problems on similar kernels on the exact same filepath. In my case, the file was last frobbed by yum/rpm. Do we have a strange interaction between a kernel regression and yum/rpm rubbing the filesystem the wrong way? BTW, I did not change/touch the file at all. My only fix action was the btrfs check --repair mentioned earlier. Right now, on the booted system I did # uname -a Linux tp-martin.remote-learner.net 3.18.9-200.fc21.x86_64 #1 SMP Mon Mar 9 15:10:50 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux # btrfs scrub start -BrR / scrub done for 94637b35-a294-4be2-aa47-82c52d6d53ef scrub started at Wed Apr 1 13:46:20 2015 and finished after 266 seconds data_extents_scrubbed: 344155 tree_extents_scrubbed: 58048 data_bytes_scrubbed: 11896840192 tree_bytes_scrubbed: 951058432 read_errors: 0 csum_errors: 0 verify_errors: 0 no_csum: 20268 csum_discards: 254459 super_errors: 0 malloc_errors: 0 uncorrectable_errors: 0 unverified_errors: 0 corrected_errors: 0 last_physical: 23928504320 cheers, m -- martin.langh...@gmail.com - ask interesting questions - don't get distracted with shiny stuff - working code first ~ http://docs.moodle.org/en/User:Martin_Langhoff -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: F21 fails to mount root part, btrfs check: Couldn't open file system
On Wed, Apr 1, 2015 at 12:16 PM, Martin Langhoff martin.langh...@gmail.com wrote: On Wed, Apr 1, 2015 at 2:04 PM, Chris Murphy li...@colorremedies.com wrote: When I had this same btrfs check error, it was the exact inode number and same /etc/shadow file. I didn't diff the two shadow files, but I That's too bizarre for words. Two folks, on two different systems, getting btrfs problems on similar kernels on the exact same filepath. In my case, the file was last frobbed by yum/rpm. Do we have a strange interaction between a kernel regression and yum/rpm rubbing the filesystem the wrong way? No idea, but it happened to me more than once, same inode number, same file. BTW, I did not change/touch the file at all. My only fix action was the btrfs check --repair mentioned earlier. That won't fix it. Once errors 400 appears, at this point you have to replace the affected file. -- Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: F21 fails to mount root part, btrfs check: Couldn't open file system
On Wed, Apr 1, 2015 at 2:20 PM, Chris Murphy li...@colorremedies.com wrote: That won't fix it. Once errors 400 appears, at this point you have to replace the affected file. Interesting. Right now I am booting without problems. I have no evidence of continued problems. What would I do to check whether I see an error similar to yours on this fs? Trying to ascertain whether my fs is cured, and whether we can learn something else about this oddity... cheers, m -- martin.langh...@gmail.com - ask interesting questions - don't get distracted with shiny stuff - working code first ~ http://docs.moodle.org/en/User:Martin_Langhoff -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: F21 fails to mount root part, btrfs check: Couldn't open file system
On Wed, Apr 1, 2015 at 9:42 AM, Martin Langhoff martin.langh...@gmail.com wrote: # mount /dev/sda6 /mysysroot Apr 01 11:26:51 localhost kernel: BTRFS info (device sda6): disk space caching is enabled Apr 01 11:26:56 localhost kernel: BTRFS: checking UUID tree Apr 01 11:26:56 localhost kernel: SELinux: initialized (dev sda6, type btrfs), uses xattr Right so it mounts fine with no errors from live media, but won't mount at boot time. Same problem I was having. # btrfs check /dev/sda6 Checking filesystem on /dev/sda6 UUID: 94637b35-a294-4be2-aa47-82c52d6d53ef checking extents checking free space cache checking fs roots root 256 inode 39841 errors 400, nbytes wrong mount /dev/sda6 /mnt btrfs inspect-internal inode-resolve 39841 /mnt It should resolve a path to file for that inode. Chances are you can just use cp to make a new copy of it, delete the original, and rename the copy to match the original file name. Unmount. And now the btrfs check error won't happen. [root@localhost liveuser]# btrfs check --repair /dev/sda6 enabling repair mode Fixed 0 roots. Checking filesystem on /dev/sda6 UUID: 94637b35-a294-4be2-aa47-82c52d6d53ef checking extents checking free space cache cache and super generation don't match, space cache will be invalidated checking fs roots root 256 inode 39841 errors 400, nbytes wrong found 7747100703 bytes used err is 1 total csum bytes: 11912932 total tree bytes: 476725248 total fs tree bytes: 434733056 total extent tree bytes: 22986752 btree space waste bytes: 83962424 file data blocks allocated: 30820143104 referenced 11997040640 Btrfs v3.17 Yeah I don't know what this errors 400 nbytes wrong means, but at the moment btrfs-progs doesn't fix it. -- Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: F21 fails to mount root part, btrfs check: Couldn't open file system
On Wed, Apr 1, 2015 at 10:15 AM, Martin Langhoff martin.langh...@gmail.com wrote: On Wed, Apr 1, 2015 at 11:42 AM, Martin Langhoff martin.langh...@gmail.com wrote: See below. Perhaps the newer kernel (in latest F21) has regressed in handling some kinds of errors during mount, or the dracut/systemd mounting process is less resilient than mounting under a fully booted system? This is getting even more interesting. Under 3.17.4-301.fc21.x86 from LiveUSB, I could mount, even repair the disk. Since the repair, the on-disk latest kernel (3.18.9-200.fc21) tries to boot, but dracut/systemd time out on mounting sysroot after waiting for quite a while. I don't get a dracut shell anymore so the failure mode has changed. I may try to set a breakpoint to force a shell. I do have an earlier F21 kernel on disk-- 3.18.7-200.fc21 -- and this boots the system without a glitch. After a complete boot with 3.18.7.200, clean shutdown and booting into 3.18.9-200 is still broken, same failure mode. Will try to capture some info from a dracut breakpoint (I'll try mount). At this point this really looks like a regression. Yeah I don't know what's going on, but with a new file system, and disabled i915 to avoid crashes, and thus no crashes since the new fs was created, I get boot failure with 3.19.3 but not 3.19.2, and I can't figure out why. I get the systemd cylon eye with 5 services pending so I can't actually tell which one it's hung up on, but one of them is looking for the fs volume UUID and apparently can't find it which is completely bogus. -- Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: F21 fails to mount root part, btrfs check: Couldn't open file system
On Wed, Apr 1, 2015 at 1:03 PM, Chris Murphy li...@colorremedies.com wrote: mount /dev/sda6 /mnt btrfs inspect-internal inode-resolve 39841 /mnt on the booted system... # uname -a Linux tp-martin.remote-learner.net 3.18.9-200.fc21.x86_64 #1 SMP Mon Mar 9 15:10:50 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux # btrfs inspect-internal inode-resolve 39841 / //etc/shadow- # diff -u /etc/shadow{,-} --- /etc/shadow 2015-03-04 02:26:59.478255332 -0500 +++ /etc/shadow-2015-03-04 02:26:59.0 -0500 @@ -42,4 +42,3 @@ systemd-timesync:!!:16498:: systemd-network:!!:16498:: systemd-resolve:!!:16498:: -systemd-bus-proxy:!!:16498:: Bizarre. cheers, m -- martin.langh...@gmail.com - ask interesting questions - don't get distracted with shiny stuff - working code first ~ http://docs.moodle.org/en/User:Martin_Langhoff -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: F21 fails to mount root part, btrfs check: Couldn't open file system
On Wed, Apr 1, 2015 at 12:29 PM, Martin Langhoff martin.langh...@gmail.com wrote: On Wed, Apr 1, 2015 at 2:20 PM, Chris Murphy li...@colorremedies.com wrote: That won't fix it. Once errors 400 appears, at this point you have to replace the affected file. Interesting. Right now I am booting without problems. I have no evidence of continued problems. What would I do to check whether I see an error similar to yours on this fs? Trying to ascertain whether my fs is cured, and whether we can learn something else about this oddity... Re-run the btrfs check. The error is still there even after a --repair. -- Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: F21 fails to mount root part, btrfs check: Couldn't open file system
On Wed, Apr 1, 2015 at 2:54 PM, Chris Murphy li...@colorremedies.com wrote: Re-run the btrfs check. The error is still there even after a --repair. Bingo! You are right the error persists. It has no effect on my use of the system right now. Is anyone interested in debugging this further? cheers, martin -- martin.langh...@gmail.com - ask interesting questions - don't get distracted with shiny stuff - working code first ~ http://docs.moodle.org/en/User:Martin_Langhoff -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: F21 fails to mount root part, btrfs check: Couldn't open file system
On Wed, Apr 1, 2015 at 11:26 AM, Martin Langhoff martin.langh...@gmail.com wrote: On Wed, Apr 1, 2015 at 1:03 PM, Chris Murphy li...@colorremedies.com wrote: mount /dev/sda6 /mnt btrfs inspect-internal inode-resolve 39841 /mnt on the booted system... # uname -a Linux tp-martin.remote-learner.net 3.18.9-200.fc21.x86_64 #1 SMP Mon Mar 9 15:10:50 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux # btrfs inspect-internal inode-resolve 39841 / //etc/shadow- # diff -u /etc/shadow{,-} --- /etc/shadow 2015-03-04 02:26:59.478255332 -0500 +++ /etc/shadow-2015-03-04 02:26:59.0 -0500 @@ -42,4 +42,3 @@ systemd-timesync:!!:16498:: systemd-network:!!:16498:: systemd-resolve:!!:16498:: -systemd-bus-proxy:!!:16498:: Bizarre. When I had this same btrfs check error, it was the exact inode number and same /etc/shadow file. I didn't diff the two shadow files, but I the the cp mv rm routine, and then the system booted. Goofy cakes. It's almost like an April Fools joke. -- Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: F21 fails to mount root part, btrfs check: Couldn't open file system
Related bugs: https://bugzilla.kernel.org/show_bug.cgi?id=68411 https://bugzilla.redhat.com/show_bug.cgi?id=1037963 The RHBZ one also mentioned the shadow file. Anyway, it seems to be a somewhat known problem, but it's just not known yet what causes it. Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: F21 fails to mount root part, btrfs check: Couldn't open file system
On Wed, Apr 1, 2015 at 1:23 PM, Martin Langhoff martin.langh...@gmail.com wrote: On Wed, Apr 1, 2015 at 2:54 PM, Chris Murphy li...@colorremedies.com wrote: Re-run the btrfs check. The error is still there even after a --repair. Bingo! You are right the error persists. It has no effect on my use of the system right now. Is anyone interested in debugging this further? 400 errors, nbytes wrong, isn't repaired by current btrfs check https://bugzilla.kernel.org/show_bug.cgi?id=90071 What's interesting in that bug report that I'd forgotten about? # btrfs inspect inode 804 /mnt/root /mnt/root/etc/shadow- Different inode number, but the shadow file is affected. In every single case I've had now (about 1/2 dozen) with this errors 400 message, it's involved the shadow file. I have no idea what's going on between Btrfs and the shadow file, but something seems to be. Or it's quite a coincidence. -- Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: F21 fails to mount root part, btrfs check: Couldn't open file system
On Tue, Mar 31, 2015 at 5:54 PM, Martin Langhoff martin.langh...@gmail.com wrote: On Tue, Mar 31, 2015 at 4:09 PM, Chris Murphy li...@colorremedies.com wrote: There should be a reference to an rdsosreport.txt in /run/... so find a way to get that posted somewhere. I'll try, but it truly says nothing of interest from a block device / btrfs PoV. I have ample background debugging boot issues, disk corruption, etc from years of work w OLPC. If there is no reference in this dracut shell to rdsosreport.txt, then use: journalctl -b -l -o short-monotonic You can mount anything at /sysroot including the boot partition if you want, or a USB stick. The usual directories aren't available in the initramfs before switchroot happens. That's a good idea! I was referring to something else -- I guess what I'm trying to say is: I'm not sure if this scrambled disk partition is a btrfs/kernel bug, or the cheap HDD lied about flushing a write to disk. This is the realm of both esoteric knowledge and an active area of research how to get reliable information about what happened when the power cut out. So you're not the only one not sure. -- Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: F21 fails to mount root part, btrfs check: Couldn't open file system
Chris Murphy li...@colorremedies.com schrieb: On Tue, Mar 31, 2015 at 4:15 PM, Chris Murphy li...@colorremedies.com wrote: The i915 regression right now is really annoying. With a Samsung 840 EVO I've had inexplicable and non-deterministic boot failures. Clarification: the boot failures happen following the i915 panic and subsequent forced power off. Yeah I thought that, too, because after hitting reset it looked like one hard disk didn't appear in dmesg and thus btrfs didn't mount (btrfs-raid). So I turned the machine off completely because I had similar issues with i915 freezes and strange boot issues during the following boot before. It looks like the GPU is not necessarily completely reset when hitting the reset button. But that's another story. In my case the hard disk was there - I didn't just scan hard enough through the huge pile of logs. I had to btrfs-zero-log, wrote reboot into the rescue shell, kernel came back, mount still locking up and sitting there until systemd decided to throw me to emergency after 5 minutes of waiting or so. I've rebooted again, machine came up. This was a few reboots after the machine was powered off, so I'd rule any GPU freeze artifacts out here. I just needed multiple reboots to arrange myself with my dracut/systemd combo super hero voodoo abilities (read: I cumbersome tried everything until one thing worked while swearing at my innocent monitor, well sort of, it's powered by the GPU). On every reboot it felt like bcache was replaying cache transactions - but I think this is by design (read: bcache is always dirty, even after a clean shutdown, if using write-back mode) and not part of the problem. -- Replies to list only preferred. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: F21 fails to mount root part, btrfs check: Couldn't open file system
On Tue, Mar 31, 2015 at 4:09 PM, Chris Murphy li...@colorremedies.com wrote: There should be a reference to an rdsosreport.txt in /run/... so find a way to get that posted somewhere. I'll try, but it truly says nothing of interest from a block device / btrfs PoV. I have ample background debugging boot issues, disk corruption, etc from years of work w OLPC. - kernel is 3.1.9-200.fc21 This is probably 3.18.9, which is the current F21 kernel. Correct, thanks. I typo'd that. A failure of the HDD cannot be ruled out, low power conditions, cheap consumer part... Well you have to rule that out before anyone on this list can really help. Try booting Fedora 21 install media, and using smartctl -x on the drive. That's a good idea! I was referring to something else -- I guess what I'm trying to say is: I'm not sure if this scrambled disk partition is a btrfs/kernel bug, or the cheap HDD lied about flushing a write to disk. cheers, m -- martin.langh...@gmail.com - ask interesting questions - don't get distracted with shiny stuff - working code first ~ http://docs.moodle.org/en/User:Martin_Langhoff -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: F21 fails to mount root part, btrfs check: Couldn't open file system
On Tue, Mar 31, 2015 at 4:11 PM, Chris Murphy li...@colorremedies.com wrote: While you're at it, try to mount the Btrfs volume in question normally and report kernel messages. If mount fails, try it with -o recovery mount option, and also report kernel messages and whether that fails. Oh, I should have mentioned this -- in the context of the initramfs/systemd diagnostic shell (which is single-user), it just hangs. No messages. I'll get a bootable usb going and try under that. cheers, m -- martin.langh...@gmail.com - ask interesting questions - don't get distracted with shiny stuff - working code first ~ http://docs.moodle.org/en/User:Martin_Langhoff -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: F21 fails to mount root part, btrfs check: Couldn't open file system
Chris Murphy li...@colorremedies.com schrieb: On Tue, Mar 31, 2015 at 2:09 PM, Chris Murphy li...@colorremedies.com wrote: Well you have to rule that out before anyone on this list can really help. Try booting Fedora 21 install media, and using smartctl -x on the drive. While you're at it, try to mount the Btrfs volume in question normally and report kernel messages. If mount fails, try it with -o recovery mount option, and also report kernel messages and whether that fails. I had this happen, too, lately. It's quite often happening after an unclean shutdown (which currently quite often happend to me due to the xorg intel driver having GPU freezes). SysRq+W shows that the mount process is locked somewhere in the btrfs code path and won't quit if Ctrl+C'd... Only way to fix it was to btrfs-zero-log. But it still took some reboots from initramfs until it successfully mounted again (I could mount it in initramfs right after zero-log but upon reboot it hung again though at a different stage probably). So I guess there's some race on the one hand (happens from time to time non- related to fixing it with zero-log), and a deadlock on the other hand after some unclean shutdowns (more or less random). My setup is 3-device btrfs-mraid1-draid0 on bcache. Bcache wasn't involved in the backtrace of SysRq+W, however. Apparently I don't have a screenshot of it because my smart phone is currently fried... -- Replies to list only preferred. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: F21 fails to mount root part, btrfs check: Couldn't open file system
On Tue, Mar 31, 2015 at 4:15 PM, Chris Murphy li...@colorremedies.com wrote: The i915 regression right now is really annoying. With a Samsung 840 EVO I've had inexplicable and non-deterministic boot failures. Clarification: the boot failures happen following the i915 panic and subsequent forced power off. -- Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: F21 fails to mount root part, btrfs check: Couldn't open file system
Kai Krakow hurikha...@gmail.com schrieb: Chris Murphy li...@colorremedies.com schrieb: On Tue, Mar 31, 2015 at 2:09 PM, Chris Murphy li...@colorremedies.com wrote: Well you have to rule that out before anyone on this list can really help. Try booting Fedora 21 install media, and using smartctl -x on the drive. While you're at it, try to mount the Btrfs volume in question normally and report kernel messages. If mount fails, try it with -o recovery mount option, and also report kernel messages and whether that fails. I had this happen, too, lately. It's quite often happening after an unclean shutdown (which currently quite often happend to me due to the xorg intel driver having GPU freezes). SysRq+W shows that the mount process is locked somewhere in the btrfs code path and won't quit if Ctrl+C'd... Only way to fix it was to btrfs-zero-log. But it still took some reboots from initramfs until it successfully mounted again (I could mount it in initramfs right after zero-log but upon reboot it hung again though at a different stage probably). So I guess there's some race on the one hand (happens from time to time non- related to fixing it with zero-log), and a deadlock on the other hand after some unclean shutdowns (more or less random). My setup is 3-device btrfs-mraid1-draid0 on bcache. Bcache wasn't involved in the backtrace of SysRq+W, however. Apparently I don't have a screenshot of it because my smart phone is currently fried... BTW: I tried all kernels from current 3.19.x back to 3.18.0 which still live on my boot partition - each with the same result and very similar backtrace (SysRq+W)... -- Replies to list only preferred. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: F21 fails to mount root part, btrfs check: Couldn't open file system
On Tue, Mar 31, 2015 at 2:39 PM, Matt Grant matt.gr...@foodstuffs-si.co.nz wrote: Seen this before at home. You have to mount -o recovery off a 3.19 kernel to fix it... If you can get the SSD out, attach it to a desktop, as there will be no Install CDs using 3.19 yet. Fedora 22 Workstation alpha has 4.0.0 (rc1 I think), and the current TC6 beta has 4.0.0-rc4. It's possible to use the netinstall, which is much smaller, and use boot param single or rescue to avoid the installer launching. And actually 3.19.2 is the stable kernel for Fedora 21, with 3.19.3 just pushed today (take mirrors a day or two to catch up), not 3.18.9 as I reported earlier. -- Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: F21 fails to mount root part, btrfs check: Couldn't open file system
On Tue, Mar 31, 2015 at 3:45 PM, Kai Krakow hurikha...@gmail.com wrote: I had this happen, too, lately. It's quite often happening after an unclean shutdown (which currently quite often happend to me due to the xorg intel driver having GPU freezes). SysRq+W shows that the mount process is locked somewhere in the btrfs code path and won't quit if Ctrl+C'd... Only way to fix it was to btrfs-zero-log. The i915 regression right now is really annoying. With a Samsung 840 EVO I've had inexplicable and non-deterministic boot failures. When running btrfs check from the initramfs (booting with rd.break=pre-mount) I get a very long pile of complaints... minutes of scrolling text of horrible sounding problems. Yet the same btrfs-progs and the same kernel from Fedora 22 install media, zero complaints, and mounts fine. So I have no idea what's going on right now. It even corrupts the EFI System partition, these crashes. -- Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: F21 fails to mount root part, btrfs check: Couldn't open file system
On Tue, Mar 31, 2015 at 2:09 PM, Chris Murphy li...@colorremedies.com wrote: Well you have to rule that out before anyone on this list can really help. Try booting Fedora 21 install media, and using smartctl -x on the drive. While you're at it, try to mount the Btrfs volume in question normally and report kernel messages. If mount fails, try it with -o recovery mount option, and also report kernel messages and whether that fails. -- Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: F21 fails to mount root part, btrfs check: Couldn't open file system
On Tue, Mar 31, 2015 at 12:55 PM, Martin Langhoff martin.langh...@gmail.com wrote: Hi BTRFS folks, one of my dev boxes is a Thinkpad x220, with a single hybrid (HDD+Flash) disk, running F21 with BTRFS partitions for /home and / . After losing power (ran out of battery, possibly while trying to hibernate) -- the system will not boot. The initrd breaks out to a shell where I find that the partition holding / is failing to mount. There should be a reference to an rdsosreport.txt in /run/... so find a way to get that posted somewhere. - kernel is 3.1.9-200.fc21 This is probably 3.18.9, which is the current F21 kernel. A failure of the HDD cannot be ruled out, low power conditions, cheap consumer part... Well you have to rule that out before anyone on this list can really help. Try booting Fedora 21 install media, and using smartctl -x on the drive. -- Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html