Re: F21 fails to mount root part, btrfs check: Couldn't open file system

2015-04-01 Thread Martin Langhoff
On Tue, Mar 31, 2015 at 4:09 PM, Chris Murphy li...@colorremedies.com wrote:
A failure of the
 HDD cannot be ruled out, low power conditions, cheap consumer part...

 Well you have to rule that out before anyone on this list can really
 help. Try booting Fedora 21 install media, and using smartctl -x on
 the drive.

smartctl thinks the drive is ok. Unfortunately, it doesn't have a
truth serum to distinguish whether this drive lies about writes or
not...

[root@localhost liveuser]# smartctl -x /dev/sda
smartctl 6.2 2014-07-16 r3952 [x86_64-linux-3.17.4-301.fc21.x86_64]
(local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Seagate Laptop SSHD
Device Model: ST500LM000-1EJ162
Serial Number:W3709VQD
LU WWN Device Id: 5 000c50 069f901e9
Firmware Version: SM14
User Capacity:500,107,862,016 bytes [500 GB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate:5400 rpm
Device is:In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:Wed Apr  1 11:44:43 2015 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM level is: 128 (minimum power consumption without standby)
Rd look-ahead is: Enabled
Write cache is:   Enabled
ATA Security is:  Disabled, frozen [SEC2]
Write SCT (Get) XXX Error Recovery Control Command failed: scsi error
aborted command
Wt Cache Reorder: N/A

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status:  (   0)The previous self-test
routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (  139) seconds.
Offline data collection
capabilities:  (0x73) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities:(0x0003)Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability:(0x01)Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time:  (   1) minutes.
Extended self-test routine
recommended polling time:  (  98) minutes.
Conveyance self-test routine
recommended polling time:  (   2) minutes.
SCT capabilities:(0x1081)SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAGSVALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate POSR--   112   099   006-46707576
  3 Spin_Up_TimePO   099   099   000-0
  4 Start_Stop_Count-O--CK   100   100   020-147
  5 Reallocated_Sector_Ct   PO--CK   100   100   010-0
  7 Seek_Error_Rate POSR--   078   060   030-65832005
  9 Power_On_Hours  -O--CK   092   092   000-7775
 10 Spin_Retry_CountPO--C-   100   100   097-0
 12 Power_Cycle_Count   -O--CK   100   100   020-159
184 End-to-End_Error-O--CK   100   100   099-0
187 Reported_Uncorrect  -O--CK   100   100   000-0
188 Command_Timeout -O--CK   100   100   000-1
189 High_Fly_Writes -O-RCK   095   095   000-5
190 Airflow_Temperature_Cel -O---K   070   058   045-30 (Min/Max 27/31)
191 G-Sense_Error_Rate  -O--CK   100   100   000-0
192 Power-Off_Retract_Count -O--CK   100   100   000-25
193 Load_Cycle_Count-O--CK   066   066   000-68484
194 Temperature_Celsius -O---K   030   042   000-30 (0 16 0 0 0)
197 Current_Pending_Sector  -O--C-   100   100   000-0
198 Offline_Uncorrectable   C-   100   100   000-0
199 UDMA_CRC_Error_Count-OSRCK   200   200   000-0
254 Free_Fall_Sensor-O--CK   100   100   000-0
||_ K auto-keep
|__ C event count
___ R error rate
||| S speed/performance
||_ O updated online

Re: F21 fails to mount root part, btrfs check: Couldn't open file system

2015-04-01 Thread Martin Langhoff
Hi Chris, list,

thanks for your debugging ideas so far. Now this gets interesting. I
booted off a LiveUSB disk, and it just mounted sysroot. WTH?

See below. Perhaps the newer kernel (in latest F21) has regressed in
handling some kinds of errors during mount, or the dracut/systemd
mounting process is less resilient than mounting under a fully booted
system?


[root@localhost liveuser]# uname -a
Linux localhost 3.17.4-301.fc21.x86_64 #1 SMP Thu Nov 27 19:09:10 UTC
2014 x86_64 x86_64 x86_64 GNU/Linux

## Before booting into liveUSB, I made a copy of
## rdsosreport.txt in the /home partition
## which is a separate btrfs fs, and seems
## to not be affected by the problem at all
[root@localhost liveuser]# mkdir /myhome
[root@localhost liveuser]# mkdir /mysysroot
[root@localhost liveuser]# mount /dev/sda2 /myhome
[root@localhost liveuser]# ls /myhome
home  rdsosreport.txt

[root@localhost liveuser]# fpaste  /myhome/rdsosreport.txt
Uploading (93.4KiB)...
http://ur1.ca/k2zue - http://paste.fedoraproject.org/205971/01928142

 Strange, on first book from live USB F21 image, it just mounts
## (I tried about half a dozen cold boots earlier -- all resulting in
##  the same initramfs/dracut/systemd emergency shell...)
[root@localhost liveuser]# mount /dev/sda6 /mysysroot

Apr 01 11:26:51 localhost kernel: BTRFS info (device sda6): disk space
caching is enabled
Apr 01 11:26:56 localhost kernel: BTRFS: checking UUID tree
Apr 01 11:26:56 localhost kernel: SELinux: initialized (dev sda6, type
btrfs), uses xattr

[root@localhost liveuser]# ls /mysysroot
root
[root@localhost liveuser]# ls /mysysroot/root
bin  boot  dev  etc  home  lib  lib64  media  mnt  opt  proc  root
run  sbin  srv  sys  sysroot  tmp  usr  var

[root@localhost liveuser]# umount /dev/sda6
[root@localhost liveuser]# btrfs check /dev/sda6
Checking filesystem on /dev/sda6
UUID: 94637b35-a294-4be2-aa47-82c52d6d53ef
checking extents
checking free space cache
checking fs roots
root 256 inode 39841 errors 400, nbytes wrong
found 7747100703 bytes used err is 1
total csum bytes: 11912932
total tree bytes: 476725248
total fs tree bytes: 434733056
total extent tree bytes: 22986752
btree space waste bytes: 83962424
file data blocks allocated: 30820143104
 referenced 11997040640
Btrfs v3.17

[root@localhost liveuser]# btrfs check --repair /dev/sda6
enabling repair mode
Fixed 0 roots.
Checking filesystem on /dev/sda6
UUID: 94637b35-a294-4be2-aa47-82c52d6d53ef
checking extents
checking free space cache
cache and super generation don't match, space cache will be invalidated
checking fs roots
root 256 inode 39841 errors 400, nbytes wrong
found 7747100703 bytes used err is 1
total csum bytes: 11912932
total tree bytes: 476725248
total fs tree bytes: 434733056
total extent tree bytes: 22986752
btree space waste bytes: 83962424
file data blocks allocated: 30820143104
 referenced 11997040640
Btrfs v3.17



EOM

cheers,



martin
-- 
 martin.langh...@gmail.com
 -  ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 ~ http://docs.moodle.org/en/User:Martin_Langhoff
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: F21 fails to mount root part, btrfs check: Couldn't open file system

2015-04-01 Thread Martin Langhoff
On Wed, Apr 1, 2015 at 11:42 AM, Martin Langhoff
martin.langh...@gmail.com wrote:
 See below. Perhaps the newer kernel (in latest F21) has regressed in
 handling some kinds of errors during mount, or the dracut/systemd
 mounting process is less resilient than mounting under a fully booted
 system?

This is getting even more interesting.

Under 3.17.4-301.fc21.x86 from LiveUSB, I could mount, even repair the disk.

Since the repair, the on-disk latest kernel (3.18.9-200.fc21) tries to
boot, but dracut/systemd time out on mounting sysroot after waiting
for quite a while. I don't get a dracut shell anymore so the failure
mode has changed. I may try to set a breakpoint to force a shell.

I do have an earlier F21 kernel on disk-- 3.18.7-200.fc21 -- and this
boots the system without a glitch. After a complete boot with
3.18.7.200, clean shutdown and booting into 3.18.9-200 is still
broken, same failure mode.

Will try to capture some info from a dracut breakpoint  (I'll try
mount). At this point this really looks like a regression.

cheers,



martin
-- 
 martin.langh...@gmail.com
 -  ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 ~ http://docs.moodle.org/en/User:Martin_Langhoff
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: F21 fails to mount root part, btrfs check: Couldn't open file system

2015-04-01 Thread Chris Murphy
Whenever I have these boot problems, I'm noticing that sometimes the
device, /dev/sda5, is showing up with lsblk (libblkid) as
/dev/block/8:5 while everything else (not-Btrfs) on that device shows
up as /dev/sdaX. Does anyone know what that might mean?


Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: F21 fails to mount root part, btrfs check: Couldn't open file system

2015-04-01 Thread Martin Langhoff
On Wed, Apr 1, 2015 at 2:04 PM, Chris Murphy li...@colorremedies.com wrote:
 When I had this same btrfs check error, it was the exact inode number
 and same /etc/shadow file. I didn't diff the two shadow files, but I

That's too bizarre for words. Two folks, on two different systems,
getting btrfs problems on similar kernels on the exact same filepath.
In my case, the file was last frobbed by yum/rpm. Do we have a strange
interaction between a kernel regression and yum/rpm rubbing the
filesystem the wrong way?

BTW, I did not change/touch the file at all. My only fix action was
the btrfs check --repair mentioned earlier. Right now, on the booted
system I did

# uname -a
Linux tp-martin.remote-learner.net 3.18.9-200.fc21.x86_64 #1 SMP Mon
Mar 9 15:10:50 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
# btrfs scrub start -BrR   /
scrub done for 94637b35-a294-4be2-aa47-82c52d6d53ef
scrub started at Wed Apr  1 13:46:20 2015 and finished after 266 seconds
data_extents_scrubbed: 344155
tree_extents_scrubbed: 58048
data_bytes_scrubbed: 11896840192
tree_bytes_scrubbed: 951058432
read_errors: 0
csum_errors: 0
verify_errors: 0
no_csum: 20268
csum_discards: 254459
super_errors: 0
malloc_errors: 0
uncorrectable_errors: 0
unverified_errors: 0
corrected_errors: 0
last_physical: 23928504320

cheers,



m
-- 
 martin.langh...@gmail.com
 -  ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 ~ http://docs.moodle.org/en/User:Martin_Langhoff
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: F21 fails to mount root part, btrfs check: Couldn't open file system

2015-04-01 Thread Chris Murphy
On Wed, Apr 1, 2015 at 12:16 PM, Martin Langhoff
martin.langh...@gmail.com wrote:
 On Wed, Apr 1, 2015 at 2:04 PM, Chris Murphy li...@colorremedies.com wrote:
 When I had this same btrfs check error, it was the exact inode number
 and same /etc/shadow file. I didn't diff the two shadow files, but I

 That's too bizarre for words. Two folks, on two different systems,
 getting btrfs problems on similar kernels on the exact same filepath.
 In my case, the file was last frobbed by yum/rpm. Do we have a strange
 interaction between a kernel regression and yum/rpm rubbing the
 filesystem the wrong way?

No idea, but it happened to me more than once, same inode number, same file.



 BTW, I did not change/touch the file at all. My only fix action was
 the btrfs check --repair mentioned earlier.

That won't fix it. Once errors 400 appears, at this point you have to
replace the affected file.





-- 
Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: F21 fails to mount root part, btrfs check: Couldn't open file system

2015-04-01 Thread Martin Langhoff
On Wed, Apr 1, 2015 at 2:20 PM, Chris Murphy li...@colorremedies.com wrote:
 That won't fix it. Once errors 400 appears, at this point you have to
 replace the affected file.

Interesting.

Right now I am booting without problems. I have no evidence of
continued problems. What would I do to check whether I see an error
similar to yours on this fs?

Trying to ascertain whether my fs is cured, and whether we can learn
something else about this oddity...

cheers,


m
-- 
 martin.langh...@gmail.com
 -  ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 ~ http://docs.moodle.org/en/User:Martin_Langhoff
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: F21 fails to mount root part, btrfs check: Couldn't open file system

2015-04-01 Thread Chris Murphy
On Wed, Apr 1, 2015 at 9:42 AM, Martin Langhoff
martin.langh...@gmail.com wrote:
# mount /dev/sda6 /mysysroot

 Apr 01 11:26:51 localhost kernel: BTRFS info (device sda6): disk space
 caching is enabled
 Apr 01 11:26:56 localhost kernel: BTRFS: checking UUID tree
 Apr 01 11:26:56 localhost kernel: SELinux: initialized (dev sda6, type
 btrfs), uses xattr

Right so it mounts fine with no errors from live media, but won't
mount at boot time. Same problem I was having.


# btrfs check /dev/sda6
 Checking filesystem on /dev/sda6
 UUID: 94637b35-a294-4be2-aa47-82c52d6d53ef
 checking extents
 checking free space cache
 checking fs roots
 root 256 inode 39841 errors 400, nbytes wrong

mount /dev/sda6 /mnt
btrfs inspect-internal inode-resolve 39841 /mnt

It should resolve a path to file for that inode. Chances are you can
just use cp to make a new copy of it, delete the original, and rename
the copy to match the original file name. Unmount. And now the btrfs
check error won't happen.



 [root@localhost liveuser]# btrfs check --repair /dev/sda6
 enabling repair mode
 Fixed 0 roots.
 Checking filesystem on /dev/sda6
 UUID: 94637b35-a294-4be2-aa47-82c52d6d53ef
 checking extents
 checking free space cache
 cache and super generation don't match, space cache will be invalidated
 checking fs roots
 root 256 inode 39841 errors 400, nbytes wrong
 found 7747100703 bytes used err is 1
 total csum bytes: 11912932
 total tree bytes: 476725248
 total fs tree bytes: 434733056
 total extent tree bytes: 22986752
 btree space waste bytes: 83962424
 file data blocks allocated: 30820143104
  referenced 11997040640
 Btrfs v3.17

Yeah I don't know what this errors 400 nbytes wrong means, but at the
moment btrfs-progs doesn't fix it.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: F21 fails to mount root part, btrfs check: Couldn't open file system

2015-04-01 Thread Chris Murphy
On Wed, Apr 1, 2015 at 10:15 AM, Martin Langhoff
martin.langh...@gmail.com wrote:
 On Wed, Apr 1, 2015 at 11:42 AM, Martin Langhoff
 martin.langh...@gmail.com wrote:
 See below. Perhaps the newer kernel (in latest F21) has regressed in
 handling some kinds of errors during mount, or the dracut/systemd
 mounting process is less resilient than mounting under a fully booted
 system?

 This is getting even more interesting.

 Under 3.17.4-301.fc21.x86 from LiveUSB, I could mount, even repair the disk.

 Since the repair, the on-disk latest kernel (3.18.9-200.fc21) tries to
 boot, but dracut/systemd time out on mounting sysroot after waiting
 for quite a while. I don't get a dracut shell anymore so the failure
 mode has changed. I may try to set a breakpoint to force a shell.

 I do have an earlier F21 kernel on disk-- 3.18.7-200.fc21 -- and this
 boots the system without a glitch. After a complete boot with
 3.18.7.200, clean shutdown and booting into 3.18.9-200 is still
 broken, same failure mode.

 Will try to capture some info from a dracut breakpoint  (I'll try
 mount). At this point this really looks like a regression.

Yeah I don't know what's going on, but with a new file system, and
disabled i915 to avoid crashes, and thus no crashes since the new fs
was created, I get boot failure with 3.19.3 but not 3.19.2, and I
can't figure out why. I get the systemd cylon eye with 5 services
pending so I can't actually tell which one it's hung up on, but one of
them is looking for the fs volume UUID and apparently can't find it
which is completely bogus.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: F21 fails to mount root part, btrfs check: Couldn't open file system

2015-04-01 Thread Martin Langhoff
On Wed, Apr 1, 2015 at 1:03 PM, Chris Murphy li...@colorremedies.com wrote:
 mount /dev/sda6 /mnt
 btrfs inspect-internal inode-resolve 39841 /mnt

on the booted system...
# uname -a
Linux tp-martin.remote-learner.net 3.18.9-200.fc21.x86_64 #1 SMP Mon
Mar 9 15:10:50 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
# btrfs inspect-internal inode-resolve 39841 /
//etc/shadow-
# diff -u /etc/shadow{,-}
--- /etc/shadow 2015-03-04 02:26:59.478255332 -0500
+++ /etc/shadow-2015-03-04 02:26:59.0 -0500
@@ -42,4 +42,3 @@
 systemd-timesync:!!:16498::
 systemd-network:!!:16498::
 systemd-resolve:!!:16498::
-systemd-bus-proxy:!!:16498::

Bizarre.

cheers,



m
-- 
 martin.langh...@gmail.com
 -  ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 ~ http://docs.moodle.org/en/User:Martin_Langhoff
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: F21 fails to mount root part, btrfs check: Couldn't open file system

2015-04-01 Thread Chris Murphy
On Wed, Apr 1, 2015 at 12:29 PM, Martin Langhoff
martin.langh...@gmail.com wrote:
 On Wed, Apr 1, 2015 at 2:20 PM, Chris Murphy li...@colorremedies.com wrote:
 That won't fix it. Once errors 400 appears, at this point you have to
 replace the affected file.

 Interesting.

 Right now I am booting without problems. I have no evidence of
 continued problems. What would I do to check whether I see an error
 similar to yours on this fs?

 Trying to ascertain whether my fs is cured, and whether we can learn
 something else about this oddity...

Re-run the btrfs check. The error is still there even after a --repair.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: F21 fails to mount root part, btrfs check: Couldn't open file system

2015-04-01 Thread Martin Langhoff
On Wed, Apr 1, 2015 at 2:54 PM, Chris Murphy li...@colorremedies.com wrote:
 Re-run the btrfs check. The error is still there even after a --repair.

Bingo! You are right the error persists.

It has no effect on my use of the system right now. Is anyone
interested in debugging this further?

cheers,



martin
-- 
 martin.langh...@gmail.com
 -  ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 ~ http://docs.moodle.org/en/User:Martin_Langhoff
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: F21 fails to mount root part, btrfs check: Couldn't open file system

2015-04-01 Thread Chris Murphy
On Wed, Apr 1, 2015 at 11:26 AM, Martin Langhoff
martin.langh...@gmail.com wrote:
 On Wed, Apr 1, 2015 at 1:03 PM, Chris Murphy li...@colorremedies.com wrote:
 mount /dev/sda6 /mnt
 btrfs inspect-internal inode-resolve 39841 /mnt

 on the booted system...
 # uname -a
 Linux tp-martin.remote-learner.net 3.18.9-200.fc21.x86_64 #1 SMP Mon
 Mar 9 15:10:50 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
 # btrfs inspect-internal inode-resolve 39841 /
 //etc/shadow-
 # diff -u /etc/shadow{,-}
 --- /etc/shadow 2015-03-04 02:26:59.478255332 -0500
 +++ /etc/shadow-2015-03-04 02:26:59.0 -0500
 @@ -42,4 +42,3 @@
  systemd-timesync:!!:16498::
  systemd-network:!!:16498::
  systemd-resolve:!!:16498::
 -systemd-bus-proxy:!!:16498::

 Bizarre.

When I had this same btrfs check error, it was the exact inode number
and same /etc/shadow file. I didn't diff the two shadow files, but I
the the cp mv rm routine, and then the system booted. Goofy cakes.
It's almost like an April Fools joke.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: F21 fails to mount root part, btrfs check: Couldn't open file system

2015-04-01 Thread Chris Murphy
Related bugs:

https://bugzilla.kernel.org/show_bug.cgi?id=68411
https://bugzilla.redhat.com/show_bug.cgi?id=1037963

The RHBZ one also mentioned the shadow file.

Anyway, it seems to be a somewhat known problem, but it's just not
known yet what causes it.

Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: F21 fails to mount root part, btrfs check: Couldn't open file system

2015-04-01 Thread Chris Murphy
On Wed, Apr 1, 2015 at 1:23 PM, Martin Langhoff
martin.langh...@gmail.com wrote:
 On Wed, Apr 1, 2015 at 2:54 PM, Chris Murphy li...@colorremedies.com wrote:
 Re-run the btrfs check. The error is still there even after a --repair.

 Bingo! You are right the error persists.

 It has no effect on my use of the system right now. Is anyone
 interested in debugging this further?

400 errors, nbytes wrong, isn't repaired by current btrfs check
https://bugzilla.kernel.org/show_bug.cgi?id=90071

What's interesting in that bug report that I'd forgotten about?

# btrfs inspect inode 804 /mnt/root
/mnt/root/etc/shadow-

Different inode number, but the shadow file is affected. In every
single case I've had now (about 1/2 dozen) with this errors 400
message, it's involved the shadow file. I have no idea what's going on
between Btrfs and the shadow file, but something seems to be. Or it's
quite a coincidence.




-- 
Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: F21 fails to mount root part, btrfs check: Couldn't open file system

2015-03-31 Thread Chris Murphy
On Tue, Mar 31, 2015 at 5:54 PM, Martin Langhoff
martin.langh...@gmail.com wrote:
 On Tue, Mar 31, 2015 at 4:09 PM, Chris Murphy li...@colorremedies.com wrote:
 There should be a reference to an rdsosreport.txt in /run/... so find
 a way to get that posted somewhere.

 I'll try, but it truly says nothing of interest from a block device /
 btrfs PoV. I have ample background debugging boot issues, disk
 corruption, etc from years of work w OLPC.

If there is no reference in this dracut shell to rdsosreport.txt, then use:
journalctl -b -l -o short-monotonic

You can mount anything at /sysroot including the boot partition if you
want, or a USB stick. The usual directories aren't available in the
initramfs before switchroot happens.


 That's a good idea! I was referring to something else -- I guess what
 I'm trying to say is: I'm not sure if this scrambled disk partition is
 a btrfs/kernel bug, or the cheap HDD lied about flushing a write to
 disk.

This is the realm of both esoteric knowledge and an active area of
research how to get reliable information about what happened when the
power cut out. So you're not the only one not sure.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: F21 fails to mount root part, btrfs check: Couldn't open file system

2015-03-31 Thread Kai Krakow
Chris Murphy li...@colorremedies.com schrieb:

 On Tue, Mar 31, 2015 at 4:15 PM, Chris Murphy li...@colorremedies.com
 wrote:
 
 The i915 regression right now is really annoying. With a Samsung 840
 EVO I've  had inexplicable and non-deterministic boot failures.
 
 Clarification: the boot failures happen following the i915 panic and
 subsequent forced power off.

Yeah I thought that, too, because after hitting reset it looked like one 
hard disk didn't appear in dmesg and thus btrfs didn't mount (btrfs-raid). 
So I turned the machine off completely because I had similar issues with 
i915 freezes and strange boot issues during the following boot before. It 
looks like the GPU is not necessarily completely reset when hitting the 
reset button. But that's another story.

In my case the hard disk was there - I didn't just scan hard enough through 
the huge pile of logs. I had to btrfs-zero-log, wrote reboot into the 
rescue shell, kernel came back, mount still locking up and sitting there 
until systemd decided to throw me to emergency after 5 minutes of waiting or 
so. I've rebooted again, machine came up. This was a few reboots after the 
machine was powered off, so I'd rule any GPU freeze artifacts out here. I 
just needed multiple reboots to arrange myself with my dracut/systemd combo 
super hero voodoo abilities (read: I cumbersome tried everything until one 
thing worked while swearing at my innocent monitor, well sort of, it's 
powered by the GPU).

On every reboot it felt like bcache was replaying cache transactions - but I 
think this is by design (read: bcache is always dirty, even after a clean 
shutdown, if using write-back mode) and not part of the problem.

-- 
Replies to list only preferred.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: F21 fails to mount root part, btrfs check: Couldn't open file system

2015-03-31 Thread Martin Langhoff
On Tue, Mar 31, 2015 at 4:09 PM, Chris Murphy li...@colorremedies.com wrote:
 There should be a reference to an rdsosreport.txt in /run/... so find
 a way to get that posted somewhere.

I'll try, but it truly says nothing of interest from a block device /
btrfs PoV. I have ample background debugging boot issues, disk
corruption, etc from years of work w OLPC.

  - kernel is 3.1.9-200.fc21

 This is probably 3.18.9, which is the current F21 kernel.

Correct, thanks. I typo'd that.

A failure of the
 HDD cannot be ruled out, low power conditions, cheap consumer part...

 Well you have to rule that out before anyone on this list can really
 help. Try booting Fedora 21 install media, and using smartctl -x on
 the drive.

That's a good idea! I was referring to something else -- I guess what
I'm trying to say is: I'm not sure if this scrambled disk partition is
a btrfs/kernel bug, or the cheap HDD lied about flushing a write to
disk.

cheers,


m
-- 
 martin.langh...@gmail.com
 -  ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 ~ http://docs.moodle.org/en/User:Martin_Langhoff
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: F21 fails to mount root part, btrfs check: Couldn't open file system

2015-03-31 Thread Martin Langhoff
On Tue, Mar 31, 2015 at 4:11 PM, Chris Murphy li...@colorremedies.com wrote:
 While you're at it, try to mount the Btrfs volume in question normally
 and report kernel messages. If mount fails, try it with -o recovery
 mount option, and also report kernel messages and whether that fails.

Oh, I should have mentioned this -- in the context of the
initramfs/systemd diagnostic shell (which is single-user), it just
hangs. No messages.

I'll get a bootable usb going and try under that.

cheers,



m
-- 
 martin.langh...@gmail.com
 -  ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 ~ http://docs.moodle.org/en/User:Martin_Langhoff
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: F21 fails to mount root part, btrfs check: Couldn't open file system

2015-03-31 Thread Kai Krakow
Chris Murphy li...@colorremedies.com schrieb:

 On Tue, Mar 31, 2015 at 2:09 PM, Chris Murphy li...@colorremedies.com
 wrote:
 
 Well you have to rule that out before anyone on this list can really
 help. Try booting Fedora 21 install media, and using smartctl -x on
 the drive.
 
 While you're at it, try to mount the Btrfs volume in question normally
 and report kernel messages. If mount fails, try it with -o recovery
 mount option, and also report kernel messages and whether that fails.

I had this happen, too, lately. It's quite often happening after an unclean 
shutdown (which currently quite often happend to me due to the xorg intel 
driver having GPU freezes). SysRq+W shows that the mount process is locked 
somewhere in the btrfs code path and won't quit if Ctrl+C'd...

Only way to fix it was to btrfs-zero-log. But it still took some reboots 
from initramfs until it successfully mounted again (I could mount it in 
initramfs right after zero-log but upon reboot it hung again though at a 
different stage probably).

So I guess there's some race on the one hand (happens from time to time non-
related to fixing it with zero-log), and a deadlock on the other hand after 
some unclean shutdowns (more or less random).

My setup is 3-device btrfs-mraid1-draid0 on bcache. Bcache wasn't involved 
in the backtrace of SysRq+W, however. Apparently I don't have a screenshot 
of it because my smart phone is currently fried...

-- 
Replies to list only preferred.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: F21 fails to mount root part, btrfs check: Couldn't open file system

2015-03-31 Thread Chris Murphy
On Tue, Mar 31, 2015 at 4:15 PM, Chris Murphy li...@colorremedies.com wrote:

 The i915 regression right now is really annoying. With a Samsung 840
 EVO I've  had inexplicable and non-deterministic boot failures.

Clarification: the boot failures happen following the i915 panic and
subsequent forced power off.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: F21 fails to mount root part, btrfs check: Couldn't open file system

2015-03-31 Thread Kai Krakow
Kai Krakow hurikha...@gmail.com schrieb:

 Chris Murphy li...@colorremedies.com schrieb:
 
 On Tue, Mar 31, 2015 at 2:09 PM, Chris Murphy li...@colorremedies.com
 wrote:
 
 Well you have to rule that out before anyone on this list can really
 help. Try booting Fedora 21 install media, and using smartctl -x on
 the drive.
 
 While you're at it, try to mount the Btrfs volume in question normally
 and report kernel messages. If mount fails, try it with -o recovery
 mount option, and also report kernel messages and whether that fails.
 
 I had this happen, too, lately. It's quite often happening after an
 unclean shutdown (which currently quite often happend to me due to the
 xorg intel driver having GPU freezes). SysRq+W shows that the mount
 process is locked somewhere in the btrfs code path and won't quit if
 Ctrl+C'd...
 
 Only way to fix it was to btrfs-zero-log. But it still took some reboots
 from initramfs until it successfully mounted again (I could mount it in
 initramfs right after zero-log but upon reboot it hung again though at a
 different stage probably).
 
 So I guess there's some race on the one hand (happens from time to time
 non- related to fixing it with zero-log), and a deadlock on the other hand
 after some unclean shutdowns (more or less random).
 
 My setup is 3-device btrfs-mraid1-draid0 on bcache. Bcache wasn't involved
 in the backtrace of SysRq+W, however. Apparently I don't have a screenshot
 of it because my smart phone is currently fried...

BTW: I tried all kernels from current 3.19.x back to 3.18.0 which still live 
on my boot partition - each with the same result and very similar backtrace 
(SysRq+W)...

-- 
Replies to list only preferred.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: F21 fails to mount root part, btrfs check: Couldn't open file system

2015-03-31 Thread Chris Murphy
On Tue, Mar 31, 2015 at 2:39 PM, Matt Grant
matt.gr...@foodstuffs-si.co.nz wrote:
 Seen this before at home.

 You have to mount -o recovery off a 3.19 kernel to fix it...

 If you can get the SSD out, attach it to a desktop, as there will be no 
 Install CDs using 3.19 yet.

Fedora 22 Workstation alpha has 4.0.0 (rc1 I think), and the current
TC6 beta has 4.0.0-rc4. It's possible to use the netinstall, which is
much smaller, and use boot param single or rescue to avoid the
installer launching.

And actually 3.19.2 is the stable kernel for Fedora 21, with 3.19.3
just pushed today (take mirrors a day or two to catch up), not 3.18.9
as I reported earlier.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: F21 fails to mount root part, btrfs check: Couldn't open file system

2015-03-31 Thread Chris Murphy
On Tue, Mar 31, 2015 at 3:45 PM, Kai Krakow hurikha...@gmail.com wrote:

 I had this happen, too, lately. It's quite often happening after an unclean
 shutdown (which currently quite often happend to me due to the xorg intel
 driver having GPU freezes). SysRq+W shows that the mount process is locked
 somewhere in the btrfs code path and won't quit if Ctrl+C'd...

 Only way to fix it was to btrfs-zero-log.

The i915 regression right now is really annoying. With a Samsung 840
EVO I've  had inexplicable and non-deterministic boot failures. When
running btrfs check from the initramfs  (booting with
rd.break=pre-mount) I get a very long pile of complaints... minutes of
scrolling text of horrible sounding problems. Yet the same btrfs-progs
and the same kernel from Fedora 22 install media, zero complaints, and
mounts fine.

So I have no idea what's going on right now. It even corrupts the EFI
System partition, these crashes.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: F21 fails to mount root part, btrfs check: Couldn't open file system

2015-03-31 Thread Chris Murphy
On Tue, Mar 31, 2015 at 2:09 PM, Chris Murphy li...@colorremedies.com wrote:

 Well you have to rule that out before anyone on this list can really
 help. Try booting Fedora 21 install media, and using smartctl -x on
 the drive.

While you're at it, try to mount the Btrfs volume in question normally
and report kernel messages. If mount fails, try it with -o recovery
mount option, and also report kernel messages and whether that fails.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: F21 fails to mount root part, btrfs check: Couldn't open file system

2015-03-31 Thread Chris Murphy
On Tue, Mar 31, 2015 at 12:55 PM, Martin Langhoff
martin.langh...@gmail.com wrote:
 Hi BTRFS folks,

 one of my dev boxes is a Thinkpad x220, with a single hybrid
 (HDD+Flash) disk, running F21 with BTRFS partitions for /home and / .

 After losing power (ran out of battery, possibly while trying to
 hibernate) -- the system will not boot. The initrd breaks out to a
 shell where I find that the partition holding / is failing to mount.

There should be a reference to an rdsosreport.txt in /run/... so find
a way to get that posted somewhere.

  - kernel is 3.1.9-200.fc21

This is probably 3.18.9, which is the current F21 kernel.

A failure of the
 HDD cannot be ruled out, low power conditions, cheap consumer part...

Well you have to rule that out before anyone on this list can really
help. Try booting Fedora 21 install media, and using smartctl -x on
the drive.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html