Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots

2014-12-10 Thread Phillip Susi
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 12/9/2014 10:10 PM, Anand Jain wrote:
 In the test case provided earlier who is triggering the scan ? 
 grub-probe ?

The scan is initiated by udev.  grub-probe only comes into it because
it is looking to /proc/mounts to find out what device is mounted, and
/proc/mounts is lieing.

 But we had to revert, Since btrfs bug become a feature for the
 system boot process and fixing that breaks mount at boot with
 subvol.

How is this?  Also are we talking about updating the cached list of
devices that *can* be mounted, or what device already *is* mounted?  I
can see doing the former, but the latter should never happen.

 if the device is already mounted, just the device path is updated 
 but still the original device will be still in use (bug).

Yep, that is the bug that started all of this.


-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.17 (MingW32)

iQEcBAEBAgAGBQJUiG1MAAoJENRVrw2cjl5Rm0gIAJ6sq72zKSEfCuCjigknx25T
a97wjtMeb+yeaECc5FfwN7Fm454GSSuj6RFCRVjo3sCgJP3sUEH49syJnvW1QiEP
A5ktXfTpz6/zaeP9DbGPDCiVix0RdsJ6bCjP/8InsASueXOENCpxxmblxrbE4Wxj
Mdz8lu9L8G+fc6btbLLb0N4i0clSiImQds90zTQ1cXihJ/4wUIO3qgq+rruSYMqI
A182FS7NTUQrRcJ/rbcha3dCyD/urbCaRTUztMvTnSs3a7hK5p+SBNbfxEORC6ni
HrRMxpOlgHOTMnL3EHw843OuGv0Us3VqVbuPG3K6L4+G4W1sFxgKEAnLvEbjzAI=
=Vpre
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots

2014-12-09 Thread Phillip Susi
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 12/8/2014 5:25 PM, Konstantin wrote:
 
 Phillip Susi schrieb am 08.12.2014 um 15:59:
 The bios does not know or care about partitions.  All you need is
 a
 That's only true for older BIOSs. With current EFI boards they not
 only care but some also mess around with GPT partition tables.

EFI is a whole other beast that we aren't talking about.

 partition table in the MBR and you can install grub there and
 have it boot the system from a mdadm 1.1 or 1.2 format array
 housed in a partition on the rest of the disk.  The only time you
 really *have* to
 I was thinking of this solution as well but as I'm not aware of
 any partitioning tool caring about mdadm metadata so I rejected it.
 It requires a non-standard layout leaving reserved empty spaces for
 mdadm metadata. It's possible but it isn't documented so far I know
 and before losing hours of trying I chose the obvious one.

What on earth are you talking about?  Partitioning tool that cares
about mdadm?  non-standard layout?  I am talking about the bog
standard layout where you create a partition, then use that partition
to build an mdadm array.  mdadm takes care of its own metadata.  There
isn't anything unusual, non obvious, or undocumented here.

 use 0.9 or 1.0 ( and you really should be using 1.0 instead since
 it handles larger arrays and can't be confused vis. whole disk
 vs. partition components ) is if you are running a raid1 on the
 raw disk, with no partition table and then partition inside the
 array instead, and really, you just shouldn't be doing that.
 That's exactly what I want to do - running RAID1 on the whole disk
 as most hardware based RAID systems do. Before that I was running
 RAID on disk partitions for some years but this was quite a pain in
 comparison. Hot(un)plugging a drive brings you a lot of issues with
 failing mdadm commands as they don't like concurrent execution when
 the same physical device is affected. And rebuild of RAID
 partitions is done sequentially with no deterministic order. We
 could talk for hours about that but if interested maybe better in
 private as it is not BTRFS related.

So don't create more than one raid partition on the disk.

 dmraid solves the problem by removing the partitions from the 
 underlying physical device ( /dev/sda ), and only exposing them
 on the array ( /dev/mapper/whatever ).  LVM only has the problem
 when you take a snapshot.  User space tools face the same issue
 and they resolve it by ignoring or deprioritizing the snapshot.
 I don't agree. dmraid and mdraid both remove the partitions. This
 is not a solution BTRFS will still crash the PC using
 /dev/mapper/whatever or whatever device appears in the system
 providing the BTRFS volume.

You just said btrfs will crash by accessing the *correct* volume after
the *incorrect* one has been removed.  You aren't making any sense.
The problem only arises when the same partition is visible on *both*
the raw disk, and the md device.

 Speaking of BTRFS tools, I am still somehow confused that the
 problem confusing or mixing devices happens at all. I don't know
 the metadata of a BTRFS RAID setup but I assume there must be
 something like a drive index in there, as the order of RAID5 drives
 does matter. So having a second device with identical metadata
 should be considered invalid for auto-adding anyway.

Again, the problem is when you first boot up and/or mount the volume.
 Which of the duplicate devices shows up first is indeterminate so
just saying ignore the second one doesn't help.  Even saying well
error out if there are two doesn't help since that leaves open a race
condition where the second volume has not appeared yet at the time you
do the check.


-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.17 (MingW32)

iQEcBAEBAgAGBQJUhx16AAoJENRVrw2cjl5R+IYH/R+ftOiy444+W/K+C0cFKBdi
RlMa2Op9Q0322Rae1IiJvkX/TPUQEnr7sFXcOIhYL9/HKB8zGMr+CQq+9rq8lGdB
QurLcI0MpWbwZZCJCTzrJxRBqqPOXKJ1aU9vWLuuGhS9tCdkfxfy9qcXPnmC2Qta
PfN1Qlr4Invb3Kb/NuB2w7S4nhzYLgBa1KgBDm3EWdCzG03WHMAxwSiBgMvf3nzc
DJ/JMF5TP70760yrlWCvFIa1fgWbGVp7fT9yArDab8N53FYAuE8WIunn+g1hHyue
MTF5ZPhEjVKUVHY1Tl1dqdv0i35TXCbXiVwCwk02veV2+lf95zeNcynmB9kUiSc=
=gvB2
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots

2014-12-09 Thread Anand Jain



On 08/12/2014 22:59, Phillip Susi wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 12/7/2014 7:32 PM, Konstantin wrote:

I'm guessing you are using metadata format 0.9 or 1.0, which put
the metadata at the end of the drive and the filesystem still
starts in sector zero.  1.2 is now the default and would not have
this problem as its metadata is at the start of the disk ( well,
4k from the start ) and the fs starts further down.

I know this and I'm using 0.9 on purpose. I need to boot from
these disks so I can't use 1.2 format as the BIOS wouldn't
recognize the partitions. Having an additional non-RAID disk for
booting introduces a single point of failure which contrary to the
idea of RAID0.


The bios does not know or care about partitions.  All you need is a
partition table in the MBR and you can install grub there and have it
boot the system from a mdadm 1.1 or 1.2 format array housed in a
partition on the rest of the disk.  The only time you really *have* to
use 0.9 or 1.0 ( and you really should be using 1.0 instead since it
handles larger arrays and can't be confused vis. whole disk vs.
partition components ) is if you are running a raid1 on the raw disk,
with no partition table and then partition inside the array instead,
and really, you just shouldn't be doing that.


Anyway, to avoid a futile discussion, mdraid and its format is not
the problem, it is just an example of the problem. Using dm-raid
would do the same trouble, LVM apparently, too. I could think of a
bunch of other cases including the use of hardware based RAID
controllers. OK, it's not the majority's problem, but that's not
the argument to keep a bug/flaw capable of crashing your system.


dmraid solves the problem by removing the partitions from the
underlying physical device ( /dev/sda ), and only exposing them on the
array ( /dev/mapper/whatever ).  LVM only has the problem when you
take a snapshot.  User space tools face the same issue and they
resolve it by ignoring or deprioritizing the snapshot.


As it is a nice feature that the kernel apparently scans for drives
and automatically identifies BTRFS ones, it seems to me that this
feature is useless. When in a live system a BTRFS RAID disk fails,
it is not sufficient to hot-replace it, the kernel will not
automatically rebalance. Commands are still needed for the task as
are with mdraid. So the only point I can see at the moment where
this auto-detect feature makes sense is when mounting the device
for the first time. If I remember the documentation correctly, you
mount one of the RAID devices and the others are automagically
attached as well. But outside of the mount process, what is this
auto-detect used for?

So here a couple of rather simple solutions which, as far as I can
see, could solve the problem:

1. Limit the auto-detect to the mount process and don't do it when
devices are appearing.


 In the test case provided earlier who is triggering the scan ?
 grub-probe ?



2. When a BTRFS device is detected and its metadata is identical to
one already mounted, just ignore it.


 Seems like patch:
   commit b96de000bc8bc9688b3a2abea4332bd57648a49f
   Author: Anand Jain anand.j...@oracle.com
   Date:   Thu Jul 3 18:22:05 2014 +0800

 Btrfs: device_list_add() should not update list when mounted


But we had to revert, Since btrfs bug become a feature for the system 
boot process and fixing that breaks mount at boot with subvol.


 commit 0f23ae74f589304bf33233f85737f4fd368549eb
 Author: Chris Mason c...@fb.com
 Date:   Thu Sep 18 07:49:05 2014 -0700

   Revert Btrfs: device_list_add() should not update list when mounted

 This reverts commit b96de000bc8bc9688b3a2abea4332bd57648a49f.



That doesn't really solve the problem since you can still pick the
wrong one to mount in the first place.


 The question is does both device has same generation number ?
 if not then this fix will take care of picking the device
 with larger generation number it during mount.

commit 77bdae4d136e167bab028cbec58b988f91cf73c0
Author: Anand Jain anand.j...@oracle.com
Date:   Thu Jul 3 18:22:06 2014 +0800

btrfs: check generation as replace duplicates devid+uuid


 Yes if there are two devices with the same
   fsid + devid + uuid + generation

 then it use last probed during mount.
 OR
 if the device is already mounted, just the device path is updated
 but still the original device will be still in use (bug).

Thanks



-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.17 (MingW32)

iQEcBAEBAgAGBQJUhbztAAoJENRVrw2cjl5RomkH/26Q3M6LXVaF0qEcEzFTzGEL
uVAOKBY040Ui5bSK0WQYnH0XtE8vlpLSFHxrRa7Ygpr3jhffSsu6ZsmbOclK64ZA
Z8rNEmRFhOxtFYTcQwcUbeBtXEN3k/5H49JxbjUDItnVPBoeK3n7XG4i1Lap5IdY
GXyLbh7ogqd/p+wX6Om20NkJSx4xzyU85E4ZvDADQA+2RIBaXva5tDPx5/UD4XBQ
h8ai+wS1iC8EySKxwKBEwzwb7+Z6w7nOWO93v/lL34fwTg0OIY9uEfTaAy5KcDjz
z6QXWTmvrbiFpyy/qyGSqBGlPjZ+r98mVEDbYWCVfK8AoD6UmteD7R8WAWkWiWY=
=PJww
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the 

Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots

2014-12-08 Thread Phillip Susi
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 12/7/2014 7:32 PM, Konstantin wrote:
 I'm guessing you are using metadata format 0.9 or 1.0, which put
 the metadata at the end of the drive and the filesystem still
 starts in sector zero.  1.2 is now the default and would not have
 this problem as its metadata is at the start of the disk ( well,
 4k from the start ) and the fs starts further down.
 I know this and I'm using 0.9 on purpose. I need to boot from
 these disks so I can't use 1.2 format as the BIOS wouldn't
 recognize the partitions. Having an additional non-RAID disk for
 booting introduces a single point of failure which contrary to the
 idea of RAID0.

The bios does not know or care about partitions.  All you need is a
partition table in the MBR and you can install grub there and have it
boot the system from a mdadm 1.1 or 1.2 format array housed in a
partition on the rest of the disk.  The only time you really *have* to
use 0.9 or 1.0 ( and you really should be using 1.0 instead since it
handles larger arrays and can't be confused vis. whole disk vs.
partition components ) is if you are running a raid1 on the raw disk,
with no partition table and then partition inside the array instead,
and really, you just shouldn't be doing that.

 Anyway, to avoid a futile discussion, mdraid and its format is not
 the problem, it is just an example of the problem. Using dm-raid
 would do the same trouble, LVM apparently, too. I could think of a
 bunch of other cases including the use of hardware based RAID
 controllers. OK, it's not the majority's problem, but that's not
 the argument to keep a bug/flaw capable of crashing your system.

dmraid solves the problem by removing the partitions from the
underlying physical device ( /dev/sda ), and only exposing them on the
array ( /dev/mapper/whatever ).  LVM only has the problem when you
take a snapshot.  User space tools face the same issue and they
resolve it by ignoring or deprioritizing the snapshot.

 As it is a nice feature that the kernel apparently scans for drives
 and automatically identifies BTRFS ones, it seems to me that this
 feature is useless. When in a live system a BTRFS RAID disk fails,
 it is not sufficient to hot-replace it, the kernel will not
 automatically rebalance. Commands are still needed for the task as
 are with mdraid. So the only point I can see at the moment where
 this auto-detect feature makes sense is when mounting the device
 for the first time. If I remember the documentation correctly, you
 mount one of the RAID devices and the others are automagically
 attached as well. But outside of the mount process, what is this
 auto-detect used for?
 
 So here a couple of rather simple solutions which, as far as I can
 see, could solve the problem:
 
 1. Limit the auto-detect to the mount process and don't do it when 
 devices are appearing.
 
 2. When a BTRFS device is detected and its metadata is identical to
 one already mounted, just ignore it.

That doesn't really solve the problem since you can still pick the
wrong one to mount in the first place.

-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.17 (MingW32)

iQEcBAEBAgAGBQJUhbztAAoJENRVrw2cjl5RomkH/26Q3M6LXVaF0qEcEzFTzGEL
uVAOKBY040Ui5bSK0WQYnH0XtE8vlpLSFHxrRa7Ygpr3jhffSsu6ZsmbOclK64ZA
Z8rNEmRFhOxtFYTcQwcUbeBtXEN3k/5H49JxbjUDItnVPBoeK3n7XG4i1Lap5IdY
GXyLbh7ogqd/p+wX6Om20NkJSx4xzyU85E4ZvDADQA+2RIBaXva5tDPx5/UD4XBQ
h8ai+wS1iC8EySKxwKBEwzwb7+Z6w7nOWO93v/lL34fwTg0OIY9uEfTaAy5KcDjz
z6QXWTmvrbiFpyy/qyGSqBGlPjZ+r98mVEDbYWCVfK8AoD6UmteD7R8WAWkWiWY=
=PJww
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots

2014-12-08 Thread Robert White

On 12/07/2014 04:32 PM, Konstantin wrote:

I know this and I'm using 0.9 on purpose. I need to boot from these
disks so I can't use 1.2 format as the BIOS wouldn't recognize the
partitions. Having an additional non-RAID disk for booting introduces a
single point of failure which contrary to the idea of RAID0.


GRUB2 has raid 1.1 and 1.2 metadata support via the mdraid1x module. LVM 
is also supported. I don't know if a stack of both is supported.


There is, BTW, no such thing as a (commodity) computer without a single 
point of failure in it somewhere. I've watched government contracts 
chase this demon for decades. Be it disk, controller, network card, bus 
chip, cpu or stick-of-ram you've got a single point of failure 
somewhere. Actually you likely have several such points of potential 
failure.


For instance, are you _sure_ your BIOS is going to check the second 
drive if it gets read failure after starting in on your first drive? 
Chances are it won't because that four-hundred bytes-or-so boot loader 
on that first disk has no way to branch back into the bios.


You can waste a lot of your life chasing that ghost and you'll still 
discover you've missed it and have to whip out your backup boot media.


It may well be worth having a second copy of /boot around, but make sure 
you stay out of bandersnatch territory when designing your system. The 
more you over-think the plumbing, the easier it is to stop up the pipes.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots

2014-12-08 Thread Konstantin

Phillip Susi schrieb am 08.12.2014 um 15:59:
 On 12/7/2014 7:32 PM, Konstantin wrote:
  I'm guessing you are using metadata format 0.9 or 1.0, which put
  the metadata at the end of the drive and the filesystem still
  starts in sector zero.  1.2 is now the default and would not have
  this problem as its metadata is at the start of the disk ( well,
  4k from the start ) and the fs starts further down.
  I know this and I'm using 0.9 on purpose. I need to boot from
  these disks so I can't use 1.2 format as the BIOS wouldn't
  recognize the partitions. Having an additional non-RAID disk for
  booting introduces a single point of failure which contrary to the
  idea of RAID0.

 The bios does not know or care about partitions.  All you need is a
That's only true for older BIOSs. With current EFI boards they not only
care but some also mess around with GPT partition tables.
 partition table in the MBR and you can install grub there and have it
 boot the system from a mdadm 1.1 or 1.2 format array housed in a
 partition on the rest of the disk.  The only time you really *have* to
I was thinking of this solution as well but as I'm not aware of any
partitioning tool caring about mdadm metadata so I rejected it. It
requires a non-standard layout leaving reserved empty spaces for mdadm
metadata. It's possible but it isn't documented so far I know and before
losing hours of trying I chose the obvious one.
 use 0.9 or 1.0 ( and you really should be using 1.0 instead since it
 handles larger arrays and can't be confused vis. whole disk vs.
 partition components ) is if you are running a raid1 on the raw disk,
 with no partition table and then partition inside the array instead,
 and really, you just shouldn't be doing that.
That's exactly what I want to do - running RAID1 on the whole disk as
most hardware based RAID systems do. Before that I was running RAID on
disk partitions for some years but this was quite a pain in comparison.
Hot(un)plugging a drive brings you a lot of issues with failing mdadm
commands as they don't like concurrent execution when the same physical
device is affected. And rebuild of RAID partitions is done sequentially
with no deterministic order. We could talk for hours about that but if
interested maybe better in private as it is not BTRFS related.
  Anyway, to avoid a futile discussion, mdraid and its format is not
  the problem, it is just an example of the problem. Using dm-raid
  would do the same trouble, LVM apparently, too. I could think of a
  bunch of other cases including the use of hardware based RAID
  controllers. OK, it's not the majority's problem, but that's not
  the argument to keep a bug/flaw capable of crashing your system.

 dmraid solves the problem by removing the partitions from the
 underlying physical device ( /dev/sda ), and only exposing them on the
 array ( /dev/mapper/whatever ).  LVM only has the problem when you
 take a snapshot.  User space tools face the same issue and they
 resolve it by ignoring or deprioritizing the snapshot.
I don't agree. dmraid and mdraid both remove the partitions. This is not
a solution BTRFS will still crash the PC using /dev/mapper/whatever or
whatever device appears in the system providing the BTRFS volume.
  As it is a nice feature that the kernel apparently scans for drives
  and automatically identifies BTRFS ones, it seems to me that this
  feature is useless. When in a live system a BTRFS RAID disk fails,
  it is not sufficient to hot-replace it, the kernel will not
  automatically rebalance. Commands are still needed for the task as
  are with mdraid. So the only point I can see at the moment where
  this auto-detect feature makes sense is when mounting the device
  for the first time. If I remember the documentation correctly, you
  mount one of the RAID devices and the others are automagically
  attached as well. But outside of the mount process, what is this
  auto-detect used for?

  So here a couple of rather simple solutions which, as far as I can
  see, could solve the problem:

  1. Limit the auto-detect to the mount process and don't do it when
  devices are appearing.

  2. When a BTRFS device is detected and its metadata is identical to
  one already mounted, just ignore it.

 That doesn't really solve the problem since you can still pick the
 wrong one to mount in the first place.
Oh, it does solve the problem, you are are speaking of another problem
which is always there when having several disks in a system. Mounting
the wrong device can happen the case I'm describing if you use UUID,
label or some other metadata related information to mount it. You won't
try do that when you insert a disk you know it has the same metadata. It
will not happen (except user tools outsmart you ;-)) when using the
device name(s). I think it could be expected from a user mounting things
manually to know or learn which device node is which drive. On the other
hand in my case one of the drives is already mounted so getting it

Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots

2014-12-08 Thread Konstantin

Robert White schrieb am 08.12.2014 um 18:20:
 On 12/07/2014 04:32 PM, Konstantin wrote:
 I know this and I'm using 0.9 on purpose. I need to boot from these
 disks so I can't use 1.2 format as the BIOS wouldn't recognize the
 partitions. Having an additional non-RAID disk for booting introduces a
 single point of failure which contrary to the idea of RAID0.

 GRUB2 has raid 1.1 and 1.2 metadata support via the mdraid1x module.
 LVM is also supported. I don't know if a stack of both is supported.

 There is, BTW, no such thing as a (commodity) computer without a
 single point of failure in it somewhere. I've watched government
 contracts chase this demon for decades. Be it disk, controller,
 network card, bus chip, cpu or stick-of-ram you've got a single point
 of failure somewhere. Actually you likely have several such points of
 potential failure.

 For instance, are you _sure_ your BIOS is going to check the second
 drive if it gets read failure after starting in on your first drive?
 Chances are it won't because that four-hundred bytes-or-so boot loader
 on that first disk has no way to branch back into the bios.

 You can waste a lot of your life chasing that ghost and you'll still
 discover you've missed it and have to whip out your backup boot media.

 It may well be worth having a second copy of /boot around, but make
 sure you stay out of bandersnatch territory when designing your
 system. The more you over-think the plumbing, the easier it is to
 stop up the pipes.
You are right, there is as good as always a single point of failure
somewhere, even if it is the power plant providing your electricity ;-).
I should have written introduces an additional single point of failure
to be 100% correct but I thought this was obvious. As I have replaced
dozens of damaged hard disks but only a few CPUs, RAMs etc. it is more
important for me to reduce the most frequent and easy-to-solve points of
failure. For more important systems there are high availability
solutions which alleviate many of the problems you mention of but that's
not the point here when speaking about the major bug in BTRFS which can
make your system crash.


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots

2014-12-08 Thread Robert White

On 12/08/2014 02:38 PM, Konstantin wrote:

For more important systems there are high availability
solutions which alleviate many of the problems you mention of but that's
not the point here when speaking about the major bug in BTRFS which can
make your system crash.


I think you missed the part where I told you that you could use GRUB2 
and then you could use the 1.2 metadata on your raid and then have you 
system work as desired.


Trying to make this all about BTRFS is more than a touch disingenuous as 
you are doing things that can make many systems fail in exactly the same 
way.


Undefined behavior is undefined.

The MDADM people made the latter metadata layouts to address your issue, 
and its up to you to use it. Need it to boot, GRUB2 will boot it, and 
it's up to you to use it.


New software fixes problems evident in the old, but once you decide to 
stick with the old despite the new, your problem becomes uninteresting 
because it was already fixed.


So yes, if you use the woefully out of date metadata and boot loader you 
will have problems. If you use the distro scripts that scan the volumes 
you don't want scanned, you will have problems. People are working on 
making sure that those problems have work arounds. And sometimes the 
work around for doctor, it hurts when I do this is don't do that any 
more.


It is multiplicatively impossible to build BTRFS such that it can dance 
through the entire Cartesian Product of all possible storage management 
solutions. Just as it was impossible for LVM and MDADM before them. If 
your system is layered, _you_ bear the burden of making sure that the 
layers are applied. Each tool is evolving to help you, but its still you 
doing the system design.


GRUB has put in modules for everything you need (so far) to boot. mdadm 
has better signatures if you use them. LVM always had device offsets 
built into its metadata block.


But answering the negative. The sort of question that might be phrased 
how do you know it's _not_ mdadm old style signatures is an unbounded 
coding, not because any one is impossible to code, but because an 
endless stream of possibilities is coming in the pipe. A striped storage 
controller might make a system look like /dev/sdb is a stand-alone BTRFS 
file system if the controller doesn't start and the mdadm and lvm 
signatures are on /dev/sda and take up just the right amount of room.


If I do an mkfs.ext2 on a media, then do a cryptsetup luksCreate on that 
same media, I can mount it either way, with disastrous consequences for 
the other semantic layout.


The bad combinations available are virtually limitless.

There comes a point where the System Architect that decided how to build 
the individual system has to take responsibility for his actions.


Note that the same it didn't protect me errors can happen _easily_ 
with other filesystems. Try building an NTFS on a disk, then build an 
ext4 on the same disk then mount as either or both. (though now days you 
may need to build the ext4 then the NTFS since I think mkfs.ext4 may now 
have a little dedicated wiper to de-NTFS a disk after that went sour a 
few too many times).


When storage signatures conflict you will get exciting outcomes. It 
will always be that way, and its not an error in any of that 
filesystem code. You, the System Architect, bear a burden here.


The system isn't shooting itself when you do certain things. The 
System Architect is shooting the system with a bad layout bullet.


You don't want some LV to be scanned... don't scan it... If your tools 
scan it automatically, don't use those tools that way. But my distro 
automatically is just a reason to look twice at your distro or your design.



--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots

2014-12-07 Thread Konstantin

Anand Jain wrote on 02.12.2014 at 12:54:



 On 02/12/2014 19:14, Goffredo Baroncelli wrote:
 I further investigate this issue.

 MegaBrutal, reported the following issue: doing a lvm snapshot of the
 device of a
 mounted btrfs fs, the new snapshot device name replaces the name of
 the original
 device in the output of /proc/mounts. This confused tools like
 grub-probe which
 report a wrong root device.

 very good test case indeed thanks.

 Actual IO would still go to the original device, until FS is remounted.
This seems to be correct at least at the beginning but I wouldn't be so
sure - why else the system is crashing in my case after a while when the
second drive is present?! So if the kernel was not using it in some way,
except the wrong /proc/mounts nothing else should happen.


 It has to be pointed out that instead the link under
 /sys/fs/btrfs/fsid/devices is
 correct.

 In this context the above sysfs path will be out of sync with the
 reality, its just stale sysfs entry.


 What happens is that *even if the filesystem is mounted*, doing a
 btrfs dev scan of a snapshot (of the real volume), the device name
 of the
 filesystem is replaced with the snapshot one.

 we have some fundamentally wrong stuff. My original patch tried
 to fix it. But later discovered that some external entities like
 systmed and boot process is using that bug as a feature and we had
 to revert the patch.

 Fundamentally scsi inquiry serial number is only number which is unique
 to the device (including the virtual device, but there could be some
 legacy virtual device which didn't follow that strictly, Anyway those
 I deem to be device side issue.) Btrfs depends on the combination of
 fsid, uuid and devid (and generation number) to identify the unique
 device volume, which is weak and easy to go wrong.


 Anand, with b96de000b, tried to fix it; however further regression
 appeared
 and Chris reverted this commit (see below).

 BR
 G.Baroncelli

 commit b96de000bc8bc9688b3a2abea4332bd57648a49f
 Author: Anand Jain anand.j...@oracle.com
 Date:   Thu Jul 3 18:22:05 2014 +0800

  Btrfs: device_list_add() should not update list when mounted
 [...]


 commit 0f23ae74f589304bf33233f85737f4fd368549eb
 Author: Chris Mason c...@fb.com
 Date:   Thu Sep 18 07:49:05 2014 -0700

  Revert Btrfs: device_list_add() should not update list when
 mounted

  This reverts commit b96de000bc8bc9688b3a2abea4332bd57648a49f.

  This commit is triggering failures to mount by subvolume id in some
  configurations.  The main problem is how many different ways this
  scanning function is used, both for scanning while mounted and
  unmounted.  A proper cleanup is too big for late rcs.

 [...]

 On 12/02/2014 09:28 AM, MegaBrutal wrote:
 2014-12-02 8:50 GMT+01:00 Goffredo Baroncelli kreij...@inwind.it:
 On 12/02/2014 01:15 AM, MegaBrutal wrote:
 2014-12-02 0:24 GMT+01:00 Robert White rwh...@pobox.com:
 On 12/01/2014 02:10 PM, MegaBrutal wrote:

 Since having duplicate UUIDs on devices is not a problem for me
 since
 I can tell them apart by LVM names, the discussion is of little
 relevance to my use case. Of course it's interesting and I like to
 read it along, it is not about the actual problem at hand.


 Which is why you use the device= mount option, which would take
 LVM names
 and which was repeatedly discussed as solving this very problem.

 Once you decide to duplicate the UUIDs with LVM snapshots you
 take up the
 burden of disambiguating your storage.

 Which is part of why re-reading was suggested as this was covered
 in some
 depth and _is_ _exactly_ about the problem at hand.

 Nope.

 root@reproduce-1391429:~# cat /proc/cmdline
 BOOT_IMAGE=/vmlinuz-3.18.0-031800rc5-generic
 root=/dev/mapper/vg-rootlv ro
 rootflags=device=/dev/mapper/vg-rootlv,subvol=@

 Observe, device= mount option is added.

 device= options is needed only in a btrfs multi-volume scenario.
 If you have only one disk, this is not needed


 I know. I only did this as a demonstration for Robert. He insisted it
 will certainly solve the problem. Well, it doesn't.



 root@reproduce-1391429:~# ./reproduce-1391429.sh
 #!/bin/sh -v
 lvs
LV VG   Attr  LSize   Pool Origin Data%  Move Log
 Copy%  Convert
rootlv vg   -wi-ao---   1.00g
swap0  vg   -wi-ao--- 256.00m

 grub-probe --target=device /
 /dev/mapper/vg-rootlv

 grep  /  /proc/mounts
 rootfs / rootfs rw 0 0
 /dev/dm-1 / btrfs rw,relatime,space_cache 0 0

 lvcreate --snapshot --size=128M --name z vg/rootlv
Logical volume z created

 lvs
LV VG   Attr  LSize   Pool Origin Data%  Move Log
 Copy%  Convert
rootlv vg   owi-aos--   1.00g
swap0  vg   -wi-ao--- 256.00m
z  vg   swi-a-s-- 128.00m  rootlv   0.11

 ls -l /dev/vg/
 total 0
 lrwxrwxrwx 1 root root 7 Dec  2 00:12 rootlv - ../dm-1
 lrwxrwxrwx 1 root root 7 Dec  2 00:12 swap0 - ../dm-0
 lrwxrwxrwx 1 root root 7 Dec  2 00:12 z - ../dm-2

 grub-probe --target=device /
 /dev/mapper/vg-z

 grep  /  

Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots

2014-12-07 Thread Konstantin
Phillip Susi wrote on 02.12.2014 at 20:19:
 On 12/1/2014 4:45 PM, Konstantin wrote:
  The bug appears also when using mdadm RAID1 - when one of the
  drives is detached from the array then the OS discovers it and
  after a while (not directly, it takes several minutes) it appears
  under /proc/mounts: instead of /dev/md0p1 I see there /dev/sdb1.
  And usually after some hour or so (depending on system workload)
  the PC completely freezes. So discussion about the uniqueness of
  UUIDs or not, a crashing kernel is telling me that there is a
  serious bug.

 I'm guessing you are using metadata format 0.9 or 1.0, which put the
 metadata at the end of the drive and the filesystem still starts in
 sector zero.  1.2 is now the default and would not have this problem
 as its metadata is at the start of the disk ( well, 4k from the start
 ) and the fs starts further down.
I know this and I'm using 0.9 on purpose. I need to boot from these
disks so I can't use 1.2 format as the BIOS wouldn't recognize the
partitions. Having an additional non-RAID disk for booting introduces a
single point of failure which contrary to the idea of RAID0.

Anyway, to avoid a futile discussion, mdraid and its format is not the
problem, it is just an example of the problem. Using dm-raid would do
the same trouble, LVM apparently, too. I could think of a bunch of other
cases including the use of hardware based RAID controllers. OK, it's not
the majority's problem, but that's not the argument to keep a bug/flaw
capable of crashing your system.

As it is a nice feature that the kernel apparently scans for drives and
automatically identifies BTRFS ones, it seems to me that this feature is
useless. When in a live system a BTRFS RAID disk fails, it is not
sufficient to hot-replace it, the kernel will not automatically
rebalance. Commands are still needed for the task as are with mdraid. So
the only point I can see at the moment where this auto-detect feature
makes sense is when mounting the device for the first time. If I
remember the documentation correctly, you mount one of the RAID devices
and the others are automagically attached as well. But outside of the
mount process, what is this auto-detect used for?

So here a couple of rather simple solutions which, as far as I can see,
could solve the problem:

1. Limit the auto-detect to the mount process and don't do it when
devices are appearing.

2. When a BTRFS device is detected and its metadata is identical to one
already mounted, just ignore it.


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots

2014-12-04 Thread MegaBrutal
2014-12-04 6:15 GMT+01:00 Duncan 1i5t5.dun...@cox.net:

 Which is why I'm running an initramfs for the first time since I've
 switched to btrfs raid1 mode root, as I quit with initrds back before
 initramfs was an option.  An initramfs appended to the kernel image beats
 a separate initrd, but I'd still love to see the kernel commandline
 parsing fixed so it broke at the correct = in rootflags=device= (which
 seemed to be the problem, the kernel then didn't seem to recognize
 rootflags at all, as it was apparently seeing it as a parameter called
 rootflags=device, instead of rootflags), so I could be rid of the
 initramfs again.


Are you sure it isn't fixed? At least, it parses rootflags=subvol=@
well, which also has multiple = signs. And last time I've tried this,
and didn't cause any problems:
rootflags=device=/dev/mapper/vg-rootlv,subvol=@. Though device=
shouldn't have an effect in this case anyway, but I didn't get any
complaints against it. Though I use an initrd.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots

2014-12-04 Thread Duncan
MegaBrutal posted on Thu, 04 Dec 2014 09:20:12 +0100 as excerpted:

 Are you sure it isn't fixed? At least, it parses rootflags=subvol=@
 well, which also has multiple = signs. And last time I've tried this,
 and didn't cause any problems:
 rootflags=device=/dev/mapper/vg-rootlv,subvol=@. Though device=
 shouldn't have an effect in this case anyway, but I didn't get any
 complaints against it. Though I use an initrd.

AFAIK lvm requires userspace anyway, thus an initr*, and once you have 
that initr* handling the lvm, it's almost certainly the initr* parsing 
the rootflags= from the kernel commandline as well.  So in that case the 
kernel doesn't /need/ to be able to parse rootflag=, as all it does is 
pass the kernel commandline straight thru to the initr*, which would 
seem, in your case at least, to parse it correctly.

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots

2014-12-03 Thread Goffredo Baroncelli
On 12/02/2014 08:11 PM, Phillip Susi wrote:
 On 12/2/2014 7:23 AM, Austin S Hemmelgarn wrote:
 Stupid thought, why don't we just add blacklisting based on device
 path like LVM has for pvscan?
 
 That isn't logic that belongs in the kernel, so that is going down the
 path of yanking out the device auto probing from btrfs and instead
 writing a mount.btrfs helper that can use policies like blacklisting
 to auto locate all of the correct devices and pass them all to the
 kernel at mount time.
 
I am thinking about that. Today the device discovery happens:
a) when a device appears, two udev rules run btrfs dev scan device

/lib/udev/rules.d/70-btrfs.rules
/lib/udev/rules.d/80-btrfs-lvm.rules

b) during the boot it is ran a btrfs device scan, which scan all 
the device (this happens in debian for other distros may be different)

c) after a btrfs.mkfs, which starts a device scan on each devices of
the new filesystem

d) by the user


Regarding a), the problem is simply solved adding a line like:

ENV{DM_UDEV_LOW_PRIORITY_FLAG}==1, GOTO=btrfs_end

Regarding c), it is not a problem

Regarding b) and d), the only solution that I found is to query the 
udev DB inside the btrfs dev scan program and to skip the devices
with DM_UDEV_LOW_PRIORITY_FLAG==1. But implementing this, it would 
solve all the points a), b), c), d) with one shot !


BR
G.Baroncelli

P.S.
This is the comment made by LVM by DM_UDEV_LOW_PRIORITY_FLAG:

/*
 * DM_UDEV_LOW_PRIORITY_FLAG is set in case we need to instruct the
 * udev rules to give low priority to the device that is currently
 * processed. For example, this provides a way to select which symlinks
 * could be overwritten by high priority ones if their names are equal.
 * Common situation is a name based on FS UUID while using origin and
 * snapshot devices.
 */
#define DM_UDEV_LOW_PRIORITY_FLAG 0x0010

https://git.fedorahosted.org/cgit/lvm2.git/tree/libdm/libdevmapper.h#n1969



-- 
gpg @keyserver.linux.it: Goffredo Baroncelli kreijackATinwind.it
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots

2014-12-03 Thread Phillip Susi
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

On 12/03/2014 03:24 AM, Goffredo Baroncelli wrote:
 I am thinking about that. Today the device discovery happens: a)
 when a device appears, two udev rules run btrfs dev scan
 device
 
 /lib/udev/rules.d/70-btrfs.rules 
 /lib/udev/rules.d/80-btrfs-lvm.rules
 
 b) during the boot it is ran a btrfs device scan, which scan all
  the device (this happens in debian for other distros may be
 different)
 
 c) after a btrfs.mkfs, which starts a device scan on each devices
 of the new filesystem
 
 d) by the user

Are you sure the kernel only gains awareness of btrfs volumes when
user space runs btrfs device scan?  If that is so then that means you
can not boot from a multi device btrfs root without using an
initramfs.  I thought the kernel auto scanned all devices if you tried
to mount a multi device volume, but if this is so, then yes, the udev
rules could be fixed to not call btrfs device scan on an lvm snapshot.


-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iQEcBAEBCgAGBQJUf9BpAAoJENRVrw2cjl5RgcQIALCGfplK/xgX/QaiRjNW96l2
DWNPQMIhPesci0gF7Th3sNboew0hrc3g6S0a55wAO12CBhMPdzHxHjd9iFVpKi9O
vzvU36XyzwdcPJkBqRdPJMT2kX+428gYUW7jkyC8usj5eSCyeiIodJuxirGDL5Nb
3TttEJOpbPHGlTzHjAqEcK2ybzYi9HCN3CD3fuLagP9n+4zmFE7tGaGglZ9+7P58
wZjlP5xKDCR4Cu5Hr+5ErrmT2EoOvFC+PLKOT8xXhD9Y2emk2AtuY+5l/w7I+SIS
42gTUqPOx/8AOxBhOhkI0pPO8eK7S/lP1LKoXF0WWHhX8CgJLIHwj5KniDYcjBA=
=HI90
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots

2014-12-03 Thread Duncan
Phillip Susi posted on Wed, 03 Dec 2014 22:09:29 -0500 as excerpted:

 Are you sure the kernel only gains awareness of btrfs volumes when user
 space runs btrfs device scan?  If that is so then that means you can not
 boot from a multi device btrfs root without using an initramfs.  I
 thought the kernel auto scanned all devices if you tried to mount a
 multi device volume, but if this is so, then yes, the udev rules could
 be fixed to not call btrfs device scan on an lvm snapshot.

That has indeed been the case in the past, and to my knowledge remains 
the case.

Unless it has changed in the last cycle or two (and I've not seen patches 
to that effect on the list nor any hint of such, so I doubt it) the 
kernel doesn't do any such scanning without userspace telling it to.  The 
device= mount option can be used instead, but it didn't work with 
rootflags= on the kernel commandline last I tried so for a multidevice 
btrfs root, yes, an initramfs/initrd is required.

Which is why I'm running an initramfs for the first time since I've 
switched to btrfs raid1 mode root, as I quit with initrds back before 
initramfs was an option.  An initramfs appended to the kernel image beats 
a separate initrd, but I'd still love to see the kernel commandline 
parsing fixed so it broke at the correct = in rootflags=device= (which 
seemed to be the problem, the kernel then didn't seem to recognize 
rootflags at all, as it was apparently seeing it as a parameter called 
rootflags=device, instead of rootflags), so I could be rid of the 
initramfs again.

FWIW, I'm using dracut to generate the cpio archive, which with the right 
kernel config options set, the kernel build process then appends to the 
kernel.  Dracut btrfs module enabled of course, most of the rest force-
disabled as I run a monolithic kernel so don't need module loading, etc.

-- 
Duncan - List replies preferred.   No HTML msgs.
Every nonfree program has a lord, a master --
and if you use the program, he is your master.  Richard Stallman

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots

2014-12-02 Thread MegaBrutal
2014-12-02 8:50 GMT+01:00 Goffredo Baroncelli kreij...@inwind.it:
 On 12/02/2014 01:15 AM, MegaBrutal wrote:
 2014-12-02 0:24 GMT+01:00 Robert White rwh...@pobox.com:
 On 12/01/2014 02:10 PM, MegaBrutal wrote:

 Since having duplicate UUIDs on devices is not a problem for me since
 I can tell them apart by LVM names, the discussion is of little
 relevance to my use case. Of course it's interesting and I like to
 read it along, it is not about the actual problem at hand.


 Which is why you use the device= mount option, which would take LVM names
 and which was repeatedly discussed as solving this very problem.

 Once you decide to duplicate the UUIDs with LVM snapshots you take up the
 burden of disambiguating your storage.

 Which is part of why re-reading was suggested as this was covered in some
 depth and _is_ _exactly_ about the problem at hand.

 Nope.

 root@reproduce-1391429:~# cat /proc/cmdline
 BOOT_IMAGE=/vmlinuz-3.18.0-031800rc5-generic
 root=/dev/mapper/vg-rootlv ro
 rootflags=device=/dev/mapper/vg-rootlv,subvol=@

 Observe, device= mount option is added.

 device= options is needed only in a btrfs multi-volume scenario.
 If you have only one disk, this is not needed


I know. I only did this as a demonstration for Robert. He insisted it
will certainly solve the problem. Well, it doesn't.



 root@reproduce-1391429:~# ./reproduce-1391429.sh
 #!/bin/sh -v
 lvs
   LV VG   Attr  LSize   Pool Origin Data%  Move Log Copy%  Convert
   rootlv vg   -wi-ao---   1.00g
   swap0  vg   -wi-ao--- 256.00m

 grub-probe --target=device /
 /dev/mapper/vg-rootlv

 grep  /  /proc/mounts
 rootfs / rootfs rw 0 0
 /dev/dm-1 / btrfs rw,relatime,space_cache 0 0

 lvcreate --snapshot --size=128M --name z vg/rootlv
   Logical volume z created

 lvs
   LV VG   Attr  LSize   Pool Origin Data%  Move Log Copy%  Convert
   rootlv vg   owi-aos--   1.00g
   swap0  vg   -wi-ao--- 256.00m
   z  vg   swi-a-s-- 128.00m  rootlv   0.11

 ls -l /dev/vg/
 total 0
 lrwxrwxrwx 1 root root 7 Dec  2 00:12 rootlv - ../dm-1
 lrwxrwxrwx 1 root root 7 Dec  2 00:12 swap0 - ../dm-0
 lrwxrwxrwx 1 root root 7 Dec  2 00:12 z - ../dm-2

 grub-probe --target=device /
 /dev/mapper/vg-z

 grep  /  /proc/mounts
 rootfs / rootfs rw 0 0
 /dev/dm-2 / btrfs rw,relatime,space_cache 0 0

 What /proc/self/mountinfo contains ?

Before creating snapshot:

15 20 0:15 / /sys rw,nosuid,nodev,noexec,relatime - sysfs sysfs rw
16 20 0:3 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw
17 20 0:5 / /dev rw,relatime - devtmpfs udev
rw,size=241692k,nr_inodes=60423,mode=755
18 17 0:12 / /dev/pts rw,nosuid,noexec,relatime - devpts devpts
rw,gid=5,mode=620,ptmxmode=000
19 20 0:16 / /run rw,nosuid,noexec,relatime - tmpfs tmpfs
rw,size=50084k,mode=755
20 0 0:17 /@ / rw,relatime - btrfs /dev/dm-1 rw,space_cache
- THIS!
21 15 0:20 / /sys/fs/cgroup rw,relatime - tmpfs none rw,size=4k,mode=755
22 15 0:21 / /sys/fs/fuse/connections rw,relatime - fusectl none rw
23 15 0:6 / /sys/kernel/debug rw,relatime - debugfs none rw
24 15 0:10 / /sys/kernel/security rw,relatime - securityfs none rw
25 19 0:22 / /run/lock rw,nosuid,nodev,noexec,relatime - tmpfs none
rw,size=5120k
26 19 0:23 / /run/shm rw,nosuid,nodev,relatime - tmpfs none rw
27 19 0:24 / /run/user rw,nosuid,nodev,noexec,relatime - tmpfs none
rw,size=102400k,mode=755
28 15 0:25 / /sys/fs/pstore rw,relatime - pstore none rw
29 20 253:1 / /boot rw,relatime - ext2 /dev/vda1 rw


After creating snapshot:

15 20 0:15 / /sys rw,nosuid,nodev,noexec,relatime - sysfs sysfs rw
16 20 0:3 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw
17 20 0:5 / /dev rw,relatime - devtmpfs udev
rw,size=241692k,nr_inodes=60423,mode=755
18 17 0:12 / /dev/pts rw,nosuid,noexec,relatime - devpts devpts
rw,gid=5,mode=620,ptmxmode=000
19 20 0:16 / /run rw,nosuid,noexec,relatime - tmpfs tmpfs
rw,size=50084k,mode=755
20 0 0:17 /@ / rw,relatime - btrfs /dev/dm-2 rw,space_cache
- WTF?!
21 15 0:20 / /sys/fs/cgroup rw,relatime - tmpfs none rw,size=4k,mode=755
22 15 0:21 / /sys/fs/fuse/connections rw,relatime - fusectl none rw
23 15 0:6 / /sys/kernel/debug rw,relatime - debugfs none rw
24 15 0:10 / /sys/kernel/security rw,relatime - securityfs none rw
25 19 0:22 / /run/lock rw,nosuid,nodev,noexec,relatime - tmpfs none
rw,size=5120k
26 19 0:23 / /run/shm rw,nosuid,nodev,relatime - tmpfs none rw
27 19 0:24 / /run/user rw,nosuid,nodev,noexec,relatime - tmpfs none
rw,size=102400k,mode=755
28 15 0:25 / /sys/fs/pstore rw,relatime - pstore none rw
29 20 253:1 / /boot rw,relatime - ext2 /dev/vda1 rw


So it's consistent with what /proc/mounts reports.



 And more important question: it is only the value
 returned by /proc/mount wrongly or also the filesystem
 content is affected ?


I quote my bug report on this:

The information reported in /proc/mounts is certainly bogus, since
still the origin device is being written, the kernel does not actually
mix up the devices for write operations, and such, the phenomenon does
not cause 

Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots

2014-12-02 Thread Goffredo Baroncelli
I further investigate this issue.

MegaBrutal, reported the following issue: doing a lvm snapshot of the device of 
a
mounted btrfs fs, the new snapshot device name replaces the name of the 
original 
device in the output of /proc/mounts. This confused tools like grub-probe which
report a wrong root device.

It has to be pointed out that instead the link under 
/sys/fs/btrfs/fsid/devices is
correct.


What happens is that *even if the filesystem is mounted*, doing a
btrfs dev scan of a snapshot (of the real volume), the device name of the
filesystem is replaced with the snapshot one.

Anand, with b96de000b, tried to fix it; however further regression appeared
and Chris reverted this commit (see below).

BR
G.Baroncelli

commit b96de000bc8bc9688b3a2abea4332bd57648a49f
Author: Anand Jain anand.j...@oracle.com
Date:   Thu Jul 3 18:22:05 2014 +0800

Btrfs: device_list_add() should not update list when mounted
[...]


commit 0f23ae74f589304bf33233f85737f4fd368549eb
Author: Chris Mason c...@fb.com
Date:   Thu Sep 18 07:49:05 2014 -0700

Revert Btrfs: device_list_add() should not update list when mounted

This reverts commit b96de000bc8bc9688b3a2abea4332bd57648a49f.

This commit is triggering failures to mount by subvolume id in some
configurations.  The main problem is how many different ways this
scanning function is used, both for scanning while mounted and
unmounted.  A proper cleanup is too big for late rcs.

[...]

On 12/02/2014 09:28 AM, MegaBrutal wrote:
 2014-12-02 8:50 GMT+01:00 Goffredo Baroncelli kreij...@inwind.it:
 On 12/02/2014 01:15 AM, MegaBrutal wrote:
 2014-12-02 0:24 GMT+01:00 Robert White rwh...@pobox.com:
 On 12/01/2014 02:10 PM, MegaBrutal wrote:

 Since having duplicate UUIDs on devices is not a problem for me since
 I can tell them apart by LVM names, the discussion is of little
 relevance to my use case. Of course it's interesting and I like to
 read it along, it is not about the actual problem at hand.


 Which is why you use the device= mount option, which would take LVM names
 and which was repeatedly discussed as solving this very problem.

 Once you decide to duplicate the UUIDs with LVM snapshots you take up the
 burden of disambiguating your storage.

 Which is part of why re-reading was suggested as this was covered in some
 depth and _is_ _exactly_ about the problem at hand.

 Nope.

 root@reproduce-1391429:~# cat /proc/cmdline
 BOOT_IMAGE=/vmlinuz-3.18.0-031800rc5-generic
 root=/dev/mapper/vg-rootlv ro
 rootflags=device=/dev/mapper/vg-rootlv,subvol=@

 Observe, device= mount option is added.

 device= options is needed only in a btrfs multi-volume scenario.
 If you have only one disk, this is not needed

 
 I know. I only did this as a demonstration for Robert. He insisted it
 will certainly solve the problem. Well, it doesn't.
 
 

 root@reproduce-1391429:~# ./reproduce-1391429.sh
 #!/bin/sh -v
 lvs
   LV VG   Attr  LSize   Pool Origin Data%  Move Log Copy%  Convert
   rootlv vg   -wi-ao---   1.00g
   swap0  vg   -wi-ao--- 256.00m

 grub-probe --target=device /
 /dev/mapper/vg-rootlv

 grep  /  /proc/mounts
 rootfs / rootfs rw 0 0
 /dev/dm-1 / btrfs rw,relatime,space_cache 0 0

 lvcreate --snapshot --size=128M --name z vg/rootlv
   Logical volume z created

 lvs
   LV VG   Attr  LSize   Pool Origin Data%  Move Log Copy%  Convert
   rootlv vg   owi-aos--   1.00g
   swap0  vg   -wi-ao--- 256.00m
   z  vg   swi-a-s-- 128.00m  rootlv   0.11

 ls -l /dev/vg/
 total 0
 lrwxrwxrwx 1 root root 7 Dec  2 00:12 rootlv - ../dm-1
 lrwxrwxrwx 1 root root 7 Dec  2 00:12 swap0 - ../dm-0
 lrwxrwxrwx 1 root root 7 Dec  2 00:12 z - ../dm-2

 grub-probe --target=device /
 /dev/mapper/vg-z

 grep  /  /proc/mounts
 rootfs / rootfs rw 0 0
 /dev/dm-2 / btrfs rw,relatime,space_cache 0 0

 What /proc/self/mountinfo contains ?
 
 Before creating snapshot:
 
 15 20 0:15 / /sys rw,nosuid,nodev,noexec,relatime - sysfs sysfs rw
 16 20 0:3 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw
 17 20 0:5 / /dev rw,relatime - devtmpfs udev
 rw,size=241692k,nr_inodes=60423,mode=755
 18 17 0:12 / /dev/pts rw,nosuid,noexec,relatime - devpts devpts
 rw,gid=5,mode=620,ptmxmode=000
 19 20 0:16 / /run rw,nosuid,noexec,relatime - tmpfs tmpfs
 rw,size=50084k,mode=755
 20 0 0:17 /@ / rw,relatime - btrfs /dev/dm-1 rw,space_cache
 - THIS!
 21 15 0:20 / /sys/fs/cgroup rw,relatime - tmpfs none rw,size=4k,mode=755
 22 15 0:21 / /sys/fs/fuse/connections rw,relatime - fusectl none rw
 23 15 0:6 / /sys/kernel/debug rw,relatime - debugfs none rw
 24 15 0:10 / /sys/kernel/security rw,relatime - securityfs none rw
 25 19 0:22 / /run/lock rw,nosuid,nodev,noexec,relatime - tmpfs none
 rw,size=5120k
 26 19 0:23 / /run/shm rw,nosuid,nodev,relatime - tmpfs none rw
 27 19 0:24 / /run/user rw,nosuid,nodev,noexec,relatime - tmpfs none
 rw,size=102400k,mode=755
 28 15 0:25 / /sys/fs/pstore rw,relatime - pstore none rw
 29 20 253:1 / /boot rw,relatime - ext2 /dev/vda1 rw
 

Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots

2014-12-02 Thread Anand Jain




On 02/12/2014 19:14, Goffredo Baroncelli wrote:

I further investigate this issue.

MegaBrutal, reported the following issue: doing a lvm snapshot of the device of 
a
mounted btrfs fs, the new snapshot device name replaces the name of the original
device in the output of /proc/mounts. This confused tools like grub-probe which
report a wrong root device.


very good test case indeed thanks.

Actual IO would still go to the original device, until FS is remounted.



It has to be pointed out that instead the link under 
/sys/fs/btrfs/fsid/devices is
correct.


In this context the above sysfs path will be out of sync with the 
reality, its just stale sysfs entry.




What happens is that *even if the filesystem is mounted*, doing a
btrfs dev scan of a snapshot (of the real volume), the device name of the
filesystem is replaced with the snapshot one.


we have some fundamentally wrong stuff. My original patch tried
to fix it. But later discovered that some external entities like
systmed and boot process is using that bug as a feature and we had
to revert the patch.

Fundamentally scsi inquiry serial number is only number which is unique
to the device (including the virtual device, but there could be some
legacy virtual device which didn't follow that strictly, Anyway those
I deem to be device side issue.) Btrfs depends on the combination of
fsid, uuid and devid (and generation number) to identify the unique
device volume, which is weak and easy to go wrong.



Anand, with b96de000b, tried to fix it; however further regression appeared
and Chris reverted this commit (see below).

BR
G.Baroncelli

commit b96de000bc8bc9688b3a2abea4332bd57648a49f
Author: Anand Jain anand.j...@oracle.com
Date:   Thu Jul 3 18:22:05 2014 +0800

 Btrfs: device_list_add() should not update list when mounted
[...]


commit 0f23ae74f589304bf33233f85737f4fd368549eb
Author: Chris Mason c...@fb.com
Date:   Thu Sep 18 07:49:05 2014 -0700

 Revert Btrfs: device_list_add() should not update list when mounted

 This reverts commit b96de000bc8bc9688b3a2abea4332bd57648a49f.

 This commit is triggering failures to mount by subvolume id in some
 configurations.  The main problem is how many different ways this
 scanning function is used, both for scanning while mounted and
 unmounted.  A proper cleanup is too big for late rcs.

[...]

On 12/02/2014 09:28 AM, MegaBrutal wrote:

2014-12-02 8:50 GMT+01:00 Goffredo Baroncelli kreij...@inwind.it:

On 12/02/2014 01:15 AM, MegaBrutal wrote:

2014-12-02 0:24 GMT+01:00 Robert White rwh...@pobox.com:

On 12/01/2014 02:10 PM, MegaBrutal wrote:


Since having duplicate UUIDs on devices is not a problem for me since
I can tell them apart by LVM names, the discussion is of little
relevance to my use case. Of course it's interesting and I like to
read it along, it is not about the actual problem at hand.



Which is why you use the device= mount option, which would take LVM names
and which was repeatedly discussed as solving this very problem.

Once you decide to duplicate the UUIDs with LVM snapshots you take up the
burden of disambiguating your storage.

Which is part of why re-reading was suggested as this was covered in some
depth and _is_ _exactly_ about the problem at hand.


Nope.

root@reproduce-1391429:~# cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-3.18.0-031800rc5-generic
root=/dev/mapper/vg-rootlv ro
rootflags=device=/dev/mapper/vg-rootlv,subvol=@

Observe, device= mount option is added.


device= options is needed only in a btrfs multi-volume scenario.
If you have only one disk, this is not needed



I know. I only did this as a demonstration for Robert. He insisted it
will certainly solve the problem. Well, it doesn't.




root@reproduce-1391429:~# ./reproduce-1391429.sh
#!/bin/sh -v
lvs
   LV VG   Attr  LSize   Pool Origin Data%  Move Log Copy%  Convert
   rootlv vg   -wi-ao---   1.00g
   swap0  vg   -wi-ao--- 256.00m

grub-probe --target=device /
/dev/mapper/vg-rootlv

grep  /  /proc/mounts
rootfs / rootfs rw 0 0
/dev/dm-1 / btrfs rw,relatime,space_cache 0 0

lvcreate --snapshot --size=128M --name z vg/rootlv
   Logical volume z created

lvs
   LV VG   Attr  LSize   Pool Origin Data%  Move Log Copy%  Convert
   rootlv vg   owi-aos--   1.00g
   swap0  vg   -wi-ao--- 256.00m
   z  vg   swi-a-s-- 128.00m  rootlv   0.11

ls -l /dev/vg/
total 0
lrwxrwxrwx 1 root root 7 Dec  2 00:12 rootlv - ../dm-1
lrwxrwxrwx 1 root root 7 Dec  2 00:12 swap0 - ../dm-0
lrwxrwxrwx 1 root root 7 Dec  2 00:12 z - ../dm-2

grub-probe --target=device /
/dev/mapper/vg-z

grep  /  /proc/mounts
rootfs / rootfs rw 0 0
/dev/dm-2 / btrfs rw,relatime,space_cache 0 0


What /proc/self/mountinfo contains ?


Before creating snapshot:

15 20 0:15 / /sys rw,nosuid,nodev,noexec,relatime - sysfs sysfs rw
16 20 0:3 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw
17 20 0:5 / /dev rw,relatime - devtmpfs udev
rw,size=241692k,nr_inodes=60423,mode=755
18 17 0:12 / /dev/pts 

Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots

2014-12-02 Thread Austin S Hemmelgarn

On 2014-12-02 06:54, Anand Jain wrote:




On 02/12/2014 19:14, Goffredo Baroncelli wrote:

I further investigate this issue.

MegaBrutal, reported the following issue: doing a lvm snapshot of the
device of a
mounted btrfs fs, the new snapshot device name replaces the name of
the original
device in the output of /proc/mounts. This confused tools like
grub-probe which
report a wrong root device.


very good test case indeed thanks.

Actual IO would still go to the original device, until FS is remounted.



It has to be pointed out that instead the link under
/sys/fs/btrfs/fsid/devices is
correct.


In this context the above sysfs path will be out of sync with the
reality, its just stale sysfs entry.



What happens is that *even if the filesystem is mounted*, doing a
btrfs dev scan of a snapshot (of the real volume), the device name
of the
filesystem is replaced with the snapshot one.


we have some fundamentally wrong stuff. My original patch tried
to fix it. But later discovered that some external entities like
systmed and boot process is using that bug as a feature and we had
to revert the patch.

Fundamentally scsi inquiry serial number is only number which is unique
to the device (including the virtual device, but there could be some
legacy virtual device which didn't follow that strictly, Anyway those
I deem to be device side issue.) Btrfs depends on the combination of
fsid, uuid and devid (and generation number) to identify the unique
device volume, which is weak and easy to go wrong.



Anand, with b96de000b, tried to fix it; however further regression
appeared
and Chris reverted this commit (see below).

BR
G.Baroncelli

commit b96de000bc8bc9688b3a2abea4332bd57648a49f
Author: Anand Jain anand.j...@oracle.com
Date:   Thu Jul 3 18:22:05 2014 +0800

 Btrfs: device_list_add() should not update list when mounted
[...]


commit 0f23ae74f589304bf33233f85737f4fd368549eb
Author: Chris Mason c...@fb.com
Date:   Thu Sep 18 07:49:05 2014 -0700

 Revert Btrfs: device_list_add() should not update list when
mounted

 This reverts commit b96de000bc8bc9688b3a2abea4332bd57648a49f.

 This commit is triggering failures to mount by subvolume id in some
 configurations.  The main problem is how many different ways this
 scanning function is used, both for scanning while mounted and
 unmounted.  A proper cleanup is too big for late rcs.

[...]

On 12/02/2014 09:28 AM, MegaBrutal wrote:

2014-12-02 8:50 GMT+01:00 Goffredo Baroncelli kreij...@inwind.it:

On 12/02/2014 01:15 AM, MegaBrutal wrote:

2014-12-02 0:24 GMT+01:00 Robert White rwh...@pobox.com:

On 12/01/2014 02:10 PM, MegaBrutal wrote:


Since having duplicate UUIDs on devices is not a problem for me
since
I can tell them apart by LVM names, the discussion is of little
relevance to my use case. Of course it's interesting and I like to
read it along, it is not about the actual problem at hand.



Which is why you use the device= mount option, which would take
LVM names
and which was repeatedly discussed as solving this very problem.

Once you decide to duplicate the UUIDs with LVM snapshots you take
up the
burden of disambiguating your storage.

Which is part of why re-reading was suggested as this was covered
in some
depth and _is_ _exactly_ about the problem at hand.


Nope.

root@reproduce-1391429:~# cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-3.18.0-031800rc5-generic
root=/dev/mapper/vg-rootlv ro
rootflags=device=/dev/mapper/vg-rootlv,subvol=@

Observe, device= mount option is added.


device= options is needed only in a btrfs multi-volume scenario.
If you have only one disk, this is not needed



I know. I only did this as a demonstration for Robert. He insisted it
will certainly solve the problem. Well, it doesn't.




root@reproduce-1391429:~# ./reproduce-1391429.sh
#!/bin/sh -v
lvs
   LV VG   Attr  LSize   Pool Origin Data%  Move Log Copy%
Convert
   rootlv vg   -wi-ao---   1.00g
   swap0  vg   -wi-ao--- 256.00m

grub-probe --target=device /
/dev/mapper/vg-rootlv

grep  /  /proc/mounts
rootfs / rootfs rw 0 0
/dev/dm-1 / btrfs rw,relatime,space_cache 0 0

lvcreate --snapshot --size=128M --name z vg/rootlv
   Logical volume z created

lvs
   LV VG   Attr  LSize   Pool Origin Data%  Move Log Copy%
Convert
   rootlv vg   owi-aos--   1.00g
   swap0  vg   -wi-ao--- 256.00m
   z  vg   swi-a-s-- 128.00m  rootlv   0.11

ls -l /dev/vg/
total 0
lrwxrwxrwx 1 root root 7 Dec  2 00:12 rootlv - ../dm-1
lrwxrwxrwx 1 root root 7 Dec  2 00:12 swap0 - ../dm-0
lrwxrwxrwx 1 root root 7 Dec  2 00:12 z - ../dm-2

grub-probe --target=device /
/dev/mapper/vg-z

grep  /  /proc/mounts
rootfs / rootfs rw 0 0
/dev/dm-2 / btrfs rw,relatime,space_cache 0 0


What /proc/self/mountinfo contains ?


Before creating snapshot:

15 20 0:15 / /sys rw,nosuid,nodev,noexec,relatime - sysfs sysfs rw
16 20 0:3 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw
17 20 0:5 / /dev rw,relatime - devtmpfs udev
rw,size=241692k,nr_inodes=60423,mode=755
18 

Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots

2014-12-02 Thread Phillip Susi
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 12/2/2014 7:23 AM, Austin S Hemmelgarn wrote:
 Stupid thought, why don't we just add blacklisting based on device
 path like LVM has for pvscan?

That isn't logic that belongs in the kernel, so that is going down the
path of yanking out the device auto probing from btrfs and instead
writing a mount.btrfs helper that can use policies like blacklisting
to auto locate all of the correct devices and pass them all to the
kernel at mount time.


-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.17 (MingW32)

iQEcBAEBAgAGBQJUfg7lAAoJENRVrw2cjl5RAakIAKLsIKgjzUO8J/PBBDTmcCQh
IvkEMlQ6ME+Zi7xCKM9p+J5Skcu22zj8w2Ip0s/zNo3ydGorajxehUqtU983l5Hd
VklKOuNGZ0wrOtwCH8IkRt9HUvT3I7982jByi2Uk9jxpRbL/BruaJ4NF+Z9HnvHO
cmMNavcKvwOkYpPHPPbeyjNwWALe/WRZZ2cgsKqs/vB2nakxFntUc1UOsnIMfLJ7
dMF0l9GudoIoNaqRUNoxV1/Lh9MxKx0p9mBK6Pc+V+wLulUyOUSQ6OkUTsznCabk
iUyzX9IYiF83hWO3g+1vxR+GCeYNVGvC/Rj8ZkLSt9Tpi7JH0kbXnq6wKedSfE0=
=Lxfb
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots

2014-12-02 Thread Phillip Susi
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 12/2/2014 6:54 AM, Anand Jain wrote:
 we have some fundamentally wrong stuff. My original patch tried to
 fix it. But later discovered that some external entities like 
 systmed and boot process is using that bug as a feature and we had 
 to revert the patch.

If systemd is depending on the kernel lieing about what device it has
mounted then something is *extremely* broken there and that should be
fixed instead of breaking the kernel.


-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.17 (MingW32)

iQEcBAEBAgAGBQJUfg+BAAoJENRVrw2cjl5REm8H/j2MEbF2yeTsGtOGhszl82rZ
ngSvVfEEPq1D+tpi28+oZnSLYxIKEGudqTciyeb8Z1jCTD065D/T0xpGJZyd6pUG
KGahBpnPvhP5xg4RaoSxSzNcFzPPFfz+EIPyV+l3OlHbyeq0whkKj5OAq15Grz6c
RDWViqRFRE+dC2k70fAt6mlxWs7ChCVs9fPuuWVTFW+lXBoCKUZhnZ5Kc2orsKx6
rVTNTo6LxZQX7m+9WzIy5lqH+WgqxtfEacAlM/6jXWwPe09DDT3z0s3ogf+dfO0D
3/efDv1XJ/LwmbyQrGxiS0LQWoPA+d+MX0Od3XRcaeml3d7k/tZjDsrFOY6anIg=
=Rxh6
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots

2014-12-02 Thread Phillip Susi
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 12/1/2014 4:45 PM, Konstantin wrote:
 The bug appears also when using mdadm RAID1 - when one of the
 drives is detached from the array then the OS discovers it and
 after a while (not directly, it takes several minutes) it appears
 under /proc/mounts: instead of /dev/md0p1 I see there /dev/sdb1.
 And usually after some hour or so (depending on system workload)
 the PC completely freezes. So discussion about the uniqueness of
 UUIDs or not, a crashing kernel is telling me that there is a
 serious bug.

I'm guessing you are using metadata format 0.9 or 1.0, which put the
metadata at the end of the drive and the filesystem still starts in
sector zero.  1.2 is now the default and would not have this problem
as its metadata is at the start of the disk ( well, 4k from the start
) and the fs starts further down.



-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.17 (MingW32)

iQEcBAEBAgAGBQJUfhC6AAoJENRVrw2cjl5RQ2EH/0Z0iCFjOs3e5oGuGqT5Wtlc
rXV8R1EfGSxESK0g6QAe7QIvJu+0CdIgccDp8z3ezfPcm1/YRfBXxXA/Y1Wl4hqw
0wuk3bNqMjUmNwIFjEZCkgOSn4Whuppbh3hOOVGNropr4cwd84GP1Cr2vrzwYnkm
If1I3RTaBhAJRSngkP9X+L5J6zBBjaZLlF4AjC/WP/1bd5vkHpGqnFpRTquCPiNV
9LFWQIB+xYdoRdK2l7huS2jQ5kfw+qLZUQO17dU3fcicwwNk56V4HcLEPg9nx9es
pxJo9BAWmQXDpeMcCL4eFECoeAhn0IXoaXb363mmpq11qyYj73r3FzhNQ+ALzPY=
=U65Z
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots

2014-12-01 Thread MegaBrutal
Hi all,

I've reported the bug I've previously posted about in BTRFS messes up
snapshot LV with origin in the Kernel Bug Tracker.
https://bugzilla.kernel.org/show_bug.cgi?id=89121

Since the other thread went off into theoretical debates about UUIDs
and their generic relation to BTRFS, their everyday use cases, and the
philosophical meaning behind uniqueness of copies and UUIDs; I'd like
to specifically ask you to only post here about the ACTUAL problem at
hand. Don't get me wrong, I find the discussion in the other thread
really interesting, I'm following it, but it is only very remotely
related to the original issue, so please keep it there! If you're
interested to catch up about the actual bug symptoms, please read the
bug report linked above, and (optionally) reproduce the problem
yourself!

A virtual machine image on which I've already reproduced the
conditions can be downloaded here:
http://undead.megabrutal.com/kvm-reproduce-1391429.img.xz
(Download size: 113 MB; Unpacked image size: 2 GB.)

Re-tested with mainline kernel 3.18.0-rc7 just today.


Regards,
MegaBrutal
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots

2014-12-01 Thread Robert White

On 12/01/2014 04:56 AM, MegaBrutal wrote:

Since the other thread went off into theoretical debates about UUIDs
and their generic relation to BTRFS, their everyday use cases, and the
philosophical meaning behind uniqueness of copies and UUIDs; I'd like
to specifically ask you to only post here about the ACTUAL problem at
hand. Don't get me wrong, I find the discussion in the other thread
really interesting, I'm following it, but it is only very remotely
related to the original issue, so please keep it there! If you're
interested to catch up about the actual bug symptoms, please read the
bug report linked above, and (optionally) reproduce the problem
yourself!


That discussion _was_ the actual discussion of the actual problem. A 
problem that is not particularly theoretical, a problem that is common 
to block-level snapshots, and a discussion that contained the actual 
work-arounds.


I suggest a re-read. 8-)

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots

2014-12-01 Thread Konstantin

MegaBrutal schrieb am 01.12.2014 um 13:56:
 Hi all,

 I've reported the bug I've previously posted about in BTRFS messes up
 snapshot LV with origin in the Kernel Bug Tracker.
 https://bugzilla.kernel.org/show_bug.cgi?id=89121
Hi MegaBrutal. If I understand your report correctly, I can give you
another example where this bug is appearing. It is so bad that it leads
to freezing the system and I'm quite sure it's the same thing. I was
thinking about filing a bug but didn't have the time for that yet. Maybe
you could add this case to your bug report as well.

The bug appears also when using mdadm RAID1 - when one of the drives is
detached from the array then the OS discovers it and after a while (not
directly, it takes several minutes) it appears under /proc/mounts:
instead of /dev/md0p1 I see there /dev/sdb1. And usually after some hour
or so (depending on system workload) the PC completely freezes. So
discussion about the uniqueness of UUIDs or not, a crashing kernel is
telling me that there is a serious bug.

While in my case detaching was intentional, there are several real
possibilities when a RAID1 disk can get detached and currently this
leads to crashing the server when using BTRFS. That not what is intended
when using RAID ;-).

In my case I wanted to do something which was working perfectly all the
years before with all other file systems - checking the file system of
the root disk while the server is running. The procedure is simple:

1. detach one of the disks
2. do fsck on the disk device
3. mdadm --zero-superblock on the device so it gets completely rewritten
4. mdadm --add it to the array

There were some surprises with BTRFS - if 2. is not done directly after
1. btrfsck refuses to check the disk as it is reported to be mounted by
/proc/mounts. And while 2. or even after finishing it the system was
freezing. If I got to get to 4. fast enough everything was OK, but
again, that's not what I expect from a good operating system. Any
objections?

Konstantin

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots

2014-12-01 Thread MegaBrutal
2014-12-01 18:27 GMT+01:00 Robert White rwh...@pobox.com:
 On 12/01/2014 04:56 AM, MegaBrutal wrote:

 Since the other thread went off into theoretical debates about UUIDs
 and their generic relation to BTRFS, their everyday use cases, and the
 philosophical meaning behind uniqueness of copies and UUIDs; I'd like
 to specifically ask you to only post here about the ACTUAL problem at
 hand. Don't get me wrong, I find the discussion in the other thread
 really interesting, I'm following it, but it is only very remotely
 related to the original issue, so please keep it there! If you're
 interested to catch up about the actual bug symptoms, please read the
 bug report linked above, and (optionally) reproduce the problem
 yourself!


 That discussion _was_ the actual discussion of the actual problem. A problem
 that is not particularly theoretical, a problem that is common to
 block-level snapshots, and a discussion that contained the actual
 work-arounds.

 I suggest a re-read. 8-)


The majority of the discussion was about how the kernel should react
UPON mounting a file system when more than one device of the same UUID
exist on the system. While it is a very legit problem worth to discuss
and mitigate, this is not the same situation as how the kernel behaves
when an identical device appears WHILE the file system is being
mounted.

Actually, I would not identify devices by UUIDs when I know that
duplicates could exist due to snapshots, therefore I mount devices by
LVM paths. And when a file system is already mounted with all its
devices, that is a clear situation: all devices are open and locked by
the kernel, any mixup at that point is an error. What is the case with
multiple-device file systems? Supply all their devices with device=
mount options. Just don't identify devices by UUIDs when you know
there could be duplicates. Use UUIDs when you don't use LVM.
Identifying file systems by UUIDs were invented because classic
/dev/sdXX device names might change. But LVM names don't change. They
only change when you intentionally change them e.g. with lvrename.

Since having duplicate UUIDs on devices is not a problem for me since
I can tell them apart by LVM names, the discussion is of little
relevance to my use case. Of course it's interesting and I like to
read it along, it is not about the actual problem at hand.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots

2014-12-01 Thread Robert White

On 12/01/2014 02:10 PM, MegaBrutal wrote:

Since having duplicate UUIDs on devices is not a problem for me since
I can tell them apart by LVM names, the discussion is of little
relevance to my use case. Of course it's interesting and I like to
read it along, it is not about the actual problem at hand.



Which is why you use the device= mount option, which would take LVM 
names and which was repeatedly discussed as solving this very problem.


Once you decide to duplicate the UUIDs with LVM snapshots you take up 
the burden of disambiguating your storage.


Which is part of why re-reading was suggested as this was covered in 
some depth and _is_ _exactly_ about the problem at hand.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots

2014-12-01 Thread MegaBrutal
2014-12-02 0:24 GMT+01:00 Robert White rwh...@pobox.com:
 On 12/01/2014 02:10 PM, MegaBrutal wrote:

 Since having duplicate UUIDs on devices is not a problem for me since
 I can tell them apart by LVM names, the discussion is of little
 relevance to my use case. Of course it's interesting and I like to
 read it along, it is not about the actual problem at hand.


 Which is why you use the device= mount option, which would take LVM names
 and which was repeatedly discussed as solving this very problem.

 Once you decide to duplicate the UUIDs with LVM snapshots you take up the
 burden of disambiguating your storage.

 Which is part of why re-reading was suggested as this was covered in some
 depth and _is_ _exactly_ about the problem at hand.

Nope.

root@reproduce-1391429:~# cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-3.18.0-031800rc5-generic
root=/dev/mapper/vg-rootlv ro
rootflags=device=/dev/mapper/vg-rootlv,subvol=@

Observe, device= mount option is added.


root@reproduce-1391429:~# ./reproduce-1391429.sh
#!/bin/sh -v
lvs
  LV VG   Attr  LSize   Pool Origin Data%  Move Log Copy%  Convert
  rootlv vg   -wi-ao---   1.00g
  swap0  vg   -wi-ao--- 256.00m

grub-probe --target=device /
/dev/mapper/vg-rootlv

grep  /  /proc/mounts
rootfs / rootfs rw 0 0
/dev/dm-1 / btrfs rw,relatime,space_cache 0 0

lvcreate --snapshot --size=128M --name z vg/rootlv
  Logical volume z created

lvs
  LV VG   Attr  LSize   Pool Origin Data%  Move Log Copy%  Convert
  rootlv vg   owi-aos--   1.00g
  swap0  vg   -wi-ao--- 256.00m
  z  vg   swi-a-s-- 128.00m  rootlv   0.11

ls -l /dev/vg/
total 0
lrwxrwxrwx 1 root root 7 Dec  2 00:12 rootlv - ../dm-1
lrwxrwxrwx 1 root root 7 Dec  2 00:12 swap0 - ../dm-0
lrwxrwxrwx 1 root root 7 Dec  2 00:12 z - ../dm-2

grub-probe --target=device /
/dev/mapper/vg-z

grep  /  /proc/mounts
rootfs / rootfs rw 0 0
/dev/dm-2 / btrfs rw,relatime,space_cache 0 0

lvremove --force vg/z
  Logical volume z successfully removed

grub-probe --target=device /
/dev/mapper/vg-rootlv

grep  /  /proc/mounts
rootfs / rootfs rw 0 0
/dev/dm-1 / btrfs rw,relatime,space_cache 0 0


Problem still reproduces.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots

2014-12-01 Thread MegaBrutal
2014-12-01 22:45 GMT+01:00 Konstantin newsbox1...@web.de:

 MegaBrutal schrieb am 01.12.2014 um 13:56:
 Hi all,

 I've reported the bug I've previously posted about in BTRFS messes up
 snapshot LV with origin in the Kernel Bug Tracker.
 https://bugzilla.kernel.org/show_bug.cgi?id=89121
 Hi MegaBrutal. If I understand your report correctly, I can give you
 another example where this bug is appearing. It is so bad that it leads
 to freezing the system and I'm quite sure it's the same thing. I was
 thinking about filing a bug but didn't have the time for that yet. Maybe
 you could add this case to your bug report as well.

 The bug appears also when using mdadm RAID1 - when one of the drives is
 detached from the array then the OS discovers it and after a while (not
 directly, it takes several minutes) it appears under /proc/mounts:
 instead of /dev/md0p1 I see there /dev/sdb1. And usually after some hour
 or so (depending on system workload) the PC completely freezes. So
 discussion about the uniqueness of UUIDs or not, a crashing kernel is
 telling me that there is a serious bug.


Hmm, I also suspect our symptoms have the same root cause. It seems
the same thing happens: the BTRFS module notices another device with
the same file system and starts to report it as the root device. It
seems like it has no idea that it's part of a RAID configuration or
anything.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots

2014-12-01 Thread Goffredo Baroncelli
On 12/02/2014 01:15 AM, MegaBrutal wrote:
 2014-12-02 0:24 GMT+01:00 Robert White rwh...@pobox.com:
 On 12/01/2014 02:10 PM, MegaBrutal wrote:

 Since having duplicate UUIDs on devices is not a problem for me since
 I can tell them apart by LVM names, the discussion is of little
 relevance to my use case. Of course it's interesting and I like to
 read it along, it is not about the actual problem at hand.


 Which is why you use the device= mount option, which would take LVM names
 and which was repeatedly discussed as solving this very problem.

 Once you decide to duplicate the UUIDs with LVM snapshots you take up the
 burden of disambiguating your storage.

 Which is part of why re-reading was suggested as this was covered in some
 depth and _is_ _exactly_ about the problem at hand.
 
 Nope.
 
 root@reproduce-1391429:~# cat /proc/cmdline
 BOOT_IMAGE=/vmlinuz-3.18.0-031800rc5-generic
 root=/dev/mapper/vg-rootlv ro
 rootflags=device=/dev/mapper/vg-rootlv,subvol=@
 
 Observe, device= mount option is added.

device= options is needed only in a btrfs multi-volume scenario.
If you have only one disk, this is not needed

 
 
 root@reproduce-1391429:~# ./reproduce-1391429.sh
 #!/bin/sh -v
 lvs
   LV VG   Attr  LSize   Pool Origin Data%  Move Log Copy%  Convert
   rootlv vg   -wi-ao---   1.00g
   swap0  vg   -wi-ao--- 256.00m
 
 grub-probe --target=device /
 /dev/mapper/vg-rootlv
 
 grep  /  /proc/mounts
 rootfs / rootfs rw 0 0
 /dev/dm-1 / btrfs rw,relatime,space_cache 0 0
 
 lvcreate --snapshot --size=128M --name z vg/rootlv
   Logical volume z created
 
 lvs
   LV VG   Attr  LSize   Pool Origin Data%  Move Log Copy%  Convert
   rootlv vg   owi-aos--   1.00g
   swap0  vg   -wi-ao--- 256.00m
   z  vg   swi-a-s-- 128.00m  rootlv   0.11
 
 ls -l /dev/vg/
 total 0
 lrwxrwxrwx 1 root root 7 Dec  2 00:12 rootlv - ../dm-1
 lrwxrwxrwx 1 root root 7 Dec  2 00:12 swap0 - ../dm-0
 lrwxrwxrwx 1 root root 7 Dec  2 00:12 z - ../dm-2
 
 grub-probe --target=device /
 /dev/mapper/vg-z

 grep  /  /proc/mounts
 rootfs / rootfs rw 0 0
 /dev/dm-2 / btrfs rw,relatime,space_cache 0 0

What /proc/self/mountinfo contains ?

And more important question: it is only the value
returned by /proc/mount wrongly or also the filesystem
content is affected ?

 
 lvremove --force vg/z
   Logical volume z successfully removed
 
 grub-probe --target=device /
 /dev/mapper/vg-rootlv
 
 grep  /  /proc/mounts
 rootfs / rootfs rw 0 0
 /dev/dm-1 / btrfs rw,relatime,space_cache 0 0
 
 
 Problem still reproduces.
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli kreijackATinwind.it
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html