Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 12/9/2014 10:10 PM, Anand Jain wrote: In the test case provided earlier who is triggering the scan ? grub-probe ? The scan is initiated by udev. grub-probe only comes into it because it is looking to /proc/mounts to find out what device is mounted, and /proc/mounts is lieing. But we had to revert, Since btrfs bug become a feature for the system boot process and fixing that breaks mount at boot with subvol. How is this? Also are we talking about updating the cached list of devices that *can* be mounted, or what device already *is* mounted? I can see doing the former, but the latter should never happen. if the device is already mounted, just the device path is updated but still the original device will be still in use (bug). Yep, that is the bug that started all of this. -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.17 (MingW32) iQEcBAEBAgAGBQJUiG1MAAoJENRVrw2cjl5Rm0gIAJ6sq72zKSEfCuCjigknx25T a97wjtMeb+yeaECc5FfwN7Fm454GSSuj6RFCRVjo3sCgJP3sUEH49syJnvW1QiEP A5ktXfTpz6/zaeP9DbGPDCiVix0RdsJ6bCjP/8InsASueXOENCpxxmblxrbE4Wxj Mdz8lu9L8G+fc6btbLLb0N4i0clSiImQds90zTQ1cXihJ/4wUIO3qgq+rruSYMqI A182FS7NTUQrRcJ/rbcha3dCyD/urbCaRTUztMvTnSs3a7hK5p+SBNbfxEORC6ni HrRMxpOlgHOTMnL3EHw843OuGv0Us3VqVbuPG3K6L4+G4W1sFxgKEAnLvEbjzAI= =Vpre -END PGP SIGNATURE- -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 12/8/2014 5:25 PM, Konstantin wrote: Phillip Susi schrieb am 08.12.2014 um 15:59: The bios does not know or care about partitions. All you need is a That's only true for older BIOSs. With current EFI boards they not only care but some also mess around with GPT partition tables. EFI is a whole other beast that we aren't talking about. partition table in the MBR and you can install grub there and have it boot the system from a mdadm 1.1 or 1.2 format array housed in a partition on the rest of the disk. The only time you really *have* to I was thinking of this solution as well but as I'm not aware of any partitioning tool caring about mdadm metadata so I rejected it. It requires a non-standard layout leaving reserved empty spaces for mdadm metadata. It's possible but it isn't documented so far I know and before losing hours of trying I chose the obvious one. What on earth are you talking about? Partitioning tool that cares about mdadm? non-standard layout? I am talking about the bog standard layout where you create a partition, then use that partition to build an mdadm array. mdadm takes care of its own metadata. There isn't anything unusual, non obvious, or undocumented here. use 0.9 or 1.0 ( and you really should be using 1.0 instead since it handles larger arrays and can't be confused vis. whole disk vs. partition components ) is if you are running a raid1 on the raw disk, with no partition table and then partition inside the array instead, and really, you just shouldn't be doing that. That's exactly what I want to do - running RAID1 on the whole disk as most hardware based RAID systems do. Before that I was running RAID on disk partitions for some years but this was quite a pain in comparison. Hot(un)plugging a drive brings you a lot of issues with failing mdadm commands as they don't like concurrent execution when the same physical device is affected. And rebuild of RAID partitions is done sequentially with no deterministic order. We could talk for hours about that but if interested maybe better in private as it is not BTRFS related. So don't create more than one raid partition on the disk. dmraid solves the problem by removing the partitions from the underlying physical device ( /dev/sda ), and only exposing them on the array ( /dev/mapper/whatever ). LVM only has the problem when you take a snapshot. User space tools face the same issue and they resolve it by ignoring or deprioritizing the snapshot. I don't agree. dmraid and mdraid both remove the partitions. This is not a solution BTRFS will still crash the PC using /dev/mapper/whatever or whatever device appears in the system providing the BTRFS volume. You just said btrfs will crash by accessing the *correct* volume after the *incorrect* one has been removed. You aren't making any sense. The problem only arises when the same partition is visible on *both* the raw disk, and the md device. Speaking of BTRFS tools, I am still somehow confused that the problem confusing or mixing devices happens at all. I don't know the metadata of a BTRFS RAID setup but I assume there must be something like a drive index in there, as the order of RAID5 drives does matter. So having a second device with identical metadata should be considered invalid for auto-adding anyway. Again, the problem is when you first boot up and/or mount the volume. Which of the duplicate devices shows up first is indeterminate so just saying ignore the second one doesn't help. Even saying well error out if there are two doesn't help since that leaves open a race condition where the second volume has not appeared yet at the time you do the check. -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.17 (MingW32) iQEcBAEBAgAGBQJUhx16AAoJENRVrw2cjl5R+IYH/R+ftOiy444+W/K+C0cFKBdi RlMa2Op9Q0322Rae1IiJvkX/TPUQEnr7sFXcOIhYL9/HKB8zGMr+CQq+9rq8lGdB QurLcI0MpWbwZZCJCTzrJxRBqqPOXKJ1aU9vWLuuGhS9tCdkfxfy9qcXPnmC2Qta PfN1Qlr4Invb3Kb/NuB2w7S4nhzYLgBa1KgBDm3EWdCzG03WHMAxwSiBgMvf3nzc DJ/JMF5TP70760yrlWCvFIa1fgWbGVp7fT9yArDab8N53FYAuE8WIunn+g1hHyue MTF5ZPhEjVKUVHY1Tl1dqdv0i35TXCbXiVwCwk02veV2+lf95zeNcynmB9kUiSc= =gvB2 -END PGP SIGNATURE- -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
On 08/12/2014 22:59, Phillip Susi wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 12/7/2014 7:32 PM, Konstantin wrote: I'm guessing you are using metadata format 0.9 or 1.0, which put the metadata at the end of the drive and the filesystem still starts in sector zero. 1.2 is now the default and would not have this problem as its metadata is at the start of the disk ( well, 4k from the start ) and the fs starts further down. I know this and I'm using 0.9 on purpose. I need to boot from these disks so I can't use 1.2 format as the BIOS wouldn't recognize the partitions. Having an additional non-RAID disk for booting introduces a single point of failure which contrary to the idea of RAID0. The bios does not know or care about partitions. All you need is a partition table in the MBR and you can install grub there and have it boot the system from a mdadm 1.1 or 1.2 format array housed in a partition on the rest of the disk. The only time you really *have* to use 0.9 or 1.0 ( and you really should be using 1.0 instead since it handles larger arrays and can't be confused vis. whole disk vs. partition components ) is if you are running a raid1 on the raw disk, with no partition table and then partition inside the array instead, and really, you just shouldn't be doing that. Anyway, to avoid a futile discussion, mdraid and its format is not the problem, it is just an example of the problem. Using dm-raid would do the same trouble, LVM apparently, too. I could think of a bunch of other cases including the use of hardware based RAID controllers. OK, it's not the majority's problem, but that's not the argument to keep a bug/flaw capable of crashing your system. dmraid solves the problem by removing the partitions from the underlying physical device ( /dev/sda ), and only exposing them on the array ( /dev/mapper/whatever ). LVM only has the problem when you take a snapshot. User space tools face the same issue and they resolve it by ignoring or deprioritizing the snapshot. As it is a nice feature that the kernel apparently scans for drives and automatically identifies BTRFS ones, it seems to me that this feature is useless. When in a live system a BTRFS RAID disk fails, it is not sufficient to hot-replace it, the kernel will not automatically rebalance. Commands are still needed for the task as are with mdraid. So the only point I can see at the moment where this auto-detect feature makes sense is when mounting the device for the first time. If I remember the documentation correctly, you mount one of the RAID devices and the others are automagically attached as well. But outside of the mount process, what is this auto-detect used for? So here a couple of rather simple solutions which, as far as I can see, could solve the problem: 1. Limit the auto-detect to the mount process and don't do it when devices are appearing. In the test case provided earlier who is triggering the scan ? grub-probe ? 2. When a BTRFS device is detected and its metadata is identical to one already mounted, just ignore it. Seems like patch: commit b96de000bc8bc9688b3a2abea4332bd57648a49f Author: Anand Jain anand.j...@oracle.com Date: Thu Jul 3 18:22:05 2014 +0800 Btrfs: device_list_add() should not update list when mounted But we had to revert, Since btrfs bug become a feature for the system boot process and fixing that breaks mount at boot with subvol. commit 0f23ae74f589304bf33233f85737f4fd368549eb Author: Chris Mason c...@fb.com Date: Thu Sep 18 07:49:05 2014 -0700 Revert Btrfs: device_list_add() should not update list when mounted This reverts commit b96de000bc8bc9688b3a2abea4332bd57648a49f. That doesn't really solve the problem since you can still pick the wrong one to mount in the first place. The question is does both device has same generation number ? if not then this fix will take care of picking the device with larger generation number it during mount. commit 77bdae4d136e167bab028cbec58b988f91cf73c0 Author: Anand Jain anand.j...@oracle.com Date: Thu Jul 3 18:22:06 2014 +0800 btrfs: check generation as replace duplicates devid+uuid Yes if there are two devices with the same fsid + devid + uuid + generation then it use last probed during mount. OR if the device is already mounted, just the device path is updated but still the original device will be still in use (bug). Thanks -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.17 (MingW32) iQEcBAEBAgAGBQJUhbztAAoJENRVrw2cjl5RomkH/26Q3M6LXVaF0qEcEzFTzGEL uVAOKBY040Ui5bSK0WQYnH0XtE8vlpLSFHxrRa7Ygpr3jhffSsu6ZsmbOclK64ZA Z8rNEmRFhOxtFYTcQwcUbeBtXEN3k/5H49JxbjUDItnVPBoeK3n7XG4i1Lap5IdY GXyLbh7ogqd/p+wX6Om20NkJSx4xzyU85E4ZvDADQA+2RIBaXva5tDPx5/UD4XBQ h8ai+wS1iC8EySKxwKBEwzwb7+Z6w7nOWO93v/lL34fwTg0OIY9uEfTaAy5KcDjz z6QXWTmvrbiFpyy/qyGSqBGlPjZ+r98mVEDbYWCVfK8AoD6UmteD7R8WAWkWiWY= =PJww -END PGP SIGNATURE- -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the
Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 12/7/2014 7:32 PM, Konstantin wrote: I'm guessing you are using metadata format 0.9 or 1.0, which put the metadata at the end of the drive and the filesystem still starts in sector zero. 1.2 is now the default and would not have this problem as its metadata is at the start of the disk ( well, 4k from the start ) and the fs starts further down. I know this and I'm using 0.9 on purpose. I need to boot from these disks so I can't use 1.2 format as the BIOS wouldn't recognize the partitions. Having an additional non-RAID disk for booting introduces a single point of failure which contrary to the idea of RAID0. The bios does not know or care about partitions. All you need is a partition table in the MBR and you can install grub there and have it boot the system from a mdadm 1.1 or 1.2 format array housed in a partition on the rest of the disk. The only time you really *have* to use 0.9 or 1.0 ( and you really should be using 1.0 instead since it handles larger arrays and can't be confused vis. whole disk vs. partition components ) is if you are running a raid1 on the raw disk, with no partition table and then partition inside the array instead, and really, you just shouldn't be doing that. Anyway, to avoid a futile discussion, mdraid and its format is not the problem, it is just an example of the problem. Using dm-raid would do the same trouble, LVM apparently, too. I could think of a bunch of other cases including the use of hardware based RAID controllers. OK, it's not the majority's problem, but that's not the argument to keep a bug/flaw capable of crashing your system. dmraid solves the problem by removing the partitions from the underlying physical device ( /dev/sda ), and only exposing them on the array ( /dev/mapper/whatever ). LVM only has the problem when you take a snapshot. User space tools face the same issue and they resolve it by ignoring or deprioritizing the snapshot. As it is a nice feature that the kernel apparently scans for drives and automatically identifies BTRFS ones, it seems to me that this feature is useless. When in a live system a BTRFS RAID disk fails, it is not sufficient to hot-replace it, the kernel will not automatically rebalance. Commands are still needed for the task as are with mdraid. So the only point I can see at the moment where this auto-detect feature makes sense is when mounting the device for the first time. If I remember the documentation correctly, you mount one of the RAID devices and the others are automagically attached as well. But outside of the mount process, what is this auto-detect used for? So here a couple of rather simple solutions which, as far as I can see, could solve the problem: 1. Limit the auto-detect to the mount process and don't do it when devices are appearing. 2. When a BTRFS device is detected and its metadata is identical to one already mounted, just ignore it. That doesn't really solve the problem since you can still pick the wrong one to mount in the first place. -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.17 (MingW32) iQEcBAEBAgAGBQJUhbztAAoJENRVrw2cjl5RomkH/26Q3M6LXVaF0qEcEzFTzGEL uVAOKBY040Ui5bSK0WQYnH0XtE8vlpLSFHxrRa7Ygpr3jhffSsu6ZsmbOclK64ZA Z8rNEmRFhOxtFYTcQwcUbeBtXEN3k/5H49JxbjUDItnVPBoeK3n7XG4i1Lap5IdY GXyLbh7ogqd/p+wX6Om20NkJSx4xzyU85E4ZvDADQA+2RIBaXva5tDPx5/UD4XBQ h8ai+wS1iC8EySKxwKBEwzwb7+Z6w7nOWO93v/lL34fwTg0OIY9uEfTaAy5KcDjz z6QXWTmvrbiFpyy/qyGSqBGlPjZ+r98mVEDbYWCVfK8AoD6UmteD7R8WAWkWiWY= =PJww -END PGP SIGNATURE- -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
On 12/07/2014 04:32 PM, Konstantin wrote: I know this and I'm using 0.9 on purpose. I need to boot from these disks so I can't use 1.2 format as the BIOS wouldn't recognize the partitions. Having an additional non-RAID disk for booting introduces a single point of failure which contrary to the idea of RAID0. GRUB2 has raid 1.1 and 1.2 metadata support via the mdraid1x module. LVM is also supported. I don't know if a stack of both is supported. There is, BTW, no such thing as a (commodity) computer without a single point of failure in it somewhere. I've watched government contracts chase this demon for decades. Be it disk, controller, network card, bus chip, cpu or stick-of-ram you've got a single point of failure somewhere. Actually you likely have several such points of potential failure. For instance, are you _sure_ your BIOS is going to check the second drive if it gets read failure after starting in on your first drive? Chances are it won't because that four-hundred bytes-or-so boot loader on that first disk has no way to branch back into the bios. You can waste a lot of your life chasing that ghost and you'll still discover you've missed it and have to whip out your backup boot media. It may well be worth having a second copy of /boot around, but make sure you stay out of bandersnatch territory when designing your system. The more you over-think the plumbing, the easier it is to stop up the pipes. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
Phillip Susi schrieb am 08.12.2014 um 15:59: On 12/7/2014 7:32 PM, Konstantin wrote: I'm guessing you are using metadata format 0.9 or 1.0, which put the metadata at the end of the drive and the filesystem still starts in sector zero. 1.2 is now the default and would not have this problem as its metadata is at the start of the disk ( well, 4k from the start ) and the fs starts further down. I know this and I'm using 0.9 on purpose. I need to boot from these disks so I can't use 1.2 format as the BIOS wouldn't recognize the partitions. Having an additional non-RAID disk for booting introduces a single point of failure which contrary to the idea of RAID0. The bios does not know or care about partitions. All you need is a That's only true for older BIOSs. With current EFI boards they not only care but some also mess around with GPT partition tables. partition table in the MBR and you can install grub there and have it boot the system from a mdadm 1.1 or 1.2 format array housed in a partition on the rest of the disk. The only time you really *have* to I was thinking of this solution as well but as I'm not aware of any partitioning tool caring about mdadm metadata so I rejected it. It requires a non-standard layout leaving reserved empty spaces for mdadm metadata. It's possible but it isn't documented so far I know and before losing hours of trying I chose the obvious one. use 0.9 or 1.0 ( and you really should be using 1.0 instead since it handles larger arrays and can't be confused vis. whole disk vs. partition components ) is if you are running a raid1 on the raw disk, with no partition table and then partition inside the array instead, and really, you just shouldn't be doing that. That's exactly what I want to do - running RAID1 on the whole disk as most hardware based RAID systems do. Before that I was running RAID on disk partitions for some years but this was quite a pain in comparison. Hot(un)plugging a drive brings you a lot of issues with failing mdadm commands as they don't like concurrent execution when the same physical device is affected. And rebuild of RAID partitions is done sequentially with no deterministic order. We could talk for hours about that but if interested maybe better in private as it is not BTRFS related. Anyway, to avoid a futile discussion, mdraid and its format is not the problem, it is just an example of the problem. Using dm-raid would do the same trouble, LVM apparently, too. I could think of a bunch of other cases including the use of hardware based RAID controllers. OK, it's not the majority's problem, but that's not the argument to keep a bug/flaw capable of crashing your system. dmraid solves the problem by removing the partitions from the underlying physical device ( /dev/sda ), and only exposing them on the array ( /dev/mapper/whatever ). LVM only has the problem when you take a snapshot. User space tools face the same issue and they resolve it by ignoring or deprioritizing the snapshot. I don't agree. dmraid and mdraid both remove the partitions. This is not a solution BTRFS will still crash the PC using /dev/mapper/whatever or whatever device appears in the system providing the BTRFS volume. As it is a nice feature that the kernel apparently scans for drives and automatically identifies BTRFS ones, it seems to me that this feature is useless. When in a live system a BTRFS RAID disk fails, it is not sufficient to hot-replace it, the kernel will not automatically rebalance. Commands are still needed for the task as are with mdraid. So the only point I can see at the moment where this auto-detect feature makes sense is when mounting the device for the first time. If I remember the documentation correctly, you mount one of the RAID devices and the others are automagically attached as well. But outside of the mount process, what is this auto-detect used for? So here a couple of rather simple solutions which, as far as I can see, could solve the problem: 1. Limit the auto-detect to the mount process and don't do it when devices are appearing. 2. When a BTRFS device is detected and its metadata is identical to one already mounted, just ignore it. That doesn't really solve the problem since you can still pick the wrong one to mount in the first place. Oh, it does solve the problem, you are are speaking of another problem which is always there when having several disks in a system. Mounting the wrong device can happen the case I'm describing if you use UUID, label or some other metadata related information to mount it. You won't try do that when you insert a disk you know it has the same metadata. It will not happen (except user tools outsmart you ;-)) when using the device name(s). I think it could be expected from a user mounting things manually to know or learn which device node is which drive. On the other hand in my case one of the drives is already mounted so getting it
Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
Robert White schrieb am 08.12.2014 um 18:20: On 12/07/2014 04:32 PM, Konstantin wrote: I know this and I'm using 0.9 on purpose. I need to boot from these disks so I can't use 1.2 format as the BIOS wouldn't recognize the partitions. Having an additional non-RAID disk for booting introduces a single point of failure which contrary to the idea of RAID0. GRUB2 has raid 1.1 and 1.2 metadata support via the mdraid1x module. LVM is also supported. I don't know if a stack of both is supported. There is, BTW, no such thing as a (commodity) computer without a single point of failure in it somewhere. I've watched government contracts chase this demon for decades. Be it disk, controller, network card, bus chip, cpu or stick-of-ram you've got a single point of failure somewhere. Actually you likely have several such points of potential failure. For instance, are you _sure_ your BIOS is going to check the second drive if it gets read failure after starting in on your first drive? Chances are it won't because that four-hundred bytes-or-so boot loader on that first disk has no way to branch back into the bios. You can waste a lot of your life chasing that ghost and you'll still discover you've missed it and have to whip out your backup boot media. It may well be worth having a second copy of /boot around, but make sure you stay out of bandersnatch territory when designing your system. The more you over-think the plumbing, the easier it is to stop up the pipes. You are right, there is as good as always a single point of failure somewhere, even if it is the power plant providing your electricity ;-). I should have written introduces an additional single point of failure to be 100% correct but I thought this was obvious. As I have replaced dozens of damaged hard disks but only a few CPUs, RAMs etc. it is more important for me to reduce the most frequent and easy-to-solve points of failure. For more important systems there are high availability solutions which alleviate many of the problems you mention of but that's not the point here when speaking about the major bug in BTRFS which can make your system crash. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
On 12/08/2014 02:38 PM, Konstantin wrote: For more important systems there are high availability solutions which alleviate many of the problems you mention of but that's not the point here when speaking about the major bug in BTRFS which can make your system crash. I think you missed the part where I told you that you could use GRUB2 and then you could use the 1.2 metadata on your raid and then have you system work as desired. Trying to make this all about BTRFS is more than a touch disingenuous as you are doing things that can make many systems fail in exactly the same way. Undefined behavior is undefined. The MDADM people made the latter metadata layouts to address your issue, and its up to you to use it. Need it to boot, GRUB2 will boot it, and it's up to you to use it. New software fixes problems evident in the old, but once you decide to stick with the old despite the new, your problem becomes uninteresting because it was already fixed. So yes, if you use the woefully out of date metadata and boot loader you will have problems. If you use the distro scripts that scan the volumes you don't want scanned, you will have problems. People are working on making sure that those problems have work arounds. And sometimes the work around for doctor, it hurts when I do this is don't do that any more. It is multiplicatively impossible to build BTRFS such that it can dance through the entire Cartesian Product of all possible storage management solutions. Just as it was impossible for LVM and MDADM before them. If your system is layered, _you_ bear the burden of making sure that the layers are applied. Each tool is evolving to help you, but its still you doing the system design. GRUB has put in modules for everything you need (so far) to boot. mdadm has better signatures if you use them. LVM always had device offsets built into its metadata block. But answering the negative. The sort of question that might be phrased how do you know it's _not_ mdadm old style signatures is an unbounded coding, not because any one is impossible to code, but because an endless stream of possibilities is coming in the pipe. A striped storage controller might make a system look like /dev/sdb is a stand-alone BTRFS file system if the controller doesn't start and the mdadm and lvm signatures are on /dev/sda and take up just the right amount of room. If I do an mkfs.ext2 on a media, then do a cryptsetup luksCreate on that same media, I can mount it either way, with disastrous consequences for the other semantic layout. The bad combinations available are virtually limitless. There comes a point where the System Architect that decided how to build the individual system has to take responsibility for his actions. Note that the same it didn't protect me errors can happen _easily_ with other filesystems. Try building an NTFS on a disk, then build an ext4 on the same disk then mount as either or both. (though now days you may need to build the ext4 then the NTFS since I think mkfs.ext4 may now have a little dedicated wiper to de-NTFS a disk after that went sour a few too many times). When storage signatures conflict you will get exciting outcomes. It will always be that way, and its not an error in any of that filesystem code. You, the System Architect, bear a burden here. The system isn't shooting itself when you do certain things. The System Architect is shooting the system with a bad layout bullet. You don't want some LV to be scanned... don't scan it... If your tools scan it automatically, don't use those tools that way. But my distro automatically is just a reason to look twice at your distro or your design. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
Anand Jain wrote on 02.12.2014 at 12:54: On 02/12/2014 19:14, Goffredo Baroncelli wrote: I further investigate this issue. MegaBrutal, reported the following issue: doing a lvm snapshot of the device of a mounted btrfs fs, the new snapshot device name replaces the name of the original device in the output of /proc/mounts. This confused tools like grub-probe which report a wrong root device. very good test case indeed thanks. Actual IO would still go to the original device, until FS is remounted. This seems to be correct at least at the beginning but I wouldn't be so sure - why else the system is crashing in my case after a while when the second drive is present?! So if the kernel was not using it in some way, except the wrong /proc/mounts nothing else should happen. It has to be pointed out that instead the link under /sys/fs/btrfs/fsid/devices is correct. In this context the above sysfs path will be out of sync with the reality, its just stale sysfs entry. What happens is that *even if the filesystem is mounted*, doing a btrfs dev scan of a snapshot (of the real volume), the device name of the filesystem is replaced with the snapshot one. we have some fundamentally wrong stuff. My original patch tried to fix it. But later discovered that some external entities like systmed and boot process is using that bug as a feature and we had to revert the patch. Fundamentally scsi inquiry serial number is only number which is unique to the device (including the virtual device, but there could be some legacy virtual device which didn't follow that strictly, Anyway those I deem to be device side issue.) Btrfs depends on the combination of fsid, uuid and devid (and generation number) to identify the unique device volume, which is weak and easy to go wrong. Anand, with b96de000b, tried to fix it; however further regression appeared and Chris reverted this commit (see below). BR G.Baroncelli commit b96de000bc8bc9688b3a2abea4332bd57648a49f Author: Anand Jain anand.j...@oracle.com Date: Thu Jul 3 18:22:05 2014 +0800 Btrfs: device_list_add() should not update list when mounted [...] commit 0f23ae74f589304bf33233f85737f4fd368549eb Author: Chris Mason c...@fb.com Date: Thu Sep 18 07:49:05 2014 -0700 Revert Btrfs: device_list_add() should not update list when mounted This reverts commit b96de000bc8bc9688b3a2abea4332bd57648a49f. This commit is triggering failures to mount by subvolume id in some configurations. The main problem is how many different ways this scanning function is used, both for scanning while mounted and unmounted. A proper cleanup is too big for late rcs. [...] On 12/02/2014 09:28 AM, MegaBrutal wrote: 2014-12-02 8:50 GMT+01:00 Goffredo Baroncelli kreij...@inwind.it: On 12/02/2014 01:15 AM, MegaBrutal wrote: 2014-12-02 0:24 GMT+01:00 Robert White rwh...@pobox.com: On 12/01/2014 02:10 PM, MegaBrutal wrote: Since having duplicate UUIDs on devices is not a problem for me since I can tell them apart by LVM names, the discussion is of little relevance to my use case. Of course it's interesting and I like to read it along, it is not about the actual problem at hand. Which is why you use the device= mount option, which would take LVM names and which was repeatedly discussed as solving this very problem. Once you decide to duplicate the UUIDs with LVM snapshots you take up the burden of disambiguating your storage. Which is part of why re-reading was suggested as this was covered in some depth and _is_ _exactly_ about the problem at hand. Nope. root@reproduce-1391429:~# cat /proc/cmdline BOOT_IMAGE=/vmlinuz-3.18.0-031800rc5-generic root=/dev/mapper/vg-rootlv ro rootflags=device=/dev/mapper/vg-rootlv,subvol=@ Observe, device= mount option is added. device= options is needed only in a btrfs multi-volume scenario. If you have only one disk, this is not needed I know. I only did this as a demonstration for Robert. He insisted it will certainly solve the problem. Well, it doesn't. root@reproduce-1391429:~# ./reproduce-1391429.sh #!/bin/sh -v lvs LV VG Attr LSize Pool Origin Data% Move Log Copy% Convert rootlv vg -wi-ao--- 1.00g swap0 vg -wi-ao--- 256.00m grub-probe --target=device / /dev/mapper/vg-rootlv grep / /proc/mounts rootfs / rootfs rw 0 0 /dev/dm-1 / btrfs rw,relatime,space_cache 0 0 lvcreate --snapshot --size=128M --name z vg/rootlv Logical volume z created lvs LV VG Attr LSize Pool Origin Data% Move Log Copy% Convert rootlv vg owi-aos-- 1.00g swap0 vg -wi-ao--- 256.00m z vg swi-a-s-- 128.00m rootlv 0.11 ls -l /dev/vg/ total 0 lrwxrwxrwx 1 root root 7 Dec 2 00:12 rootlv - ../dm-1 lrwxrwxrwx 1 root root 7 Dec 2 00:12 swap0 - ../dm-0 lrwxrwxrwx 1 root root 7 Dec 2 00:12 z - ../dm-2 grub-probe --target=device / /dev/mapper/vg-z grep /
Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
Phillip Susi wrote on 02.12.2014 at 20:19: On 12/1/2014 4:45 PM, Konstantin wrote: The bug appears also when using mdadm RAID1 - when one of the drives is detached from the array then the OS discovers it and after a while (not directly, it takes several minutes) it appears under /proc/mounts: instead of /dev/md0p1 I see there /dev/sdb1. And usually after some hour or so (depending on system workload) the PC completely freezes. So discussion about the uniqueness of UUIDs or not, a crashing kernel is telling me that there is a serious bug. I'm guessing you are using metadata format 0.9 or 1.0, which put the metadata at the end of the drive and the filesystem still starts in sector zero. 1.2 is now the default and would not have this problem as its metadata is at the start of the disk ( well, 4k from the start ) and the fs starts further down. I know this and I'm using 0.9 on purpose. I need to boot from these disks so I can't use 1.2 format as the BIOS wouldn't recognize the partitions. Having an additional non-RAID disk for booting introduces a single point of failure which contrary to the idea of RAID0. Anyway, to avoid a futile discussion, mdraid and its format is not the problem, it is just an example of the problem. Using dm-raid would do the same trouble, LVM apparently, too. I could think of a bunch of other cases including the use of hardware based RAID controllers. OK, it's not the majority's problem, but that's not the argument to keep a bug/flaw capable of crashing your system. As it is a nice feature that the kernel apparently scans for drives and automatically identifies BTRFS ones, it seems to me that this feature is useless. When in a live system a BTRFS RAID disk fails, it is not sufficient to hot-replace it, the kernel will not automatically rebalance. Commands are still needed for the task as are with mdraid. So the only point I can see at the moment where this auto-detect feature makes sense is when mounting the device for the first time. If I remember the documentation correctly, you mount one of the RAID devices and the others are automagically attached as well. But outside of the mount process, what is this auto-detect used for? So here a couple of rather simple solutions which, as far as I can see, could solve the problem: 1. Limit the auto-detect to the mount process and don't do it when devices are appearing. 2. When a BTRFS device is detected and its metadata is identical to one already mounted, just ignore it. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
2014-12-04 6:15 GMT+01:00 Duncan 1i5t5.dun...@cox.net: Which is why I'm running an initramfs for the first time since I've switched to btrfs raid1 mode root, as I quit with initrds back before initramfs was an option. An initramfs appended to the kernel image beats a separate initrd, but I'd still love to see the kernel commandline parsing fixed so it broke at the correct = in rootflags=device= (which seemed to be the problem, the kernel then didn't seem to recognize rootflags at all, as it was apparently seeing it as a parameter called rootflags=device, instead of rootflags), so I could be rid of the initramfs again. Are you sure it isn't fixed? At least, it parses rootflags=subvol=@ well, which also has multiple = signs. And last time I've tried this, and didn't cause any problems: rootflags=device=/dev/mapper/vg-rootlv,subvol=@. Though device= shouldn't have an effect in this case anyway, but I didn't get any complaints against it. Though I use an initrd. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
MegaBrutal posted on Thu, 04 Dec 2014 09:20:12 +0100 as excerpted: Are you sure it isn't fixed? At least, it parses rootflags=subvol=@ well, which also has multiple = signs. And last time I've tried this, and didn't cause any problems: rootflags=device=/dev/mapper/vg-rootlv,subvol=@. Though device= shouldn't have an effect in this case anyway, but I didn't get any complaints against it. Though I use an initrd. AFAIK lvm requires userspace anyway, thus an initr*, and once you have that initr* handling the lvm, it's almost certainly the initr* parsing the rootflags= from the kernel commandline as well. So in that case the kernel doesn't /need/ to be able to parse rootflag=, as all it does is pass the kernel commandline straight thru to the initr*, which would seem, in your case at least, to parse it correctly. -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
On 12/02/2014 08:11 PM, Phillip Susi wrote: On 12/2/2014 7:23 AM, Austin S Hemmelgarn wrote: Stupid thought, why don't we just add blacklisting based on device path like LVM has for pvscan? That isn't logic that belongs in the kernel, so that is going down the path of yanking out the device auto probing from btrfs and instead writing a mount.btrfs helper that can use policies like blacklisting to auto locate all of the correct devices and pass them all to the kernel at mount time. I am thinking about that. Today the device discovery happens: a) when a device appears, two udev rules run btrfs dev scan device /lib/udev/rules.d/70-btrfs.rules /lib/udev/rules.d/80-btrfs-lvm.rules b) during the boot it is ran a btrfs device scan, which scan all the device (this happens in debian for other distros may be different) c) after a btrfs.mkfs, which starts a device scan on each devices of the new filesystem d) by the user Regarding a), the problem is simply solved adding a line like: ENV{DM_UDEV_LOW_PRIORITY_FLAG}==1, GOTO=btrfs_end Regarding c), it is not a problem Regarding b) and d), the only solution that I found is to query the udev DB inside the btrfs dev scan program and to skip the devices with DM_UDEV_LOW_PRIORITY_FLAG==1. But implementing this, it would solve all the points a), b), c), d) with one shot ! BR G.Baroncelli P.S. This is the comment made by LVM by DM_UDEV_LOW_PRIORITY_FLAG: /* * DM_UDEV_LOW_PRIORITY_FLAG is set in case we need to instruct the * udev rules to give low priority to the device that is currently * processed. For example, this provides a way to select which symlinks * could be overwritten by high priority ones if their names are equal. * Common situation is a name based on FS UUID while using origin and * snapshot devices. */ #define DM_UDEV_LOW_PRIORITY_FLAG 0x0010 https://git.fedorahosted.org/cgit/lvm2.git/tree/libdm/libdevmapper.h#n1969 -- gpg @keyserver.linux.it: Goffredo Baroncelli kreijackATinwind.it Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
-BEGIN PGP SIGNED MESSAGE- Hash: SHA512 On 12/03/2014 03:24 AM, Goffredo Baroncelli wrote: I am thinking about that. Today the device discovery happens: a) when a device appears, two udev rules run btrfs dev scan device /lib/udev/rules.d/70-btrfs.rules /lib/udev/rules.d/80-btrfs-lvm.rules b) during the boot it is ran a btrfs device scan, which scan all the device (this happens in debian for other distros may be different) c) after a btrfs.mkfs, which starts a device scan on each devices of the new filesystem d) by the user Are you sure the kernel only gains awareness of btrfs volumes when user space runs btrfs device scan? If that is so then that means you can not boot from a multi device btrfs root without using an initramfs. I thought the kernel auto scanned all devices if you tried to mount a multi device volume, but if this is so, then yes, the udev rules could be fixed to not call btrfs device scan on an lvm snapshot. -BEGIN PGP SIGNATURE- Version: GnuPG v1 iQEcBAEBCgAGBQJUf9BpAAoJENRVrw2cjl5RgcQIALCGfplK/xgX/QaiRjNW96l2 DWNPQMIhPesci0gF7Th3sNboew0hrc3g6S0a55wAO12CBhMPdzHxHjd9iFVpKi9O vzvU36XyzwdcPJkBqRdPJMT2kX+428gYUW7jkyC8usj5eSCyeiIodJuxirGDL5Nb 3TttEJOpbPHGlTzHjAqEcK2ybzYi9HCN3CD3fuLagP9n+4zmFE7tGaGglZ9+7P58 wZjlP5xKDCR4Cu5Hr+5ErrmT2EoOvFC+PLKOT8xXhD9Y2emk2AtuY+5l/w7I+SIS 42gTUqPOx/8AOxBhOhkI0pPO8eK7S/lP1LKoXF0WWHhX8CgJLIHwj5KniDYcjBA= =HI90 -END PGP SIGNATURE- -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
Phillip Susi posted on Wed, 03 Dec 2014 22:09:29 -0500 as excerpted: Are you sure the kernel only gains awareness of btrfs volumes when user space runs btrfs device scan? If that is so then that means you can not boot from a multi device btrfs root without using an initramfs. I thought the kernel auto scanned all devices if you tried to mount a multi device volume, but if this is so, then yes, the udev rules could be fixed to not call btrfs device scan on an lvm snapshot. That has indeed been the case in the past, and to my knowledge remains the case. Unless it has changed in the last cycle or two (and I've not seen patches to that effect on the list nor any hint of such, so I doubt it) the kernel doesn't do any such scanning without userspace telling it to. The device= mount option can be used instead, but it didn't work with rootflags= on the kernel commandline last I tried so for a multidevice btrfs root, yes, an initramfs/initrd is required. Which is why I'm running an initramfs for the first time since I've switched to btrfs raid1 mode root, as I quit with initrds back before initramfs was an option. An initramfs appended to the kernel image beats a separate initrd, but I'd still love to see the kernel commandline parsing fixed so it broke at the correct = in rootflags=device= (which seemed to be the problem, the kernel then didn't seem to recognize rootflags at all, as it was apparently seeing it as a parameter called rootflags=device, instead of rootflags), so I could be rid of the initramfs again. FWIW, I'm using dracut to generate the cpio archive, which with the right kernel config options set, the kernel build process then appends to the kernel. Dracut btrfs module enabled of course, most of the rest force- disabled as I run a monolithic kernel so don't need module loading, etc. -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
2014-12-02 8:50 GMT+01:00 Goffredo Baroncelli kreij...@inwind.it: On 12/02/2014 01:15 AM, MegaBrutal wrote: 2014-12-02 0:24 GMT+01:00 Robert White rwh...@pobox.com: On 12/01/2014 02:10 PM, MegaBrutal wrote: Since having duplicate UUIDs on devices is not a problem for me since I can tell them apart by LVM names, the discussion is of little relevance to my use case. Of course it's interesting and I like to read it along, it is not about the actual problem at hand. Which is why you use the device= mount option, which would take LVM names and which was repeatedly discussed as solving this very problem. Once you decide to duplicate the UUIDs with LVM snapshots you take up the burden of disambiguating your storage. Which is part of why re-reading was suggested as this was covered in some depth and _is_ _exactly_ about the problem at hand. Nope. root@reproduce-1391429:~# cat /proc/cmdline BOOT_IMAGE=/vmlinuz-3.18.0-031800rc5-generic root=/dev/mapper/vg-rootlv ro rootflags=device=/dev/mapper/vg-rootlv,subvol=@ Observe, device= mount option is added. device= options is needed only in a btrfs multi-volume scenario. If you have only one disk, this is not needed I know. I only did this as a demonstration for Robert. He insisted it will certainly solve the problem. Well, it doesn't. root@reproduce-1391429:~# ./reproduce-1391429.sh #!/bin/sh -v lvs LV VG Attr LSize Pool Origin Data% Move Log Copy% Convert rootlv vg -wi-ao--- 1.00g swap0 vg -wi-ao--- 256.00m grub-probe --target=device / /dev/mapper/vg-rootlv grep / /proc/mounts rootfs / rootfs rw 0 0 /dev/dm-1 / btrfs rw,relatime,space_cache 0 0 lvcreate --snapshot --size=128M --name z vg/rootlv Logical volume z created lvs LV VG Attr LSize Pool Origin Data% Move Log Copy% Convert rootlv vg owi-aos-- 1.00g swap0 vg -wi-ao--- 256.00m z vg swi-a-s-- 128.00m rootlv 0.11 ls -l /dev/vg/ total 0 lrwxrwxrwx 1 root root 7 Dec 2 00:12 rootlv - ../dm-1 lrwxrwxrwx 1 root root 7 Dec 2 00:12 swap0 - ../dm-0 lrwxrwxrwx 1 root root 7 Dec 2 00:12 z - ../dm-2 grub-probe --target=device / /dev/mapper/vg-z grep / /proc/mounts rootfs / rootfs rw 0 0 /dev/dm-2 / btrfs rw,relatime,space_cache 0 0 What /proc/self/mountinfo contains ? Before creating snapshot: 15 20 0:15 / /sys rw,nosuid,nodev,noexec,relatime - sysfs sysfs rw 16 20 0:3 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw 17 20 0:5 / /dev rw,relatime - devtmpfs udev rw,size=241692k,nr_inodes=60423,mode=755 18 17 0:12 / /dev/pts rw,nosuid,noexec,relatime - devpts devpts rw,gid=5,mode=620,ptmxmode=000 19 20 0:16 / /run rw,nosuid,noexec,relatime - tmpfs tmpfs rw,size=50084k,mode=755 20 0 0:17 /@ / rw,relatime - btrfs /dev/dm-1 rw,space_cache - THIS! 21 15 0:20 / /sys/fs/cgroup rw,relatime - tmpfs none rw,size=4k,mode=755 22 15 0:21 / /sys/fs/fuse/connections rw,relatime - fusectl none rw 23 15 0:6 / /sys/kernel/debug rw,relatime - debugfs none rw 24 15 0:10 / /sys/kernel/security rw,relatime - securityfs none rw 25 19 0:22 / /run/lock rw,nosuid,nodev,noexec,relatime - tmpfs none rw,size=5120k 26 19 0:23 / /run/shm rw,nosuid,nodev,relatime - tmpfs none rw 27 19 0:24 / /run/user rw,nosuid,nodev,noexec,relatime - tmpfs none rw,size=102400k,mode=755 28 15 0:25 / /sys/fs/pstore rw,relatime - pstore none rw 29 20 253:1 / /boot rw,relatime - ext2 /dev/vda1 rw After creating snapshot: 15 20 0:15 / /sys rw,nosuid,nodev,noexec,relatime - sysfs sysfs rw 16 20 0:3 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw 17 20 0:5 / /dev rw,relatime - devtmpfs udev rw,size=241692k,nr_inodes=60423,mode=755 18 17 0:12 / /dev/pts rw,nosuid,noexec,relatime - devpts devpts rw,gid=5,mode=620,ptmxmode=000 19 20 0:16 / /run rw,nosuid,noexec,relatime - tmpfs tmpfs rw,size=50084k,mode=755 20 0 0:17 /@ / rw,relatime - btrfs /dev/dm-2 rw,space_cache - WTF?! 21 15 0:20 / /sys/fs/cgroup rw,relatime - tmpfs none rw,size=4k,mode=755 22 15 0:21 / /sys/fs/fuse/connections rw,relatime - fusectl none rw 23 15 0:6 / /sys/kernel/debug rw,relatime - debugfs none rw 24 15 0:10 / /sys/kernel/security rw,relatime - securityfs none rw 25 19 0:22 / /run/lock rw,nosuid,nodev,noexec,relatime - tmpfs none rw,size=5120k 26 19 0:23 / /run/shm rw,nosuid,nodev,relatime - tmpfs none rw 27 19 0:24 / /run/user rw,nosuid,nodev,noexec,relatime - tmpfs none rw,size=102400k,mode=755 28 15 0:25 / /sys/fs/pstore rw,relatime - pstore none rw 29 20 253:1 / /boot rw,relatime - ext2 /dev/vda1 rw So it's consistent with what /proc/mounts reports. And more important question: it is only the value returned by /proc/mount wrongly or also the filesystem content is affected ? I quote my bug report on this: The information reported in /proc/mounts is certainly bogus, since still the origin device is being written, the kernel does not actually mix up the devices for write operations, and such, the phenomenon does not cause
Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
I further investigate this issue. MegaBrutal, reported the following issue: doing a lvm snapshot of the device of a mounted btrfs fs, the new snapshot device name replaces the name of the original device in the output of /proc/mounts. This confused tools like grub-probe which report a wrong root device. It has to be pointed out that instead the link under /sys/fs/btrfs/fsid/devices is correct. What happens is that *even if the filesystem is mounted*, doing a btrfs dev scan of a snapshot (of the real volume), the device name of the filesystem is replaced with the snapshot one. Anand, with b96de000b, tried to fix it; however further regression appeared and Chris reverted this commit (see below). BR G.Baroncelli commit b96de000bc8bc9688b3a2abea4332bd57648a49f Author: Anand Jain anand.j...@oracle.com Date: Thu Jul 3 18:22:05 2014 +0800 Btrfs: device_list_add() should not update list when mounted [...] commit 0f23ae74f589304bf33233f85737f4fd368549eb Author: Chris Mason c...@fb.com Date: Thu Sep 18 07:49:05 2014 -0700 Revert Btrfs: device_list_add() should not update list when mounted This reverts commit b96de000bc8bc9688b3a2abea4332bd57648a49f. This commit is triggering failures to mount by subvolume id in some configurations. The main problem is how many different ways this scanning function is used, both for scanning while mounted and unmounted. A proper cleanup is too big for late rcs. [...] On 12/02/2014 09:28 AM, MegaBrutal wrote: 2014-12-02 8:50 GMT+01:00 Goffredo Baroncelli kreij...@inwind.it: On 12/02/2014 01:15 AM, MegaBrutal wrote: 2014-12-02 0:24 GMT+01:00 Robert White rwh...@pobox.com: On 12/01/2014 02:10 PM, MegaBrutal wrote: Since having duplicate UUIDs on devices is not a problem for me since I can tell them apart by LVM names, the discussion is of little relevance to my use case. Of course it's interesting and I like to read it along, it is not about the actual problem at hand. Which is why you use the device= mount option, which would take LVM names and which was repeatedly discussed as solving this very problem. Once you decide to duplicate the UUIDs with LVM snapshots you take up the burden of disambiguating your storage. Which is part of why re-reading was suggested as this was covered in some depth and _is_ _exactly_ about the problem at hand. Nope. root@reproduce-1391429:~# cat /proc/cmdline BOOT_IMAGE=/vmlinuz-3.18.0-031800rc5-generic root=/dev/mapper/vg-rootlv ro rootflags=device=/dev/mapper/vg-rootlv,subvol=@ Observe, device= mount option is added. device= options is needed only in a btrfs multi-volume scenario. If you have only one disk, this is not needed I know. I only did this as a demonstration for Robert. He insisted it will certainly solve the problem. Well, it doesn't. root@reproduce-1391429:~# ./reproduce-1391429.sh #!/bin/sh -v lvs LV VG Attr LSize Pool Origin Data% Move Log Copy% Convert rootlv vg -wi-ao--- 1.00g swap0 vg -wi-ao--- 256.00m grub-probe --target=device / /dev/mapper/vg-rootlv grep / /proc/mounts rootfs / rootfs rw 0 0 /dev/dm-1 / btrfs rw,relatime,space_cache 0 0 lvcreate --snapshot --size=128M --name z vg/rootlv Logical volume z created lvs LV VG Attr LSize Pool Origin Data% Move Log Copy% Convert rootlv vg owi-aos-- 1.00g swap0 vg -wi-ao--- 256.00m z vg swi-a-s-- 128.00m rootlv 0.11 ls -l /dev/vg/ total 0 lrwxrwxrwx 1 root root 7 Dec 2 00:12 rootlv - ../dm-1 lrwxrwxrwx 1 root root 7 Dec 2 00:12 swap0 - ../dm-0 lrwxrwxrwx 1 root root 7 Dec 2 00:12 z - ../dm-2 grub-probe --target=device / /dev/mapper/vg-z grep / /proc/mounts rootfs / rootfs rw 0 0 /dev/dm-2 / btrfs rw,relatime,space_cache 0 0 What /proc/self/mountinfo contains ? Before creating snapshot: 15 20 0:15 / /sys rw,nosuid,nodev,noexec,relatime - sysfs sysfs rw 16 20 0:3 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw 17 20 0:5 / /dev rw,relatime - devtmpfs udev rw,size=241692k,nr_inodes=60423,mode=755 18 17 0:12 / /dev/pts rw,nosuid,noexec,relatime - devpts devpts rw,gid=5,mode=620,ptmxmode=000 19 20 0:16 / /run rw,nosuid,noexec,relatime - tmpfs tmpfs rw,size=50084k,mode=755 20 0 0:17 /@ / rw,relatime - btrfs /dev/dm-1 rw,space_cache - THIS! 21 15 0:20 / /sys/fs/cgroup rw,relatime - tmpfs none rw,size=4k,mode=755 22 15 0:21 / /sys/fs/fuse/connections rw,relatime - fusectl none rw 23 15 0:6 / /sys/kernel/debug rw,relatime - debugfs none rw 24 15 0:10 / /sys/kernel/security rw,relatime - securityfs none rw 25 19 0:22 / /run/lock rw,nosuid,nodev,noexec,relatime - tmpfs none rw,size=5120k 26 19 0:23 / /run/shm rw,nosuid,nodev,relatime - tmpfs none rw 27 19 0:24 / /run/user rw,nosuid,nodev,noexec,relatime - tmpfs none rw,size=102400k,mode=755 28 15 0:25 / /sys/fs/pstore rw,relatime - pstore none rw 29 20 253:1 / /boot rw,relatime - ext2 /dev/vda1 rw
Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
On 02/12/2014 19:14, Goffredo Baroncelli wrote: I further investigate this issue. MegaBrutal, reported the following issue: doing a lvm snapshot of the device of a mounted btrfs fs, the new snapshot device name replaces the name of the original device in the output of /proc/mounts. This confused tools like grub-probe which report a wrong root device. very good test case indeed thanks. Actual IO would still go to the original device, until FS is remounted. It has to be pointed out that instead the link under /sys/fs/btrfs/fsid/devices is correct. In this context the above sysfs path will be out of sync with the reality, its just stale sysfs entry. What happens is that *even if the filesystem is mounted*, doing a btrfs dev scan of a snapshot (of the real volume), the device name of the filesystem is replaced with the snapshot one. we have some fundamentally wrong stuff. My original patch tried to fix it. But later discovered that some external entities like systmed and boot process is using that bug as a feature and we had to revert the patch. Fundamentally scsi inquiry serial number is only number which is unique to the device (including the virtual device, but there could be some legacy virtual device which didn't follow that strictly, Anyway those I deem to be device side issue.) Btrfs depends on the combination of fsid, uuid and devid (and generation number) to identify the unique device volume, which is weak and easy to go wrong. Anand, with b96de000b, tried to fix it; however further regression appeared and Chris reverted this commit (see below). BR G.Baroncelli commit b96de000bc8bc9688b3a2abea4332bd57648a49f Author: Anand Jain anand.j...@oracle.com Date: Thu Jul 3 18:22:05 2014 +0800 Btrfs: device_list_add() should not update list when mounted [...] commit 0f23ae74f589304bf33233f85737f4fd368549eb Author: Chris Mason c...@fb.com Date: Thu Sep 18 07:49:05 2014 -0700 Revert Btrfs: device_list_add() should not update list when mounted This reverts commit b96de000bc8bc9688b3a2abea4332bd57648a49f. This commit is triggering failures to mount by subvolume id in some configurations. The main problem is how many different ways this scanning function is used, both for scanning while mounted and unmounted. A proper cleanup is too big for late rcs. [...] On 12/02/2014 09:28 AM, MegaBrutal wrote: 2014-12-02 8:50 GMT+01:00 Goffredo Baroncelli kreij...@inwind.it: On 12/02/2014 01:15 AM, MegaBrutal wrote: 2014-12-02 0:24 GMT+01:00 Robert White rwh...@pobox.com: On 12/01/2014 02:10 PM, MegaBrutal wrote: Since having duplicate UUIDs on devices is not a problem for me since I can tell them apart by LVM names, the discussion is of little relevance to my use case. Of course it's interesting and I like to read it along, it is not about the actual problem at hand. Which is why you use the device= mount option, which would take LVM names and which was repeatedly discussed as solving this very problem. Once you decide to duplicate the UUIDs with LVM snapshots you take up the burden of disambiguating your storage. Which is part of why re-reading was suggested as this was covered in some depth and _is_ _exactly_ about the problem at hand. Nope. root@reproduce-1391429:~# cat /proc/cmdline BOOT_IMAGE=/vmlinuz-3.18.0-031800rc5-generic root=/dev/mapper/vg-rootlv ro rootflags=device=/dev/mapper/vg-rootlv,subvol=@ Observe, device= mount option is added. device= options is needed only in a btrfs multi-volume scenario. If you have only one disk, this is not needed I know. I only did this as a demonstration for Robert. He insisted it will certainly solve the problem. Well, it doesn't. root@reproduce-1391429:~# ./reproduce-1391429.sh #!/bin/sh -v lvs LV VG Attr LSize Pool Origin Data% Move Log Copy% Convert rootlv vg -wi-ao--- 1.00g swap0 vg -wi-ao--- 256.00m grub-probe --target=device / /dev/mapper/vg-rootlv grep / /proc/mounts rootfs / rootfs rw 0 0 /dev/dm-1 / btrfs rw,relatime,space_cache 0 0 lvcreate --snapshot --size=128M --name z vg/rootlv Logical volume z created lvs LV VG Attr LSize Pool Origin Data% Move Log Copy% Convert rootlv vg owi-aos-- 1.00g swap0 vg -wi-ao--- 256.00m z vg swi-a-s-- 128.00m rootlv 0.11 ls -l /dev/vg/ total 0 lrwxrwxrwx 1 root root 7 Dec 2 00:12 rootlv - ../dm-1 lrwxrwxrwx 1 root root 7 Dec 2 00:12 swap0 - ../dm-0 lrwxrwxrwx 1 root root 7 Dec 2 00:12 z - ../dm-2 grub-probe --target=device / /dev/mapper/vg-z grep / /proc/mounts rootfs / rootfs rw 0 0 /dev/dm-2 / btrfs rw,relatime,space_cache 0 0 What /proc/self/mountinfo contains ? Before creating snapshot: 15 20 0:15 / /sys rw,nosuid,nodev,noexec,relatime - sysfs sysfs rw 16 20 0:3 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw 17 20 0:5 / /dev rw,relatime - devtmpfs udev rw,size=241692k,nr_inodes=60423,mode=755 18 17 0:12 / /dev/pts
Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
On 2014-12-02 06:54, Anand Jain wrote: On 02/12/2014 19:14, Goffredo Baroncelli wrote: I further investigate this issue. MegaBrutal, reported the following issue: doing a lvm snapshot of the device of a mounted btrfs fs, the new snapshot device name replaces the name of the original device in the output of /proc/mounts. This confused tools like grub-probe which report a wrong root device. very good test case indeed thanks. Actual IO would still go to the original device, until FS is remounted. It has to be pointed out that instead the link under /sys/fs/btrfs/fsid/devices is correct. In this context the above sysfs path will be out of sync with the reality, its just stale sysfs entry. What happens is that *even if the filesystem is mounted*, doing a btrfs dev scan of a snapshot (of the real volume), the device name of the filesystem is replaced with the snapshot one. we have some fundamentally wrong stuff. My original patch tried to fix it. But later discovered that some external entities like systmed and boot process is using that bug as a feature and we had to revert the patch. Fundamentally scsi inquiry serial number is only number which is unique to the device (including the virtual device, but there could be some legacy virtual device which didn't follow that strictly, Anyway those I deem to be device side issue.) Btrfs depends on the combination of fsid, uuid and devid (and generation number) to identify the unique device volume, which is weak and easy to go wrong. Anand, with b96de000b, tried to fix it; however further regression appeared and Chris reverted this commit (see below). BR G.Baroncelli commit b96de000bc8bc9688b3a2abea4332bd57648a49f Author: Anand Jain anand.j...@oracle.com Date: Thu Jul 3 18:22:05 2014 +0800 Btrfs: device_list_add() should not update list when mounted [...] commit 0f23ae74f589304bf33233f85737f4fd368549eb Author: Chris Mason c...@fb.com Date: Thu Sep 18 07:49:05 2014 -0700 Revert Btrfs: device_list_add() should not update list when mounted This reverts commit b96de000bc8bc9688b3a2abea4332bd57648a49f. This commit is triggering failures to mount by subvolume id in some configurations. The main problem is how many different ways this scanning function is used, both for scanning while mounted and unmounted. A proper cleanup is too big for late rcs. [...] On 12/02/2014 09:28 AM, MegaBrutal wrote: 2014-12-02 8:50 GMT+01:00 Goffredo Baroncelli kreij...@inwind.it: On 12/02/2014 01:15 AM, MegaBrutal wrote: 2014-12-02 0:24 GMT+01:00 Robert White rwh...@pobox.com: On 12/01/2014 02:10 PM, MegaBrutal wrote: Since having duplicate UUIDs on devices is not a problem for me since I can tell them apart by LVM names, the discussion is of little relevance to my use case. Of course it's interesting and I like to read it along, it is not about the actual problem at hand. Which is why you use the device= mount option, which would take LVM names and which was repeatedly discussed as solving this very problem. Once you decide to duplicate the UUIDs with LVM snapshots you take up the burden of disambiguating your storage. Which is part of why re-reading was suggested as this was covered in some depth and _is_ _exactly_ about the problem at hand. Nope. root@reproduce-1391429:~# cat /proc/cmdline BOOT_IMAGE=/vmlinuz-3.18.0-031800rc5-generic root=/dev/mapper/vg-rootlv ro rootflags=device=/dev/mapper/vg-rootlv,subvol=@ Observe, device= mount option is added. device= options is needed only in a btrfs multi-volume scenario. If you have only one disk, this is not needed I know. I only did this as a demonstration for Robert. He insisted it will certainly solve the problem. Well, it doesn't. root@reproduce-1391429:~# ./reproduce-1391429.sh #!/bin/sh -v lvs LV VG Attr LSize Pool Origin Data% Move Log Copy% Convert rootlv vg -wi-ao--- 1.00g swap0 vg -wi-ao--- 256.00m grub-probe --target=device / /dev/mapper/vg-rootlv grep / /proc/mounts rootfs / rootfs rw 0 0 /dev/dm-1 / btrfs rw,relatime,space_cache 0 0 lvcreate --snapshot --size=128M --name z vg/rootlv Logical volume z created lvs LV VG Attr LSize Pool Origin Data% Move Log Copy% Convert rootlv vg owi-aos-- 1.00g swap0 vg -wi-ao--- 256.00m z vg swi-a-s-- 128.00m rootlv 0.11 ls -l /dev/vg/ total 0 lrwxrwxrwx 1 root root 7 Dec 2 00:12 rootlv - ../dm-1 lrwxrwxrwx 1 root root 7 Dec 2 00:12 swap0 - ../dm-0 lrwxrwxrwx 1 root root 7 Dec 2 00:12 z - ../dm-2 grub-probe --target=device / /dev/mapper/vg-z grep / /proc/mounts rootfs / rootfs rw 0 0 /dev/dm-2 / btrfs rw,relatime,space_cache 0 0 What /proc/self/mountinfo contains ? Before creating snapshot: 15 20 0:15 / /sys rw,nosuid,nodev,noexec,relatime - sysfs sysfs rw 16 20 0:3 / /proc rw,nosuid,nodev,noexec,relatime - proc proc rw 17 20 0:5 / /dev rw,relatime - devtmpfs udev rw,size=241692k,nr_inodes=60423,mode=755 18
Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 12/2/2014 7:23 AM, Austin S Hemmelgarn wrote: Stupid thought, why don't we just add blacklisting based on device path like LVM has for pvscan? That isn't logic that belongs in the kernel, so that is going down the path of yanking out the device auto probing from btrfs and instead writing a mount.btrfs helper that can use policies like blacklisting to auto locate all of the correct devices and pass them all to the kernel at mount time. -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.17 (MingW32) iQEcBAEBAgAGBQJUfg7lAAoJENRVrw2cjl5RAakIAKLsIKgjzUO8J/PBBDTmcCQh IvkEMlQ6ME+Zi7xCKM9p+J5Skcu22zj8w2Ip0s/zNo3ydGorajxehUqtU983l5Hd VklKOuNGZ0wrOtwCH8IkRt9HUvT3I7982jByi2Uk9jxpRbL/BruaJ4NF+Z9HnvHO cmMNavcKvwOkYpPHPPbeyjNwWALe/WRZZ2cgsKqs/vB2nakxFntUc1UOsnIMfLJ7 dMF0l9GudoIoNaqRUNoxV1/Lh9MxKx0p9mBK6Pc+V+wLulUyOUSQ6OkUTsznCabk iUyzX9IYiF83hWO3g+1vxR+GCeYNVGvC/Rj8ZkLSt9Tpi7JH0kbXnq6wKedSfE0= =Lxfb -END PGP SIGNATURE- -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 12/2/2014 6:54 AM, Anand Jain wrote: we have some fundamentally wrong stuff. My original patch tried to fix it. But later discovered that some external entities like systmed and boot process is using that bug as a feature and we had to revert the patch. If systemd is depending on the kernel lieing about what device it has mounted then something is *extremely* broken there and that should be fixed instead of breaking the kernel. -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.17 (MingW32) iQEcBAEBAgAGBQJUfg+BAAoJENRVrw2cjl5REm8H/j2MEbF2yeTsGtOGhszl82rZ ngSvVfEEPq1D+tpi28+oZnSLYxIKEGudqTciyeb8Z1jCTD065D/T0xpGJZyd6pUG KGahBpnPvhP5xg4RaoSxSzNcFzPPFfz+EIPyV+l3OlHbyeq0whkKj5OAq15Grz6c RDWViqRFRE+dC2k70fAt6mlxWs7ChCVs9fPuuWVTFW+lXBoCKUZhnZ5Kc2orsKx6 rVTNTo6LxZQX7m+9WzIy5lqH+WgqxtfEacAlM/6jXWwPe09DDT3z0s3ogf+dfO0D 3/efDv1XJ/LwmbyQrGxiS0LQWoPA+d+MX0Od3XRcaeml3d7k/tZjDsrFOY6anIg= =Rxh6 -END PGP SIGNATURE- -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 12/1/2014 4:45 PM, Konstantin wrote: The bug appears also when using mdadm RAID1 - when one of the drives is detached from the array then the OS discovers it and after a while (not directly, it takes several minutes) it appears under /proc/mounts: instead of /dev/md0p1 I see there /dev/sdb1. And usually after some hour or so (depending on system workload) the PC completely freezes. So discussion about the uniqueness of UUIDs or not, a crashing kernel is telling me that there is a serious bug. I'm guessing you are using metadata format 0.9 or 1.0, which put the metadata at the end of the drive and the filesystem still starts in sector zero. 1.2 is now the default and would not have this problem as its metadata is at the start of the disk ( well, 4k from the start ) and the fs starts further down. -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.17 (MingW32) iQEcBAEBAgAGBQJUfhC6AAoJENRVrw2cjl5RQ2EH/0Z0iCFjOs3e5oGuGqT5Wtlc rXV8R1EfGSxESK0g6QAe7QIvJu+0CdIgccDp8z3ezfPcm1/YRfBXxXA/Y1Wl4hqw 0wuk3bNqMjUmNwIFjEZCkgOSn4Whuppbh3hOOVGNropr4cwd84GP1Cr2vrzwYnkm If1I3RTaBhAJRSngkP9X+L5J6zBBjaZLlF4AjC/WP/1bd5vkHpGqnFpRTquCPiNV 9LFWQIB+xYdoRdK2l7huS2jQ5kfw+qLZUQO17dU3fcicwwNk56V4HcLEPg9nx9es pxJo9BAWmQXDpeMcCL4eFECoeAhn0IXoaXb363mmpq11qyYj73r3FzhNQ+ALzPY= =U65Z -END PGP SIGNATURE- -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
Hi all, I've reported the bug I've previously posted about in BTRFS messes up snapshot LV with origin in the Kernel Bug Tracker. https://bugzilla.kernel.org/show_bug.cgi?id=89121 Since the other thread went off into theoretical debates about UUIDs and their generic relation to BTRFS, their everyday use cases, and the philosophical meaning behind uniqueness of copies and UUIDs; I'd like to specifically ask you to only post here about the ACTUAL problem at hand. Don't get me wrong, I find the discussion in the other thread really interesting, I'm following it, but it is only very remotely related to the original issue, so please keep it there! If you're interested to catch up about the actual bug symptoms, please read the bug report linked above, and (optionally) reproduce the problem yourself! A virtual machine image on which I've already reproduced the conditions can be downloaded here: http://undead.megabrutal.com/kvm-reproduce-1391429.img.xz (Download size: 113 MB; Unpacked image size: 2 GB.) Re-tested with mainline kernel 3.18.0-rc7 just today. Regards, MegaBrutal -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
On 12/01/2014 04:56 AM, MegaBrutal wrote: Since the other thread went off into theoretical debates about UUIDs and their generic relation to BTRFS, their everyday use cases, and the philosophical meaning behind uniqueness of copies and UUIDs; I'd like to specifically ask you to only post here about the ACTUAL problem at hand. Don't get me wrong, I find the discussion in the other thread really interesting, I'm following it, but it is only very remotely related to the original issue, so please keep it there! If you're interested to catch up about the actual bug symptoms, please read the bug report linked above, and (optionally) reproduce the problem yourself! That discussion _was_ the actual discussion of the actual problem. A problem that is not particularly theoretical, a problem that is common to block-level snapshots, and a discussion that contained the actual work-arounds. I suggest a re-read. 8-) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
MegaBrutal schrieb am 01.12.2014 um 13:56: Hi all, I've reported the bug I've previously posted about in BTRFS messes up snapshot LV with origin in the Kernel Bug Tracker. https://bugzilla.kernel.org/show_bug.cgi?id=89121 Hi MegaBrutal. If I understand your report correctly, I can give you another example where this bug is appearing. It is so bad that it leads to freezing the system and I'm quite sure it's the same thing. I was thinking about filing a bug but didn't have the time for that yet. Maybe you could add this case to your bug report as well. The bug appears also when using mdadm RAID1 - when one of the drives is detached from the array then the OS discovers it and after a while (not directly, it takes several minutes) it appears under /proc/mounts: instead of /dev/md0p1 I see there /dev/sdb1. And usually after some hour or so (depending on system workload) the PC completely freezes. So discussion about the uniqueness of UUIDs or not, a crashing kernel is telling me that there is a serious bug. While in my case detaching was intentional, there are several real possibilities when a RAID1 disk can get detached and currently this leads to crashing the server when using BTRFS. That not what is intended when using RAID ;-). In my case I wanted to do something which was working perfectly all the years before with all other file systems - checking the file system of the root disk while the server is running. The procedure is simple: 1. detach one of the disks 2. do fsck on the disk device 3. mdadm --zero-superblock on the device so it gets completely rewritten 4. mdadm --add it to the array There were some surprises with BTRFS - if 2. is not done directly after 1. btrfsck refuses to check the disk as it is reported to be mounted by /proc/mounts. And while 2. or even after finishing it the system was freezing. If I got to get to 4. fast enough everything was OK, but again, that's not what I expect from a good operating system. Any objections? Konstantin -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
2014-12-01 18:27 GMT+01:00 Robert White rwh...@pobox.com: On 12/01/2014 04:56 AM, MegaBrutal wrote: Since the other thread went off into theoretical debates about UUIDs and their generic relation to BTRFS, their everyday use cases, and the philosophical meaning behind uniqueness of copies and UUIDs; I'd like to specifically ask you to only post here about the ACTUAL problem at hand. Don't get me wrong, I find the discussion in the other thread really interesting, I'm following it, but it is only very remotely related to the original issue, so please keep it there! If you're interested to catch up about the actual bug symptoms, please read the bug report linked above, and (optionally) reproduce the problem yourself! That discussion _was_ the actual discussion of the actual problem. A problem that is not particularly theoretical, a problem that is common to block-level snapshots, and a discussion that contained the actual work-arounds. I suggest a re-read. 8-) The majority of the discussion was about how the kernel should react UPON mounting a file system when more than one device of the same UUID exist on the system. While it is a very legit problem worth to discuss and mitigate, this is not the same situation as how the kernel behaves when an identical device appears WHILE the file system is being mounted. Actually, I would not identify devices by UUIDs when I know that duplicates could exist due to snapshots, therefore I mount devices by LVM paths. And when a file system is already mounted with all its devices, that is a clear situation: all devices are open and locked by the kernel, any mixup at that point is an error. What is the case with multiple-device file systems? Supply all their devices with device= mount options. Just don't identify devices by UUIDs when you know there could be duplicates. Use UUIDs when you don't use LVM. Identifying file systems by UUIDs were invented because classic /dev/sdXX device names might change. But LVM names don't change. They only change when you intentionally change them e.g. with lvrename. Since having duplicate UUIDs on devices is not a problem for me since I can tell them apart by LVM names, the discussion is of little relevance to my use case. Of course it's interesting and I like to read it along, it is not about the actual problem at hand. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
On 12/01/2014 02:10 PM, MegaBrutal wrote: Since having duplicate UUIDs on devices is not a problem for me since I can tell them apart by LVM names, the discussion is of little relevance to my use case. Of course it's interesting and I like to read it along, it is not about the actual problem at hand. Which is why you use the device= mount option, which would take LVM names and which was repeatedly discussed as solving this very problem. Once you decide to duplicate the UUIDs with LVM snapshots you take up the burden of disambiguating your storage. Which is part of why re-reading was suggested as this was covered in some depth and _is_ _exactly_ about the problem at hand. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
2014-12-02 0:24 GMT+01:00 Robert White rwh...@pobox.com: On 12/01/2014 02:10 PM, MegaBrutal wrote: Since having duplicate UUIDs on devices is not a problem for me since I can tell them apart by LVM names, the discussion is of little relevance to my use case. Of course it's interesting and I like to read it along, it is not about the actual problem at hand. Which is why you use the device= mount option, which would take LVM names and which was repeatedly discussed as solving this very problem. Once you decide to duplicate the UUIDs with LVM snapshots you take up the burden of disambiguating your storage. Which is part of why re-reading was suggested as this was covered in some depth and _is_ _exactly_ about the problem at hand. Nope. root@reproduce-1391429:~# cat /proc/cmdline BOOT_IMAGE=/vmlinuz-3.18.0-031800rc5-generic root=/dev/mapper/vg-rootlv ro rootflags=device=/dev/mapper/vg-rootlv,subvol=@ Observe, device= mount option is added. root@reproduce-1391429:~# ./reproduce-1391429.sh #!/bin/sh -v lvs LV VG Attr LSize Pool Origin Data% Move Log Copy% Convert rootlv vg -wi-ao--- 1.00g swap0 vg -wi-ao--- 256.00m grub-probe --target=device / /dev/mapper/vg-rootlv grep / /proc/mounts rootfs / rootfs rw 0 0 /dev/dm-1 / btrfs rw,relatime,space_cache 0 0 lvcreate --snapshot --size=128M --name z vg/rootlv Logical volume z created lvs LV VG Attr LSize Pool Origin Data% Move Log Copy% Convert rootlv vg owi-aos-- 1.00g swap0 vg -wi-ao--- 256.00m z vg swi-a-s-- 128.00m rootlv 0.11 ls -l /dev/vg/ total 0 lrwxrwxrwx 1 root root 7 Dec 2 00:12 rootlv - ../dm-1 lrwxrwxrwx 1 root root 7 Dec 2 00:12 swap0 - ../dm-0 lrwxrwxrwx 1 root root 7 Dec 2 00:12 z - ../dm-2 grub-probe --target=device / /dev/mapper/vg-z grep / /proc/mounts rootfs / rootfs rw 0 0 /dev/dm-2 / btrfs rw,relatime,space_cache 0 0 lvremove --force vg/z Logical volume z successfully removed grub-probe --target=device / /dev/mapper/vg-rootlv grep / /proc/mounts rootfs / rootfs rw 0 0 /dev/dm-1 / btrfs rw,relatime,space_cache 0 0 Problem still reproduces. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
2014-12-01 22:45 GMT+01:00 Konstantin newsbox1...@web.de: MegaBrutal schrieb am 01.12.2014 um 13:56: Hi all, I've reported the bug I've previously posted about in BTRFS messes up snapshot LV with origin in the Kernel Bug Tracker. https://bugzilla.kernel.org/show_bug.cgi?id=89121 Hi MegaBrutal. If I understand your report correctly, I can give you another example where this bug is appearing. It is so bad that it leads to freezing the system and I'm quite sure it's the same thing. I was thinking about filing a bug but didn't have the time for that yet. Maybe you could add this case to your bug report as well. The bug appears also when using mdadm RAID1 - when one of the drives is detached from the array then the OS discovers it and after a while (not directly, it takes several minutes) it appears under /proc/mounts: instead of /dev/md0p1 I see there /dev/sdb1. And usually after some hour or so (depending on system workload) the PC completely freezes. So discussion about the uniqueness of UUIDs or not, a crashing kernel is telling me that there is a serious bug. Hmm, I also suspect our symptoms have the same root cause. It seems the same thing happens: the BTRFS module notices another device with the same file system and starts to report it as the root device. It seems like it has no idea that it's part of a RAID configuration or anything. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: PROBLEM: #89121 BTRFS mixes up mounted devices with their snapshots
On 12/02/2014 01:15 AM, MegaBrutal wrote: 2014-12-02 0:24 GMT+01:00 Robert White rwh...@pobox.com: On 12/01/2014 02:10 PM, MegaBrutal wrote: Since having duplicate UUIDs on devices is not a problem for me since I can tell them apart by LVM names, the discussion is of little relevance to my use case. Of course it's interesting and I like to read it along, it is not about the actual problem at hand. Which is why you use the device= mount option, which would take LVM names and which was repeatedly discussed as solving this very problem. Once you decide to duplicate the UUIDs with LVM snapshots you take up the burden of disambiguating your storage. Which is part of why re-reading was suggested as this was covered in some depth and _is_ _exactly_ about the problem at hand. Nope. root@reproduce-1391429:~# cat /proc/cmdline BOOT_IMAGE=/vmlinuz-3.18.0-031800rc5-generic root=/dev/mapper/vg-rootlv ro rootflags=device=/dev/mapper/vg-rootlv,subvol=@ Observe, device= mount option is added. device= options is needed only in a btrfs multi-volume scenario. If you have only one disk, this is not needed root@reproduce-1391429:~# ./reproduce-1391429.sh #!/bin/sh -v lvs LV VG Attr LSize Pool Origin Data% Move Log Copy% Convert rootlv vg -wi-ao--- 1.00g swap0 vg -wi-ao--- 256.00m grub-probe --target=device / /dev/mapper/vg-rootlv grep / /proc/mounts rootfs / rootfs rw 0 0 /dev/dm-1 / btrfs rw,relatime,space_cache 0 0 lvcreate --snapshot --size=128M --name z vg/rootlv Logical volume z created lvs LV VG Attr LSize Pool Origin Data% Move Log Copy% Convert rootlv vg owi-aos-- 1.00g swap0 vg -wi-ao--- 256.00m z vg swi-a-s-- 128.00m rootlv 0.11 ls -l /dev/vg/ total 0 lrwxrwxrwx 1 root root 7 Dec 2 00:12 rootlv - ../dm-1 lrwxrwxrwx 1 root root 7 Dec 2 00:12 swap0 - ../dm-0 lrwxrwxrwx 1 root root 7 Dec 2 00:12 z - ../dm-2 grub-probe --target=device / /dev/mapper/vg-z grep / /proc/mounts rootfs / rootfs rw 0 0 /dev/dm-2 / btrfs rw,relatime,space_cache 0 0 What /proc/self/mountinfo contains ? And more important question: it is only the value returned by /proc/mount wrongly or also the filesystem content is affected ? lvremove --force vg/z Logical volume z successfully removed grub-probe --target=device / /dev/mapper/vg-rootlv grep / /proc/mounts rootfs / rootfs rw 0 0 /dev/dm-1 / btrfs rw,relatime,space_cache 0 0 Problem still reproduces. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- gpg @keyserver.linux.it: Goffredo Baroncelli kreijackATinwind.it Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html