btrfs gives kernel call trace just because send/receive on a ro fs

2016-08-16 Thread Christoph Anton Mitterer
Hey. $ uname -a Linux heisenberg 4.6.0-1-amd64 #1 SMP Debian 4.6.4-1 (2016-07-18) x86_64 GNU/Linux The following call trace happens, just because of a send/receive to a read-only mounted btrfs. Isn't that a bit overkill and shouldn't one rather get just a user space warning/error? Aug 16 21:15

does btrfs-receive use/compare the checksums from the btrfs-send side?

2016-08-27 Thread Christoph Anton Mitterer
Hey. I've often wondered: When I do a send/receive, does the receiving side use the checksums from the sending side (either by directly storing them or by comparing them with calculated checksums and failing if they don't match after the transfer)? Cause that would effectively secure any transpor

Re: does btrfs-receive use/compare the checksums from the btrfs-send side?

2016-08-28 Thread Christoph Anton Mitterer
On Sun, 2016-08-28 at 11:35 -0600, Chris Murphy wrote: > I don't see evidence of them in the btrfs send file, so I don't think > csums are in the stream. hmm... isn't that kinda unfortunate not to make use of the information that's already there? IMO, to the extent this is possibly, btrfs should

Re: does btrfs-receive use/compare the checksums from the btrfs-send side?

2016-08-28 Thread Christoph Anton Mitterer
On Sun, 2016-08-28 at 22:19 +0200, Adam Borowski wrote: > Transports over which you're likely to send a filesystem stream > already > protect against corruption. Well... in some cases,... but not always... just consider a plain old netcat... > It'd still be nice to have something for those which

Re: does btrfs-receive use/compare the checksums from the btrfs-send side?

2016-09-03 Thread Christoph Anton Mitterer
On Mon, 2016-08-29 at 16:25 +0800, Qu Wenruo wrote: > Send will generate checksum for each command. What does "command" mean here? Or better said how much data is secured with one CRC32? > For send stream, it's CRC32 for the whole command. And this is verified then on the receiving end? Wouldn'

gazillions of Incorrect local/global backref count

2016-09-03 Thread Christoph Anton Mitterer
Hey. I just did a btrfs check on my notebooks root fs, with: $ uname -a Linux heisenberg 4.7.0-1-amd64 #1 SMP Debian 4.7.2-1 (2016-08-28) x86_64 GNU/Linux $ btrfs --version btrfs-progs v4.7.1 during: checking extents it found gazillions of these: Incorrect local backref count on 1107980288 roo

Re: gazillions of Incorrect local/global backref count

2016-09-03 Thread Christoph Anton Mitterer
On Sun, 2016-09-04 at 05:33 +, Paul Jones wrote: > The errors are wrong. I nearly ruined my filesystem a few days ago by > trying to repair similar errors, thankfully all seems ok. > Check again with btrfs-progs 4.6.1 and see if the errors go away, > mine did. > See open bug https://bugzilla.ke

Re: gazillions of Incorrect local/global backref count

2016-09-05 Thread Christoph Anton Mitterer
On Mon, 2016-09-05 at 09:27 +0200, David Sterba wrote: > As others replied, it's a false positive. There's a fix on the way, > once > it's done I'll release 4.7.2. Yeah... thanks again for confirming... and sorry that I've missed the obvious earlier post :-/ Best wishes, Chris. smime.p7s Descript

Re: Security implications of btrfs receive?

2016-09-07 Thread Christoph Anton Mitterer
On Tue, 2016-09-06 at 18:20 +0100, Graham Cobb wrote: > they know the UUID of the subvolume? Unfortunately, btrfs seems to be pretty problematic when anyone knows your UUIDs... Look for my thread "attacking btrfs filesystems via UUID collisions?" in the list archives. From accidental corruptions t

Re: Security implications of btrfs receive?

2016-09-07 Thread Christoph Anton Mitterer
On Wed, 2016-09-07 at 07:58 -0400, Austin S. Hemmelgarn wrote: > if you want proper security you should > be  > using a real container system Won't these probably use the same filesystems? Cheers, Chris. smime.p7s Description: S/MIME cryptographic signature

Re: Security implications of btrfs receive?

2016-09-07 Thread Christoph Anton Mitterer
On Wed, 2016-09-07 at 11:06 -0400, Austin S. Hemmelgarn wrote: > This is an issue with any filesystem, Not really... any other filesystem I'd know (not sure about ZFS) keeps working when there are UUID collisions... or at least it won't cause arbitrary corruptions, which then in the end may even be

Re: stability matrix (was: Is stability a joke?)

2016-09-14 Thread Christoph Anton Mitterer
Hey. As for the stability matrix... In general: - I think another column should be added, which tells when and for   which kernel version the feature-status of each row was    revised/updated the last time and especially by whom.   If a core dev makes a statement on a particular feature, this   p

Re: Is stability a joke? (wiki updated)

2016-09-15 Thread Christoph Anton Mitterer
On Thu, 2016-09-15 at 14:20 -0400, Austin S. Hemmelgarn wrote: > 3. Fsck should be needed only for un-mountable filesystems.  Ideally, > we  > should be handling things like Windows does.  Preform slightly > better  > checking when reading data, and if we see an error, flag the > filesystem  > for

Re: stability matrix (was: Is stability a joke?)

2016-09-19 Thread Christoph Anton Mitterer
+1 for all your changes with the following comments in addition... On Mon, 2016-09-19 at 17:27 +0200, David Sterba wrote: > That's more like a usecase, thats out of the scope of the tabular > overview. But we have an existing page UseCases that I'd like to > transform to a more structured and com

Re: stability matrix

2016-09-19 Thread Christoph Anton Mitterer
On Mon, 2016-09-19 at 13:18 -0400, Austin S. Hemmelgarn wrote: > > > - even mounting a fs ro, may cause it to be changed > > > > This would go to the UseCases > My same argument about the UUID issues applies here, just without > the  > security aspect. I personally could agree to have that "just"

Re: stability matrix

2016-09-19 Thread Christoph Anton Mitterer
On Mon, 2016-09-19 at 16:07 -0400, Chris Mason wrote: > That's in the blockdev command (blockdev --setro /dev/xxx). Well, I know that ;-) ... but I bet most end-user don't (just as most end-users assume mount -r is truly ro)... At least this is nowadays documented at the mount manpage... so in a w

spurious call trace during send

2016-09-19 Thread Christoph Anton Mitterer
Hey. FYI: Just got this call trace during a send/receive (with -p) between two btrfs on 4.7.0. Neither btrfs-send nor -receive showed an error though and seem to have completed successfully (at least a diff of the changes implied that. Sep 19 20:24:38 heisenberg kernel: BTRFS info (device dm-2

Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5

2016-09-21 Thread Christoph Anton Mitterer
On Thu, 2016-09-22 at 10:08 +0800, Qu Wenruo wrote: > And I don't see the necessary to csum the parity. > Why csum a csum again? I'd say simply for the following reason: Imagine the smallest RAID5: 2x data D1 D2, 1x parity P If D2 is lost it could be recalculated via D1 and P. What if only (all)

Re: btrfs rare silent data corruption with kernel data leak

2016-09-22 Thread Christoph Anton Mitterer
On Thu, 2016-09-22 at 19:49 +0200, Kai Krakow wrote: > I think mysql data files > have > their own checksumming Last time I've checked it, none of the major DBs or VM-image formats had this... postgresql being the only one supporting something close to fs level csumming (but again not per default).

Re: Does data checksumming remain for files with No_COW file attribute?

2016-09-24 Thread Christoph Anton Mitterer
On Sat, 2016-09-24 at 17:40 +0500, Roman Mamedov wrote: > Yes. IIRC the reasoning was that it's more difficult to track > checksums of > data which is being overwritten in-place (as opposed to CoW). AFAIU it wouldn't be more difficult, since the meta-data itself is still subject to CoW... There's

Re: Does data checksumming remain for files with No_COW file attribute?

2016-09-24 Thread Christoph Anton Mitterer
On Sat, 2016-09-24 at 12:43 +, Hugo Mills wrote: >    It's because you can't update the data and the checksum atomically > -- at some point in the writing process, they must be inconsistent. > This is considered a Bad Thing. It's not worse at all than simply not cheksuming... in both cases you

Re: Does data checksumming remain for files with No_COW file attribute?

2016-09-24 Thread Christoph Anton Mitterer
On Sat, 2016-09-24 at 23:44 +0200, Adam Borowski wrote: > This would require teaching btrfs that, in some cases, a csum > mismatch is no > big thing and it can legitimately return junk data (like most other > filesystems) rather than complaining.  Same for scrub and btrfs > check. Well, I see no po

Re: Does data checksumming remain for files with No_COW file attribute?

2016-09-25 Thread Christoph Anton Mitterer
On Sun, 2016-09-25 at 15:49 +0200, Goffredo Baroncelli wrote: > I think that the bigger cost is the lower performance due to the > write of checksums. Which would make btrfs default mode of CoW+checksumming also unusable... If for anyone checksumming comes at a too high cost, he can simply use no

Re: dm-integrity + mdadm + btrfs = no journal?

2019-01-30 Thread Christoph Anton Mitterer
On Wed, 2019-01-30 at 07:58 -0500, Austin S. Hemmelgarn wrote: > Running dm-integrity without a journal is roughly equivalent to > using > the nobarrier mount option (the journal is used to provide the same > guarantees that barriers do). IOW, don't do this unless you are > willing > to lose th

Re: dm-integrity + mdadm + btrfs = no journal?

2019-01-30 Thread Christoph Anton Mitterer
On Wed, 2019-01-30 at 11:00 -0500, Austin S. Hemmelgarn wrote: > Running dm-integrity on a device which doesn't support barriers > without > a journal is risky, because the journal can help mitigate the issues > arising from the lack of barrier support. Does it? Isn't it then suffering from the

Re: Reproducer for "compressed data + hole data corruption bug, 2018 edition" still works on 4.20.7

2019-02-12 Thread Christoph Anton Mitterer
Hey. Sounds like a highly severe (and long standing) bug? Is anyone doing anything about it? Cheers, Chris.

Re: Reproducer for "compressed data + hole data corruption bug, 2018 edition" still works on 4.20.7

2019-02-14 Thread Christoph Anton Mitterer
On Thu, 2019-02-14 at 01:22 +, Filipe Manana wrote: > The following one liner fixes it: > https://friendpaste.com/22t4OdktHQTl0aMGxcWLj3 Great to see that fixed... is there any advise that can be given for users/admins? Like whether and how any occurred corruptions can be detected (right now

Re: "btrfs: harden agaist duplicate fsid" spams syslog

2019-07-11 Thread Christoph Anton Mitterer
I'm also seeing these since quite a while on Debian sid: Jul 11 13:33:56 heisenberg kernel: BTRFS info (device dm-0): device fsid 60[...]3c devid 1 moved old:/dev/mapper/system new:/dev/dm-0 Jul 11 13:33:56 heisenberg kernel: BTRFS info (device dm-0): device fsid 60[...]3c devid 1 moved old:/dev

Re: Massive filesystem corruption since kernel 5.2 (ARCH)

2019-08-24 Thread Christoph Anton Mitterer
Hey. Anything new about the issue described here: https://www.spinics.net/lists/linux-btrfs/msg91046.html It was said that it might be a regression in 5.2 actually and not a hardware thing... so I just wonder whether I can safely move to 5.2? Cheers, Chris.

Re: Massive filesystem corruption since kernel 5.2 (ARCH)

2019-08-26 Thread Christoph Anton Mitterer
Hey. On Sun, 2019-08-25 at 12:00 +0200, Swâmi Petaramesh wrote: > I haven't seen any filesystem issue since, but I haven't used the > system > very much yet. Hmm strange... so could it have been a hardware issue? Cheers, Chris.

Re: Massive filesystem corruption since kernel 5.2 (ARCH)

2019-08-29 Thread Christoph Anton Mitterer
On Thu, 2019-08-29 at 14:46 +0200, Oliver Freyermuth wrote: > This thread made me check on my various BTRFS volumes and for almost > all of them (in different machines), I find cases of > failed to load free space cache for block group , rebuilding it > now > at several points during the last

Re: BTRFS state on kernel 5.2

2019-09-02 Thread Christoph Anton Mitterer
On Mon, 2019-09-02 at 20:10 -0400, Remi Gauvin wrote: > AFAIK, checksum and Nocow files is technically not possible While this has been claimed numerous times, I still don't see any reason why it should be true. I even used to have a off-the-list conversation with Chris Mason about just that: me

Re: Massive filesystem corruption since kernel 5.2 (ARCH)

2019-09-12 Thread Christoph Anton Mitterer
On Thu, 2019-09-12 at 12:53 +0200, Swâmi Petaramesh wrote: > Yep, I assume that a big flashing red neon sign should be raised for > a > confirmed bug that can trash your filesystem into ashes, and > actually > did so for two of mine... I doubt this will happen... I've asked for something like th

Re: Massive filesystem corruption since kernel 5.2 (ARCH)

2019-09-12 Thread Christoph Anton Mitterer
Hi. First, thanks for finding&fixing this :-) On Thu, 2019-09-12 at 08:50 +0100, Filipe Manana wrote: > 1) either a hang when committing a transaction, reported by several > users recently and hit it myself too twice when running fstests (test > case generic/475 and generic/561) after I upgradad

Re: Massive filesystem corruption since kernel 5.2 (ARCH)

2019-09-12 Thread Christoph Anton Mitterer
On Thu, 2019-09-12 at 15:28 +0100, Filipe Manana wrote: > This is case 2), the corruption with the error messages > "parent transid verify failed ..." in dmesg/syslog after mounting the > filesystem again. Hmm so "at least" it will never go unnoticed, right? This is IMO a pretty important advise,

BUG: unable to handle kernel paging request at ffff9fb75f827100

2018-02-20 Thread Christoph Anton Mitterer
Hi. Not sure if that's a bug in btrfs... maybe someone's interested in it. Cheers, Chris. # uname -a Linux heisenberg 4.14.0-3-amd64 #1 SMP Debian 4.14.17-1 (2018-02-14) x86_64 GNU/Linux Feb 21 04:55:51 heisenberg kernel: BUG: unable to handle kernel paging request at 9fb75f827100 Feb 21

Re: BUG: unable to handle kernel paging request at ffff9fb75f827100

2018-02-21 Thread Christoph Anton Mitterer
Hi Nikolay. Thanks. On Wed, 2018-02-21 at 08:34 +0200, Nikolay Borisov wrote: > This looks like the one fixed by > e8f1bc1493855e32b7a2a019decc3c353d94daf6 . It's tagged for stable so > you > should get it eventually. Another consequence of this was that I couldn't sync/umount or shutdown anymor

Re: BUG: unable to handle kernel paging request at ffff9fb75f827100

2018-02-21 Thread Christoph Anton Mitterer
Interestingly, I got another one only within minutes after the scrub: Feb 21 15:23:49 heisenberg kernel: BTRFS warning (device dm-0): csum failed root 257 ino 7703 off 56852480 csum 0x42d1b69c expected csum 0x3ce55621 mirror 1 Feb 21 15:23:52 heisenberg kernel: BTRFS warning (device dm-0): csum fa

Re: BUG: unable to handle kernel paging request at ffff9fb75f827100

2018-02-21 Thread Christoph Anton Mitterer
Spurious corruptions seem to continue [ 69.688652] BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 4096 [ 69.688656] BTRFS critical (device dm-0): unable to find logical 4503658729209856 length 4096 [ 69.688658] BTRFS critical (device dm-0): unable to find l

Re: BUG: unable to handle kernel paging request at ffff9fb75f827100

2018-02-21 Thread Christoph Anton Mitterer
A scrub now gave: # btrfs scrub start -Br /dev/disk/by-label/system ERROR: scrubbing /dev/disk/by-label/system failed for device id 1: ret=-1, errno=5 (Input/output error) scrub canceled for b6050e38-716a-40c3-a8df-fcf1dd7e655d scrub started at Wed Feb 21 17:42:39 2018 and was aborted afte

Re: BUG: unable to handle kernel paging request at ffff9fb75f827100

2018-02-21 Thread Christoph Anton Mitterer
e8f1bc1493855e32b7a2a019decc3c353d94daf6 That bug... When was that introduced and how can I find out whether an fs was affected/corrupted by this? Cause I've mounted and wrote to some extremely important (to me) fs recently. Thanks, Chris. -- To unsubscribe from this list: send the line "uns

Re: BUG: unable to handle kernel paging request at ffff9fb75f827100

2018-02-21 Thread Christoph Anton Mitterer
And you have any other ideas on how to dubs that filesystem? Or at least backup as much as possible? Thanks, Chris. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/major

Re: spurious full btrfs corruption

2018-03-05 Thread Christoph Anton Mitterer
Hey Qu. On Thu, 2018-03-01 at 09:25 +0800, Qu Wenruo wrote: > > - For my personal data, I have one[0] Seagate 8 TB SMR HDD, which I > > backup (send/receive) on two further such HDDs (all these are > > btrfs), and (rsync) on one further with ext4. > > These files have all their SHA512 sums a

Re: spurious full btrfs corruption

2018-03-08 Thread Christoph Anton Mitterer
Hey. On Tue, 2018-03-06 at 09:50 +0800, Qu Wenruo wrote: > > These were the two files: > > -rw-r--r-- 1 calestyo calestyo 90112 Feb 22 16:46 'Lady In The > > Water/05.mp3' > > -rw-r--r-- 1 calestyo calestyo 4892407 Feb 27 23:28 > > '/home/calestyo/share/music/Lady In The Water/05.mp3' > > > >

call trace on btrfs send/receive

2018-03-09 Thread Christoph Anton Mitterer
Hey. The following still happens with 4.15 kernel/progs: btrfs send -p oldsnap newsnap | btrfs receive /some/other/fs Mar 10 00:48:10 heisenberg kernel: WARNING: CPU: 5 PID: 32197 at /build/linux-PFKtCE/linux-4.15.4/fs/btrfs/send.c:6487 btrfs_ioctl_send+0x48f/0xfb0 [btrfs] Mar 10 00:48:10 heis

zerofree btrfs support?

2018-03-09 Thread Christoph Anton Mitterer
Hi. Just wondered... was it ever planned (or is there some equivalent) to get support for btrfs in zerofree? Thanks, Chris. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.or

Re: zerofree btrfs support?

2018-03-10 Thread Christoph Anton Mitterer
On Sat, 2018-03-10 at 09:16 +0100, Adam Borowski wrote: > Do you want zerofree for thin storage optimization, or for security? I don't think one can really use it for security (neither on SSD or HDD). On both, zeroed blocks may still be readable by forensic measures. So optimisation, i.e. digging

Re: Ongoing Btrfs stability issues

2018-03-10 Thread Christoph Anton Mitterer
On Sat, 2018-03-10 at 14:04 +0200, Nikolay Borisov wrote: > So for OLTP workloads you definitely want nodatacow enabled, bear in > mind this also disables crc checksumming, but your db engine should > already have such functionality implemented in it. Unlike repeated claims made here on the list a

Re: zerofree btrfs support?

2018-03-10 Thread Christoph Anton Mitterer
On Sat, 2018-03-10 at 19:37 +0500, Roman Mamedov wrote: > Note you can use it on HDDs too, even without QEMU and the like: via > using LVM > "thin" volumes. I use that on a number of machines, the benefit is > that since > TRIMed areas are "stored nowhere", those partitions allow for > incredibly f

Re: zerofree btrfs support?

2018-03-10 Thread Christoph Anton Mitterer
On Sat, 2018-03-10 at 16:50 +0100, Adam Borowski wrote: > Since we're on a btrfs mailing list Well... my original question was whether someone could make zerofree support for btrfs (which I think would be best if someone who knows how btrfs really works)... thus I directed the question to this list

Re: zerofree btrfs support?

2018-03-10 Thread Christoph Anton Mitterer
On Sat, 2018-03-10 at 23:31 +0500, Roman Mamedov wrote: > QCOW2 would add a second layer of COW > on top of > Btrfs, which sounds like a nightmare. I've just seen there is even a nocow option "specifically" for btrfs... it seems however that it doesn't disable the CoW of qcow, but rather that of b

Re: Ongoing Btrfs stability issues

2018-03-11 Thread Christoph Anton Mitterer
On Sun, 2018-03-11 at 18:51 +0100, Goffredo Baroncelli wrote: > > COW is needed to properly checksum the data. Otherwise is not > possible to ensure the coherency between data and checksum (however I > have to point out that BTRFS fails even in this case [*]). > We could rearrange this sentence, s

Re: Ongoing Btrfs stability issues

2018-03-12 Thread Christoph Anton Mitterer
On Mon, 2018-03-12 at 22:22 +0100, Goffredo Baroncelli wrote: > Unfortunately no, the likelihood might be 100%: there are some > patterns which trigger this problem quite easily. See The link which > I posted in my previous email. There was a program which creates a > bad checksum (in COW+DATASUM m

Re: Ongoing Btrfs stability issues

2018-03-13 Thread Christoph Anton Mitterer
On Tue, 2018-03-13 at 20:36 +0100, Goffredo Baroncelli wrote: > A checksum mismatch, is returned as -EIO by a read() syscall. This is > an event handled badly by most part of the programs. Then these programs must simply be fixed... otherwise they'll also fail under normal circumstances with btrfs,

Re: zerofree btrfs support?

2018-03-14 Thread Christoph Anton Mitterer
Hey. On Wed, 2018-03-14 at 20:38 +0100, David Sterba wrote: > I have a prototype code for that and after the years, seeing the > request > again, I'm not against adding it as long as it's not advertised as a > security feature. I'd expect that anyone in the security area should know that securely

Re: spurious full btrfs corruption

2018-03-15 Thread Christoph Anton Mitterer
Hey. Found some time to move on with this: Frist, I think from my side (i.e. restoring as much as possible) I'm basically done now, so everything left over here is looking for possible bugs/etc. I have from my side no indication that my corruptions were actually a bug in btrfs... the new notebo

Re: [PATCH] btrfs-progs: mkfs: add uuid and otime to ROOT_ITEM of FS_TREE

2018-03-19 Thread Christoph Anton Mitterer
On Mon, 2018-03-19 at 14:02 +0100, David Sterba wrote: > We can do that by a special purpose tool. No average user will ever run (even know) about that... Could you perhaps either do it automatically in fsck (which is IMO als a bad idea as fsck should be read-only per default)... or at least add

Re: Status of RAID5/6

2018-03-21 Thread Christoph Anton Mitterer
Hey. Some things would IMO be nice to get done/clarified (i.e. documented in the Wiki and manpages) from users'/admin's POV: Some basic questions: - Starting with which kernels (including stable kernel versions) does it contain the fixes for the bigger issues from some time ago? - Exactly what

Re: spurious full btrfs corruption

2018-03-21 Thread Christoph Anton Mitterer
Just some addition on this: On Fri, 2018-03-16 at 01:03 +0100, Christoph Anton Mitterer wrote: > The issue that newer btrfs-progs/kernel don't restore anything at all > from my corrupted fs: 4.13.3 seems to be already buggy... 4.7.3 works, but interestingly btrfs-find-super seems to

Re: spurious full btrfs corruption

2018-03-26 Thread Christoph Anton Mitterer
Hey Qu. Some update on the corruption issue on my Fujitsu notebook: Finally got around running some memtest on it... and few seconds after it started I already got this: https://paste.pics/1ff8b13b94f31082bc7410acfb1c6693 So plenty of bad memory... I'd say it's probably not so unlikely that *t

Re: Btrfs progs release 4.16.1

2018-04-25 Thread Christoph Anton Mitterer
On Wed, 2018-04-25 at 07:22 -0400, Austin S. Hemmelgarn wrote: > While I can understand Duncan's point here, I'm inclined to agree > with > David Same from my side... and I run a multi-PiB storage site (though not with btrfs). Cosmetically one shouldn't do this in a bugfix release, this should h

is back and forth incremental send/receive supported/stable?

2021-01-29 Thread Christoph Anton Mitterer
Hey. I regularly do the following with btrfs, which seems to work pretty stable since years: - having n+1 filesystems MASTER and COPY_n - creating snapshots on MASTER, e.g. one each month - incremental send/receive the new snapshot from MASTER to each of COPY_n (which already have the previous s

Re: is back and forth incremental send/receive supported/stable?

2021-01-31 Thread Christoph Anton Mitterer
Hey Hugo. Thanks for your explanation. I assume such a swapped send/receive would fail at least gracefully? On Fri, 2021-01-29 at 19:20 +, Hugo Mills wrote: >    In your scenario with MASTER and COPY-1 swapped, you'd have to > match the received_uuid from the sending side (on old COPY-1) to

Re: is back and forth incremental send/receive supported/stable?

2021-02-01 Thread Christoph Anton Mitterer
On Mon, 2021-02-01 at 10:46 +, Hugo Mills wrote: >    It'll fail *obviously*. I'm not sure how graceful it is. :) Okay that doesn't sound like it was very trustworthy... :-/ Especially this from the manpage: You must not specify clone sources unless you guarantee that these snap

Re: [PATCH v2 1/2] btrfs-progs: Rename OPEN_CTREE_FS_PARTIAL to OPEN_CTREE_TEMPORARY_SUPER

2018-07-12 Thread Christoph Anton Mitterer
Hey. Better late than never ;-) Just to confirm: At least since 4.16.1, I could btrfs-restore from the broken fs image again (that I've described in "spurious full btrfs corruption" from around mid March). So the regression in btrfsprogs has in fact been fixed by these patches, it seems. Thank

fsck lowmem mode only: ERROR: errors found in fs roots

2018-08-30 Thread Christoph Anton Mitterer
Hey. I've the following on a btrfs that's basically the system fs for my notebook: When booting from a USB stick with: # uname -a Linux heisenberg 4.17.0-3-amd64 #1 SMP Debian 4.17.17-1 (2018-08-18) x86_64 GNU/Linux # btrfs --version btrfs-progs v4.17 ... a lowmem mode fsck gives no error: #

Re: fsck lowmem mode only: ERROR: errors found in fs roots

2018-09-03 Thread Christoph Anton Mitterer
Hey. On Fri, 2018-08-31 at 10:33 +0800, Su Yue wrote: > Can you please fetch btrfs-progs from my repo and run lowmem check > in readonly? > Repo: https://github.com/Damenly/btrfs-progs/tree/lowmem_debug > It's based on v4.17.1 plus additonal output for debug only. I've adapted your patch to 4.17

Re: fsck lowmem mode only: ERROR: errors found in fs roots

2018-09-04 Thread Christoph Anton Mitterer
On Tue, 2018-09-04 at 17:14 +0800, Qu Wenruo wrote: > However the backtrace can't tell which process caused such fsync > call. > (Maybe LVM user space code?) Well it was just literally before btrfs-check exited... so I blindly guesses... but arguably it could be just some coincidence. LVM tools a

Re: fsck lowmem mode only: ERROR: errors found in fs roots

2018-09-05 Thread Christoph Anton Mitterer
On Wed, 2018-09-05 at 15:04 +0800, Su Yue wrote: > Agreed with Qu, btrfs-check shall not try to do any write. Well.. it could have been just some coincidence :-) > I found the errors should blame to something about inode_extref check > in lowmem mode. So you mean errors in btrfs-check... and it

Re: fsck lowmem mode only: ERROR: errors found in fs roots

2018-10-18 Thread Christoph Anton Mitterer
Hey. So I'm back from a longer vacation and had now the time to try out your patches from below: On Wed, 2018-09-05 at 15:04 +0800, Su Yue wrote: > I found the errors should blame to something about inode_extref check > in lowmem mode. > I have writeen three patches to detect and report errors ab

Re: fsck lowmem mode only: ERROR: errors found in fs roots

2018-10-27 Thread Christoph Anton Mitterer
Hey. Without the last patches on 4.17: checking extents checking free space cache checking fs roots ERROR: errors found in fs roots Checking filesystem on /dev/mapper/system UUID: 6050ca10-e778-4d08-80e7-6d27b9c89b3c found 619543498752 bytes used, error(s) found total csum bytes: 602382204 total

Re: fsck lowmem mode only: ERROR: errors found in fs roots

2018-11-02 Thread Christoph Anton Mitterer
Hey Su. Anything further I need to do in this matter or can I consider it "solved" and you won't need further testing by my side, but just PR the patches of that branch? :-) Thanks, Chris. On Sat, 2018-10-27 at 14:15 +0200, Christoph Anton Mitterer wrote: > Hey. > > &g

Re: fsck lowmem mode only: ERROR: errors found in fs roots

2018-11-02 Thread Christoph Anton Mitterer
On Sat, 2018-11-03 at 09:34 +0800, Su Yue wrote: > Sorry for the late reply cause I'm busy at other things. No worries :-) > I just looked through related codes and found the bug. > The patches can fix it. So no need to do more tests. > Thanks to your tests and patience. :) Thanks for fixing :-)

Re: Reproducer for "compressed data + hole data corruption bug, 2018 edition" still works on 4.20.7

2019-03-04 Thread Christoph Anton Mitterer
Hey. Thanks for your elaborate explanations :-) On Fri, 2019-02-15 at 00:40 -0500, Zygo Blaxell wrote: > The problem occurs only on reads. Data that is written to disk will > be OK, and can be read correctly by a fixed kernel. > > A kernel without the fix will give corrupt data on reads with

Re: Reproducer for "compressed data + hole data corruption bug, 2018 edition" still works on 4.20.7

2019-03-04 Thread Christoph Anton Mitterer
On Fri, 2019-02-15 at 12:02 +, Filipe Manana wrote: > Upgrade to a kernel with the patch (none yet) or build it from > source? > Not sure what kind of advice you are looking for. Well more something of the kind that Zygo wrote in his mail, i.e some explanation of the whole issue in order to fi

Re: Reproducer for "compressed data + hole data corruption bug, 2018 edition" still works on 4.20.7

2019-03-14 Thread Christoph Anton Mitterer
Hey again. And again thanks for your time and further elaborate explanations :-) On Thu, 2019-03-07 at 15:07 -0500, Zygo Blaxell wrote: > In 2016 there were two kernel bugs that silently corrupted reads of > compressed data. In 2015 there were...4? 5? Before 2015 the > problems > are worse, a

Re: Reproducer for "compressed data + hole data corruption bug, 2018 edition" still works on 4.20.7

2019-03-14 Thread Christoph Anton Mitterer
Hey again. Just wondered about the inclusion status of this patch? The first merge I could find from Linus was 2 days ago for the upcoming 5.1. It doesn't seem to be in any of the stable kernels yet, neither in 5.0.x? Is this still coming to the stable kernels for distros or could it have gotten

Re: Reproducer for "compressed data + hole data corruption bug, 2018 edition" still works on 4.20.7

2019-03-14 Thread Christoph Anton Mitterer
On Fri, 2019-03-08 at 07:20 -0500, Austin S. Hemmelgarn wrote: > On 2019-03-07 15:07, Zygo Blaxell wrote: > > Legacy POSIX doesn't have the hole-punching concept, so legacy > > tools won't do it; however, people add features to GNU tools all > > the > > time, so it's hard to be 100% sure without do

Re: Reproducer for "compressed data + hole data corruption bug, 2018 edition" still works on 4.20.7

2019-03-14 Thread Christoph Anton Mitterer
Oh and just for double checking: In the original patch you've posted and which Zygo tested, AFAIU, you had one line replaced. ( https://friendpaste.com/22t4OdktHQTl0aMGxcWLj3 ) In the one submitted there were two occasions of replacing em->orig_start with em->start. ( https://lore.kernel.org/li

Re: Reproducer for "compressed data + hole data corruption bug, 2018 edition" still works on 4.20.7

2019-03-16 Thread Christoph Anton Mitterer
On Fri, 2019-03-15 at 01:28 -0400, Zygo Blaxell wrote: > But maybe there should be something like a btrfs-announce list, > > i.e. a > > low volume mailing list, in which (interested) users are informed > > about > > more grave issues. > > … > I don't know if it would be a low-volume list...every ke

delayed_refs has NO entry / btrfs_update_root:136: Aborting unused transaction(No space left).

2019-03-16 Thread Christoph Anton Mitterer
(resending,... seems this hasn't gotten through to the list, when I've sent it the first time) Hi. On Debian's 4.19.28-2 kernel (which includes the recent read- corruption on compression fix) the following happens: As a consequence of the bug from the "Reproducer for "compressed data + hole da

Re: delayed_refs has NO entry / btrfs_update_root:136: Aborting unused transaction(No space left).

2019-03-19 Thread Christoph Anton Mitterer
Anything I should do with respect to this? I.e. is further debug info needed for an interested developer? or can I simply scrap that particular image (which is not an important one)? Cheers, Chris. On Sun, 2019-03-17 at 04:42 +0100, Christoph Anton Mitterer wrote: > (resending,... seems t

Re: delayed_refs has NO entry / btrfs_update_root:136: Aborting unused transaction(No space left).

2019-04-12 Thread Christoph Anton Mitterer
On Wed, 2019-03-20 at 10:59 +0100, Johannes Thumshirn wrote: > First of all, have you tried a more recent kernel than the Debian > kernels you referenced? E.g. Linus' current master or David's misc- > next > branch? Just so we don't try to hunt down a bug that's already fixed. I haven't and that's

Re: delayed_refs has NO entry / btrfs_update_root:136: Aborting unused transaction(No space left).

2019-04-12 Thread Christoph Anton Mitterer
On Sat, 2019-04-13 at 00:46 +0200, Christoph Anton Mitterer wrote: > If you repeat the above from the losetup point, but with -r ... s/with -r/without -r/

Re: delayed_refs has NO entry / btrfs_update_root:136: Aborting unused transaction(No space left).

2019-05-02 Thread Christoph Anton Mitterer
Hey. Just asking... was anyone able to reproduce these errors (as described below)? Cheers, Chris. On Sat, 2019-04-13 at 00:46 +0200, Christoph Anton Mitterer wrote: > On Wed, 2019-03-20 at 10:59 +0100, Johannes Thumshirn wrote: > > First of all, have you tried a more recent kernel

Re: delayed_refs has NO entry / btrfs_update_root:136: Aborting unused transaction(No space left).

2019-05-16 Thread Christoph Anton Mitterer
Since no one seems to show any big interest in this issues, I've added it for the records in https://bugzilla.kernel.org/show_bug.cgi?id=203621 Cheers, Chris.

Re: Massive filesystem corruption after balance + fstrim on Linux 5.1.2

2019-05-28 Thread Christoph Anton Mitterer
Hey. Just to be on the safe side... AFAIU this issue only occured in 5.1.2 and later, right? Starting with which 5.1.x and 5.2.x versions has the fix been merged? Cheers, Chris.

Re: Patch "Btrfs: do not start a transaction during fiemap"

2019-05-29 Thread Christoph Anton Mitterer
Hey David. Regarding your patch "Btrfs: do not start a transaction during fiemap"... I assume since the blockdevice had to be set read-only in order for the bug to happen... all these aborted transactions, etc. couldn't cause any corruptions/etc. upon the fs,... so there's nothing further one wou

in which directions does btrfs send -p | btrfs receive work

2018-06-06 Thread Christoph Anton Mitterer
Hey. Just wondered about the following: When I have a btrfs which acts as a master and from which I make copies of snapshots on it via send/receive (with using -p at send) to other btrfs which acts as copies like this: master +--> copy1 +--> copy2 \--> copy3 and if now e.g. the dev

call trace: WARNING: at /build/linux-uwVqDp/linux-4.16.16/fs/btrfs/ctree.h:1565 btrfs_update_device

2018-06-28 Thread Christoph Anton Mitterer
Hey. On a 4.16.16 kernel with a RAID 1 btrfs I got the following messages since today. Data seems still to be readable (correctly)... and there are no other errors (like SATA errors) in the kernel log. Any idea what these could mean? Thanks, Chris. [ 72.168662] WARNING: CPU: 0 PID: 242 at

Re: call trace: WARNING: at /build/linux-uwVqDp/linux-4.16.16/fs/btrfs/ctree.h:1565 btrfs_update_device

2018-06-28 Thread Christoph Anton Mitterer
On Thu, 2018-06-28 at 22:09 +0800, Qu Wenruo wrote: > > [ 72.168662] WARNING: CPU: 0 PID: 242 at /build/linux- > > uwVqDp/linux-4.16.16/fs/btrfs/ctree.h:1565 > > btrfs_update_device+0x1b2/0x1c0It > looks like it's the old WARN_ON() for unaligned device size. > Would you please verify if it is the

Re: call trace: WARNING: at /build/linux-uwVqDp/linux-4.16.16/fs/btrfs/ctree.h:1565 btrfs_update_device

2018-06-28 Thread Christoph Anton Mitterer
Hey Qu and Nikolay. On Thu, 2018-06-28 at 22:58 +0800, Qu Wenruo wrote: > Nothing special. Btrfs-progs will handle it pretty well. Since this a remote system where the ISP provides only a rescue image with pretty old kernel/btrfs-progs, I had to copy a current local binary and use that... but tha

Re: call trace: WARNING: at /build/linux-uwVqDp/linux-4.16.16/fs/btrfs/ctree.h:1565 btrfs_update_device

2018-06-29 Thread Christoph Anton Mitterer
On Fri, 2018-06-29 at 09:10 +0800, Qu Wenruo wrote: > Maybe it's the old mkfs causing the problem? > Although mkfs.btrfs added device size alignment much earlier than > kernel, it's still possible that the old mkfs doesn't handle the > initial > device and extra device (mkfs.btrfs will always creat

Re: Announcing btrfs-dedupe

2016-11-07 Thread Christoph Anton Mitterer
On Mon, 2016-11-07 at 15:02 +0100, David Sterba wrote: > I think adding a whole-file dedup mode to duperemove would be better > (from user's POV) than writing a whole new tool What would IMO be really good from a user's POV was, if one of the tools, deemed to be the "best", would be added to the b

Re: [PATCH] btrfs: raid56: Use correct stolen pages to calculate P/Q

2016-11-27 Thread Christoph Anton Mitterer
On Sat, 2016-11-26 at 14:12 +0100, Goffredo Baroncelli wrote: > I cant agree. If the filesystem is mounted read-only this behavior > may be correct; bur in others cases I don't see any reason to not > correct wrong data even in the read case. If your ram is unreliable > you have big problem anyway.

Re: [PATCH] btrfs: raid56: Use correct stolen pages to calculate P/Q

2016-11-27 Thread Christoph Anton Mitterer
On Mon, 2016-11-28 at 06:53 +0300, Andrei Borzenkov wrote: > If you allow any write to filesystem before resuming from hibernation > you risk corrupted filesystem. I strongly believe that "ro" must be > really read-only You're aware that "ro" already doesn't mean "no changes to the block device" o

Re: [PATCH] btrfs: raid56: Use correct stolen pages to calculate P/Q

2016-11-28 Thread Christoph Anton Mitterer
On Mon, 2016-11-28 at 19:45 +0100, Goffredo Baroncelli wrote: > I am understanding that the status of RAID5/6 code is so badly Just some random thought: If the code for raid56 is really as bad as it's often claimed (I haven't read it, to be honest) could it perhaps make sense to consider to s

Re: [PATCH] btrfs: raid56: Use correct stolen pages to calculate P/Q

2016-11-28 Thread Christoph Anton Mitterer
On Mon, 2016-11-28 at 19:32 +0100, Goffredo Baroncelli wrote: > I am assuming that a corruption is a quite rare event. So > occasionally it could happens that a page is corrupted and the system > corrects it. This shouldn't  have an impact on the workloads. Probably, but it still make sense to mak

Re: [PATCH] btrfs: raid56: Use correct stolen pages to calculate P/Q

2016-11-28 Thread Christoph Anton Mitterer
On Mon, 2016-11-28 at 16:48 -0500, Zygo Blaxell wrote: > If a drive's > embedded controller RAM fails, you get corruption on the majority of > reads from a single disk, and most writes will be corrupted (even if > they > were not before). Administrating a multi-PiB Tier-2 for the LHC Computing Gri

Re: [PATCH] btrfs: raid56: Use correct stolen pages to calculate P/Q

2016-11-29 Thread Christoph Anton Mitterer
On Tue, 2016-11-29 at 08:35 +0100, Adam Borowski wrote: > I administer no real storage at this time, and got only 16 disks > (plus a few > disk-likes) to my name right now.  Yet in a ~2 months span I've seen > three > cases of silent data corruption I didn't meant to say we'd have no silent data c

  1   2   3   4   >