Re: Kernel crash related to LZO compression
On 2018-10-25 20:49, Chris Murphy wrote: I would say the first step no matter what if you're using an older kernel, is to boot a current Fedora or Arch live or install media, mount the Btrfs and try to read the problem files and see if the problem still happens. I can't even being to estimate the tens of thousands of line changes since kernel 4.9. Good point Chris. Indeed booting a fresh kernel is never a problem. Actually I forgot to mention that I've seen the same problem with kernel 4.12.13 (attached). What profile are you using for this Btrfs? Is this a raid56? What do you get for 'btrfs fi us ' ? It is RAID1 volume for both metadata and data, but unfortunately I haven't recorded the actual output before the failure. The configuration was like this: # btrfs filesystem show /var/log Label: none uuid: 5b45ac8e-fd8c-4759-854a-94e45069959d Total devices 2 FS bytes used 11.13GiB devid3 size 50.00GiB used 14.03GiB path /dev/sda3 devid4 size 50.00GiB used 14.03GiB path /dev/sdc1 On 2018-10-25 20:49, Chris Murphy wrote: It should be safe even with that kernel. I'm not sure this is compression related. There is a corruption bug related to inline extents and corruption that had been fairly elusive but I think it's fixed now. I haven't run into it though. On 2018-10-26 02:09, Qu Wenruo wrote: Are there any updates / fixes done in that area? Is lzo option safe to use? Yes, we have commits to harden lzo decompress code in v4.18: de885e3ee281a88f52283c7e8994e762e3a5f6bd btrfs: lzo: Harden inline lzo compressed extent decompression 314bfa473b6b6d3efe68011899bd718b349f29d7 btrfs: lzo: Add header length check to avoid potential out-of-bounds acc And for the root cause, it's compressed data without csum, then scrub could make it corrupted. It's also fixed in v4.18: 665d4953cde6d9e75c62a07ec8f4f8fd7d396ade btrfs: scrub: Don't use inode page cache in scrub_handle_errored_block() ac0b4145d662a3b9e34085dea460fb06ede9b69b btrfs: scrub: Don't use inode pages for device replace Thanks, Qu, for this information. Actually one time I've seen the binary crap (not zeros) in text log files (/var/log/*.log) and I was surprised that btrfs returned me data which is corrupted instead of signalling I/O error. Could it be because of "compressed data without csum" problem? Thanks! -- With best regards, Dmitry [Sun Dec 3 19:39:55 2017] BUG: unable to handle kernel paging request at f80a3000 [Sun Dec 3 19:39:55 2017] IP: memcpy+0x11/0x20 [Sun Dec 3 19:39:55 2017] *pde = 370bb067 [Sun Dec 3 19:39:55 2017] *pte = [Sun Dec 3 19:39:55 2017] Oops: 0002 [#1] SMP [Sun Dec 3 19:39:55 2017] Modules linked in: bridge stp llc arc4 iTCO_wdt iTCO_vendor_support ppdev ath5k evdev ath mac80211 cfg80211 i915 coretemp pcspkr rfkill snd_hda_codec_realtek serio_raw snd_hda_codec_generic video snd_hda_intel drm_kms_helper snd_hda_codec lpc_ich drm snd_hda_core snd_hwdep i2c_algo_bit snd_pcm_oss snd_mixer_oss fb_sys_fops sg snd_pcm syscopyarea snd_timer sysfillrect rng_core snd sysimgblt soundcore parport_pc parport shpchp button acpi_cpufreq binfmt_misc w83627hf hwmon_vid ip_tables x_tables autofs4 ses enclosure scsi_transport_sas xfs libcrc32c hid_generic usbhid hid btrfs crc32c_generic xor raid6_pq uas usb_storage sr_mod cdrom sd_mod ata_generic ata_piix i2c_i801 libata scsi_mod firewire_ohci firewire_core crc_itu_t ehci_pci e1000e ptp pps_core uhci_hcd ehci_hcd usbcore usb_common [Sun Dec 3 19:39:55 2017] CPU: 1 PID: 100 Comm: kworker/u4:2 Tainted: G W 4.12.0-2-686 #1 Debian 4.12.13-1 [Sun Dec 3 19:39:55 2017] Hardware name: AOpen i945GMx-IF/i945GMx-IF, BIOS i945GMx-IF R1.01 Mar.02.2007 AOpen Inc. 03/02/2007 [Sun Dec 3 19:39:55 2017] Workqueue: btrfs-endio btrfs_endio_helper [btrfs] [Sun Dec 3 19:39:55 2017] task: f7337280 task.stack: f695c000 [Sun Dec 3 19:39:55 2017] EIP: memcpy+0x11/0x20 [Sun Dec 3 19:39:55 2017] EFLAGS: 00010206 CPU: 1 [Sun Dec 3 19:39:55 2017] EAX: f80a2ff8 EBX: 1000 ECX: 03fe EDX: ff998000 [Sun Dec 3 19:39:55 2017] ESI: ff998008 EDI: f80a3000 EBP: ESP: f695de88 [Sun Dec 3 19:39:55 2017] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 [Sun Dec 3 19:39:55 2017] CR0: 80050033 CR2: f9c00140 CR3: 36bc7000 CR4: 06d0 [Sun Dec 3 19:39:55 2017] Call Trace: [Sun Dec 3 19:39:55 2017] ? lzo_decompress_bio+0x19f/0x2b0 [btrfs] [Sun Dec 3 19:39:55 2017] ? end_compressed_bio_read+0x28d/0x360 [btrfs] [Sun Dec 3 19:39:55 2017] ? btrfs_scrubparity_helper+0xb6/0x2c0 [btrfs] [Sun Dec 3 19:39:55 2017] ? process_one_work+0x135/0x2f0 [Sun Dec 3 19:39:55 2017] ? worker_thread+0x39/0x3a0 [Sun Dec 3 19:39:55 2017] ? kthread+0xd7/0x110 [Sun Dec 3 19:39:55 2017] ? process_one_work+0x2f0/0x2f0 [Sun Dec 3 19:39:55 2017] ? kthread_create_on_node+0x30/0x30 [Sun Dec 3 19:39:55 2017] ? ret_from_fork+0x19/0x24 [Sun Dec 3 19:39:55 2017] Code: 43 58 2b 43 50 88 43 4e 5b eb ed 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90
Kernel crash related to LZO compression
Dear btrfs community, My excuses for the dumps for rather old kernel (4.9.25), nevertheless I wonder about your opinion about the below reported kernel crashes. As I could understand the situation (correct me if I am wrong), it happened that some data block became corrupted which resulted the following kernel trace during the boot: kernel BUG at /build/linux-fB36Cv/linux-4.9.25/fs/btrfs/extent_io.c:2318! invalid opcode: [#1] SMP Call Trace: [] ? end_bio_extent_readpage+0x4e9/0x680 [btrfs] [] ? end_compressed_bio_read+0x3b/0x2d0 [btrfs] [] ? btrfs_scrubparity_helper+0xce/0x2d0 [btrfs] [] ? process_one_work+0x141/0x380 [] ? worker_thread+0x41/0x460 [] ? kthread+0xb4/0xd0 [] ? process_one_work+0x380/0x380 [] ? kthread_park+0x50/0x50 [] ? ret_from_fork+0x1b/0x28 The problematic file turned out to be the one used by systemd-journald /var/log/journal/c496cea41ebc4700a0dfaabf64a21be4/system.journal which was trying to read it (or append to it) during the boot and that was causing the system crash (see attached bootN_dmesg.txt). I've rebooted in safe mode and tried to copy the data from this partition to another location using btrfs-restore, however kernel was crashing as well with a bit different symphom (see attached copyN_dmesg.txt): Call Trace: [] ? lzo_decompress_biovec+0x1b0/0x2b0 [btrfs] [] ? vmalloc+0x38/0x40 [] ? end_compressed_bio_read+0x265/0x2d0 [btrfs] [] ? btrfs_scrubparity_helper+0xce/0x2d0 [btrfs] [] ? process_one_work+0x141/0x380 [] ? worker_thread+0x41/0x460 [] ? kthread+0xb4/0xd0 [] ? ret_from_fork+0x1b/0x28 Just to keep away from the problem, I've removed this file and also removed "compress=lzo" mount option. Are there any updates / fixes done in that area? Is lzo option safe to use? P.S. Perhaps relative issue is in "Warnings" section: https://wiki.debian.org/Btrfs#Warnings / https://www.spinics.net/lists/linux-btrfs/msg56563.html -- With best regards, Dmitry[ 13.100666] BTRFS critical (device sda3): stripe index math went horribly wrong, got stripe_index=4294936575, num_stripes=2 [ 13.100901] BTRFS critical (device sda3): stripe index math went horribly wrong, got stripe_index=4294936575, num_stripes=2 [ 13.101096] BTRFS critical (device sda3): stripe index math went horribly wrong, got stripe_index=4294936575, num_stripes=2 [ 13.101178] [ cut here ] [ 13.101182] kernel BUG at /build/linux-fB36Cv/linux-4.9.25/fs/btrfs/extent_io.c:2318! [ 13.101185] invalid opcode: [#1] SMP [ 13.101257] Modules linked in: binfmt_misc bridge stp llc iTCO_wdt iTCO_vendor_support arc4 ppdev coretemp ath5k pcspkr ath sr9700 mac80211 dm9601 serio_raw usbnet cfg80211 snd_hda_codec_realtek snd_hda_codec_generic mii rfkill lpc_ich snd_hda_intel i915 mfd_core snd_hda_codec evdev sg snd_hda_core snd_hwdep snd_pcm_oss snd_mixer_oss rng_core snd_pcm snd_timer video snd drm_kms_helper soundcore drm parport_pc parport i2c_algo_bit shpchp button acpi_cpufreq netconsole configfs w83627hf hwmon_vid ip_tables x_tables autofs4 xfs libcrc32c btrfs crc32c_generic xor raid6_pq ses enclosure scsi_transport_sas uas hid_generic usbhid usb_storage hid sd_mod sr_mod cdrom i2c_i801 i2c_smbus firewire_ohci ata_generic firewire_core crc_itu_t ehci_pci ata_piix libata uhci_hcd ehci_hcd scsi_mod e1000e ptp pps_core [ 13.101261] usbcore usb_common [ 13.101267] CPU: 0 PID: 96 Comm: kworker/u4:2 Tainted: GW 4.9.0-3-686-pae #1 Debian 4.9.25-1 [ 13.101269] Hardware name: AOpen i945GMx-IF/i945GMx-IF, BIOS i945GMx-IF R1.01 Mar.02.2007 AOpen Inc. 03/02/2007 [ 13.101326] Workqueue: btrfs-endio btrfs_endio_helper [btrfs] [ 13.101328] task: f6d409c0 task.stack: f6d46000 [ 13.101332] EIP: 0060:[] EFLAGS: 00010203 CPU: 0 [ 13.101373] EIP is at btrfs_check_repairable+0x12c/0x130 [btrfs] [ 13.101375] EAX: 8800 EBX: f292dd80 ECX: 8801 EDX: 0002 [ 13.101378] ESI: f69c EDI: f678bc5c EBP: f6d47e50 ESP: f6d47e30 [ 13.101381] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 [ 13.101383] CR0: 80050033 CR2: b64c6db0 CR3: 36c115a0 CR4: 06f0 [ 13.101386] Stack: [ 13.101395] 1000 f292dd80 d04e93d0 f35885d8 f35885d8 [ 13.101402] f6d47ed8 f8c63739 0001 f6d47ec4 f8c951eb f3bb4800 0001 [ 13.101412] 0009 f678bc00 f35884b0 0001 [ 13.101413] Call Trace: [ 13.101457] [] ? end_bio_extent_readpage+0x4e9/0x680 [btrfs] [ 13.101497] [] ? end_compressed_bio_read+0x3b/0x2d0 [btrfs] [ 13.101538] [] ? btrfs_scrubparity_helper+0xce/0x2d0 [btrfs] [ 13.101548] [] ? process_one_work+0x141/0x380 [ 13.101553] [] ? worker_thread+0x41/0x460 [ 13.101557] [] ? kthread+0xb4/0xd0 [ 13.101561] [] ? process_one_work+0x380/0x380 [ 13.101566] [] ? kthread_park+0x50/0x50 [ 13.101572] [] ? ret_from_fork+0x1b/0x28 [ 13.104547] Modules linked in: binfmt_misc bridge stp llc iTCO_wdt iTCO_vendor_support
Re: Failover for unattached USB device
On 2018-10-24 20:05, Chris Murphy wrote: I think about the best we can expect in the short term is that Btrfs goes read-only before the file system becomes corrupted in a way it can't recover with a normal mount. And I'm not certain it is in this state of development right now for all cases. And I say the same thing for other file systems as well. Running Btrfs on USB devices is fine, so long as they're well behaved. I have such a setup with USB 3.0 devices. Perhaps I got a bit lucky, because there are a lot of known bugs with USB controllers, USB bridge chipsets, and USB hubs. Having user definable switches for when to go read-only is, I think misleading to the user, and very likely will mislead the file system. The file system needs to go read-only when it gets confused, period. It doesn't matter what the error rate is. In general I agree. I just wonder why it couldn't happen quicker. For example, from the log I've originally attached one can see that btrfs made 1867 attempts to read (perhaps the same) block from both devices in RAID1 volume, without success: BTRFS error (device sdf): bdev /dev/sdh errs: wr 0, rd 1867, flush 0, corrupt 0, gen 0 BTRFS error (device sdf): bdev /dev/sdg errs: wr 0, rd 1867, flush 0, corrupt 0, gen 0 Attempts lasted for 29 minutes. The work around is really to do the hard work making the devices stable. Not asking Btrfs to paper over known unstable hardware. In my case, I started out with rare disconnects and resets with directly attached drives. This was a couple years ago. It was a Btrfs raid1 setup, and the drives would not go missing at the same time, but both would just drop off from time to time. Btrfs would complain of dropped writes, I vaguely remember it going read only. But normal mounts worked, sometimes with scary errors but always finding a good copy on the other drive, and doing passive fixups. Scrub would always fix up the rest. I'm still using those same file systems on those devices, but now they go through a dyconn USB 3.0 hub with a decently good power supply. I originally thought the drop offs were power related, so I explicitly looked for a USB hub that could supply at least 2A, and this one is 12VDC @ 2500mA. A laptop drive will draw nearly 1A on spin up, but at that point P=AV. Laptop drives during read/write using 1.5 W to 2.5 W @ 5VDC. 1.5-2.5 W = A * 5 V Therefore A = 0.3-0.5A And for 4 drives at possibly 0.5 A (although my drives are all at the 1.6 W read/write), that's 2 A @ 5 V, which is easily maintained for the hub power supply (which by my calculation could do 6 A @ 5 V, not accounting for any resistance). Anyway, as it turns out I don't think it was power related, as the Intel NUC in question probably had just enough amps per port. And what it really was, was incompatibility between the Intel controller and the bridgechipset in the USB-SATA cases, and the USB hub is similar to an ethernet hub, it actually reads the USB stream and rewrites it out. So hubs are actually pretty complicated little things, and having a good one matters. Thanks for this information. I have a situation similar to yours, with only important difference that my drives are put into the USB dock with independent power and cooling like this one: https://www.ebay.com/itm/Mediasonic-ProBox-4-Bay-3-5-Hard-Drive-Enclosure-USB-3-0-eSATA-Sata-3-6-0Gbps/273161164246 so I don't think I need to worry about amps. This dock is connected directly to USB port on the motherboard. However indeed there could be bugs both on dock side and in south bridge. More over I could imagine that USB reset happens due to another USB device, like a wave stated in one place turning into tsunami for the whole USB subsystem. There are pending patches for something similar that you can find in the archives. I think the reason they haven't been merged yet is there haven't been enough comments and feedback (?). I think Anand Jain is the author of those patches so you might dig around in the archives. In a way you have an ideal setup for testing them out. Just make sure you have backups... Thanks for reference. Should I look for this patch here: https://patchwork.kernel.org/project/linux-btrfs/list/?submitter=34632=-date or this patch was only floating around in this maillist? 'btrfs check' without the --repair flag is safe and read only but takes a long time because it'll read all metadata. The fastest safe way is to mount it ro and read a directory recently being written to and see if there are any kernel errors. You could recursively copy files from a directory to /dev/null and then check kernel messages for any errors. So long as metadata is DUP, there is a good chance a bad copy of metadata can be automatically fixed up with a good copy. If there's only single copy of metadata, or both copies get corrupt, then it's difficult. Usually recovery of data is possible, but depending on what's damaged, repair might not be possible. I think "btrfs check" would be too heavy.
Re: Failover for unattached USB device
On 2018-10-17 00:14, Dmitry Katsubo wrote: As a workaround I can monitor dmesg output but: 1. It would be nice if I could tell btrfs that I would like to mount read-only after a certain error rate per minute is reached. 2. It would be nice if btrfs could detect that both drives are not available and unmount (as mount read-only won't help much) the filesystem. Kernel log for Linux v4.14.2 is attached. I wonder if somebody could further advise the workaround. I understand that running btrfs volume over USB devices is not good, but I think btrfs could play some role as well. In particular I wonder if btrfs could detect that all devices in RAID1 volume became inaccessible and instead of reporting increasing "write error" counter to kernel log simply render the volume as read-only. "inaccessible" could be that if the same block cannot be written back to minimum number of devices in RAID volume, so btrfs gives up. Maybe someone can advise some sophisticated way of quick checking that filesystems is healthy? Right now the only way I see is to make a tiny write (like create a file and instantly remove it) to make it die faster... Checking for write IO errors in "btrfs dev stats /mnt/backups" output could be an option provided that delta is computed for some period of time and write errors counter increase for both devices in the volume (as apparently I am not interested in one failing block which btrfs tries to write again and again increasing the write errors counter). Thanks for any feedback. -- With best regards, Dmitry
Failover for unattached USB device
Dear btrfs team / community, Sometimes it happens that kernel resets USB subsystem (looks like hardware problem). Nevertheless all USB devices are unattached and attached back. After few hours of struggle btrfs finally comes to the situation when read-only filesystem mount is necessary. During this time when I try to access this mounted filesystem (/mnt/backups) it reports success for some directories, or error for others: root@debian:~# ll /mnt/backups/ total 14334 drwxr-xr-x 1 adm users116 Sep 12 00:35 . drwxrwxr-x 1 adm users164 Sep 19 22:44 .. -rw-r--r-- 1 adm users 79927 Feb 7 2018 contacts.zip drwxr-xr-x 1 adm users254 Feb 4 2018 attic drwxr-xr-x 1 adm users 16 Feb 23 2018 recent ... root@debian:~# ll /mnt/backups/attic/ ls: reading directory '/mnt/backups/attic/': Input/output error total 0 drwxr-xr-x 1 adm users 254 Feb 4 2018 . drwxr-xr-x 1 adm users 116 Sep 12 00:35 .. It looks like this depends on whether the content is in disk cache... What is surprising: when I try to create a file, I succeed: root@debian:~# touch /mnt/backups/.mounted root@debian:~# ll /mnt/backups/.mounted -rw-r--r-- 1 root root 0 Sep 20 16:52 /mnt/backups/.mounted root@debian:~# rm /mnt/backups/.mounted My btrfs volume consists of two identical drives combined into RAID1 volume: # btrfs filesystem df /mnt/backups Data, RAID1: total=880.00GiB, used=878.96GiB System, RAID1: total=8.00MiB, used=144.00KiB Metadata, RAID1: total=2.00GiB, used=1.13GiB GlobalReserve, single: total=512.00MiB, used=0.00B # btrfs filesystem show /mnt/backups Label: none uuid: a657364b-36d2-4c1f-8e5d-dc3d28166190 Total devices 2 FS bytes used 880.09GiB devid1 size 3.64TiB used 882.01GiB path /dev/sdf devid2 size 3.64TiB used 882.01GiB path /dev/sde As a workaround I can monitor dmesg output but: 1. It would be nice if I could tell btrfs that I would like to mount read-only after a certain error rate per minute is reached. 2. It would be nice if btrfs could detect that both drives are not available and unmount (as mount read-only won't help much) the filesystem. Kernel log for Linux v4.14.2 is attached. -- With best regards, Dmitry Jun 29 18:54:56 debian kernel: [1197865.440396] usb 4-2: USB disconnect, device number 3 Jun 29 18:54:56 debian kernel: [1197865.440403] usb 4-2.2: USB disconnect, device number 5 Jun 29 18:54:56 debian kernel: [1197865.476118] usb 4-2.3: USB disconnect, device number 8 Jun 29 18:54:56 debian kernel: [1197865.549379] usb 4-2.4: USB disconnect, device number 7 ... Jun 29 18:54:58 debian kernel: [1197867.517728] usb-storage 4-2.3:1.0: USB Mass Storage device detected Jun 29 18:54:58 debian kernel: [1197867.524021] usb-storage 4-2.3:1.0: Quirks match for vid 152d pid 0567: 500 Jun 29 18:54:58 debian kernel: [1197867.603859] usb 4-2.4: new full-speed USB device number 13 using ehci-pci Jun 29 18:54:58 debian kernel: [1197867.725595] usb-storage 4-2.4:1.2: USB Mass Storage device detected Jun 29 18:54:58 debian kernel: [1197867.728602] scsi host9: usb-storage 4-2.4:1.2 Jun 29 18:54:59 debian kernel: [1197868.528737] scsi 7:0:0:0: Direct-Access ST4000DM 004-2CV104 0125 PQ: 0 ANSI: 6 Jun 29 18:54:59 debian kernel: [1197868.529310] scsi 7:0:0:1: Direct-Access ST4000DM 004-2CV104 0125 PQ: 0 ANSI: 6 Jun 29 18:54:59 debian kernel: [1197868.530093] sd 7:0:0:0: Attached scsi generic sg5 type 0 Jun 29 18:54:59 debian kernel: [1197868.530588] sd 7:0:0:1: Attached scsi generic sg6 type 0 Jun 29 18:54:59 debian kernel: [1197868.533064] sd 7:0:0:1: [sdh] Very big device. Trying to use READ CAPACITY(16). Jun 29 18:54:59 debian kernel: [1197868.533619] sd 7:0:0:1: [sdh] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB) Jun 29 18:54:59 debian kernel: [1197868.533626] sd 7:0:0:1: [sdh] 4096-byte physical blocks Jun 29 18:54:59 debian kernel: [1197868.534063] sd 7:0:0:1: [sdh] Write Protect is off Jun 29 18:54:59 debian kernel: [1197868.534069] sd 7:0:0:1: [sdh] Mode Sense: 67 00 10 08 Jun 29 18:54:59 debian kernel: [1197868.534422] sd 7:0:0:1: [sdh] No Caching mode page found Jun 29 18:54:59 debian kernel: [1197868.534542] sd 7:0:0:1: [sdh] Assuming drive cache: write through Jun 29 18:54:59 debian kernel: [1197868.535563] sd 7:0:0:1: [sdh] Very big device. Trying to use READ CAPACITY(16). Jun 29 18:54:59 debian kernel: [1197868.536702] sd 7:0:0:0: [sdg] Very big device. Trying to use READ CAPACITY(16). Jun 29 18:54:59 debian kernel: [1197868.537454] sd 7:0:0:0: [sdg] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB) Jun 29 18:54:59 debian kernel: [1197868.537459] sd 7:0:0:0: [sdg] 4096-byte physical blocks Jun 29 18:54:59 debian kernel: [1197868.538327] sd 7:0:0:0: [sdg] Write Protect is off Jun 29 18:54:59 debian kernel: [1197868.538331] sd 7:0:0:0: [sdg] Mode Sense: 67 00 10 08 ... Jun 29 20:22:35 debian kernel: [1203125.061068] BTRFS error (device sdf): bdev /dev/sdh errs: wr 0, rd 1, flush 0, corrupt 0, gen 0
brtfs warning at ctree.h:1564 btrfs_update_device+0x220/0x230
Dear btrfs team, I often observe kernel traces on linux-4.14.0 (mostly likely due to background "btrfs scrub") which contain the following "characterizing" line (for the rest see attachments): btrfs_remove_chunk+0x26a/0x7e0 [btrfs] I wonder if somebody from developers team knows anything about this problem. It seems like after such dump btfs volume continues to function OK. Thanks for any information! -- With best regards, Dmitry Jun 7 16:26:31 debian kernel: [1176060.298759] [ cut here ] Jun 7 16:26:31 debian kernel: [1176060.298820] WARNING: CPU: 0 PID: 566 at /build/linux-SCFPgu/linux-4.14.2/fs/btrfs/ctree.h:1564 btrfs_update_device+0x220/0x230 [btrfs] Jun 7 16:26:31 debian kernel: [1176060.298823] Modules linked in: option usb_wwan usbserial ipt_REJECT nf_reject_ipv4 xt_multiport iptable_filter xt_REDIRECT nf_nat_redirect xt_physdev br_netfilter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c xt_tcpudp iptable_mangle arc4 bridge stp llc iTCO_wdt iTCO_vendor_support ppdev coretemp ath5k pcspkr serio_raw ath mac80211 sr9700 dm9601 cfg80211 usbnet mii i915 rfkill snd_hda_codec_realtek lpc_ich snd_hda_codec_generic mfd_core evdev snd_hda_intel snd_hda_codec sg snd_hda_core snd_hwdep snd_pcm_oss rng_core snd_mixer_oss video snd_pcm drm_kms_helper snd_timer drm snd parport_pc soundcore i2c_algo_bit parport shpchp button acpi_cpufreq binfmt_misc w83627hf hwmon_vid ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto ecb crypto_simd cryptd Jun 7 16:26:31 debian kernel: [1176060.298930] aes_i586 btrfs crc32c_generic xor zstd_decompress zstd_compress xxhash raid6_pq hid_generic usbhid hid uas usb_storage sr_mod cdrom sd_mod ata_generic i2c_i801 ata_piix libata firewire_ohci scsi_mod firewire_core crc_itu_t e1000e ptp pps_core ehci_pci uhci_hcd ehci_hcd usbcore usb_common Jun 7 16:26:31 debian kernel: [1176060.298981] CPU: 0 PID: 566 Comm: btrfs-cleaner Tainted: GW 4.14.0-1-686-pae #1 Debian 4.14.2-1 Jun 7 16:26:31 debian kernel: [1176060.299162] Hardware name: AOpen i945GMx-IF/i945GMx-IF, BIOS i945GMx-IF R1.01 Mar.02.2007 AOpen Inc. 03/02/2007 Jun 7 16:26:31 debian kernel: [1176060.299327] task: f287e200 task.stack: f24e2000 Jun 7 16:26:31 debian kernel: [1176060.299448] EIP: btrfs_update_device+0x220/0x230 [btrfs] Jun 7 16:26:31 debian kernel: [1176060.299450] EFLAGS: 00010206 CPU: 0 Jun 7 16:26:31 debian kernel: [1176060.299454] EAX: EBX: f68bee00 ECX: 000c EDX: 0200 Jun 7 16:26:31 debian kernel: [1176060.299457] ESI: ef0d9320 EDI: EBP: f24e3e9c ESP: f24e3e5c Jun 7 16:26:31 debian kernel: [1176060.299460] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Jun 7 16:26:31 debian kernel: [1176060.299463] CR0: 80050033 CR2: 02aa3000 CR3: 32b6ece0 CR4: 06f0 Jun 7 16:26:31 debian kernel: [1176060.299467] Call Trace: Jun 7 16:26:31 debian kernel: [1176060.299561] btrfs_remove_chunk+0x26a/0x7e0 [btrfs] Jun 7 16:26:31 debian kernel: [1176060.299686] btrfs_delete_unused_bgs+0x321/0x3f0 [btrfs] Jun 7 16:26:31 debian kernel: [1176060.299819] cleaner_kthread+0x13c/0x150 [btrfs] Jun 7 16:26:31 debian kernel: [1176060.299907] kthread+0xf3/0x110 Jun 7 16:26:31 debian kernel: [1176060.33] ? __btree_submit_bio_start+0x20/0x20 [btrfs] Jun 7 16:26:31 debian kernel: [1176060.300099] ? kthread_create_on_node+0x20/0x20 Jun 7 16:26:31 debian kernel: [1176060.300182] ret_from_fork+0x19/0x24 Jun 7 16:26:31 debian kernel: [1176060.300249] Code: e9 81 fe ff ff 8d b6 00 00 00 00 bf f4 ff ff ff e9 78 fe ff ff 8d b6 00 00 00 00 f3 90 eb a8 8d 74 26 00 f3 90 e9 2b ff ff ff 90 <0f> ff e9 7a ff ff ff e8 14 4d 4c dc 8d 74 26 00 3e 8d 74 26 00 Jun 7 16:26:31 debian kernel: [1176060.300626] ---[ end trace 32773559e9ec5e68 ]--- Jul 1 07:07:31 debian kernel: [1328228.484772] [ cut here ] Jul 1 07:07:31 debian kernel: [1328228.484822] WARNING: CPU: 0 PID: 26193 at /build/linux-SCFPgu/linux-4.14.2/fs/btrfs/ctree.h:1564 btrfs_update_device+0x220/0x230 [btrfs] Jul 1 07:07:31 debian kernel: [1328228.484824] Modules linked in: cpuid nfs lockd grace sunrpc fscache ipt_REJECT nf_reject_ipv4 xt_multiport iptable_filter xt_REDIRECT nf_nat_redirect xt_physdev br_netfilter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c xt_tcpudp iptable_mangle option usb_wwan usbserial arc4 bridge stp llc iTCO_wdt iTCO_vendor_support ppdev evdev ath5k ath mac80211 coretemp cfg80211 sr9700 rfkill serio_raw dm9601 i915 usbnet pcspkr snd_hda_codec_realtek mii lpc_ich snd_hda_codec_generic mfd_core snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep rng_core video snd_pcm_oss sg drm_kms_helper snd_mixer_oss drm snd_pcm snd_timer i2c_algo_bit snd soundcore parport_pc parport button shpchp acpi_cpufreq binfmt_misc w83627hf hwmon_vid ip_tables x_tables autofs4 ext4 crc16 mbcache Jul 1 07:07:31 debian kernel:
Re: Kernel crash during btrfs scrub
On 2018-01-03 05:58, Qu Wenruo wrote: > On 2018年01月03日 09:12, Dmitry Katsubo wrote: >> Dear btrfs team, >> >> I send a kernel crash report which I have observed recently during btrfs >> scrub. >> It looks like scrub itself has completed without errors. > > It's not a kernel crash (if I didn't miss anything), but just kernel > warning. > > The warning is caused by the fact that your fs (mostly created by old > mkfs.btrfs) has device with unaligned size. > > You could either resize the device down a little (e.g. -4K) and newer > kernel (the one you're using should be new enough) could handle it well. > > Or you could update your btrfs-progs (I assume you're using Arch, which > is already shipping btrfs-progs v4.14) and use "btrfs rescue > fix-device-size" to fix other device related problems offline. > (Not only the warning, but also potential superblock size mismatch) > > Thanks, > Qu Thanks for reply! Why couldn't a warning message be issued as one-liner, e.g. with proper description and without scaring stack trace? btrfs /dev/sda1 warning: device size is not aligned with FS (mostly created by old mkfs.btrfs), see https://btrfs.wiki.kernel.org/index.php/FAQ#... -- With best regards, Dmitry -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Kernel crash during btrfs scrub
Dear btrfs team, I send a kernel crash report which I have observed recently during btrfs scrub. It looks like scrub itself has completed without errors. # btrfs scrub status /home scrub status for 83a3cb60-3334-4d11-9fdf-70b8e8703167 scrub started at Mon Jan 1 06:52:01 2018 and finished after 00:30:47 total bytes scrubbed: 87.55GiB with 0 errors # btrfs scrub status /var/log scrub status for 5b45ac8e-fd8c-4759-854a-94e45069959d scrub started at Mon Jan 1 06:52:01 2018 and finished after 00:15:45 total bytes scrubbed: 23.39GiB with 0 errors Linux kernel v4.14.2-1 btrfs-progs v4.7.3-1 -- With best regards, Dmitry [Mon Jan 1 07:04:44 2018] [ cut here ] [Mon Jan 1 07:04:44 2018] WARNING: CPU: 0 PID: 13583 at /build/linux-SCFPgu/linux-4.14.2/fs/btrfs/ctree.h:1564 btrfs_update_device+0x220/0x230 [btrfs] [Mon Jan 1 07:04:44 2018] Modules linked in: md4 nls_utf8 cifs ccm dns_resolver fscache option usb_wwan usbserial isofs loop ses enclosure scsi_transport_sas hid_generic usbhid hid ipt_REJECT nf_reject_ipv4 xt_multiport iptable_filter xt_REDIRECT nf_nat_redirect xt_physdev br_netfilter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c xt_tcpudp iptable_mangle bridge stp llc arc4 iTCO_wdt iTCO_vendor_support ppdev evdev snd_hda_codec_realtek snd_hda_codec_generic ath5k ath mac80211 cfg80211 snd_hda_intel i915 rfkill coretemp snd_hda_codec snd_hda_core snd_hwdep serio_raw snd_pcm_oss pcspkr snd_mixer_oss lpc_ich snd_pcm mfd_core snd_timer snd video soundcore drm_kms_helper sg drm shpchp i2c_algo_bit rng_core parport_pc parport button acpi_cpufreq binfmt_misc w83627hf hwmon_vid [Mon Jan 1 07:04:44 2018] ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto ecb crypto_simd cryptd aes_i586 btrfs crc32c_generic xor zstd_decompress zstd_compress xxhash raid6_pq uas usb_storage sr_mod sd_mod cdrom ata_generic ata_piix i2c_i801 libata firewire_ohci scsi_mod firewire_core crc_itu_t ehci_pci uhci_hcd ehci_hcd e1000e ptp pps_core usbcore usb_common [Mon Jan 1 07:04:44 2018] CPU: 0 PID: 13583 Comm: btrfs Tainted: GW 4.14.0-1-686-pae #1 Debian 4.14.2-1 [Mon Jan 1 07:04:44 2018] Hardware name: AOpen i945GMx-IF/i945GMx-IF, BIOS i945GMx-IF R1.01 Mar.02.2007 AOpen Inc. 03/02/2007 [Mon Jan 1 07:04:44 2018] task: eba6a000 task.stack: ca216000 [Mon Jan 1 07:04:44 2018] EIP: btrfs_update_device+0x220/0x230 [btrfs] [Mon Jan 1 07:04:44 2018] EFLAGS: 00210206 CPU: 0 [Mon Jan 1 07:04:44 2018] EAX: EBX: f6908400 ECX: 000c EDX: 0200 [Mon Jan 1 07:04:44 2018] ESI: f69e2280 EDI: EBP: ca217bd8 ESP: ca217b98 [Mon Jan 1 07:04:44 2018] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 [Mon Jan 1 07:04:44 2018] CR0: 80050033 CR2: b795da00 CR3: 1a2e2460 CR4: 06f0 [Mon Jan 1 07:04:44 2018] Call Trace: [Mon Jan 1 07:04:44 2018] btrfs_finish_chunk_alloc+0xf3/0x480 [btrfs] [Mon Jan 1 07:04:44 2018] ? btrfs_free_path.part.26+0x1c/0x20 [btrfs] [Mon Jan 1 07:04:44 2018] ? btrfs_insert_item+0x66/0xd0 [btrfs] [Mon Jan 1 07:04:44 2018] btrfs_create_pending_block_groups+0x139/0x250 [btrfs] [Mon Jan 1 07:04:44 2018] __btrfs_end_transaction+0x78/0x2e0 [btrfs] [Mon Jan 1 07:04:44 2018] btrfs_end_transaction+0xf/0x20 [btrfs] [Mon Jan 1 07:04:44 2018] btrfs_inc_block_group_ro+0xea/0x190 [btrfs] [Mon Jan 1 07:04:44 2018] scrub_enumerate_chunks+0x215/0x660 [btrfs] [Mon Jan 1 07:04:44 2018] btrfs_scrub_dev+0x1e8/0x4e0 [btrfs] [Mon Jan 1 07:04:44 2018] btrfs_ioctl+0x1480/0x28b0 [btrfs] [Mon Jan 1 07:04:44 2018] ? kmem_cache_alloc+0x30c/0x540 [Mon Jan 1 07:04:44 2018] ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs] [Mon Jan 1 07:04:44 2018] do_vfs_ioctl+0x90/0x650 [Mon Jan 1 07:04:44 2018] ? do_vfs_ioctl+0x90/0x650 [Mon Jan 1 07:04:44 2018] ? create_task_io_context+0x78/0xe0 [Mon Jan 1 07:04:44 2018] ? get_task_io_context+0x3d/0x80 [Mon Jan 1 07:04:44 2018] SyS_ioctl+0x58/0x70 [Mon Jan 1 07:04:44 2018] do_fast_syscall_32+0x71/0x1a0 [Mon Jan 1 07:04:44 2018] entry_SYSENTER_32+0x4e/0x7c [Mon Jan 1 07:04:44 2018] EIP: 0xb7f81cf9 [Mon Jan 1 07:04:44 2018] EFLAGS: 0246 CPU: 0 [Mon Jan 1 07:04:44 2018] EAX: ffda EBX: 0003 ECX: c400941b EDX: 092e21b8 [Mon Jan 1 07:04:44 2018] ESI: 092e21b8 EDI: 003d0f00 EBP: b7cff1e8 ESP: b7cff188 [Mon Jan 1 07:04:44 2018] DS: 007b ES: 007b FS: GS: 0033 SS: 007b [Mon Jan 1 07:04:44 2018] Code: e9 81 fe ff ff 8d b6 00 00 00 00 bf f4 ff ff ff e9 78 fe ff ff 8d b6 00 00 00 00 f3 90 eb a8 8d 74 26 00 f3 90 e9 2b ff ff ff 90 <0f> ff e9 7a ff ff ff e8 14 ad 48 d0 8d 74 26 00 3e 8d 74 26 00 [Mon Jan 1 07:04:44 2018] ---[ end trace 6b4736d811ae42e1 ]--- [Mon Jan 1 07:05:00 2018] [ cut here ] [Mon Jan 1 07:05:00 2018] WARNING: CPU: 1 PID: 443 at /build/linux-SCFPgu/linux-4.14.2/fs/btrfs/ctree.h:1564 btrfs_update_device+0x220/0x230 [btrfs] [Mon Jan 1 07:05:00 2018]
Re: btrfs defrag questions
On 2016-07-01 22:46, Henk Slager wrote: > (email ends up in gmail spamfolder) > On Fri, Jul 1, 2016 at 10:14 PM, Dmitry Katsubo <dm...@mail.ru> wrote: >> Hello everyone, >> >> Question #1: >> >> While doing defrag I got the following message: >> >> # btrfs fi defrag -r /home >> ERROR: defrag failed on /home/user/.dropbox-dist/dropbox: Success >> total 1 failures >> >> I feel that something went wrong, but the message is a bit misleading. >> >> Provided that Dropbox is running in the system, does it mean that it >> cannot be defagmented? > > I think it is a matter of newlines in btrfs-progs and/or stdout/stderr mixup. > > You should run the command with -v and probably also with -f, so that > it gets hopefully clearer what is wrong. Running with "-v -f" (or just "-v") result the same output: ... /home/user/.dropbox-dist/dropbox-lnx.x86-5.4.24/select.so /home/user/.dropbox-dist/dropbox-lnx.x86-5.4.24/grp.so /home/user/.dropbox-dist/dropbox-lnx.x86-5.4.24/posixffi.libc._posixffi_libcERROR: defrag failed on /home/user/.dropbox-dist/dropbox-lnx.x86-5.4.24/dropbox: Success .so /home/user/.dropbox-dist/dropbox-lnx.x86-5.4.24/_functools.so /home/user/.dropbox-dist/dropbox-lnx.x86-5.4.24/dropbox /home/user/.dropbox-dist/dropbox-lnx.x86-5.4.24/_csv.so ... This is not a matter of newlines: $ grep -rnH 'defrag failed' btrfs-progs btrfs-progs/cmds-filesystem.c:1021: error("defrag failed on %s: %s", fpath, strerror(e)); btrfs-progs/cmds-filesystem.c:1161: error("defrag failed on %s: %s", argv[i], strerror(e)); > That it fails on dropbox is an error I think, but maybe known: Could > be mount option is compress and that that causes trouble for defrag > although that should not happen. True, compression is enabled. > You can defrag just 1 file, so maybe you could try to make a reproducible > case. When I run it on one file, it works as expected: # btrfs fi defrag -r -v /home/user/.dropbox-dist/dropbox-lnx.x86-5.4.24/dropbox ERROR: cannot open /home/user/.dropbox-dist/dropbox-lnx.x86-5.4.24/dropbox: Text file busy > What kernel? > What btrfs-progs? kernel v4.4.6 btrfs-tools v4.5.2 >> Question #2: >> >> Suppose that in above example /home/ftp is mounted as another btrfs >> array (not subvolume). Will 'btrfs fi defrag -r /home' defragment it >> (recursively) as well? > > I dont know, I dont think so, but you can simply try. Many thanks, now I see how can I check this. Unfortunately it does not descend into submounted directories. -- With best regards, Dmitry -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs defrag questions
Hello everyone, Question #1: While doing defrag I got the following message: # btrfs fi defrag -r /home ERROR: defrag failed on /home/user/.dropbox-dist/dropbox: Success total 1 failures I feel that something went wrong, but the message is a bit misleading. Provided that Dropbox is running in the system, does it mean that it cannot be defagmented? Question #2: Suppose that in above example /home/ftp is mounted as another btrfs array (not subvolume). Will 'btrfs fi defrag -r /home' defragment it (recursively) as well? -- With best regards, Dmitry -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs defrag: success or failure?
Hi everyone, I got the following message: # btrfs fi defrag -r /home ERROR: defrag failed on /home/user/.dropbox-dist/dropbox: Success total 1 failures I feel that something went wrong, but the message is a bit misleading. Anyway: Provided that Dropbox is running in the system, does it mean that it cannot be defagmented? -- With best regards, Dmitry -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is "btrfs balance start" truly asynchronous?
On 2016-06-21 15:17, Graham Cobb wrote: > On 21/06/16 12:51, Austin S. Hemmelgarn wrote: >> The scrub design works, but the whole state file thing has some rather >> irritating side effects and other implications, and developed out of >> requirements that aren't present for balance (it might be nice to check >> how many chunks actually got balanced after the fact, but it's not >> absolutely necessary). > > Actually, that would be **really** useful. I have been experimenting > with cancelling balances after a certain time (as part of my > "balance-slowly" script). I have got it working, just using bash > scripting, but it means my script does not know whether any work has > actually been done by the balance run which was cancelled (if no work > was done, but it timed out anyway, there is probably no point trying > again with the same timeout later!). Additionally it would be nice if balance/scrub reports the status via /proc in human readable manner (similar to /proc/mdstat). -- With best regards, Dmitry -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Is "btrfs balance start" truly asynchronous?
Dear btfs community, I have added a drive to existing raid1 btrfs volume and decided to perform balancing so that data distributes "fairly" among drives. I have started "btrfs balance start", but it stalled for about 5-10 minutes intensively doing the work. After that time it has printed something like "had to relocate 50 chunks" and exited. According to drive I/O, "btrfs balance" did most (if not all) of the work, so after it has exited the job was done. Shouldn't "btrfs balance start" do the operation in the background? Thanks for any information. -- With best regards, Dmitry -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Process is blocked for more than 120 seconds
On 2015-11-11 12:38, Dmitry Katsubo wrote: > On 2015-11-09 14:25, Austin S Hemmelgarn wrote: >> On 2015-11-07 07:22, Dmitry Katsubo wrote: >>> Hi everyone, >>> >>> I have noticed the following in the log. The system continues to run, >>> but I am not sure for how long it will be stable. Should I start >>> worrying? Thanks in advance for the opinion. >>> >> This just means that a process was stuck in the D state (uninterruptible >> I/O sleep) for more than 120 seconds. Depending on a number of factors, >> this happening could mean: >> 1. Absolutely nothing (if you have low-powered or older hardware, for >> example, I get these regularly on a first generation Raspberry Pi if I >> don't increase the timeout significantly) >> 2. The program is doing a very large chunk of I/O (usually with the >> O_DIRECT flag, although this probably isn't the case here) >> 3. There's a bug in the blocked program (this is rarely the case when >> this type of thing happens) >> 4. There's a bug in the kernel (which is why this dumps a stack trace) >> 5. The filesystem itself is messed up somehow, and the kernel isn't >> handling it properly (technically a bug, but a more specific case of it). >> 6. You're hardware is misbehaving, failing, or experienced a transient >> error. >> >> Assuming you can rule out possibilities 1 and 6, I think that 4 is the >> most likely cause, as all of the listed programs (I'm assuming that >> 'master' is from postfix) are relatively well audited, and all of them >> hit this at the same time. >> >> For what it's worth, if you want you can do: >> echo 0 > /proc/sys/kernel/hung_task_timeout_secs >> like the message says to stop these from appearing in the future, or use >> some arbitrary number to change the timeout before these messages appear >> (I usually use at least 150 on production systems, and more often 300, >> although on something like a Raspberry Pi I often use timeouts as high >> as 1800 seconds). > > Thanks for comments, Austin. > > The system is "normal" PC, running Intel Core 2 Duo Mobile @1.66GHz. > "master" is indeed a postfix process. > > I haven't seen anything like that when I was on 3.16 kernel, but after I > have upgraded to 4.2.3, I caught that message. I/O and CPU load are > usually low, but it could be (6) from your list, as the system is > generally very old (5+ years). > > As the problem appeared only once for passed 15 days, I think it is just > a transient error. Thanks for clarifying the possible reasons. The problem (rarely) re-occurs. It does not happen on XFS filesystem (root) but only on btrfs. I will increase timeout up to 300 and see what happens. === cut dmesg === INFO: task fail2ban-server:1747 blocked for more than 120 seconds. Tainted: GW 4.4.0-1-rt-686-pae #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. fail2ban-server D 001f 0 1747 1 0x f1ca1bc0 00200086 f2d24190 001f f79ca4c0 f3d21bc0 c1168726 f1ca1bc0 e9152000 e9151d8c c156075f c0d25a90 f1ca1bc0 e9151db4 c1561ed4 f1ca1bc0 0002 eab98940 c0d25a90 Call Trace: [] ? __filemap_fdatawrite_range+0xb6/0xf0 [] ? schedule+0x3f/0xd0 [] ? __rt_mutex_slowlock+0x74/0x140 [] ? rt_mutex_slowlock+0xf3/0x250 [] ? btrfs_write_marked_extents+0xae/0x190 [btrfs] [] ? rt_mutex_lock+0x45/0x50 [] ? btrfs_sync_log+0x1d5/0x9a0 [btrfs] [] ? pin_current_cpu+0x71/0x1a0 [] ? preempt_count_add+0x8a/0xb0 [] ? unpin_current_cpu+0x13/0x70 [] ? btrfs_sync_file+0x3ce/0x410 [btrfs] [] ? start_ordered_ops+0x40/0x40 [btrfs] [] ? vfs_fsync_range+0x47/0xb0 [] ? do_fsync+0x3c/0x60 [] ? SyS_fdatasync+0x15/0x20 [] ? do_fast_syscall_32+0x8d/0x150 [] ? sysenter_past_esp+0x3d/0x61 [] ? pci_mmcfg_check_reserved+0x90/0xb0 INFO: task cleanup:2093 blocked for more than 120 seconds. Tainted: GW 4.4.0-1-rt-686-pae #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. cleanup D d0f5de0c 0 2093 28135 0x eab99bc0 00200086 c1836760 d0f5de0c 0002 f79ca4c0 f3d21bc0 0001 eab99bc0 d0f5e000 f325d2c4 d0f5ddfc c156075f f325d000 0088 d0f5de30 f8c72f36 0310 f325d290 eab99bc0 c10afd70 f325d2dc Call Trace: [] ? schedule+0x3f/0xd0 [] ? wait_log_commit+0xc6/0xf0 [btrfs] [] ? wake_atomic_t_function+0x70/0x70 [] ? btrfs_sync_log+0x36a/0x9a0 [btrfs] [] ? pin_current_cpu+0x71/0x1a0 [] ? preempt_count_add+0x8a/0xb0 [] ? unpin_current_cpu+0x13/0x70 [] ? btrfs_log_dentry_safe+0x64/0x70 [btrfs] [] ? btrfs_sync_file+0x3ce/0x410 [btrfs] [] ? do_sys_truncate+0xb0/0xb0 [] ? start_ordered_ops+0x40/0x40 [btrfs] []
Re: Hot data tracking / hybrid storage
On 2016-05-29 22:45, Ferry Toth wrote: Op Sun, 29 May 2016 12:33:06 -0600, schreef Chris Murphy: On Sun, May 29, 2016 at 12:03 PM, Holger Hoffstättewrote: On 05/29/16 19:53, Chris Murphy wrote: But I'm skeptical of bcache using a hidden area historically for the bootloader, to put its device metadata. I didn't realize that was the case. Imagine if LVM were to stuff metadata into the MBR gap, or mdadm. Egads. On the matter of bcache in general this seems noteworthy: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4d1034eb7c2f5e32d48ddc4dfce0f1a723d28667 bummer.. Well it doesn't mean no one will take it, just that no one has taken it yet. But the future of SSD caching may only be with LVM. I think all the above posts underline exacly my point: Instead of using a ssd cache (be it bcache or dm-cache) it would be much better to have the btrfs allocator be aware of ssd's in the pool and prioritize allocations to the ssd to maximize performance. This will allow to easily add more ssd's or replace worn out ones, without the mentioned headaches. After all adding/replacing drives to a pool is one of btrfs's biggest advantages. I would certainly vote for this feature. If I understand correctly, the mirror is selected based on the PID of btrfs worker thread [1], which is simple but not most effective. I would suggest implementing the queue of read operations per physical device (perhaps reads/writes should be put into same queue). If device is fast (and for SSD that is the case), the queue becomes empty quicker which means it should be loaded more intensively. Allocation logic should simply put the next request to the shortest queue. I think this will guarantee that most operations are served by SSD (or any other even faster technology that appears in the future). [1] https://btrfs.wiki.kernel.org/index.php/Project_ideas#Better_data_balancing_over_multiple_devices_for_raid1.2F10_.28read.29 -- With best regards, Dmitry -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Some ideas for improvements
On 2016-05-25 21:03, Duncan wrote: > Dmitry Katsubo posted on Wed, 25 May 2016 16:45:41 +0200 as excerpted: >> * Would be nice if 'btrfs scrub status' shows estimated finishing time >> (ETA) and throughput (in Mb/s). > > That might not be so easy to implement. (Caveat, I'm not a dev, just a > btrfs user and list regular, so if a dev says different...) > > Currently, a running scrub simply outputs progress to a file (/var/lib/ > btrfs/scrub.status.), and scrub status is simply a UI to pretty- > print that file. Note that there's nothing in there which lists the > total number of extents or bytes to go -- that's not calculated ahead of > time. > > So implementing some form of percentage done or eta is likely to increase > the processing time dramatically, as it could involve doing a dry-run > first, in ordered to get the total figures against which to calculate > percentage done. Indeed that this cannot (should not) be done on user-space level: kernel module should provide information about that. I am not a dev :) but I think module should now number of extents, at least something is shown in "btrfs fi usage ..." output. The information shouldn't be 100% exact, but at least some indication would be great. In worst scenario module can remember the last scrub time and make estimation based on that (similar how some CD burning utilities do). >> * Not possible to start scrub for all devices in the volume without >> mounting it. > > Interesting. It's news to me that you can scrub individual devices > without mounting. But given that, this would indeed be a useful feature, > and given that btrfs filesystem show can get the information, scrub > should be able to get and make use of it as well. =:^) More over I got into a trap when tried to use "btrfs scrub start /dev/..." syntax, as I only scrubs the given device. When I scrubbed the whole volume after mounting it, de result was different. I understood it only after reading man btrfs-scrub more attentively: start ... | Start a scrub on all devices of the filesystem identified by or on a single . Other (shorter) forms of help misled me, giving the impression that it does not matter whether I specify a path or device. On 2016-05-26 00:05, Duncan wrote: > Nicholas D Steeves posted on Wed, 25 May 2016 16:36:13 -0400 as excerpted: >> On 25 May 2016 at 15:03, Duncan <1i5t5.dun...@cox.net> wrote: >>> Dmitry Katsubo posted on Wed, 25 May 2016 16:45:41 +0200 as excerpted: >>>> btrfs-restore [needs an o]ption that applies (y) to all questions >>>> (completely unattended recovery) >>> >>> That['s] a known sore spot that a lot of people have complained >>> about. > >> I'm surprised no one has mentioned, in any of these discussions, what I >> believe is the standard method of providing this functionality: >> yes | btrfs-restore -options /dev/disk > > Good point. > > I didn't bring it up because while I've used btrfs restore a few times, > my btrfs are all on relatively small SSD partitions, so I both needed > less y's, and the total time per restore is a few minutes, not hours, so > it wasn't a big deal. As a result, while I know of yes, I didn't need to > think about automation, and as I never used it, it didn't occur to me to > suggest it for others. Thanks for advise, Nicholas. Last time I tried it I used the following command: while true; do echo y; done | btrfs restore -voxmSi /dev/sda /mnt/tmp &> btrfs_restore & which presumably is equivalent to what you suggest. The command was in "running" state in "jobs" output for a while, but then turned into "waiting" state and did not progress. I suspect that btrfs-restore somehow reads directly from terminal, not from stdin. I will try the solution with "yes | btrfs-restore..." once I get a chance. -- With best regards, Dmitry -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Some ideas for improvements
Dear btrfs community, I hope btrfs developers are open for suggestions. btrfs-scrub: * Would be nice if 'btrfs scrub status' shows estimated finishing time (ETA) and throughput (in Mb/s). * Not possible to start scrub for all devices in the volume without mounting it. btrfs-restore: * It does not restore special files like named pipes and devices. * Hard-linked files are not correctly restored (they all turn into independent replicas). * If the file cannot be read / recovered, it is still created with zero size (I would expect that the file is not created). * I think that the options '-xmS' should be enabled by default (shouldn't it be a goal to restore as much as possible?). * Option that applies (y) to all questions (completely unattended recovery) is missing. -- With best regards, Dmitry -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Copy on write of unmodified data
On 2016-05-25 11:29, Hugo Mills wrote: On Wed, May 25, 2016 at 01:58:15AM -0700, H. Peter Anvin wrote: Hi, I'm looking at using a btrfs with snapshots to implement a generational backup capacity. However, doing it the naïve way would have the side effect that for a file that has been partially modified, after snapshotting the file would be written with *mostly* the same data. How does btrfs' COW algorithm deal with that? If necessary I might want to write some smarter user space utilities for this. Sounds like it might be a job for one of the dedup tools (deupremove, bedup), or, if you're writing your own, the safe deduplication ioctl which underlies those tools. Hugo. Perhaps it really makes sense to delegate de-duplication to 3-rd party software like BackupPC [1]. I am not sure if btrfs can manage it more effectively, as in order to find duplicates it would need to scan / analyse all blocks, so at least it would take longer. [1] https://sourceforge.net/projects/backuppc/ -- With best regards, Dmitry -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Spare volumes and hot auto-replacement feature
Dear btrfs community, I am interested in spare volumes and hot auto-replacement feature [1]. I have a couple of questions: * Which kernel version this feature will be included? * The description says that replacement happens automatically when there is any write failed or flush failed. Is it possible to control the ratio / number of such failures? (e.g. in case it was one-time accidental failure) * What happens if spare device is smaller then the (failing) device to be replaced? * What happens if during the replacement the spare device fails (write error)? * Is it possible for root to be notified in case if drive replacement (successful or unsuccessful) took place? Actually this question is actual for me for overall write/flush failures on btrfs volume (btrfs monitor). Many thanks! [1] https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg48209.html -- With best regards, Dmitry -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Kernel crash if both devices in raid1 are failing
Hello, If somebody is interested in digging into the problem, I would be happy to provide more information and/or do the testing. On 2016-04-27 04:44, Dmitry Katsubo wrote: > # cat /mnt/tmp/file > /dev/null > [ 11.432059] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 > [ 11.436665] ata3.00: BMDMA stat 0x25 > [ 11.441301] ata3.00: failed command: READ DMA > [ 11.479570] ata3.00: cmd c8/00:20:40:ec:f3/00:00:00:00:00/e3 tag 0 dma > 16384 in > [ 11.479664] res 51/40:1e:42:ec:f3/00:00:00:00:00/e3 Emask 0x9 > (media error) > [ 11.619086] ata3.00: status: { DRDY ERR } > [ 11.619126] ata3.00: error: { UNC } > [ 11.625750] blk_update_request: I/O error, dev sda, sector 66317378 > [ 11.625779] NOHZ: local_softirq_pending 40 > [ 70.969876] [ cut here ] > [ 70.969879] kernel BUG at > /build/linux-SBJFwR/linux-4.4.6/debian/build/source_rt/fs/btrfs/volumes.c:5509! > [ 70.969885] invalid opcode: [#1] PREEMPT SMP > [ 70.969954] Modules linked in: netconsole configfs bridge stp llc arc4 > iTCO_wdt iTCO_vendor_support ppdev coretemp pcspkr serio_raw i2c_i801 ath5k > ath mac80211 cfg80211 sr9700 evdev rfkill dm9601 usbnet lpc_ich mfd_core mii > option usb_wwan usbserial rng_core sg snd_hda_codec_realtek > snd_hda_codec_generic i915 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep > snd_pcm_oss snd_mixer_oss snd_pcm acpi_cpufreq snd_timer video 8250_fintek > snd drm_kms_helper soundcore tpm_tis drm tpm parport_pc i2c_algo_bit parport > shpchp button processor binfmt_misc w83627hf hwmon_vid autofs4 xfs libcrc32c > hid_generic usbhid hid crc32c_generic btrfs xor raid6_pq uas usb_storage > sd_mod sr_mod cdrom ata_generic firewire_ohci ata_piix libata scsi_mod > firewire_core crc_itu_t ehci_pci uhci_hcd ehci_hcd usbcore usb_common e1000e > ptp pps_core > [ 70.969965] CPU: 0 PID: 114 Comm: kworker/u4:3 Tainted: GW > 4.4.0-1-rt-686-pae #1 Debian 4.4.6-1 > [ 70.969968] Hardware name: AOpen i945GMx-IF/i945GMx-IF, BIOS i945GMx-IF > R1.01 Mar.02.2007 AOpen Inc. 03/02/2007 > [ 70.970029] Workqueue: btrfs-endio btrfs_endio_helper [btrfs] > [ 70.970032] task: f3eec0c0 ti: f6a76000 task.ti: f6a76000 > [ 70.970036] EIP: 0060:[] EFLAGS: 00010217 CPU: 0 > [ 70.970076] EIP is at __btrfs_map_block+0x11be/0x15a0 [btrfs] -- With best regards, Dmitry -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Kernel crash if both devices in raid1 are failing
On 2016-04-25 09:12, Dmitry Katsubo wrote: > I have run "btrfs check /dev/sda" two times. One time it has completed > OK, actually showing only one error. The 2nd time it has shown many messages > > "parent transid verify failed on NNN wanted AAA found BBB" > > and then asserted :) But I think the 2nd run is not representative as I have > gracefully removed one drive from btrfs array to build a new array. The > "btrfs device remove" completed successfully, but it might have written some > metadata to the remaining drives, which perhaps was not synchronized > correctly. > > What I am going to do next is to recompile btrfs-tools so that "-i" CLI option > applies "(y)" to all questions and run "btrfs restore" again. Hopefully it can > handle transid mismatch correctly... OK, I have recompiled btrfs with necessary fix (attached). It allowed me to capture "btrfs restore" output because due to reads from console it was not possible, even with attempts like this: while true; do echo y; done | btrfs restore -voxmSi /dev/sda /mnt/backup 2>&1 | tee btrfs_restore For the matter of experiment I have upgraded kernel to 4.4.6 and it still crashes on problematic file: # cat /mnt/tmp/file > /dev/null [ 11.432059] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 [ 11.436665] ata3.00: BMDMA stat 0x25 [ 11.441301] ata3.00: failed command: READ DMA [ 11.479570] ata3.00: cmd c8/00:20:40:ec:f3/00:00:00:00:00/e3 tag 0 dma 16384 in [ 11.479664] res 51/40:1e:42:ec:f3/00:00:00:00:00/e3 Emask 0x9 (media error) [ 11.619086] ata3.00: status: { DRDY ERR } [ 11.619126] ata3.00: error: { UNC } [ 11.625750] blk_update_request: I/O error, dev sda, sector 66317378 [ 11.625779] NOHZ: local_softirq_pending 40 [ 70.969876] [ cut here ] [ 70.969879] kernel BUG at /build/linux-SBJFwR/linux-4.4.6/debian/build/source_rt/fs/btrfs/volumes.c:5509! [ 70.969885] invalid opcode: [#1] PREEMPT SMP [ 70.969954] Modules linked in: netconsole configfs bridge stp llc arc4 iTCO_wdt iTCO_vendor_support ppdev coretemp pcspkr serio_raw i2c_i801 ath5k ath mac80211 cfg80211 sr9700 evdev rfkill dm9601 usbnet lpc_ich mfd_core mii option usb_wwan usbserial rng_core sg snd_hda_codec_realtek snd_hda_codec_generic i915 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm acpi_cpufreq snd_timer video 8250_fintek snd drm_kms_helper soundcore tpm_tis drm tpm parport_pc i2c_algo_bit parport shpchp button processor binfmt_misc w83627hf hwmon_vid autofs4 xfs libcrc32c hid_generic usbhid hid crc32c_generic btrfs xor raid6_pq uas usb_storage sd_mod sr_mod cdrom ata_generic firewire_ohci ata_piix libata scsi_mod firewire_core crc_itu_t ehci_pci uhci_hcd ehci_hcd usbcore usb_common e1000e ptp pps_core [ 70.969965] CPU: 0 PID: 114 Comm: kworker/u4:3 Tainted: GW 4.4.0-1-rt-686-pae #1 Debian 4.4.6-1 [ 70.969968] Hardware name: AOpen i945GMx-IF/i945GMx-IF, BIOS i945GMx-IF R1.01 Mar.02.2007 AOpen Inc. 03/02/2007 [ 70.970029] Workqueue: btrfs-endio btrfs_endio_helper [btrfs] [ 70.970032] task: f3eec0c0 ti: f6a76000 task.ti: f6a76000 [ 70.970036] EIP: 0060:[] EFLAGS: 00010217 CPU: 0 [ 70.970076] EIP is at __btrfs_map_block+0x11be/0x15a0 [btrfs] Unfortunately I was not able to capture the whole trace, as there seem to be concurrent problem with netconsole: the whole system hangs at the point above. P.S. If debian maintainer of btrfs-progs is on the list: Project packaging fails for me (happens at the very end during binaries installation): # debuild ... dpkg-source: info: building btrfs-progs in btrfs-progs_4.4.1-1.1.debian.tar.xz dpkg-source: info: building btrfs-progs in btrfs-progs_4.4.1-1.1.dsc debian/rules build dh build --parallel dh_testdir -O--parallel debian/rules override_dh_auto_configure make[1]: Entering directory '/home/btrfs-progs-4.4.1' dh_auto_configure -- --bindir=/bin make[1]: Leaving directory '/home/btrfs-progs-4.4.1' dh_auto_build -O--parallel fakeroot debian/rules binary dh binary --parallel dh_testroot -O--parallel dh_prep -O--parallel debian/rules override_dh_auto_install make[1]: Entering directory '/home/btrfs-progs-4.4.1' dh_auto_install --destdir=debian/btrfs-progs # Adding initramfs-tools integration install -D -m 0755 debian/local/btrfs.hook debian/btrfs-progs/usr/share/initramfs-tools/hooks/btrfs install -D -m 0755 debian/local/btrfs.local-premount debian/btrfs-progs/usr/share/initramfs-tools/scripts/local-premount/btrfs make[1]: Leaving directory '/home/btrfs-progs-4.4.1' dh_install -O--parallel /home/btrfs-progs-4.4.1/debian/btrfs-progs.install: 1: /home/btrfs-progs-4.4.1/debian/btrfs-progs.install: btrfs-calc-size: not found /home/btrfs-progs-4.4.1/debian/btrfs-progs.install: 2: /home/btrfs-progs-4.4.1/debian/btrfs-progs.install: btrfs-select-su
Re: Kernel crash if both devices in raid1 are failing
On 2016-04-19 09:58, Duncan wrote: > Dmitry Katsubo posted on Tue, 19 Apr 2016 07:45:40 +0200 as excerpted: > >> Actually btrfs restore has recovered many files, however I was not able >> to run in fully unattended mode as it complains about "looping a lot". >> Does it mean that files are corrupted / not correctly restored? > > As long as you tell it to keep going each time, the loop complaints > shouldn't be an issue. The problem is that the loop counter is measuring > loops on a particular directory, because that's what it has available to > measure. But if you had a whole bunch of files in that dir, it's /going/ > to loop a lot, to restore all of them. > > I have one cache directory with over 200K files in it. They're all text > messages from various technical lists and newsgroups (like this list, > which I view as a newsgroup using gmane.org's list2news service) so > they're quite small, about 5 KiB on average by my quick calculation, but > that's still a LOT of files for a single dir, even if they're only using > just over a GiB of space. > > I ended up doing a btrfs restore on that filesystem (/home), because > while I had a backup, restore was getting more recent copies of stuff > back, and that dir looped a *LOT* the first time it happened, now several > years ago, before they actually added the always option. I have the same situation here: there is a backup, but the most recent modifications in files are preferable. > The second time it happened, about a year ago, restore worked much > better, and I was able to use the always option. But AFAIK, always only > applies to that dir. If you have multiple dirs with the problem, you'll > still get asked for the next one. But it did vastly improve the > situation for me, giving me only a handful of prompts instead of the very > many I had before the option was there. Yes, this is exactly the problem discussed a while ago. Would be nice if "btrfs restore -i" applies "(a)lways" option to all questions or there is a separate option for that ("-y"). For me personally "looping" is too low-level problem. System administrators (that are going to use this utility) should operate with some more reasonable terms. If "looping" is some analogy of "time consumption" then I would say that during restore time does not matter so much: I am ready to wait for 1 minute until a specific file is restored. So I think not the number of loops but number of time spent should be measured. Also I have difficulties in finding out what files have not been restored due to uncorrectable errors. As I cannot redirect the output of "btrfs restore" and it does not print the final stats, I cannot tell what files have to be restored from backup. > (The main problem triggering the need to run restore for me, turned out > to be hardware. I've had no issues since I replaced that failing ssd, > and with a bit of luck, won't be running restore again for a few years, > now.) I would be happy if I am able to replace the failing drive on the fly, without stopping the system. Unfortunately I cannot do that due to kernel crashes :( btrfs is still not resistant to these corner cases. -- With best regards, Dmitry -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Kernel crash if both devices in raid1 are failing
On 2016-04-18 02:19, Chris Murphy wrote: > With two device failure on raid1 volume, the file system is actually > broken. There's a big hole in the metadata, not just missing data, > because there are only two copies of metadata, distributed across > three drives. Thanks, I understand that. Well, the drive has not completely failed, it has accidental read-write errors. I still wonder what went wrong and why the kernel has crashed - I think this should not happen, as it does not allow me to operate with the data which still can be read. I am happy to contribute more information if it would help. > btrfs restore might be able to scrape off some files, but I don't > expect it'll get very far. If there were n-way raid1, where every > drive has a complete copy of 100% of the filesystem metadata, what you > suggest would be possible. Actually btrfs restore has recovered many files, however I was not able to run in fully unattended mode as it complains about "looping a lot". Does it mean that files are corrupted / not correctly restored? > OK probably the worst thing you can do if you're trying to recover > data from a degraded volume where a 2nd device is also having > problems, is to mount it rw let alone write anything to it. *shrug* > That's just going to make things much worse and more difficult to > recover, assuming anything can be recovered at all. The least number > of changes you make to such a volume, the better. Another option I have thought about is to shrink the failing volume up to some small value. This will cause chunks to be moved to another location. How btrfs will behave if both copies cannot be read? Would be nice to have a strategy to recover without "btrfs restore" in such case. I wonder because "btrfs restore" assumes pausing of normal system operation to do copying back and forth. -- With best regards, Dmitry -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Kernel crash if both devices in raid1 are failing
On 2016-04-14 22:30, Dmitry Katsubo wrote: > Dear btrfs community, > > I have the following setup: > > # btrfs fi show /home > Label: none uuid: 865f8cf9-27be-41a0-85a4-6cb4d1658ce3 > Total devices 3 FS bytes used 55.68GiB > devid1 size 52.91GiB used 0.00B path /dev/sdd2 > devid2 size 232.89GiB used 59.03GiB path /dev/sda > devid3 size 111.79GiB used 59.03GiB path /dev/sdc1 > > btrfs volume was created in raid1 mode both for data and metadata and mounted > with compress=lzo option. > > Unfortunately, two drives (sda and sdc1) started to fail at the same time. > This > leads to system crash if I start the system in runlevel 3 (see crash1.log). > > After I have started the system in single mode, volume can be mounted in rw > mode and I can write some data into it. Unfortunately when I tried to read > a certain file, the system crashed (see crash2.log). > > I have started scrub on the volume and here is the report: > > # btrfs scrub status /home > scrub status for 865f8cf9-27be-41a0-85a4-6cb4d1658ce3 > scrub started at Tue Apr 12 20:39:20 2016 and finished after 02:40:09 > total bytes scrubbed: 55.68GiB with 1767 errors > error details: verify=175 csum=1592 > corrected errors: 1110, uncorrectable errors: 657, unverified errors: 0 > > Obviously, some data is lost. However due to above crash, I cannot just copy > the data from the volume. I would assume that I still can access the data, but > the files for which data is lost, should result I/O error (I would then > recover > them from my backup). > > I have decided to attach another drive and remove failing devices one-by-one. > However that does not work: > > # btrfs dev delete /dev/sda /home > [ 168.680057] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 > [ 168.684236] ata3.00: BMDMA stat 0x25 > [ 168.688464] ata3.00: failed command: READ DMA > [ 168.692681] ata3.00: cmd c8/00:08:68:4b:84/00:00:00:00:00/e7 tag 0 dma > 4096 in > [ 168.692681] res 51/40:08:68:4b:84/40:08:07:00:00/e7 Emask 0x9 > (media error) > [ 168.701281] ata3.00: status: { DRDY ERR } > [ 168.705600] ata3.00: error: { UNC } > [ 168.724446] blk_update_request: I/O error, dev sda, sector 126110568 > [ 168.728860] BTRFS error (device sdc1): bdev /dev/sda errs: wr 0, rd 43, > flush 0, corrupt 0, gen 0 > [ 172.824043] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 > [ 172.828651] ata3.00: BMDMA stat 0x25 > [ 172.833281] ata3.00: failed command: READ DMA > [ 172.837876] ata3.00: cmd c8/00:08:50:4b:84/00:00:00:00:00/e7 tag 0 dma > 4096 in > [ 172.837876] res 51/40:08:50:4b:84/40:08:07:00:00/e7 Emask 0x9 > (media error) > [ 172.847296] ata3.00: status: { DRDY ERR } > [ 172.852054] ata3.00: error: { UNC } > [ 172.872404] blk_update_request: I/O error, dev sda, sector 126110544 > [ 172.877241] BTRFS error (device sdc1): bdev /dev/sda errs: wr 0, rd 44, > flush 0, corrupt 0, gen 0 > ERROR: error removing device '/dev/sda': Input/output error > > The same happens when I try to delete /dev/sdc1 from the volume. Is there any > btrfs "force" option so that btrfs balances only chunks that are accessible? I > can potentially physically disconnect /dev/sda, but the loss will be greater > I believe. > > How can I proceed except btrfs restore? > > During scrub operation the following was recorded in the logs: > > [Tue Apr 12 23:10:20 2016] BTRFS warning (device sdc1): checksum error at > logical 126952947712 on dev /dev/sdc1, sector 126150176, root 258, inode > 879324, offset 308256768, length 4096, links 1 (path: lib/mysql/ibdata1) > > If I collect all the messages like this, will it give a full picture of > damaged files? > > Many thanks in advance. > > P.S. Linux kernel v4.4.2, btrfs-progs v4.4. I have decided to try "btrfs restore". Actually I have discovered two usability points about it: 1. I cannot run this utility as following: btrfs -i restore /dev/sda /mnt/usb &> log because this command is interactive and may read something from the terminal. It would be nice if there is a flag -y (answer "yes" to all questions) so that no input is required from user. The example of the question is: We seem to be looping a lot on ..., do you want to keep going on? [y/N/a] In general this question puzzles me. What does it mean? As far as I understood it prevents btrfs restore from looping forever. Should I consider those files as lost? I have also hit the same problem as discussed in [1]: answer "a" (always) still causes the questions to be asked. 2. btrfs restore does not print a final statistics: how many files are successfully restored, and how many have failed. [1] https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg36458.html -- With best regards, Dmitry -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Kernel crash if both devices in raid1 are failing
Dear btrfs community, I have the following setup: # btrfs fi show /home Label: none uuid: 865f8cf9-27be-41a0-85a4-6cb4d1658ce3 Total devices 3 FS bytes used 55.68GiB devid1 size 52.91GiB used 0.00B path /dev/sdd2 devid2 size 232.89GiB used 59.03GiB path /dev/sda devid3 size 111.79GiB used 59.03GiB path /dev/sdc1 btrfs volume was created in raid1 mode both for data and metadata and mounted with compress=lzo option. Unfortunately, two drives (sda and sdc1) started to fail at the same time. This leads to system crash if I start the system in runlevel 3 (see crash1.log). After I have started the system in single mode, volume can be mounted in rw mode and I can write some data into it. Unfortunately when I tried to read a certain file, the system crashed (see crash2.log). I have started scrub on the volume and here is the report: # btrfs scrub status /home scrub status for 865f8cf9-27be-41a0-85a4-6cb4d1658ce3 scrub started at Tue Apr 12 20:39:20 2016 and finished after 02:40:09 total bytes scrubbed: 55.68GiB with 1767 errors error details: verify=175 csum=1592 corrected errors: 1110, uncorrectable errors: 657, unverified errors: 0 Obviously, some data is lost. However due to above crash, I cannot just copy the data from the volume. I would assume that I still can access the data, but the files for which data is lost, should result I/O error (I would then recover them from my backup). I have decided to attach another drive and remove failing devices one-by-one. However that does not work: # btrfs dev delete /dev/sda /home [ 168.680057] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 [ 168.684236] ata3.00: BMDMA stat 0x25 [ 168.688464] ata3.00: failed command: READ DMA [ 168.692681] ata3.00: cmd c8/00:08:68:4b:84/00:00:00:00:00/e7 tag 0 dma 4096 in [ 168.692681] res 51/40:08:68:4b:84/40:08:07:00:00/e7 Emask 0x9 (media error) [ 168.701281] ata3.00: status: { DRDY ERR } [ 168.705600] ata3.00: error: { UNC } [ 168.724446] blk_update_request: I/O error, dev sda, sector 126110568 [ 168.728860] BTRFS error (device sdc1): bdev /dev/sda errs: wr 0, rd 43, flush 0, corrupt 0, gen 0 [ 172.824043] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 [ 172.828651] ata3.00: BMDMA stat 0x25 [ 172.833281] ata3.00: failed command: READ DMA [ 172.837876] ata3.00: cmd c8/00:08:50:4b:84/00:00:00:00:00/e7 tag 0 dma 4096 in [ 172.837876] res 51/40:08:50:4b:84/40:08:07:00:00/e7 Emask 0x9 (media error) [ 172.847296] ata3.00: status: { DRDY ERR } [ 172.852054] ata3.00: error: { UNC } [ 172.872404] blk_update_request: I/O error, dev sda, sector 126110544 [ 172.877241] BTRFS error (device sdc1): bdev /dev/sda errs: wr 0, rd 44, flush 0, corrupt 0, gen 0 ERROR: error removing device '/dev/sda': Input/output error The same happens when I try to delete /dev/sdc1 from the volume. Is there any btrfs "force" option so that btrfs balances only chunks that are accessible? I can potentially physically disconnect /dev/sda, but the loss will be greater I believe. How can I proceed except btrfs restore? During scrub operation the following was recorded in the logs: [Tue Apr 12 23:10:20 2016] BTRFS warning (device sdc1): checksum error at logical 126952947712 on dev /dev/sdc1, sector 126150176, root 258, inode 879324, offset 308256768, length 4096, links 1 (path: lib/mysql/ibdata1) If I collect all the messages like this, will it give a full picture of damaged files? Many thanks in advance. P.S. Linux kernel v4.4.2, btrfs-progs v4.4. -- With best regards, Dmitry [ 231.228068] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 [ 231.231255] ata3.00: BMDMA stat 0x25 [ 231.234443] ata3.00: failed command: READ DMA [ 231.237661] ata3.00: cmd c8/00:08:60:f9:99/00:00:00:00:00/e2 tag 0 dma 4096 in [ 231.237661] res 51/40:08:60:f9:99/00:00:00:00:00/e2 Emask 0x9 (media error) [ 231.244022] ata3.00: status: { DRDY ERR } [ 231.247119] ata3.00: error: { UNC } [ 231.264447] blk_update_request: I/O error, dev sda, sector 43645280 [ 231.267817] BTRFS error (device sdc1): bdev /dev/sda errs: wr 0, rd 39, flush 0, corrupt 0, gen 0 [ 232.127298] BTRFS error (device sdc1): parent transid verify failed on 65675001856 wanted 480578 found 480435 [ 232.185418] BTRFS error (device sdc1): parent transid verify failed on 65679622144 wanted 480579 found 480435 [ 232.359943] BTRFS error (device sdc1): parent transid verify failed on 65674952704 wanted 480578 found 480435 [ 232.656145] BTRFS error (device sdc1): parent transid verify failed on 65674379264 wanted 480578 found 480435 [ 232.851908] BTRFS error (device sdc1): parent transid verify failed on 65669464064 wanted 480579 found 480577 [ 233.142476] BTRFS error (device sdc1): parent transid verify failed on 65674313728 wanted 480578 found 480435 [ 233.497501] BTRFS error (device sdc1): parent transid verify failed on 65669513216
Re: Kernel 3.19 and still "disk full" even though 'btrfs fi df" reports enough room left?
Many thanks to Duncan for such a verbose clarification. I am thinking about another parallel similar to SimSity, and that is memory management in virtual machines like Java. If heap is full, it does not really mean that there is no free memory. In this case JVM forces garbage collector and if after that procedure no memory was released, then it signals this to application by raising OutOfMemoryError. I think the similar should happen in btrfs: ENOSPC is returned to application only when there is really no space left. If no chunk can be further allocated, btrfs should check all "deleted" data chunks and and return them to unallocated pool. I would expect this automatic "cleanup" from modern filesystem. This behaviour not necessarily should be a default one, as one can argue that: * Such cleanup procedure may freeze the calling process for a considerable time, as btrfs would need to walk all allocated chunks to find candidates for release. * Filesystem will perhaps anyway run of free space soon, so why not to fallback with error earlier? (for example, one process is intensively writing the log) It would be nice to have the automatic "cleanup" function controlled by some /sys/fs/btrfs/features variable, which if set to 1, forces btrfs to do its best to allocate the chunk before giving up and returning ENOSPC, sacrificing response time of the process/application. On 2015-11-20 04:14, Duncan wrote: > linux-btrfs.tebulin posted on Thu, 19 Nov 2015 18:56:45 + as > excerpted: > > Meta-comment: > > Apparently that attribution should actually be to Hugo Mills. I've no > idea what went wrong, but at least here as received from gmane.org, the > from header really does say linux-btrfs.tebulin, so something obviously > bugged out somewhere! > > > Meanwhile discussing btrfs data/metadata allocation, vs. usage of that > allocation, Hugo also known here as tebulin, explained... > >> If you've ever played SimCity, the allocation process is like zoning -- >> you say what kind of thing can go on the space, but it's not actually >> used until something gets built on it. > > Very nice analogy. Thanks. =:^) > > Tho I'd actually put it in terms of the real thing that sim city > simulates, thus eliminating the proprietary comparison. A city zones an > area one way or another, restricting what can be built there -- you can't > put heavy industry in a residential zone. But the lot is still empty > until something's actually built there. > > And if the city has all its area zoned residential, and some company > wants to build a plant there (generally considered a good thing as it'll > provide employment), there's a process that allows rezoning. > > In btrfs, there's four types of allocations aka "zones": > > 1) Unallocated (unzoned) > > Can be used for anything but must be allocated/zoned first > > 2) System > > Critical but limited in size and generally only allocated at mkfs or when > adding a new device. > > 3) Data > > The actual file storage, generally the largest allocation. > > 4) Metadata > > Information /about/ the files, where they are located (the location and > size of individual extents), ownership and permissions, date stamps, > checksums, and for very small files (a few KiB), sometimes the file > content itself. > > 4a) Global reserve > > A small part of metadata reserved for emergency use only. Btrfs is > pretty strict about its use, and will generally error out with ENOSPC if > metadata space other than the global reserve is used up, before actually > using this global reserve. As a result, any of the global reserve used > at all indicates a filesystem in very severe straits, crisis mode. > > > As it happens, btrfs in practice tends to be a bit liberal about > allocating/zoning data chunks, since it's easier to find bigger blocks of > space in unallocated space than it is in busy partly used data space. > (Think of a big shopping center. It's easier to build it in outlying > areas that haven't been built up yet, where many whole blocks worth of > space can be allocated/zoned at once, than it is in the city center, > where even finding a single whole block vacant, is near impossible.) > > Over time, therefore, more and more space tends to be allocated to data, > while existing data space, like those blocks near city center, may have > most of its files/buildings deleted, but still have a couple still > occupied. > > Btrfs balancing, then, is comparable to the city functions of > condemnation and rezoning to vacant/unallocated, forcing remaining > occupants, most commonly data zoned, to move out of the way so the area > can be reclaimed for other usage. Then it can be rezoned to data again, > or to metadata, whatever needs it. > > > (FWIW, I played the original sim city, but IIRC it wasn't sophisticated > enough to have zoning yet. Of course I've not played anything recent as > to my knowledge it's not freedomware, and since I no longer agree
Re: Kernel 3.19 and still "disk full" even though 'btrfs fi df" reports enough room left?
If I may add: Information for "System" System, DUP: total=32.00MiB, used=16.00KiB is also quite technical, as for end user system = metadata (one can call it "filesystem metadata" perhaps). For simplicity the numbers can be added to "Metadata" thus eliminating that line as well. For those power users who really want to see the tiny details like "System" and "GlobalReserve" I suggest to implement "-v" flag: # btrfs fi usage -v On 2015-11-19 03:16, Duncan wrote: > Qu Wenruo posted on Thu, 19 Nov 2015 08:42:13 +0800 as excerpted: > >> Although the metadata output is showing that you still have about 512M >> available, but the 512M is Global Reserved space, or the unknown one. > > Unknown here, as the userspace (btrfs-progs) is evidently too old to show > it as global reserve, as it does in newer versions... > >> The output is really a little confusing. I'd like the change the output >> by adding global reserved into metadata used space and make it a sub >> item for metadata. > > Thanks for the clarification. It's most helpful, here. =:^) > > I've at times wondered if global reserve folded into one of the other > settings. Apparently it comes from the metadata allocation, but while > metadata is normally dup (single-device btrfs) or raid1 (multi-device), > global reserve is single. > > It would have been nice if that sort of substructure was described a bit > better when global reserve first made its appearance, at least in the > patch descriptions and release announcement, if not then yet in btrfs fi > df output, first implementations being what they are. But regardless, > now at least it should be clear for list regulars who read this thread > anyway, since the above reasonably clarifies things. > > As for btrfs fi df, making global reserve a metadata subentry there would > be one way to deal with it, preserving the exposure of the additional > data provided by that line (here, the fact that global reserve is > actually being used, underlining the fact that the filesystem is severely > short on space). > > Another way of handling it would be to simply add the global reserve into > the metadata used figure before printing it, eliminating the separate > global reserve line, and changing the upthread posted metadata line from > 8.48 GiB of 9 GiB used, to 8.98 of 9 GiB used, which is effectively the > case if the 512 MiB of global reserve indeed comes from the metadata > allocation. This would more clearly show how full metadata actually is > without the added complexity of an additional global reserve line, but > would lose the fact that global reserve is actually in use, that the > broken out global reserve line exposes. > > I'd actually argue in favor of the latter, directly folding global > reserve allocation into metadata used, since it'd both be simpler, and > more consistent if for instance btrfs fi usage didn't report separate > global reserve in the overall stats, but fail to report it in the per- > device stats and in btrfs dev usage. > > Either way would make much clearer that metadata is actually running out > than the current report layout does, since "metadata used" would then > either explicitly or implicitly include the global reserve. > -- With best regards, Dmitry -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Kernel 3.19 and still "disk full" even though 'btrfs fi df" reports enough room left?
On 2015-11-20 14:52, Austin S Hemmelgarn wrote: > On 2015-11-20 08:27, Hugo Mills wrote: >> On Fri, Nov 20, 2015 at 08:21:31AM -0500, Austin S Hemmelgarn wrote: >>> On 2015-11-20 06:39, Dmitry Katsubo wrote: >>>> For those power users who really want to see the tiny details like >>>> "System" and "GlobalReserve" I suggest to implement "-v" flag: >>>> >>>> # btrfs fi usage -v >>> Actually, I really like this idea, one of the questions I get asked >>> when I show people BTRFS is the difference between System and >>> Metadata, and it's not always easy to explain to somebody who >>> doesn't have a background in filesystem development. For some >>> reason, people seem to have trouble with the concept that the system >>> tree is an index of the other trees. >> >> Actually, it's not that in the system chunks. :) >> >> System chunks contain the chunk tree, not the tree of tree roots. >> They're special (and small) because they're listed explicitly by devid >> and physical offset at the end of the superblock, and allow the FS to >> read them first so that it can bootstrap the logical:physical mapping >> table before it starts reading all the other metadata like the tree of >> tree roots (which is "normal" metadata). > I guess my understanding was wrong then. Thanks for the explanation. The size of "System" is anyway very small in comparison other types of allocations. Adding it to "Metadata" or simply suppressing does not make big difference. If shown, it actually raises "dummy" questions (Is "System" area big enough? Can it run out of free space? How can I add more space to it?). -- With best regards, Dmitry -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Potential to loose data in case of disk failure
On 2015-11-12 13:47, Austin S Hemmelgarn wrote: >> That's a pretty unusual setup, so I'm not surprised there's no quick and >> easy answer. The best solution in my opinion would be to shuffle your >> partitions around and combine sda3 and sda8 into a single partition. >> There's generally no reason to present btrfs with two different >> partitions on the same disk. >> >> If there's something that prevents you from doing that, you may be able >> to use RAID10 or RAID6 somehow. I'm not really sure, though, so I'll >> defer to others on the list for implementation details. > RAID10 has the same issue. Assume you have 1 block. This gets stored > as 2 copies, each with 2 stripes, with the stripes split symmetrically. > For this, call the first half of the first copy 1a, the second half 1b, > and likewise for 2a and 2b with the second copy. 1a and 2a have > identical contents, and 1b and 2b have identical contents. It is fully > possible that you will end up with this block striped such that 1a and > 2a are on one disk, and 1b and 2b on the other. Based on this, losing > one disk would mean losing half the block, which would mean based on how > BTRFS works that you would lose the whole block (because neither copy > would be complete). Does it equally apply to RAID1? Namely, if I create mkfs.btrfs -mraid1 -draid1 /dev/sda3 /dev/sda8 then btrfs will "believe" that these are different drives and mistakenly think that RAID pre-condition is satisfied. Am I right? If so then I think this is a trap, and mkfs.btrfs should at least warn (or require --force) if two partitions are on the same drive for raid1/raid5/raid10. In other words, the only scenario when this check should be skipped is: mkfs.btrfs -mraid0 -draid0 /dev/sda3 /dev/sda8 -- With best regards, Dmitry -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Process is blocked for more than 120 seconds
On 2015-11-09 14:25, Austin S Hemmelgarn wrote: > On 2015-11-07 07:22, Dmitry Katsubo wrote: >> Hi everyone, >> >> I have noticed the following in the log. The system continues to run, >> but I am not sure for how long it will be stable. Should I start >> worrying? Thanks in advance for the opinion. >> > This just means that a process was stuck in the D state (uninterruptible > I/O sleep) for more than 120 seconds. Depending on a number of factors, > this happening could mean: > 1. Absolutely nothing (if you have low-powered or older hardware, for > example, I get these regularly on a first generation Raspberry Pi if I > don't increase the timeout significantly) > 2. The program is doing a very large chunk of I/O (usually with the > O_DIRECT flag, although this probably isn't the case here) > 3. There's a bug in the blocked program (this is rarely the case when > this type of thing happens) > 4. There's a bug in the kernel (which is why this dumps a stack trace) > 5. The filesystem itself is messed up somehow, and the kernel isn't > handling it properly (technically a bug, but a more specific case of it). > 6. You're hardware is misbehaving, failing, or experienced a transient > error. > > Assuming you can rule out possibilities 1 and 6, I think that 4 is the > most likely cause, as all of the listed programs (I'm assuming that > 'master' is from postfix) are relatively well audited, and all of them > hit this at the same time. > > For what it's worth, if you want you can do: > echo 0 > /proc/sys/kernel/hung_task_timeout_secs > like the message says to stop these from appearing in the future, or use > some arbitrary number to change the timeout before these messages appear > (I usually use at least 150 on production systems, and more often 300, > although on something like a Raspberry Pi I often use timeouts as high > as 1800 seconds). Thanks for comments, Austin. The system is "normal" PC, running Intel Core 2 Duo Mobile @1.66GHz. "master" is indeed a postfix process. I haven't seen anything like that when I was on 3.16 kernel, but after I have upgraded to 4.2.3, I caught that message. I/O and CPU load are usually low, but it could be (6) from your list, as the system is generally very old (5+ years). As the problem appeared only once for passed 15 days, I think it is just a transient error. Thanks for clarifying the possible reasons. -- With best regards, Dmitry -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Process is blocked for more than 120 seconds
Hi everyone, I have noticed the following in the log. The system continues to run, but I am not sure for how long it will be stable. Should I start worrying? Thanks in advance for the opinion. # uname -a Linux Debian 4.2.3-2~bpo8+1 (2015-10-20) i686 GNU/Linux # mount | grep /var /dev/sdd2 on /var type btrfs (rw,noatime,compress=lzo,space_cache,subvolid=258,subvol=/var) > [Mon Nov 2 06:35:57 2015] INFO: task nscd:859 blocked for more than 120 > seconds. > [Mon Nov 2 06:35:57 2015] Not tainted 4.2.0-0.bpo.1-686-pae #1 > [Mon Nov 2 06:35:57 2015] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [Mon Nov 2 06:35:57 2015] nscdD f1c7dd20 0 859 1 > 0x > [Mon Nov 2 06:35:57 2015] f1c7dd40 00200082 f79de900 f1c7dd20 c10bc119 > ffe0 f3aec740 00200246 > [Mon Nov 2 06:35:57 2015] f74ea800 f79e3f40 f77fb800 f1c7e000 f6b381dc > f6b38000 f1c7dd4c c14f1fdb > [Mon Nov 2 06:35:57 2015] d5553960 f1c7dd70 f867672f f77fb800 > c1099250 d0a4be08 d9755e68 > [Mon Nov 2 06:35:57 2015] Call Trace: > [Mon Nov 2 06:35:57 2015] [] ? del_timer_sync+0x49/0x50 > [Mon Nov 2 06:35:57 2015] [] ? schedule+0x2b/0x80 > [Mon Nov 2 06:35:57 2015] [] ? > wait_current_trans.isra.21+0x8f/0xf0 [btrfs] > [Mon Nov 2 06:35:57 2015] [] ? wait_woken+0x80/0x80 > [Mon Nov 2 06:35:57 2015] [] ? start_transaction+0x3d0/0x5d0 > [btrfs] > [Mon Nov 2 06:35:57 2015] [] ? > btrfs_delalloc_reserve_metadata+0x32d/0x580 [btrfs] > [Mon Nov 2 06:35:57 2015] [] ? btrfs_dirty_inode+0xb0/0xb0 [btrfs] > [Mon Nov 2 06:35:57 2015] [] ? btrfs_join_transaction+0x23/0x30 > [btrfs] > [Mon Nov 2 06:35:57 2015] [] ? btrfs_dirty_inode+0x39/0xb0 [btrfs] > [Mon Nov 2 06:35:57 2015] [] ? btrfs_dirty_inode+0xb0/0xb0 [btrfs] > [Mon Nov 2 06:35:57 2015] [] ? file_update_time+0x7e/0xc0 > [Mon Nov 2 06:35:57 2015] [] ? btrfs_page_mkwrite+0x80/0x3c0 > [btrfs] > [Mon Nov 2 06:35:57 2015] [] ? hrtimer_cancel+0x19/0x20 > [Mon Nov 2 06:35:57 2015] [] ? futex_wait+0x1e1/0x270 > [Mon Nov 2 06:35:57 2015] [] ? do_page_mkwrite+0x38/0x90 > [Mon Nov 2 06:35:57 2015] [] ? do_wp_page+0x2e2/0x6d0 > [Mon Nov 2 06:35:57 2015] [] ? futex_wake+0x71/0x140 > [Mon Nov 2 06:35:57 2015] [] ? kmap_atomic_prot+0xe7/0x110 > [Mon Nov 2 06:35:57 2015] [] ? handle_mm_fault+0xd59/0x14d0 > [Mon Nov 2 06:35:57 2015] [] ? __do_page_fault+0x18c/0x480 > [Mon Nov 2 06:35:57 2015] [] ? __do_page_fault+0x480/0x480 > [Mon Nov 2 06:35:57 2015] [] ? error_code+0x67/0x6c > [Mon Nov 2 06:35:57 2015] INFO: task nscd:864 blocked for more than 120 > seconds. > [Mon Nov 2 06:35:57 2015] Not tainted 4.2.0-0.bpo.1-686-pae #1 > [Mon Nov 2 06:35:57 2015] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [Mon Nov 2 06:35:57 2015] nscdD f1c87f5c 0 864 1 > 0x > [Mon Nov 2 06:35:57 2015] f1c87ef4 00200082 f1c87f80 f1c87f5c 03e7 > f1c87ee4 f3aec740 ac76c560 > [Mon Nov 2 06:35:57 2015] f74ea800 f79e3f40 f3c7b040 f1c88000 f3c7b040 > 0001 f1c87f00 c14f1fdb > [Mon Nov 2 06:35:57 2015] f3aec77c f1c87f38 c14f4265 f1c87f1c f3aec780 > f3aec788 0125 > [Mon Nov 2 06:35:57 2015] Call Trace: > [Mon Nov 2 06:35:57 2015] [] ? schedule+0x2b/0x80 > [Mon Nov 2 06:35:57 2015] [] ? rwsem_down_write_failed+0x185/0x280 > [Mon Nov 2 06:35:57 2015] [] ? > call_rwsem_down_write_failed+0x6/0x8 > [Mon Nov 2 06:35:57 2015] [] ? down_write+0x25/0x40 > [Mon Nov 2 06:35:57 2015] [] ? vm_mmap_pgoff+0x4a/0xa0 > [Mon Nov 2 06:35:57 2015] [] ? SyS_fstat64+0x28/0x30 > [Mon Nov 2 06:35:57 2015] [] ? SyS_mmap_pgoff+0x110/0x210 > [Mon Nov 2 06:35:57 2015] [] ? sysenter_do_call+0x12/0x12 > [Mon Nov 2 06:35:57 2015] INFO: task nmbd:1330 blocked for more than 120 > seconds. > [Mon Nov 2 06:35:57 2015] Not tainted 4.2.0-0.bpo.1-686-pae #1 > [Mon Nov 2 06:35:57 2015] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [Mon Nov 2 06:35:57 2015] nmbdD 0 1330 1 > 0x > [Mon Nov 2 06:35:57 2015] ef44bd74 00200086 > f3984900 > [Mon Nov 2 06:35:57 2015] f69e1800 f79e3f40 f3a7a800 ef44c000 d17255a0 > d17255a0 ef44bd80 c14f1fdb > [Mon Nov 2 06:35:57 2015] d1725600 ef44bdc8 f86961b5 000d3fff > 1000 000d3000 > [Mon Nov 2 06:35:57 2015] Call Trace: > [Mon Nov 2 06:35:57 2015] [] ? schedule+0x2b/0x80 > [Mon Nov 2 06:35:57 2015] [] ? > btrfs_start_ordered_extent+0xd5/0x100 [btrfs] > [Mon Nov 2 06:35:57 2015] [] ? wait_woken+0x80/0x80 > [Mon Nov 2 06:35:57 2015] [] ? > lock_and_cleanup_extent_if_need+0x134/0x260 [btrfs] > [Mon Nov 2 06:35:57 2015] [] ? prepare_pages+0xc6/0x150 [btrfs] > [Mon Nov 2 06:35:57 2015] [] ? __btrfs_buffered_write+0x17a/0x5e0 > [btrfs] > [Mon Nov 2 06:35:57 2015] [] ? __alloc_pages_nodemask+0x133/0x880 > [Mon Nov 2 06:35:57 2015] [] ?
Process is blocked for more than 120 seconds
Hi everyone, I have noticed the following in the log. The system continues to run, but I am not sure for how long it will be stable. # uname -a Linux Debian 4.2.3-2~bpo8+1 (2015-10-20) i686 GNU/Linux # mount | grep /var /dev/sdd2 on /var type btrfs (rw,noatime,compress=lzo,space_cache,subvolid=258,subvol=/var) > [Mon Nov 2 06:35:57 2015] INFO: task nscd:859 blocked for more than 120 > seconds. > [Mon Nov 2 06:35:57 2015] Not tainted 4.2.0-0.bpo.1-686-pae #1 > [Mon Nov 2 06:35:57 2015] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [Mon Nov 2 06:35:57 2015] nscdD f1c7dd20 0 859 1 > 0x > [Mon Nov 2 06:35:57 2015] f1c7dd40 00200082 f79de900 f1c7dd20 c10bc119 > ffe0 f3aec740 00200246 > [Mon Nov 2 06:35:57 2015] f74ea800 f79e3f40 f77fb800 f1c7e000 f6b381dc > f6b38000 f1c7dd4c c14f1fdb > [Mon Nov 2 06:35:57 2015] d5553960 f1c7dd70 f867672f f77fb800 > c1099250 d0a4be08 d9755e68 > [Mon Nov 2 06:35:57 2015] Call Trace: > [Mon Nov 2 06:35:57 2015] [] ? del_timer_sync+0x49/0x50 > [Mon Nov 2 06:35:57 2015] [] ? schedule+0x2b/0x80 > [Mon Nov 2 06:35:57 2015] [] ? > wait_current_trans.isra.21+0x8f/0xf0 [btrfs] > [Mon Nov 2 06:35:57 2015] [] ? wait_woken+0x80/0x80 > [Mon Nov 2 06:35:57 2015] [] ? start_transaction+0x3d0/0x5d0 > [btrfs] > [Mon Nov 2 06:35:57 2015] [] ? > btrfs_delalloc_reserve_metadata+0x32d/0x580 [btrfs] > [Mon Nov 2 06:35:57 2015] [] ? btrfs_dirty_inode+0xb0/0xb0 [btrfs] > [Mon Nov 2 06:35:57 2015] [] ? btrfs_join_transaction+0x23/0x30 > [btrfs] > [Mon Nov 2 06:35:57 2015] [] ? btrfs_dirty_inode+0x39/0xb0 [btrfs] > [Mon Nov 2 06:35:57 2015] [] ? btrfs_dirty_inode+0xb0/0xb0 [btrfs] > [Mon Nov 2 06:35:57 2015] [] ? file_update_time+0x7e/0xc0 > [Mon Nov 2 06:35:57 2015] [] ? btrfs_page_mkwrite+0x80/0x3c0 > [btrfs] > [Mon Nov 2 06:35:57 2015] [] ? hrtimer_cancel+0x19/0x20 > [Mon Nov 2 06:35:57 2015] [] ? futex_wait+0x1e1/0x270 > [Mon Nov 2 06:35:57 2015] [] ? do_page_mkwrite+0x38/0x90 > [Mon Nov 2 06:35:57 2015] [] ? do_wp_page+0x2e2/0x6d0 > [Mon Nov 2 06:35:57 2015] [] ? futex_wake+0x71/0x140 > [Mon Nov 2 06:35:57 2015] [] ? kmap_atomic_prot+0xe7/0x110 > [Mon Nov 2 06:35:57 2015] [] ? handle_mm_fault+0xd59/0x14d0 > [Mon Nov 2 06:35:57 2015] [] ? __do_page_fault+0x18c/0x480 > [Mon Nov 2 06:35:57 2015] [] ? __do_page_fault+0x480/0x480 > [Mon Nov 2 06:35:57 2015] [] ? error_code+0x67/0x6c > [Mon Nov 2 06:35:57 2015] INFO: task nscd:864 blocked for more than 120 > seconds. > [Mon Nov 2 06:35:57 2015] Not tainted 4.2.0-0.bpo.1-686-pae #1 > [Mon Nov 2 06:35:57 2015] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [Mon Nov 2 06:35:57 2015] nscdD f1c87f5c 0 864 1 > 0x > [Mon Nov 2 06:35:57 2015] f1c87ef4 00200082 f1c87f80 f1c87f5c 03e7 > f1c87ee4 f3aec740 ac76c560 > [Mon Nov 2 06:35:57 2015] f74ea800 f79e3f40 f3c7b040 f1c88000 f3c7b040 > 0001 f1c87f00 c14f1fdb > [Mon Nov 2 06:35:57 2015] f3aec77c f1c87f38 c14f4265 f1c87f1c f3aec780 > f3aec788 0125 > [Mon Nov 2 06:35:57 2015] Call Trace: > [Mon Nov 2 06:35:57 2015] [] ? schedule+0x2b/0x80 > [Mon Nov 2 06:35:57 2015] [] ? rwsem_down_write_failed+0x185/0x280 > [Mon Nov 2 06:35:57 2015] [] ? > call_rwsem_down_write_failed+0x6/0x8 > [Mon Nov 2 06:35:57 2015] [] ? down_write+0x25/0x40 > [Mon Nov 2 06:35:57 2015] [] ? vm_mmap_pgoff+0x4a/0xa0 > [Mon Nov 2 06:35:57 2015] [] ? SyS_fstat64+0x28/0x30 > [Mon Nov 2 06:35:57 2015] [] ? SyS_mmap_pgoff+0x110/0x210 > [Mon Nov 2 06:35:57 2015] [] ? sysenter_do_call+0x12/0x12 > [Mon Nov 2 06:35:57 2015] INFO: task nmbd:1330 blocked for more than 120 > seconds. > [Mon Nov 2 06:35:57 2015] Not tainted 4.2.0-0.bpo.1-686-pae #1 > [Mon Nov 2 06:35:57 2015] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [Mon Nov 2 06:35:57 2015] nmbdD 0 1330 1 > 0x > [Mon Nov 2 06:35:57 2015] ef44bd74 00200086 > f3984900 > [Mon Nov 2 06:35:57 2015] f69e1800 f79e3f40 f3a7a800 ef44c000 d17255a0 > d17255a0 ef44bd80 c14f1fdb > [Mon Nov 2 06:35:57 2015] d1725600 ef44bdc8 f86961b5 000d3fff > 1000 000d3000 > [Mon Nov 2 06:35:57 2015] Call Trace: > [Mon Nov 2 06:35:57 2015] [] ? schedule+0x2b/0x80 > [Mon Nov 2 06:35:57 2015] [] ? > btrfs_start_ordered_extent+0xd5/0x100 [btrfs] > [Mon Nov 2 06:35:57 2015] [] ? wait_woken+0x80/0x80 > [Mon Nov 2 06:35:57 2015] [] ? > lock_and_cleanup_extent_if_need+0x134/0x260 [btrfs] > [Mon Nov 2 06:35:57 2015] [] ? prepare_pages+0xc6/0x150 [btrfs] > [Mon Nov 2 06:35:57 2015] [] ? __btrfs_buffered_write+0x17a/0x5e0 > [btrfs] > [Mon Nov 2 06:35:57 2015] [] ? __alloc_pages_nodemask+0x133/0x880 > [Mon Nov 2 06:35:57 2015] [] ? btrfs_file_write_iter+0x1e5/0x550 > [btrfs] > [Mon Nov 2
Re: How to remove missing device on RAID1?
On 2015-10-21 00:40, Henk Slager wrote: > I had a similar issue some time ago, around the time kernel 4.1.6 was > just there. > In case you don't want to wait for new disk or decide to just run the > filesystem with 1 disk less or maybe later on replace 1 of the still > healthy disks with a double/bigger sized one and use current/older > kernel+tools, you could do this (assuming the filesystem is not too > full of course): > - mount degraded > - btrfs balance start -f -v -sdevid=1 -sdevid=1 -sdevid=1 > (where missing disk has devid 1) Am I right that one can "btrfs dev delete 1" after balance succeeded? > After completion the (virtual/missing) device shall be fully unallocated > - create /dev/loopX with sparse file of same size as missing disk on > some other filesystem > - btrfs replace start 1 /dev/loopX > - remove /dev/loopX from the filesystem > - remount filesystyem without degraded > And remove /dev/loopX If would be nice if btrfs allows to delete device and perform rebalance automatically (provided that left devices still have enough space to sustain raidX prerequisite). -- With best regards, Dmitry -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recover btrfs volume which can only be mounded in read-only mode
On 16/10/2015 10:18, Duncan wrote: > Dmitry Katsubo posted on Thu, 15 Oct 2015 16:10:13 +0200 as excerpted: > >> On 15 October 2015 at 02:48, Duncan <1i5t5.dun...@cox.net> wrote: >> >>> [snipped] >> >> Thanks for this information. As far as I can see, btrfs-tools v4.1.2 in >> now in experimental Debian repo (but you anyway suggest at least 4.2.2, >> which is just 10 days ago released in master git). Kernel image 3.18 is >> still not there, perhaps because Debian jessie was frozen before is was >> released (2014-12-07). > > For userspace, as long as it's supporting the features you need at > runtime (where it generally simply has to know how to make the call to > the kernel, to do the actual work), and you're not running into anything > really hairy that you're trying to offline-recover, which is where the > latest userspace code becomes critical... > > Running a userspace series behind, or even more (as long as it's not > /too/ far), isn't all /that/ critical a problem. > > It generally becomes a problem in one of three ways: 1) You have a bad > filesystem and want the best chance at fixing it, in which case you > really want the latest code, including the absolute latest fixups for the > most recently discovered possible problems. 2) You want/need a new > feature that's simply not supported in your old userspace. 3) The > userspace gets so old that the output from its diagnostics commands no > longer easily compares with that of current tools, giving people on-list > difficulties when trying to compare the output in your posts to the > output they get. > > As a very general rule, at least try to keep the userspace version > comparable to the kernel version you are running. Since the userspace > version numbering syncs to kernelspace version numbering, and userspace > of a particular version is normally released shortly after the similarly > numbered kernel series is released, with a couple minor updates before > the next kernel-series-synced release, keeping userspace to at least the > kernel space version, means you're at least running the userspace release > that was made with that kernel series release in mind. > > Then, as long as you don't get too far behind on kernel version, you > should remain at least /somewhat/ current on userspace as well, since > you'll be upgrading to near the same userspace (at least), when you > upgrade the kernel. > > Using that loose guideline, since you're aiming for the 3.18 stable > kernel, you should be running at least a 3.18 btrfs-progs as well. > > In that context, btrfs-progs 4.1.2 should be fine, as long as you're not > trying to fix any problems that a newer version fixed. And, my > recommendation of the latest 4.2.2 was in the "fixing problems" context, > in which case, yes, getting your hands on 4.2.2, even if it means > building from sources to do so, could be critical, depending of course on > the problem you're trying to fix. But otherwise, 4.1.2, or even back to > the last 3.18.whatever release since that's the kernel version you're > targeting, should be fine. > > Just be sure that whenever you do upgrade to later, you avoid the known- > bad-mkfs.btrfs in 4.2.0 and/or 4.2.1 -- be sure if you're doing the btrfs- > progs-4.2 series, that you get 4.2.2 or later. > > As for finding a current 3.18 series kernel released for Debian, I'm not > a Debian user so my my knowledge of the ecosystem around it is limited, > but I've been very much under the impression that there are various > optional repos available that you can choose to include and update from > as well, and I'm quite sure based on previous discussions with others > that there's a well recognized and fairly commonly enabled repo that > includes debian kernel updates thru current release, or close to it. > > Of course you could also simply run a mainstream Linus kernel and build > it yourself, and it's not too horribly hard to do either, as there's all > sorts of places with instructions for doing so out there, and back when I > switched from MS to freedomware Linux in late 2001, I learned the skill, > at at least the reasonably basic level of mostly taking a working config > from my distro's kernel and using it as a basis for my mainstream kernel > config as well, within about two months of switching. > > Tho of course just because you can doesn't mean you want to, and for > many, finding their distro's experimental/current kernel repos and simply > installing the packages from it, will be far simpler. > > But regardless of the method used, finding or building and keeping > current with your own copy of at least the lastest couple of LTS >
Re: Recover btrfs volume which can only be mounded in read-only mode
On 15 October 2015 at 02:48, Duncan <1i5t5.dun...@cox.net> wrote: > Dmitry Katsubo posted on Wed, 14 Oct 2015 22:27:29 +0200 as excerpted: > >> On 14/10/2015 16:40, Anand Jain wrote: >>>> # mount -o degraded /var Oct 11 18:20:15 kernel: BTRFS: too many >>>> missing devices, writeable mount is not allowed >>>> >>>> # mount -o degraded,ro /var # btrfs device add /dev/sdd1 /var ERROR: >>>> error adding the device '/dev/sdd1' - Read-only file system >>>> >>>> Now I am stuck: I cannot add device to the volume to satisfy raid >>>> pre-requisite. >>> >>> This is a known issue. Would you be able to test below set of patches >>> and update us.. >>> >>>[PATCH 0/5] Btrfs: Per-chunk degradable check >> >> Many thanks for the reply. Unfortunately I have no environment to >> recompile the kernel, and setting it up will perhaps take a day. Can the >> latest kernel be pushed to Debian sid? Duncan, many thanks for verbose answer. I appreciate a lot. > In the way of general information... > > While btrfs is no longer entirely unstable (since 3.12 when the > experimental tag was removed) and kernel patch backports are generally > done where stability is a factor, it's not yet fully stable and mature, > either. As such, an expectation of true stability such that wishing to > remain on kernels more than one LTS series behind the latest LTS kernel > series (4.1, with 3.18 the one LTS series back version) can be considered > incompatible with wishing to run the still under heavy development and > not yet fully stable and mature btrfs, at least as soon as problems are > reported. A request to upgrade to current and/or to try various not yet > mainline integrated patches is thus to be expected on report of problems. > > As for userspace, the division between btrfs kernel and userspace works > like this: Under normal operating conditions, userspace simply makes > requests of the kernel, which does the actual work. Thus, under normal > conditions, updated kernel code is most important. However, once a > problem occurs and repair/recovery is attempted, it's generally userspace > code itself directly operating on the unmounted filesystem, so having the > latest userspace code fixes becomes most important once something has > gone wrong and you're trying to fix it. > > So upgrading to a 3.18 series kernel, at minimum, is very strongly > recommended for those running btrfs, with an expectation that an upgrade > to 4.1 should be being planned and tested, for deployment as soon as it's > passing on-site pre-deployment testing. And an upgrade to current or > close to current btrfs-progs 4.2.2 userspace is recommended as soon as > you need its features, which include the latest patches for repair and > recovery, so as soon as you have a filesystem that's not working as > expected, if not before. (Note that earlier btrfs-progs 4.2 releases, > before 4.2.2, had a buggy mkfs.btrfs, so they should be skipped if you > will be doing mkfs.btrfs with them, and any btrfs created with those > versions should have what's on them backed up if it's not already, and > the filesystems recreated with 4.2.2, as they'll be unstable and are > subject to failure.) Thanks for this information. As far as I can see, btrfs-tools v4.1.2 in now in experimental Debian repo (but you anyway suggest at least 4.2.2, which is just 10 days ago released in master git). Kernel image 3.18 is still not there, perhaps because Debian jessie was frozen before is was released (2014-12-07). >> 1. Is there any way to recover btrfs at the moment? Or the easiest I can >> do is to mount ro, copy all data to another drive, re-create btrfs >> volume and copy back? > > Sysadmin's rule of backups: If data isn't backed up, by definition you > value the data less than the cost of time/hassle/resources to do the > backup, so loss of a filesystem is never a big problem, because if the > data was of any value, it was backed up and can be restored from that > backup, and if it wasn't backed up, then by definition you have already > saved the more important to you commodity, the hassle/time/resources you > would have spent doing the backup. Therefore, loss of a filesystem is > loss of throw-away data in any case, either because it was backed up (and > a would-be backup that hasn't been tested restorable isn't yet a > completed backup, so doesn't count), or because the data really was throw- > away data, not worth the hassle of backing up in the first place, even at > risk of loss should the un-backed-up data be lost. > > No exceptions. Any after-the-fact protests to the contrary simply put > the lie to claims that the value
Recover btrfs volume which can only be mounded in read-only mode
Dear btrfs community, I am facing several problems regarding to btrfs, and I will be very thankful if someone can help me with. Also while playing with btrfs I have few suggestions – would be nice if one can comment on those. While starting the system, /var (which is btrfs volume) failed to be mounted. That btrfs volume was created with the following options: # mkfs.btrfs -d raid1 -m raid1 /dev/sdc2 /dev/sda /dev/sdd1 Here comes what is recorded in systemd journal during the startup: [2.931097] BTRFS: device fsid 57b828ee-5984-4f50-89ff-4c9be0fd3084 devid 2 transid 394288 /dev/sda [9.810439] BTRFS: device fsid 57b828ee-5984-4f50-89ff-4c9be0fd3084 devid 1 transid 394288 /dev/sdc2 Oct 11 13:00:22 systemd[1]: Job dev-disk-by\x2duuid-57b828ee\x2d5984\x2d4f50\x2d89ff\x2d4c9be0fd3084.device/start timed out. Oct 11 13:00:22 systemd[1]: Timed out waiting for device dev-disk-by\x2duuid-57b828ee\x2d5984\x2d4f50\x2d89ff\x2d4c9be0fd3084.device. After the system started on runlevel 1, I attempted to mount the filesystem: # mount /var Oct 11 13:53:55 kernel: BTRFS info (device sdc2): disk space caching is enabled Oct 11 13:53:55 kernel: BTRFS: failed to read chunk tree on sdc2 Oct 11 13:53:55 kernel: BTRFS: open_ctree failed When I google for "failed to read chunk tree" the feedback was that something really bad is happening, and it's time to restore the data / give up with btrfs. In fact, this message is misleading because it refers /dev/sdc2 which is a mount device in fstab but this is SSD drive, so it is very unlikely to cause "read" error. Literally I read the message as "BTRFS: tried to read something from sdc2 and failed". Maybe it is better to re-phrase the message to "failed to construct chunk tree on /var (sdc2,sda,sdd1)"? Next I did a check: # btrfs check /dev/sdc2 warning devid 3 not found already checking extents checking free space cache Error reading 36818145280, -1 checking fs roots checking csums checking root refs Checking filesystem on /dev/sdc2 UUID: 57b828ee-5984-4f50-89ff-4c9be0fd3084 failed to load free space cache for block group 36536582144 found 29602081783 bytes used err is 0 total csum bytes: 57681304 total tree bytes: 1047363584 total fs tree bytes: 843694080 total extent tree bytes: 121159680 btree space waste bytes: 207443742 file data blocks allocated: 4524416 referenced 60893913088 The message "devid 3 not found already" does not tell much to me. If I understand correctly, btrfs does not store the list of devices in the metadata, but maybe it would be a good idea to save the last seen information about devices so that I would not need to guess what "devid 3" means? Next I tried to list all devices in my btrfs volume. I found this is not possible (unless volume is mounted). Would be nice if "btrfs device scan" outputs the detected volumes / devices to stdout (e.g. with "-v" option) or there is any other way to do that. Then I have mounted the volume in degraded mode and only after that I could understand what the error message means: # mount /var -o degraded # btrfs device stats /var btrfs device stats /var [/dev/sdc2].write_io_errs 0 [/dev/sdc2].read_io_errs0 [/dev/sdc2].flush_io_errs 0 [/dev/sdc2].corruption_errs 0 [/dev/sdc2].generation_errs 0 [/dev/sda].write_io_errs 0 [/dev/sda].read_io_errs0 [/dev/sda].flush_io_errs 0 [/dev/sda].corruption_errs 0 [/dev/sda].generation_errs 0 [].write_io_errs 3160958 [].read_io_errs0 [].flush_io_errs 0 [].corruption_errs 0 [].generation_errs 0 Now I can see that the device with devid 3 is actually /dev/sdd1, which btrfs found not ready. Is it possible to improve btrfs output and to list "last seen device" in that output, e.g. [/dev/sdd1*].write_io_errs 3160958 [/dev/sdd1*].read_io_errs0 ... where "*" means that device is missing. I have listed all partitions and /dev/sdd1 was among them. I have also run # badblocks /dev/sdd and it found no bad blocks. Why btrfs considers the device "not ready" – that is a question. Afterwards I have decided to run scrub: # btrfs scrub start /var # btrfs scrub status /var scrub status for 57b828ee-5984-4f50-89ff-4c9be0fd3084 scrub started at Sun Oct 11 14:55:45 2015 and was aborted after 1365 seconds total bytes scrubbed: 89.52GiB with 0 errors I have noticed that btrfs always reports "was aborted after X seconds", even if scrub is still running (I check that X and number of bytes scrubbed is increasing). That is confusing. After scrub finished, I have no idea whether it scrubbed everything, or was really aborted. And if it was aborted, what is the reason? Also it would be nice if status displays the number of data bytes (without replicas) scrubbed because the number 89.52GiB includes all replicas (of raid1 in my case): total bytes scrubbed: 89.52GiB (data 55.03GiB, system 16.00KiB, metadata 998.83MiB) with 0 errors Then I can compare this number with "filesystem df" output to answer the question: was all data successfully scrubbed? # btrfs
Re: Recover btrfs volume which can only be mounded in read-only mode
On 14/10/2015 16:40, Anand Jain wrote: >> # mount -o degraded /var >> Oct 11 18:20:15 kernel: BTRFS: too many missing devices, writeable >> mount is not allowed >> >> # mount -o degraded,ro /var >> # btrfs device add /dev/sdd1 /var >> ERROR: error adding the device '/dev/sdd1' - Read-only file system >> >> Now I am stuck: I cannot add device to the volume to satisfy raid >> pre-requisite. > > This is a known issue. Would you be able to test below set of patches > and update us.. > >[PATCH 0/5] Btrfs: Per-chunk degradable check Many thanks for the reply. Unfortunately I have no environment to recompile the kernel, and setting it up will perhaps take a day. Can the latest kernel be pushed to Debian sid? 1. Is there any way to recover btrfs at the moment? Or the easiest I can do is to mount ro, copy all data to another drive, re-create btrfs volume and copy back? 2. How to avoid such a trap in the future? 3. How can I know what version of kernel the patch "Per-chunk degradable check" is targeting? 4. What is the best way to express/vote for new features or suggestions (wikipage "Project_ideas" / bugzilla)? Thanks! -- With best regards, Dmitry -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html