Re: Kernel crash related to LZO compression

2018-10-26 Thread Dmitry Katsubo

On 2018-10-25 20:49, Chris Murphy wrote:

I would say the first step no matter what if you're using an older
kernel, is to boot a current Fedora or Arch live or install media,
mount the Btrfs and try to read the problem files and see if the
problem still happens. I can't even being to estimate the tens of
thousands of line changes since kernel 4.9.


Good point Chris. Indeed booting a fresh kernel is never a problem.
Actually I forgot to mention that I've seen the same problem with
kernel 4.12.13 (attached).


What profile are you using for this Btrfs? Is this a raid56? What do
you get for 'btrfs fi us ' ?


It is RAID1 volume for both metadata and data, but unfortunately I
haven't recorded the actual output before the failure. The configuration
was like this:

# btrfs filesystem show /var/log
Label: none  uuid: 5b45ac8e-fd8c-4759-854a-94e45069959d
Total devices 2 FS bytes used 11.13GiB
devid3 size 50.00GiB used 14.03GiB path /dev/sda3
devid4 size 50.00GiB used 14.03GiB path /dev/sdc1

On 2018-10-25 20:49, Chris Murphy wrote:

It should be safe even with that kernel. I'm not sure this is
compression related. There is a corruption bug related to inline
extents and corruption that had been fairly elusive but I think it's
fixed now. I haven't run into it though.


On 2018-10-26 02:09, Qu Wenruo wrote:
Are there any updates / fixes done in that area? Is lzo option safe to 
use?


Yes, we have commits to harden lzo decompress code in v4.18:

de885e3ee281a88f52283c7e8994e762e3a5f6bd btrfs: lzo: Harden inline lzo
compressed extent decompression
314bfa473b6b6d3efe68011899bd718b349f29d7 btrfs: lzo: Add header length
check to avoid potential out-of-bounds acc

And for the root cause, it's compressed data without csum, then scrub
could make it corrupted.

It's also fixed in v4.18:

665d4953cde6d9e75c62a07ec8f4f8fd7d396ade btrfs: scrub: Don't use inode
page cache in scrub_handle_errored_block()
ac0b4145d662a3b9e34085dea460fb06ede9b69b btrfs: scrub: Don't use inode
pages for device replace


Thanks, Qu, for this information. Actually one time I've seen the binary
crap (not zeros) in text log files (/var/log/*.log) and I was surprised
that btrfs returned me data which is corrupted instead of signalling I/O
error. Could it be because of "compressed data without csum" problem?

Thanks!

--
With best regards,
Dmitry
[Sun Dec  3 19:39:55 2017] BUG: unable to handle kernel paging request at 
f80a3000
[Sun Dec  3 19:39:55 2017] IP: memcpy+0x11/0x20
[Sun Dec  3 19:39:55 2017] *pde = 370bb067 
[Sun Dec  3 19:39:55 2017] *pte =  
[Sun Dec  3 19:39:55 2017] Oops: 0002 [#1] SMP
[Sun Dec  3 19:39:55 2017] Modules linked in: bridge stp llc arc4 iTCO_wdt 
iTCO_vendor_support ppdev ath5k evdev ath mac80211 cfg80211 i915 coretemp 
pcspkr rfkill snd_hda_codec_realtek serio_raw snd_hda_codec_generic video 
snd_hda_intel drm_kms_helper snd_hda_codec lpc_ich drm snd_hda_core snd_hwdep 
i2c_algo_bit snd_pcm_oss snd_mixer_oss fb_sys_fops sg snd_pcm syscopyarea 
snd_timer sysfillrect rng_core snd sysimgblt soundcore parport_pc parport 
shpchp button acpi_cpufreq binfmt_misc w83627hf hwmon_vid ip_tables x_tables 
autofs4 ses enclosure scsi_transport_sas xfs libcrc32c hid_generic usbhid hid 
btrfs crc32c_generic xor raid6_pq uas usb_storage sr_mod cdrom sd_mod 
ata_generic ata_piix i2c_i801 libata scsi_mod firewire_ohci firewire_core 
crc_itu_t ehci_pci e1000e ptp pps_core uhci_hcd ehci_hcd usbcore usb_common
[Sun Dec  3 19:39:55 2017] CPU: 1 PID: 100 Comm: kworker/u4:2 Tainted: G
W   4.12.0-2-686 #1 Debian 4.12.13-1
[Sun Dec  3 19:39:55 2017] Hardware name: AOpen i945GMx-IF/i945GMx-IF, BIOS 
i945GMx-IF R1.01 Mar.02.2007 AOpen Inc. 03/02/2007
[Sun Dec  3 19:39:55 2017] Workqueue: btrfs-endio btrfs_endio_helper [btrfs]
[Sun Dec  3 19:39:55 2017] task: f7337280 task.stack: f695c000
[Sun Dec  3 19:39:55 2017] EIP: memcpy+0x11/0x20
[Sun Dec  3 19:39:55 2017] EFLAGS: 00010206 CPU: 1
[Sun Dec  3 19:39:55 2017] EAX: f80a2ff8 EBX: 1000 ECX: 03fe EDX: 
ff998000
[Sun Dec  3 19:39:55 2017] ESI: ff998008 EDI: f80a3000 EBP:  ESP: 
f695de88
[Sun Dec  3 19:39:55 2017]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[Sun Dec  3 19:39:55 2017] CR0: 80050033 CR2: f9c00140 CR3: 36bc7000 CR4: 
06d0
[Sun Dec  3 19:39:55 2017] Call Trace:
[Sun Dec  3 19:39:55 2017]  ? lzo_decompress_bio+0x19f/0x2b0 [btrfs]
[Sun Dec  3 19:39:55 2017]  ? end_compressed_bio_read+0x28d/0x360 [btrfs]
[Sun Dec  3 19:39:55 2017]  ? btrfs_scrubparity_helper+0xb6/0x2c0 [btrfs]
[Sun Dec  3 19:39:55 2017]  ? process_one_work+0x135/0x2f0
[Sun Dec  3 19:39:55 2017]  ? worker_thread+0x39/0x3a0
[Sun Dec  3 19:39:55 2017]  ? kthread+0xd7/0x110
[Sun Dec  3 19:39:55 2017]  ? process_one_work+0x2f0/0x2f0
[Sun Dec  3 19:39:55 2017]  ? kthread_create_on_node+0x30/0x30
[Sun Dec  3 19:39:55 2017]  ? ret_from_fork+0x19/0x24
[Sun Dec  3 19:39:55 2017] Code: 43 58 2b 43 50 88 43 4e 5b eb ed 90 90 90 90 
90 90 90 90 90 90 90 90 90 90 90 

Kernel crash related to LZO compression

2018-10-25 Thread Dmitry Katsubo

Dear btrfs community,

My excuses for the dumps for rather old kernel (4.9.25), nevertheless I 
wonder

about your opinion about the below reported kernel crashes.

As I could understand the situation (correct me if I am wrong), it 
happened
that some data block became corrupted which resulted the following 
kernel trace

during the boot:

kernel BUG at 
/build/linux-fB36Cv/linux-4.9.25/fs/btrfs/extent_io.c:2318!

invalid opcode:  [#1] SMP
Call Trace:
 [] ? end_bio_extent_readpage+0x4e9/0x680 [btrfs]
 [] ? end_compressed_bio_read+0x3b/0x2d0 [btrfs]
 [] ? btrfs_scrubparity_helper+0xce/0x2d0 [btrfs]
 [] ? process_one_work+0x141/0x380
 [] ? worker_thread+0x41/0x460
 [] ? kthread+0xb4/0xd0
 [] ? process_one_work+0x380/0x380
 [] ? kthread_park+0x50/0x50
 [] ? ret_from_fork+0x1b/0x28

The problematic file turned out to be the one used by systemd-journald
/var/log/journal/c496cea41ebc4700a0dfaabf64a21be4/system.journal
which was trying to read it (or append to it) during the boot and that 
was

causing the system crash (see attached bootN_dmesg.txt).

I've rebooted in safe mode and tried to copy the data from this 
partition to
another location using btrfs-restore, however kernel was crashing as 
well with

a bit different symphom (see attached copyN_dmesg.txt):

Call Trace:
 [] ? lzo_decompress_biovec+0x1b0/0x2b0 [btrfs]
 [] ? vmalloc+0x38/0x40
 [] ? end_compressed_bio_read+0x265/0x2d0 [btrfs]
 [] ? btrfs_scrubparity_helper+0xce/0x2d0 [btrfs]
 [] ? process_one_work+0x141/0x380
 [] ? worker_thread+0x41/0x460
 [] ? kthread+0xb4/0xd0
 [] ? ret_from_fork+0x1b/0x28

Just to keep away from the problem, I've removed this file and also 
removed

"compress=lzo" mount option.

Are there any updates / fixes done in that area? Is lzo option safe to 
use?


P.S. Perhaps relative issue is in "Warnings" section:

https://wiki.debian.org/Btrfs#Warnings / 
https://www.spinics.net/lists/linux-btrfs/msg56563.html


--
With best regards,
Dmitry[   13.100666] BTRFS critical (device sda3): stripe index math went horribly 
wrong, got stripe_index=4294936575, num_stripes=2
[   13.100901] BTRFS critical (device sda3): stripe index math went horribly 
wrong, got stripe_index=4294936575, num_stripes=2
[   13.101096] BTRFS critical (device sda3): stripe index math went horribly 
wrong, got stripe_index=4294936575, num_stripes=2
[   13.101178] [ cut here ]
[   13.101182] kernel BUG at 
/build/linux-fB36Cv/linux-4.9.25/fs/btrfs/extent_io.c:2318!
[   13.101185] invalid opcode:  [#1] SMP
[   13.101257] Modules linked in: binfmt_misc bridge stp llc iTCO_wdt 
iTCO_vendor_support arc4 ppdev coretemp ath5k pcspkr ath sr9700 mac80211 dm9601 
serio_raw usbnet cfg80211 snd_hda_codec_realtek snd_hda_codec_generic mii 
rfkill lpc_ich snd_hda_intel i915 mfd_core snd_hda_codec evdev sg snd_hda_core 
snd_hwdep snd_pcm_oss snd_mixer_oss rng_core snd_pcm snd_timer video snd 
drm_kms_helper soundcore drm parport_pc parport i2c_algo_bit shpchp button 
acpi_cpufreq netconsole configfs w83627hf hwmon_vid ip_tables x_tables autofs4 
xfs libcrc32c btrfs crc32c_generic xor raid6_pq ses enclosure 
scsi_transport_sas uas hid_generic usbhid usb_storage hid sd_mod sr_mod cdrom 
i2c_i801 i2c_smbus firewire_ohci ata_generic firewire_core crc_itu_t ehci_pci 
ata_piix libata uhci_hcd ehci_hcd scsi_mod e1000e ptp pps_core
[   13.101261]  usbcore usb_common
[   13.101267] CPU: 0 PID: 96 Comm: kworker/u4:2 Tainted: GW   
4.9.0-3-686-pae #1 Debian 4.9.25-1
[   13.101269] Hardware name: AOpen i945GMx-IF/i945GMx-IF, BIOS i945GMx-IF 
R1.01 Mar.02.2007 AOpen Inc. 03/02/2007
[   13.101326] Workqueue: btrfs-endio btrfs_endio_helper [btrfs]
[   13.101328] task: f6d409c0 task.stack: f6d46000
[   13.101332] EIP: 0060:[] EFLAGS: 00010203 CPU: 0
[   13.101373] EIP is at btrfs_check_repairable+0x12c/0x130 [btrfs]
[   13.101375] EAX: 8800 EBX: f292dd80 ECX: 8801 EDX: 0002
[   13.101378] ESI: f69c EDI: f678bc5c EBP: f6d47e50 ESP: f6d47e30
[   13.101381]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[   13.101383] CR0: 80050033 CR2: b64c6db0 CR3: 36c115a0 CR4: 06f0
[   13.101386] Stack:
[   13.101395]  1000   f292dd80 d04e93d0  f35885d8 
f35885d8
[   13.101402]  f6d47ed8 f8c63739 0001  f6d47ec4 f8c951eb f3bb4800 
0001
[   13.101412]  0009 f678bc00 f35884b0   0001  

[   13.101413] Call Trace:
[   13.101457]  [] ? end_bio_extent_readpage+0x4e9/0x680 [btrfs]
[   13.101497]  [] ? end_compressed_bio_read+0x3b/0x2d0 [btrfs]
[   13.101538]  [] ? btrfs_scrubparity_helper+0xce/0x2d0 [btrfs]
[   13.101548]  [] ? process_one_work+0x141/0x380
[   13.101553]  [] ? worker_thread+0x41/0x460
[   13.101557]  [] ? kthread+0xb4/0xd0
[   13.101561]  [] ? process_one_work+0x380/0x380
[   13.101566]  [] ? kthread_park+0x50/0x50
[   13.101572]  [] ? ret_from_fork+0x1b/0x28
[   13.104547] Modules linked in: binfmt_misc bridge stp llc iTCO_wdt 
iTCO_vendor_support 

Re: Failover for unattached USB device

2018-10-25 Thread Dmitry Katsubo

On 2018-10-24 20:05, Chris Murphy wrote:

I think about the best we can expect in the short term is that Btrfs
goes read-only before the file system becomes corrupted in a way it
can't recover with a normal mount. And I'm not certain it is in this
state of development right now for all cases. And I say the same thing
for other file systems as well.

Running Btrfs on USB devices is fine, so long as they're well behaved.
I have such a setup with USB 3.0 devices. Perhaps I got a bit lucky,
because there are a lot of known bugs with USB controllers, USB bridge
chipsets, and USB hubs.

Having user definable switches for when to go read-only is, I think
misleading to the user, and very likely will mislead the file system.
The file system needs to go read-only when it gets confused, period.
It doesn't matter what the error rate is.


In general I agree. I just wonder why it couldn't happen quicker. For
example, from the log I've originally attached one can see that btrfs
made 1867 attempts to read (perhaps the same) block from both devices
in RAID1 volume, without success:

BTRFS error (device sdf): bdev /dev/sdh errs: wr 0, rd 1867, flush 0, 
corrupt 0, gen 0
BTRFS error (device sdf): bdev /dev/sdg errs: wr 0, rd 1867, flush 0, 
corrupt 0, gen 0


Attempts lasted for 29 minutes.


The work around is really to do the hard work making the devices
stable. Not asking Btrfs to paper over known unstable hardware.

In my case, I started out with rare disconnects and resets with
directly attached drives. This was a couple years ago. It was a Btrfs
raid1 setup, and the drives would not go missing at the same time, but
both would just drop off from time to time. Btrfs would complain of
dropped writes, I vaguely remember it going read only. But normal
mounts worked, sometimes with scary errors but always finding a good
copy on the other drive, and doing passive fixups. Scrub would always
fix up the rest. I'm still using those same file systems on those
devices, but now they go through a dyconn USB 3.0 hub with a decently
good power supply. I originally thought the drop offs were power
related, so I explicitly looked for a USB hub that could supply at
least 2A, and this one is 12VDC @ 2500mA. A laptop drive will draw
nearly 1A on spin up, but at that point P=AV. Laptop drives during
read/write using 1.5 W to 2.5 W @ 5VDC.

1.5-2.5 W = A * 5 V
Therefore A = 0.3-0.5A

And for 4 drives at possibly 0.5 A (although my drives are all at the
1.6 W read/write), that's 2 A @ 5 V, which is easily maintained for
the hub power supply (which by my calculation could do 6 A @ 5 V, not
accounting for any resistance).

Anyway, as it turns out I don't think it was power related, as the
Intel NUC in question probably had just enough amps per port. And what
it really was, was incompatibility between the Intel controller and
the bridgechipset in the USB-SATA cases, and the USB hub is similar to
an ethernet hub, it actually reads the USB stream and rewrites it out.
So hubs are actually pretty complicated little things, and having a
good one matters.


Thanks for this information. I have a situation similar to yours, with
only important difference that my drives are put into the USB dock with
independent power and cooling like this one:

https://www.ebay.com/itm/Mediasonic-ProBox-4-Bay-3-5-Hard-Drive-Enclosure-USB-3-0-eSATA-Sata-3-6-0Gbps/273161164246

so I don't think I need to worry about amps. This dock is connected
directly to USB port on the motherboard.

However indeed there could be bugs both on dock side and in south 
bridge.
More over I could imagine that USB reset happens due to another USB 
device,

like a wave stated in one place turning into tsunami for the whole
USB subsystem.


There are pending patches for something similar that you can find in
the archives. I think the reason they haven't been merged yet is there
haven't been enough comments and feedback (?). I think Anand Jain is
the author of those patches so you might dig around in the archives.
In a way you have an ideal setup for testing them out. Just make sure
you have backups...


Thanks for reference. Should I look for this patch here:

https://patchwork.kernel.org/project/linux-btrfs/list/?submitter=34632=-date

or this patch was only floating around in this maillist?


'btrfs check' without the --repair flag is safe and read only but
takes a long time because it'll read all metadata. The fastest safe
way is to mount it ro and read a directory recently being written to
and see if there are any kernel errors. You could recursively copy
files from a directory to /dev/null and then check kernel messages for
any errors. So long as metadata is DUP, there is a good chance a bad
copy of metadata can be automatically fixed up with a good copy. If
there's only single copy of metadata, or both copies get corrupt, then
it's difficult. Usually recovery of data is possible, but depending on
what's damaged, repair might not be possible.


I think "btrfs check" would be too heavy. 

Re: Failover for unattached USB device

2018-10-24 Thread Dmitry Katsubo

On 2018-10-17 00:14, Dmitry Katsubo wrote:

As a workaround I can monitor dmesg output but:

1. It would be nice if I could tell btrfs that I would like to mount 
read-only

after a certain error rate per minute is reached.
2. It would be nice if btrfs could detect that both drives are not 
available and

unmount (as mount read-only won't help much) the filesystem.

Kernel log for Linux v4.14.2 is attached.


I wonder if somebody could further advise the workaround. I understand 
that running
btrfs volume over USB devices is not good, but I think btrfs could play 
some role

as well.

In particular I wonder if btrfs could detect that all devices in RAID1 
volume became
inaccessible and instead of reporting increasing "write error" counter 
to kernel log simply
render the volume as read-only. "inaccessible" could be that if the same 
block cannot be
written back to minimum number of devices in RAID volume, so btrfs gives 
up.


Maybe someone can advise some sophisticated way of quick checking that 
filesystems is
healthy? Right now the only way I see is to make a tiny write (like 
create a file and
instantly remove it) to make it die faster... Checking for write IO 
errors in "btrfs
dev stats /mnt/backups" output could be an option provided that delta is 
computed for
some period of time and write errors counter increase for both devices 
in the volume
(as apparently I am not interested in one failing block which btrfs 
tries to write

again and again increasing the write errors counter).

Thanks for any feedback.

--
With best regards,
Dmitry


Failover for unattached USB device

2018-10-16 Thread Dmitry Katsubo
Dear btrfs team / community,

Sometimes it happens that kernel resets USB subsystem (looks like hardware
problem). Nevertheless all USB devices are unattached and attached back. After
few hours of struggle btrfs finally comes to the situation when read-only
filesystem mount is necessary. During this time when I try to access this
mounted filesystem (/mnt/backups) it reports success for some directories, or
error for others:

root@debian:~# ll /mnt/backups/
total 14334
drwxr-xr-x 1 adm users116 Sep 12 00:35 .
drwxrwxr-x 1 adm users164 Sep 19 22:44 ..
-rw-r--r-- 1 adm users  79927 Feb  7  2018 contacts.zip
drwxr-xr-x 1 adm users254 Feb  4  2018 attic
drwxr-xr-x 1 adm users 16 Feb 23  2018 recent
...
root@debian:~# ll /mnt/backups/attic/
ls: reading directory '/mnt/backups/attic/': Input/output error
total 0
drwxr-xr-x 1 adm users 254 Feb  4  2018 .
drwxr-xr-x 1 adm users 116 Sep 12 00:35 ..

It looks like this depends on whether the content is in disk cache...

What is surprising: when I try to create a file, I succeed:

root@debian:~# touch /mnt/backups/.mounted
root@debian:~# ll /mnt/backups/.mounted
-rw-r--r-- 1 root root 0 Sep 20 16:52 /mnt/backups/.mounted
root@debian:~# rm /mnt/backups/.mounted

My btrfs volume consists of two identical drives combined into RAID1 volume:

# btrfs filesystem df /mnt/backups
Data, RAID1: total=880.00GiB, used=878.96GiB
System, RAID1: total=8.00MiB, used=144.00KiB
Metadata, RAID1: total=2.00GiB, used=1.13GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

# btrfs filesystem show /mnt/backups
Label: none  uuid: a657364b-36d2-4c1f-8e5d-dc3d28166190
Total devices 2 FS bytes used 880.09GiB
devid1 size 3.64TiB used 882.01GiB path /dev/sdf
devid2 size 3.64TiB used 882.01GiB path /dev/sde

As a workaround I can monitor dmesg output but:

1. It would be nice if I could tell btrfs that I would like to mount read-only
after a certain error rate per minute is reached.
2. It would be nice if btrfs could detect that both drives are not available and
unmount (as mount read-only won't help much) the filesystem.

Kernel log for Linux v4.14.2 is attached.

-- 
With best regards,
Dmitry
Jun 29 18:54:56 debian kernel: [1197865.440396] usb 4-2: USB disconnect, device 
number 3
Jun 29 18:54:56 debian kernel: [1197865.440403] usb 4-2.2: USB disconnect, 
device number 5
Jun 29 18:54:56 debian kernel: [1197865.476118] usb 4-2.3: USB disconnect, 
device number 8
Jun 29 18:54:56 debian kernel: [1197865.549379] usb 4-2.4: USB disconnect, 
device number 7
...
Jun 29 18:54:58 debian kernel: [1197867.517728] usb-storage 4-2.3:1.0: USB Mass 
Storage device detected
Jun 29 18:54:58 debian kernel: [1197867.524021] usb-storage 4-2.3:1.0: Quirks 
match for vid 152d pid 0567: 500
Jun 29 18:54:58 debian kernel: [1197867.603859] usb 4-2.4: new full-speed USB 
device number 13 using ehci-pci
Jun 29 18:54:58 debian kernel: [1197867.725595] usb-storage 4-2.4:1.2: USB Mass 
Storage device detected
Jun 29 18:54:58 debian kernel: [1197867.728602] scsi host9: usb-storage 
4-2.4:1.2
Jun 29 18:54:59 debian kernel: [1197868.528737] scsi 7:0:0:0: Direct-Access 
ST4000DM 004-2CV104   0125 PQ: 0 ANSI: 6
Jun 29 18:54:59 debian kernel: [1197868.529310] scsi 7:0:0:1: Direct-Access 
ST4000DM 004-2CV104   0125 PQ: 0 ANSI: 6
Jun 29 18:54:59 debian kernel: [1197868.530093] sd 7:0:0:0: Attached scsi 
generic sg5 type 0
Jun 29 18:54:59 debian kernel: [1197868.530588] sd 7:0:0:1: Attached scsi 
generic sg6 type 0
Jun 29 18:54:59 debian kernel: [1197868.533064] sd 7:0:0:1: [sdh] Very big 
device. Trying to use READ CAPACITY(16).
Jun 29 18:54:59 debian kernel: [1197868.533619] sd 7:0:0:1: [sdh] 7814037168 
512-byte logical blocks: (4.00 TB/3.64 TiB)
Jun 29 18:54:59 debian kernel: [1197868.533626] sd 7:0:0:1: [sdh] 4096-byte 
physical blocks
Jun 29 18:54:59 debian kernel: [1197868.534063] sd 7:0:0:1: [sdh] Write Protect 
is off
Jun 29 18:54:59 debian kernel: [1197868.534069] sd 7:0:0:1: [sdh] Mode Sense: 
67 00 10 08
Jun 29 18:54:59 debian kernel: [1197868.534422] sd 7:0:0:1: [sdh] No Caching 
mode page found
Jun 29 18:54:59 debian kernel: [1197868.534542] sd 7:0:0:1: [sdh] Assuming 
drive cache: write through
Jun 29 18:54:59 debian kernel: [1197868.535563] sd 7:0:0:1: [sdh] Very big 
device. Trying to use READ CAPACITY(16).
Jun 29 18:54:59 debian kernel: [1197868.536702] sd 7:0:0:0: [sdg] Very big 
device. Trying to use READ CAPACITY(16).
Jun 29 18:54:59 debian kernel: [1197868.537454] sd 7:0:0:0: [sdg] 7814037168 
512-byte logical blocks: (4.00 TB/3.64 TiB)
Jun 29 18:54:59 debian kernel: [1197868.537459] sd 7:0:0:0: [sdg] 4096-byte 
physical blocks
Jun 29 18:54:59 debian kernel: [1197868.538327] sd 7:0:0:0: [sdg] Write Protect 
is off
Jun 29 18:54:59 debian kernel: [1197868.538331] sd 7:0:0:0: [sdg] Mode Sense: 
67 00 10 08
...
Jun 29 20:22:35 debian kernel: [1203125.061068] BTRFS error (device sdf): bdev 
/dev/sdh errs: wr 0, rd 1, flush 0, corrupt 0, gen 0

brtfs warning at ctree.h:1564 btrfs_update_device+0x220/0x230

2018-10-16 Thread Dmitry Katsubo
Dear btrfs team,

I often observe kernel traces on linux-4.14.0 (mostly likely due to background
"btrfs scrub") which contain the following "characterizing" line (for the rest
see attachments):

btrfs_remove_chunk+0x26a/0x7e0 [btrfs]

I wonder if somebody from developers team knows anything about this problem. It
seems like after such dump btfs volume continues to function OK.

Thanks for any information!

-- 
With best regards,
Dmitry
Jun  7 16:26:31 debian kernel: [1176060.298759] [ cut here 
]
Jun  7 16:26:31 debian kernel: [1176060.298820] WARNING: CPU: 0 PID: 566 at 
/build/linux-SCFPgu/linux-4.14.2/fs/btrfs/ctree.h:1564 
btrfs_update_device+0x220/0x230 [btrfs]
Jun  7 16:26:31 debian kernel: [1176060.298823] Modules linked in: option 
usb_wwan usbserial ipt_REJECT nf_reject_ipv4 xt_multiport iptable_filter 
xt_REDIRECT nf_nat_redirect xt_physdev br_netfilter iptable_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c 
xt_tcpudp iptable_mangle arc4 bridge stp llc iTCO_wdt iTCO_vendor_support ppdev 
coretemp ath5k pcspkr serio_raw ath mac80211 sr9700 dm9601 cfg80211 usbnet mii 
i915 rfkill snd_hda_codec_realtek lpc_ich snd_hda_codec_generic mfd_core evdev 
snd_hda_intel snd_hda_codec sg snd_hda_core snd_hwdep snd_pcm_oss rng_core 
snd_mixer_oss video snd_pcm drm_kms_helper snd_timer drm snd parport_pc 
soundcore i2c_algo_bit parport shpchp button acpi_cpufreq binfmt_misc w83627hf 
hwmon_vid ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 fscrypto ecb 
crypto_simd cryptd
Jun  7 16:26:31 debian kernel: [1176060.298930]  aes_i586 btrfs crc32c_generic 
xor zstd_decompress zstd_compress xxhash raid6_pq hid_generic usbhid hid uas 
usb_storage sr_mod cdrom sd_mod ata_generic i2c_i801 ata_piix libata 
firewire_ohci scsi_mod firewire_core crc_itu_t e1000e ptp pps_core ehci_pci 
uhci_hcd ehci_hcd usbcore usb_common
Jun  7 16:26:31 debian kernel: [1176060.298981] CPU: 0 PID: 566 Comm: 
btrfs-cleaner Tainted: GW   4.14.0-1-686-pae #1 Debian 4.14.2-1
Jun  7 16:26:31 debian kernel: [1176060.299162] Hardware name: AOpen 
i945GMx-IF/i945GMx-IF, BIOS i945GMx-IF R1.01 Mar.02.2007 AOpen Inc. 03/02/2007
Jun  7 16:26:31 debian kernel: [1176060.299327] task: f287e200 task.stack: 
f24e2000
Jun  7 16:26:31 debian kernel: [1176060.299448] EIP: 
btrfs_update_device+0x220/0x230 [btrfs]
Jun  7 16:26:31 debian kernel: [1176060.299450] EFLAGS: 00010206 CPU: 0
Jun  7 16:26:31 debian kernel: [1176060.299454] EAX:  EBX: f68bee00 
ECX: 000c EDX: 0200
Jun  7 16:26:31 debian kernel: [1176060.299457] ESI: ef0d9320 EDI:  
EBP: f24e3e9c ESP: f24e3e5c
Jun  7 16:26:31 debian kernel: [1176060.299460]  DS: 007b ES: 007b FS: 00d8 GS: 
00e0 SS: 0068
Jun  7 16:26:31 debian kernel: [1176060.299463] CR0: 80050033 CR2: 02aa3000 
CR3: 32b6ece0 CR4: 06f0
Jun  7 16:26:31 debian kernel: [1176060.299467] Call Trace:
Jun  7 16:26:31 debian kernel: [1176060.299561]  btrfs_remove_chunk+0x26a/0x7e0 
[btrfs]
Jun  7 16:26:31 debian kernel: [1176060.299686]  
btrfs_delete_unused_bgs+0x321/0x3f0 [btrfs]
Jun  7 16:26:31 debian kernel: [1176060.299819]  cleaner_kthread+0x13c/0x150 
[btrfs]
Jun  7 16:26:31 debian kernel: [1176060.299907]  kthread+0xf3/0x110
Jun  7 16:26:31 debian kernel: [1176060.33]  ? 
__btree_submit_bio_start+0x20/0x20 [btrfs]
Jun  7 16:26:31 debian kernel: [1176060.300099]  ? 
kthread_create_on_node+0x20/0x20
Jun  7 16:26:31 debian kernel: [1176060.300182]  ret_from_fork+0x19/0x24
Jun  7 16:26:31 debian kernel: [1176060.300249] Code: e9 81 fe ff ff 8d b6 00 
00 00 00 bf f4 ff ff ff e9 78 fe ff ff 8d b6 00 00 00 00 f3 90 eb a8 8d 74 26 
00 f3 90 e9 2b ff ff ff 90 <0f> ff e9 7a ff ff ff e8 14 4d 4c dc 8d 74 26 00 3e 
8d 74 26 00
Jun  7 16:26:31 debian kernel: [1176060.300626] ---[ end trace 32773559e9ec5e68 
]---
Jul  1 07:07:31 debian kernel: [1328228.484772] [ cut here 
]
Jul  1 07:07:31 debian kernel: [1328228.484822] WARNING: CPU: 0 PID: 26193 at 
/build/linux-SCFPgu/linux-4.14.2/fs/btrfs/ctree.h:1564 
btrfs_update_device+0x220/0x230 [btrfs]
Jul  1 07:07:31 debian kernel: [1328228.484824] Modules linked in: cpuid nfs 
lockd grace sunrpc fscache ipt_REJECT nf_reject_ipv4 xt_multiport 
iptable_filter xt_REDIRECT nf_nat_redirect xt_physdev br_netfilter iptable_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c 
xt_tcpudp iptable_mangle option usb_wwan usbserial arc4 bridge stp llc iTCO_wdt 
iTCO_vendor_support ppdev evdev ath5k ath mac80211 coretemp cfg80211 sr9700 
rfkill serio_raw dm9601 i915 usbnet pcspkr snd_hda_codec_realtek mii lpc_ich 
snd_hda_codec_generic mfd_core snd_hda_intel snd_hda_codec snd_hda_core 
snd_hwdep rng_core video snd_pcm_oss sg drm_kms_helper snd_mixer_oss drm 
snd_pcm snd_timer i2c_algo_bit snd soundcore parport_pc parport button shpchp 
acpi_cpufreq binfmt_misc w83627hf hwmon_vid ip_tables x_tables autofs4 ext4 
crc16 mbcache
Jul  1 07:07:31 debian kernel: 

Re: Kernel crash during btrfs scrub

2018-01-03 Thread Dmitry Katsubo
On 2018-01-03 05:58, Qu Wenruo wrote:
> On 2018年01月03日 09:12, Dmitry Katsubo wrote:
>> Dear btrfs team,
>>
>> I send a kernel crash report which I have observed recently during btrfs 
>> scrub.
>> It looks like scrub itself has completed without errors.
> 
> It's not a kernel crash (if I didn't miss anything), but just kernel
> warning.
> 
> The warning is caused by the fact that your fs (mostly created by old
> mkfs.btrfs) has device with unaligned size.
> 
> You could either resize the device down a little (e.g. -4K) and newer
> kernel (the one you're using should be new enough) could handle it well.
> 
> Or you could update your btrfs-progs (I assume you're using Arch, which
> is already shipping btrfs-progs v4.14) and use "btrfs rescue
> fix-device-size" to fix other device related problems offline.
> (Not only the warning, but also potential superblock size mismatch)
> 
> Thanks,
> Qu

Thanks for reply!

Why couldn't a warning message be issued as one-liner, e.g. with proper
description and without scaring stack trace?

btrfs /dev/sda1 warning: device size is not aligned with FS (mostly created by 
old mkfs.btrfs), see https://btrfs.wiki.kernel.org/index.php/FAQ#...

-- 
With best regards,
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Kernel crash during btrfs scrub

2018-01-02 Thread Dmitry Katsubo
Dear btrfs team,

I send a kernel crash report which I have observed recently during btrfs scrub.
It looks like scrub itself has completed without errors.

# btrfs scrub status /home
scrub status for 83a3cb60-3334-4d11-9fdf-70b8e8703167
scrub started at Mon Jan  1 06:52:01 2018 and finished after 00:30:47
total bytes scrubbed: 87.55GiB with 0 errors

# btrfs scrub status /var/log
scrub status for 5b45ac8e-fd8c-4759-854a-94e45069959d
scrub started at Mon Jan  1 06:52:01 2018 and finished after 00:15:45
total bytes scrubbed: 23.39GiB with 0 errors

Linux kernel v4.14.2-1
btrfs-progs v4.7.3-1

-- 
With best regards,
Dmitry
[Mon Jan  1 07:04:44 2018] [ cut here ]
[Mon Jan  1 07:04:44 2018] WARNING: CPU: 0 PID: 13583 at 
/build/linux-SCFPgu/linux-4.14.2/fs/btrfs/ctree.h:1564 
btrfs_update_device+0x220/0x230 [btrfs]
[Mon Jan  1 07:04:44 2018] Modules linked in: md4 nls_utf8 cifs ccm 
dns_resolver fscache option usb_wwan usbserial isofs loop ses enclosure 
scsi_transport_sas hid_generic usbhid hid ipt_REJECT nf_reject_ipv4 
xt_multiport iptable_filter xt_REDIRECT nf_nat_redirect xt_physdev br_netfilter 
iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack 
libcrc32c xt_tcpudp iptable_mangle bridge stp llc arc4 iTCO_wdt 
iTCO_vendor_support ppdev evdev snd_hda_codec_realtek snd_hda_codec_generic 
ath5k ath mac80211 cfg80211 snd_hda_intel i915 rfkill coretemp snd_hda_codec 
snd_hda_core snd_hwdep serio_raw snd_pcm_oss pcspkr snd_mixer_oss lpc_ich 
snd_pcm mfd_core snd_timer snd video soundcore drm_kms_helper sg drm shpchp 
i2c_algo_bit rng_core parport_pc parport button acpi_cpufreq binfmt_misc 
w83627hf hwmon_vid
[Mon Jan  1 07:04:44 2018]  ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 
fscrypto ecb crypto_simd cryptd aes_i586 btrfs crc32c_generic xor 
zstd_decompress zstd_compress xxhash raid6_pq uas usb_storage sr_mod sd_mod 
cdrom ata_generic ata_piix i2c_i801 libata firewire_ohci scsi_mod firewire_core 
crc_itu_t ehci_pci uhci_hcd ehci_hcd e1000e ptp pps_core usbcore usb_common
[Mon Jan  1 07:04:44 2018] CPU: 0 PID: 13583 Comm: btrfs Tainted: GW
   4.14.0-1-686-pae #1 Debian 4.14.2-1
[Mon Jan  1 07:04:44 2018] Hardware name: AOpen i945GMx-IF/i945GMx-IF, BIOS 
i945GMx-IF R1.01 Mar.02.2007 AOpen Inc. 03/02/2007
[Mon Jan  1 07:04:44 2018] task: eba6a000 task.stack: ca216000
[Mon Jan  1 07:04:44 2018] EIP: btrfs_update_device+0x220/0x230 [btrfs]
[Mon Jan  1 07:04:44 2018] EFLAGS: 00210206 CPU: 0
[Mon Jan  1 07:04:44 2018] EAX:  EBX: f6908400 ECX: 000c EDX: 
0200
[Mon Jan  1 07:04:44 2018] ESI: f69e2280 EDI:  EBP: ca217bd8 ESP: 
ca217b98
[Mon Jan  1 07:04:44 2018]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[Mon Jan  1 07:04:44 2018] CR0: 80050033 CR2: b795da00 CR3: 1a2e2460 CR4: 
06f0
[Mon Jan  1 07:04:44 2018] Call Trace:
[Mon Jan  1 07:04:44 2018]  btrfs_finish_chunk_alloc+0xf3/0x480 [btrfs]
[Mon Jan  1 07:04:44 2018]  ? btrfs_free_path.part.26+0x1c/0x20 [btrfs]
[Mon Jan  1 07:04:44 2018]  ? btrfs_insert_item+0x66/0xd0 [btrfs]
[Mon Jan  1 07:04:44 2018]  btrfs_create_pending_block_groups+0x139/0x250 
[btrfs]
[Mon Jan  1 07:04:44 2018]  __btrfs_end_transaction+0x78/0x2e0 [btrfs]
[Mon Jan  1 07:04:44 2018]  btrfs_end_transaction+0xf/0x20 [btrfs]
[Mon Jan  1 07:04:44 2018]  btrfs_inc_block_group_ro+0xea/0x190 [btrfs]
[Mon Jan  1 07:04:44 2018]  scrub_enumerate_chunks+0x215/0x660 [btrfs]
[Mon Jan  1 07:04:44 2018]  btrfs_scrub_dev+0x1e8/0x4e0 [btrfs]
[Mon Jan  1 07:04:44 2018]  btrfs_ioctl+0x1480/0x28b0 [btrfs]
[Mon Jan  1 07:04:44 2018]  ? kmem_cache_alloc+0x30c/0x540
[Mon Jan  1 07:04:44 2018]  ? btrfs_ioctl_get_supported_features+0x30/0x30 
[btrfs]
[Mon Jan  1 07:04:44 2018]  do_vfs_ioctl+0x90/0x650
[Mon Jan  1 07:04:44 2018]  ? do_vfs_ioctl+0x90/0x650
[Mon Jan  1 07:04:44 2018]  ? create_task_io_context+0x78/0xe0
[Mon Jan  1 07:04:44 2018]  ? get_task_io_context+0x3d/0x80
[Mon Jan  1 07:04:44 2018]  SyS_ioctl+0x58/0x70
[Mon Jan  1 07:04:44 2018]  do_fast_syscall_32+0x71/0x1a0
[Mon Jan  1 07:04:44 2018]  entry_SYSENTER_32+0x4e/0x7c
[Mon Jan  1 07:04:44 2018] EIP: 0xb7f81cf9
[Mon Jan  1 07:04:44 2018] EFLAGS: 0246 CPU: 0
[Mon Jan  1 07:04:44 2018] EAX: ffda EBX: 0003 ECX: c400941b EDX: 
092e21b8
[Mon Jan  1 07:04:44 2018] ESI: 092e21b8 EDI: 003d0f00 EBP: b7cff1e8 ESP: 
b7cff188
[Mon Jan  1 07:04:44 2018]  DS: 007b ES: 007b FS:  GS: 0033 SS: 007b
[Mon Jan  1 07:04:44 2018] Code: e9 81 fe ff ff 8d b6 00 00 00 00 bf f4 ff ff 
ff e9 78 fe ff ff 8d b6 00 00 00 00 f3 90 eb a8 8d 74 26 00 f3 90 e9 2b ff ff 
ff 90 <0f> ff e9 7a ff ff ff e8 14 ad 48 d0 8d 74 26 00 3e 8d 74 26 00
[Mon Jan  1 07:04:44 2018] ---[ end trace 6b4736d811ae42e1 ]---
[Mon Jan  1 07:05:00 2018] [ cut here ]
[Mon Jan  1 07:05:00 2018] WARNING: CPU: 1 PID: 443 at 
/build/linux-SCFPgu/linux-4.14.2/fs/btrfs/ctree.h:1564 
btrfs_update_device+0x220/0x230 [btrfs]
[Mon Jan  1 07:05:00 2018] 

Re: btrfs defrag questions

2016-07-04 Thread Dmitry Katsubo
On 2016-07-01 22:46, Henk Slager wrote:
> (email ends up in gmail spamfolder)
> On Fri, Jul 1, 2016 at 10:14 PM, Dmitry Katsubo <dm...@mail.ru> wrote:
>> Hello everyone,
>>
>> Question #1:
>>
>> While doing defrag I got the following message:
>>
>> # btrfs fi defrag -r /home
>> ERROR: defrag failed on /home/user/.dropbox-dist/dropbox: Success
>> total 1 failures
>>
>> I feel that something went wrong, but the message is a bit misleading.
>>
>> Provided that Dropbox is running in the system, does it mean that it
>> cannot be defagmented?
> 
> I think it is a matter of newlines in btrfs-progs and/or stdout/stderr mixup.
> 
> You should run the command with -v and probably also with -f, so that
> it gets hopefully clearer what is wrong.

Running with "-v -f" (or just "-v") result the same output:

...
/home/user/.dropbox-dist/dropbox-lnx.x86-5.4.24/select.so
/home/user/.dropbox-dist/dropbox-lnx.x86-5.4.24/grp.so
/home/user/.dropbox-dist/dropbox-lnx.x86-5.4.24/posixffi.libc._posixffi_libcERROR:
 defrag failed on /home/user/.dropbox-dist/dropbox-lnx.x86-5.4.24/dropbox: 
Success
.so
/home/user/.dropbox-dist/dropbox-lnx.x86-5.4.24/_functools.so
/home/user/.dropbox-dist/dropbox-lnx.x86-5.4.24/dropbox
/home/user/.dropbox-dist/dropbox-lnx.x86-5.4.24/_csv.so
...

This is not a matter of newlines:

$ grep -rnH 'defrag failed' btrfs-progs
btrfs-progs/cmds-filesystem.c:1021:   error("defrag failed on %s: %s", 
fpath, strerror(e));
btrfs-progs/cmds-filesystem.c:1161:   error("defrag failed 
on %s: %s", argv[i], strerror(e));

> That it fails on dropbox is an error I think, but maybe known: Could
> be mount option is compress and that that causes trouble for defrag
> although that should not happen.

True, compression is enabled.

> You can defrag just 1 file, so maybe you could try to make a reproducible 
> case.

When I run it on one file, it works as expected:

# btrfs fi defrag -r -v /home/user/.dropbox-dist/dropbox-lnx.x86-5.4.24/dropbox
ERROR: cannot open /home/user/.dropbox-dist/dropbox-lnx.x86-5.4.24/dropbox: 
Text file busy

> What kernel?
> What btrfs-progs?

kernel v4.4.6
btrfs-tools v4.5.2

>> Question #2:
>>
>> Suppose that in above example /home/ftp is mounted as another btrfs
>> array (not subvolume). Will 'btrfs fi defrag -r /home' defragment it
>> (recursively) as well?
> 
> I dont know, I dont think so, but you can simply try.
 
Many thanks, now I see how can I check this. Unfortunately it does not
descend into submounted directories.

-- 
With best regards,
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs defrag questions

2016-07-01 Thread Dmitry Katsubo
Hello everyone,

Question #1:

While doing defrag I got the following message:

# btrfs fi defrag -r /home
ERROR: defrag failed on /home/user/.dropbox-dist/dropbox: Success
total 1 failures

I feel that something went wrong, but the message is a bit misleading.

Provided that Dropbox is running in the system, does it mean that it
cannot be defagmented?

Question #2:

Suppose that in above example /home/ftp is mounted as another btrfs
array (not subvolume). Will 'btrfs fi defrag -r /home' defragment it
(recursively) as well?

-- 
With best regards,
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs defrag: success or failure?

2016-06-21 Thread Dmitry Katsubo
Hi everyone,

I got the following message:

# btrfs fi defrag -r /home
ERROR: defrag failed on /home/user/.dropbox-dist/dropbox: Success
total 1 failures

I feel that something went wrong, but the message is a bit misleading.

Anyway: Provided that Dropbox is running in the system, does it mean
that it cannot be defagmented?

-- 
With best regards,
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is "btrfs balance start" truly asynchronous?

2016-06-21 Thread Dmitry Katsubo
On 2016-06-21 15:17, Graham Cobb wrote:
> On 21/06/16 12:51, Austin S. Hemmelgarn wrote:
>> The scrub design works, but the whole state file thing has some rather
>> irritating side effects and other implications, and developed out of
>> requirements that aren't present for balance (it might be nice to check
>> how many chunks actually got balanced after the fact, but it's not
>> absolutely necessary).
> 
> Actually, that would be **really** useful.  I have been experimenting
> with cancelling balances after a certain time (as part of my
> "balance-slowly" script).  I have got it working, just using bash
> scripting, but it means my script does not know whether any work has
> actually been done by the balance run which was cancelled (if no work
> was done, but it timed out anyway, there is probably no point trying
> again with the same timeout later!).

Additionally it would be nice if balance/scrub reports the status via
/proc in human readable manner (similar to /proc/mdstat).

-- 
With best regards,
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Is "btrfs balance start" truly asynchronous?

2016-06-20 Thread Dmitry Katsubo

Dear btfs community,

I have added a drive to existing raid1 btrfs volume and decided to 
perform balancing so that data distributes "fairly" among drives. I have 
started "btrfs balance start", but it stalled for about 5-10 minutes 
intensively doing the work. After that time it has printed something 
like "had to relocate 50 chunks" and exited. According to drive I/O, 
"btrfs balance" did most (if not all) of the work, so after it has 
exited the job was done.


Shouldn't "btrfs balance start" do the operation in the background?

Thanks for any information.

--
With best regards,
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Process is blocked for more than 120 seconds

2016-06-15 Thread Dmitry Katsubo
On 2015-11-11 12:38, Dmitry Katsubo wrote:
> On 2015-11-09 14:25, Austin S Hemmelgarn wrote:
>> On 2015-11-07 07:22, Dmitry Katsubo wrote:
>>> Hi everyone,
>>>
>>> I have noticed the following in the log. The system continues to run,
>>> but I am not sure for how long it will be stable. Should I start
>>> worrying? Thanks in advance for the opinion.
>>>
>> This just means that a process was stuck in the D state (uninterruptible
>> I/O sleep) for more than 120 seconds.  Depending on a number of factors,
>> this happening could mean:
>> 1. Absolutely nothing (if you have low-powered or older hardware, for
>> example, I get these regularly on a first generation Raspberry Pi if I
>> don't increase the timeout significantly)
>> 2. The program is doing a very large chunk of I/O (usually with the
>> O_DIRECT flag, although this probably isn't the case here)
>> 3. There's a bug in the blocked program (this is rarely the case when
>> this type of thing happens)
>> 4. There's a bug in the kernel (which is why this dumps a stack trace)
>> 5. The filesystem itself is messed up somehow, and the kernel isn't
>> handling it properly (technically a bug, but a more specific case of it).
>> 6. You're hardware is misbehaving, failing, or experienced a transient
>> error.
>>
>> Assuming you can rule out possibilities 1 and 6, I think that 4 is the
>> most likely cause, as all of the listed programs (I'm assuming that
>> 'master' is from postfix) are relatively well audited, and all of them
>> hit this at the same time.
>>
>> For what it's worth, if you want you can do:
>> echo 0 > /proc/sys/kernel/hung_task_timeout_secs
>> like the message says to stop these from appearing in the future, or use
>> some arbitrary number to change the timeout before these messages appear
>> (I usually use at least 150 on production systems, and more often 300,
>> although on something like a Raspberry Pi I often use timeouts as high
>> as 1800 seconds).
> 
> Thanks for comments, Austin.
> 
> The system is "normal" PC, running Intel Core 2 Duo Mobile @1.66GHz.
> "master" is indeed a postfix process.
> 
> I haven't seen anything like that when I was on 3.16 kernel, but after I
> have upgraded to 4.2.3, I caught that message. I/O and CPU load are
> usually low, but it could be (6) from your list, as the system is
> generally very old (5+ years).
> 
> As the problem appeared only once for passed 15 days, I think it is just
> a transient error. Thanks for clarifying the possible reasons.

The problem (rarely) re-occurs. It does not happen on XFS filesystem (root)
but only on btrfs. I will increase timeout up to 300 and see what happens.

=== cut dmesg ===
INFO: task fail2ban-server:1747 blocked for more than 120 seconds.
  Tainted: GW   4.4.0-1-rt-686-pae #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
fail2ban-server D 001f 0  1747  1 0x
 f1ca1bc0 00200086 f2d24190 001f  f79ca4c0 f3d21bc0 
 c1168726 f1ca1bc0 e9152000  e9151d8c c156075f c0d25a90 f1ca1bc0
 e9151db4 c1561ed4   f1ca1bc0 0002 eab98940 c0d25a90
Call Trace:
 [] ? __filemap_fdatawrite_range+0xb6/0xf0
 [] ? schedule+0x3f/0xd0
 [] ? __rt_mutex_slowlock+0x74/0x140
 [] ? rt_mutex_slowlock+0xf3/0x250
 [] ? btrfs_write_marked_extents+0xae/0x190 [btrfs]
 [] ? rt_mutex_lock+0x45/0x50
 [] ? btrfs_sync_log+0x1d5/0x9a0 [btrfs]
 [] ? pin_current_cpu+0x71/0x1a0
 [] ? preempt_count_add+0x8a/0xb0
 [] ? unpin_current_cpu+0x13/0x70
 [] ? btrfs_sync_file+0x3ce/0x410 [btrfs]
 [] ? start_ordered_ops+0x40/0x40 [btrfs]
 [] ? vfs_fsync_range+0x47/0xb0
 [] ? do_fsync+0x3c/0x60
 [] ? SyS_fdatasync+0x15/0x20
 [] ? do_fast_syscall_32+0x8d/0x150
 [] ? sysenter_past_esp+0x3d/0x61
 [] ? pci_mmcfg_check_reserved+0x90/0xb0
INFO: task cleanup:2093 blocked for more than 120 seconds.
  Tainted: GW   4.4.0-1-rt-686-pae #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
cleanup D d0f5de0c 0  2093  28135 0x
 eab99bc0 00200086 c1836760 d0f5de0c 0002 f79ca4c0 f3d21bc0 
 0001 eab99bc0 d0f5e000 f325d2c4 d0f5ddfc c156075f f325d000 0088
 d0f5de30 f8c72f36 0310 f325d290  eab99bc0 c10afd70 f325d2dc
Call Trace:
 [] ? schedule+0x3f/0xd0
 [] ? wait_log_commit+0xc6/0xf0 [btrfs]
 [] ? wake_atomic_t_function+0x70/0x70
 [] ? btrfs_sync_log+0x36a/0x9a0 [btrfs]
 [] ? pin_current_cpu+0x71/0x1a0
 [] ? preempt_count_add+0x8a/0xb0
 [] ? unpin_current_cpu+0x13/0x70
 [] ? btrfs_log_dentry_safe+0x64/0x70 [btrfs]
 [] ? btrfs_sync_file+0x3ce/0x410 [btrfs]
 [] ? do_sys_truncate+0xb0/0xb0
 [] ? start_ordered_ops+0x40/0x40 [btrfs]
 [] 

Re: Hot data tracking / hybrid storage

2016-06-01 Thread Dmitry Katsubo

On 2016-05-29 22:45, Ferry Toth wrote:

Op Sun, 29 May 2016 12:33:06 -0600, schreef Chris Murphy:


On Sun, May 29, 2016 at 12:03 PM, Holger Hoffstätte
 wrote:

On 05/29/16 19:53, Chris Murphy wrote:

But I'm skeptical of bcache using a hidden area historically for the
bootloader, to put its device metadata. I didn't realize that was 
the

case. Imagine if LVM were to stuff metadata into the MBR gap, or
mdadm. Egads.


On the matter of bcache in general this seems noteworthy:

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=4d1034eb7c2f5e32d48ddc4dfce0f1a723d28667

bummer..


Well it doesn't mean no one will take it, just that no one has taken 
it

yet. But the future of SSD caching may only be with LVM.



I think all the above posts underline exacly my point:

Instead of using a ssd cache (be it bcache or dm-cache) it would be 
much

better to have the btrfs allocator be aware of ssd's in the pool and
prioritize allocations to the ssd to maximize performance.

This will allow to easily add more ssd's or replace worn out ones,
without the mentioned headaches. After all adding/replacing drives to a
pool is one of btrfs's biggest advantages.


I would certainly vote for this feature. If I understand correctly, the
mirror is selected based on the PID of btrfs worker thread [1], which is
simple but not most effective. I would suggest implementing the queue of 
read
operations per physical device (perhaps reads/writes should be put into 
same
queue). If device is fast (and for SSD that is the case), the queue 
becomes
empty quicker which means it should be loaded more intensively. 
Allocation

logic should simply put the next request to the shortest queue. I think
this will guarantee that most operations are served by SSD (or any other
even faster technology that appears in the future).

[1] 
https://btrfs.wiki.kernel.org/index.php/Project_ideas#Better_data_balancing_over_multiple_devices_for_raid1.2F10_.28read.29



--
With best regards,
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Some ideas for improvements

2016-05-29 Thread Dmitry Katsubo
On 2016-05-25 21:03, Duncan wrote:
> Dmitry Katsubo posted on Wed, 25 May 2016 16:45:41 +0200 as excerpted:
>> * Would be nice if 'btrfs scrub status' shows estimated finishing time
>> (ETA) and throughput (in Mb/s).
> 
> That might not be so easy to implement.  (Caveat, I'm not a dev, just a 
> btrfs user and list regular, so if a dev says different...)
> 
> Currently, a running scrub simply outputs progress to a file (/var/lib/
> btrfs/scrub.status.), and scrub status is simply a UI to pretty-
> print that file.  Note that there's nothing in there which lists the 
> total number of extents or bytes to go -- that's not calculated ahead of 
> time.
> 
> So implementing some form of percentage done or eta is likely to increase 
> the processing time dramatically, as it could involve doing a dry-run 
> first, in ordered to get the total figures against which to calculate 
> percentage done.

Indeed that this cannot (should not) be done on user-space level: kernel
module should provide information about that. I am not a dev :) but I
think module should now number of extents, at least something is shown in
"btrfs fi usage ..." output.

The information shouldn't be 100% exact, but at least some indication
would be great. In worst scenario module can remember the last scrub
time and make estimation based on that (similar how some CD burning
utilities do).

>> * Not possible to start scrub for all devices in the volume without
>> mounting it.
> 
> Interesting.  It's news to me that you can scrub individual devices 
> without mounting.  But given that, this would indeed be a useful feature, 
> and given that btrfs filesystem show can get the information, scrub 
> should be able to get and make use of it as well. =:^)

More over I got into a trap when tried to use "btrfs scrub start /dev/..."
syntax, as I only scrubs the given device. When I scrubbed the whole
volume after mounting it, de result was different. I understood it only
after reading man btrfs-scrub more attentively:

  start ... |

  Start a scrub on all devices of the filesystem identified by 
  or on a single .

Other (shorter) forms of help misled me, giving the impression that
it does not matter whether I specify a path or device.

On 2016-05-26 00:05, Duncan wrote:
> Nicholas D Steeves posted on Wed, 25 May 2016 16:36:13 -0400 as excerpted:
>> On 25 May 2016 at 15:03, Duncan <1i5t5.dun...@cox.net> wrote:
>>> Dmitry Katsubo posted on Wed, 25 May 2016 16:45:41 +0200 as excerpted:
>>>> btrfs-restore [needs an o]ption that applies (y) to all questions
>>>> (completely unattended recovery)
>>>
>>> That['s] a known sore spot that a lot of people have complained
>>> about.
> 
>> I'm surprised no one has mentioned, in any of these discussions, what I
>> believe is the standard method of providing this functionality:
>> yes | btrfs-restore -options /dev/disk
> 
> Good point.
> 
> I didn't bring it up because while I've used btrfs restore a few times, 
> my btrfs are all on relatively small SSD partitions, so I both needed 
> less y's, and the total time per restore is a few minutes, not hours, so 
> it wasn't a big deal.  As a result, while I know of yes, I didn't need to 
> think about automation, and as I never used it, it didn't occur to me to 
> suggest it for others.

Thanks for advise, Nicholas. Last time I tried it I used the following
command:

while true; do echo y; done | btrfs restore -voxmSi /dev/sda /mnt/tmp &> 
btrfs_restore &

which presumably is equivalent to what you suggest. The command was in
"running" state in "jobs" output for a while, but then turned into
"waiting" state and did not progress. I suspect that btrfs-restore
somehow reads directly from terminal, not from stdin. I will try the
solution with "yes | btrfs-restore..." once I get a chance.

-- 
With best regards,
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Some ideas for improvements

2016-05-25 Thread Dmitry Katsubo

Dear btrfs community,

I hope btrfs developers are open for suggestions.

btrfs-scrub:

* Would be nice if 'btrfs scrub status' shows estimated finishing time 
(ETA) and throughput (in Mb/s).
* Not possible to start scrub for all devices in the volume without 
mounting it.


btrfs-restore:

* It does not restore special files like named pipes and devices.
* Hard-linked files are not correctly restored (they all turn into 
independent replicas).
* If the file cannot be read / recovered, it is still created with zero 
size (I would expect that the file is not created).
* I think that the options '-xmS' should be enabled by default 
(shouldn't it be a goal to restore as much as possible?).
* Option that applies (y) to all questions (completely unattended 
recovery) is missing.


--
With best regards,
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Copy on write of unmodified data

2016-05-25 Thread Dmitry Katsubo

On 2016-05-25 11:29, Hugo Mills wrote:

On Wed, May 25, 2016 at 01:58:15AM -0700, H. Peter Anvin wrote:

Hi,

I'm looking at using a btrfs with snapshots to implement a 
generational

backup capacity.  However, doing it the naïve way would have the side
effect that for a file that has been partially modified, after
snapshotting the file would be written with *mostly* the same data.  
How
does btrfs' COW algorithm deal with that?  If necessary I might want 
to

write some smarter user space utilities for this.


Sounds like it might be a job for one of the dedup tools
(deupremove, bedup), or, if you're writing your own, the safe
deduplication ioctl which underlies those tools.

Hugo.


Perhaps it really makes sense to delegate de-duplication to 3-rd party
software like BackupPC [1]. I am not sure if btrfs can manage it more
effectively, as in order to find duplicates it would need to scan / 
analyse

all blocks, so at least it would take longer.

[1] https://sourceforge.net/projects/backuppc/

--
With best regards,
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Spare volumes and hot auto-replacement feature

2016-05-04 Thread Dmitry Katsubo
Dear btrfs community,

I am interested in spare volumes and hot auto-replacement feature [1]. I have a 
couple of questions:

* Which kernel version this feature will be included?
* The description says that replacement happens automatically when there is any 
write failed or flush failed. Is it possible to control the ratio / number of 
such failures? (e.g. in case it was one-time accidental failure)
* What happens if spare device is smaller then the (failing) device to be 
replaced?
* What happens if during the replacement the spare device fails (write error)?
* Is it possible for root to be notified in case if drive replacement 
(successful or unsuccessful) took place? Actually this question is actual for 
me for overall write/flush failures on btrfs volume (btrfs monitor).

Many thanks!

[1] https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg48209.html

-- 
With best regards,
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel crash if both devices in raid1 are failing

2016-05-02 Thread Dmitry Katsubo
Hello,

If somebody is interested in digging into the problem, I would be happy to 
provide
more information and/or do the testing.

On 2016-04-27 04:44, Dmitry Katsubo wrote:
> # cat /mnt/tmp/file > /dev/null
> [   11.432059] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> [   11.436665] ata3.00: BMDMA stat 0x25
> [   11.441301] ata3.00: failed command: READ DMA
> [   11.479570] ata3.00: cmd c8/00:20:40:ec:f3/00:00:00:00:00/e3 tag 0 dma 
> 16384 in
> [   11.479664]  res 51/40:1e:42:ec:f3/00:00:00:00:00/e3 Emask 0x9 
> (media error)
> [   11.619086] ata3.00: status: { DRDY ERR }
> [   11.619126] ata3.00: error: { UNC }
> [   11.625750] blk_update_request: I/O error, dev sda, sector 66317378
> [   11.625779] NOHZ: local_softirq_pending 40
> [   70.969876] [ cut here ]
> [   70.969879] kernel BUG at 
> /build/linux-SBJFwR/linux-4.4.6/debian/build/source_rt/fs/btrfs/volumes.c:5509!
> [   70.969885] invalid opcode:  [#1] PREEMPT SMP 
> [   70.969954] Modules linked in: netconsole configfs bridge stp llc arc4 
> iTCO_wdt iTCO_vendor_support ppdev coretemp pcspkr serio_raw i2c_i801 ath5k 
> ath mac80211 cfg80211 sr9700 evdev rfkill dm9601 usbnet lpc_ich mfd_core mii 
> option usb_wwan usbserial rng_core sg snd_hda_codec_realtek 
> snd_hda_codec_generic i915 snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep 
> snd_pcm_oss snd_mixer_oss snd_pcm acpi_cpufreq snd_timer video 8250_fintek 
> snd drm_kms_helper soundcore tpm_tis drm tpm parport_pc i2c_algo_bit parport 
> shpchp button processor binfmt_misc w83627hf hwmon_vid autofs4 xfs libcrc32c 
> hid_generic usbhid hid crc32c_generic btrfs xor raid6_pq uas usb_storage 
> sd_mod sr_mod cdrom ata_generic firewire_ohci ata_piix libata scsi_mod 
> firewire_core crc_itu_t ehci_pci uhci_hcd ehci_hcd usbcore usb_common e1000e 
> ptp pps_core
> [   70.969965] CPU: 0 PID: 114 Comm: kworker/u4:3 Tainted: GW   
> 4.4.0-1-rt-686-pae #1 Debian 4.4.6-1
> [   70.969968] Hardware name: AOpen i945GMx-IF/i945GMx-IF, BIOS i945GMx-IF 
> R1.01 Mar.02.2007 AOpen Inc. 03/02/2007
> [   70.970029] Workqueue: btrfs-endio btrfs_endio_helper [btrfs]
> [   70.970032] task: f3eec0c0 ti: f6a76000 task.ti: f6a76000
> [   70.970036] EIP: 0060:[] EFLAGS: 00010217 CPU: 0
> [   70.970076] EIP is at __btrfs_map_block+0x11be/0x15a0 [btrfs]


-- 
With best regards,
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel crash if both devices in raid1 are failing

2016-04-26 Thread Dmitry Katsubo
On 2016-04-25 09:12, Dmitry Katsubo wrote:
> I have run "btrfs check /dev/sda" two times. One time it has completed
> OK, actually showing only one error. The 2nd time it has shown many messages
> 
> "parent transid verify failed on NNN wanted AAA found BBB"
> 
> and then asserted :) But I think the 2nd run is not representative as I have
> gracefully removed one drive from btrfs array to build a new array. The
> "btrfs device remove" completed successfully, but it might have written some
> metadata to the remaining drives, which perhaps was not synchronized
> correctly.
> 
> What I am going to do next is to recompile btrfs-tools so that "-i" CLI option
> applies "(y)" to all questions and run "btrfs restore" again. Hopefully it can
> handle transid mismatch correctly...

OK, I have recompiled btrfs with necessary fix (attached). It allowed me to 
capture
"btrfs restore" output because due to reads from console it was not possible, 
even
with attempts like this:

while true; do echo y; done | btrfs restore -voxmSi /dev/sda /mnt/backup 2>&1 | 
tee btrfs_restore

For the matter of experiment I have upgraded kernel to 4.4.6 and it still 
crashes
on problematic file:

# cat /mnt/tmp/file > /dev/null
[   11.432059] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[   11.436665] ata3.00: BMDMA stat 0x25
[   11.441301] ata3.00: failed command: READ DMA
[   11.479570] ata3.00: cmd c8/00:20:40:ec:f3/00:00:00:00:00/e3 tag 0 dma 16384 
in
[   11.479664]  res 51/40:1e:42:ec:f3/00:00:00:00:00/e3 Emask 0x9 
(media error)
[   11.619086] ata3.00: status: { DRDY ERR }
[   11.619126] ata3.00: error: { UNC }
[   11.625750] blk_update_request: I/O error, dev sda, sector 66317378
[   11.625779] NOHZ: local_softirq_pending 40
[   70.969876] [ cut here ]
[   70.969879] kernel BUG at 
/build/linux-SBJFwR/linux-4.4.6/debian/build/source_rt/fs/btrfs/volumes.c:5509!
[   70.969885] invalid opcode:  [#1] PREEMPT SMP 
[   70.969954] Modules linked in: netconsole configfs bridge stp llc arc4 
iTCO_wdt iTCO_vendor_support ppdev coretemp pcspkr serio_raw i2c_i801 ath5k ath 
mac80211 cfg80211 sr9700 evdev rfkill dm9601 usbnet lpc_ich mfd_core mii option 
usb_wwan usbserial rng_core sg snd_hda_codec_realtek snd_hda_codec_generic i915 
snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm_oss snd_mixer_oss 
snd_pcm acpi_cpufreq snd_timer video 8250_fintek snd drm_kms_helper soundcore 
tpm_tis drm tpm parport_pc i2c_algo_bit parport shpchp button processor 
binfmt_misc w83627hf hwmon_vid autofs4 xfs libcrc32c hid_generic usbhid hid 
crc32c_generic btrfs xor raid6_pq uas usb_storage sd_mod sr_mod cdrom 
ata_generic firewire_ohci ata_piix libata scsi_mod firewire_core crc_itu_t 
ehci_pci uhci_hcd ehci_hcd usbcore usb_common e1000e ptp pps_core
[   70.969965] CPU: 0 PID: 114 Comm: kworker/u4:3 Tainted: GW   
4.4.0-1-rt-686-pae #1 Debian 4.4.6-1
[   70.969968] Hardware name: AOpen i945GMx-IF/i945GMx-IF, BIOS i945GMx-IF 
R1.01 Mar.02.2007 AOpen Inc. 03/02/2007
[   70.970029] Workqueue: btrfs-endio btrfs_endio_helper [btrfs]
[   70.970032] task: f3eec0c0 ti: f6a76000 task.ti: f6a76000
[   70.970036] EIP: 0060:[] EFLAGS: 00010217 CPU: 0
[   70.970076] EIP is at __btrfs_map_block+0x11be/0x15a0 [btrfs]

Unfortunately I was not able to capture the whole trace, as there seem to be
concurrent problem with netconsole: the whole system hangs at the point above.

P.S. If debian maintainer of btrfs-progs is on the list: Project packaging fails
for me (happens at the very end during binaries installation):

# debuild
...
dpkg-source: info: building btrfs-progs in btrfs-progs_4.4.1-1.1.debian.tar.xz
dpkg-source: info: building btrfs-progs in btrfs-progs_4.4.1-1.1.dsc
 debian/rules build
dh build --parallel
   dh_testdir -O--parallel
   debian/rules override_dh_auto_configure
make[1]: Entering directory '/home/btrfs-progs-4.4.1'
dh_auto_configure -- --bindir=/bin
make[1]: Leaving directory '/home/btrfs-progs-4.4.1'
   dh_auto_build -O--parallel
 fakeroot debian/rules binary
dh binary --parallel
   dh_testroot -O--parallel
   dh_prep -O--parallel
   debian/rules override_dh_auto_install
make[1]: Entering directory '/home/btrfs-progs-4.4.1'
dh_auto_install --destdir=debian/btrfs-progs
# Adding initramfs-tools integration
install -D -m 0755 debian/local/btrfs.hook 
debian/btrfs-progs/usr/share/initramfs-tools/hooks/btrfs
install -D -m 0755 debian/local/btrfs.local-premount 
debian/btrfs-progs/usr/share/initramfs-tools/scripts/local-premount/btrfs
make[1]: Leaving directory '/home/btrfs-progs-4.4.1'
   dh_install -O--parallel
/home/btrfs-progs-4.4.1/debian/btrfs-progs.install: 1: 
/home/btrfs-progs-4.4.1/debian/btrfs-progs.install: btrfs-calc-size: not found
/home/btrfs-progs-4.4.1/debian/btrfs-progs.install: 2: 
/home/btrfs-progs-4.4.1/debian/btrfs-progs.install: btrfs-select-su

Re: Kernel crash if both devices in raid1 are failing

2016-04-20 Thread Dmitry Katsubo
On 2016-04-19 09:58, Duncan wrote:
> Dmitry Katsubo posted on Tue, 19 Apr 2016 07:45:40 +0200 as excerpted:
> 
>> Actually btrfs restore has recovered many files, however I was not able
>> to run in fully unattended mode as it complains about "looping a lot".
>> Does it mean that files are corrupted / not correctly restored?
> 
> As long as you tell it to keep going each time, the loop complaints 
> shouldn't be an issue.  The problem is that the loop counter is measuring 
> loops on a particular directory, because that's what it has available to 
> measure.  But if you had a whole bunch of files in that dir, it's /going/ 
> to loop a lot, to restore all of them.
> 
> I have one cache directory with over 200K files in it.  They're all text 
> messages from various technical lists and newsgroups (like this list, 
> which I view as a newsgroup using gmane.org's list2news service) so 
> they're quite small, about 5 KiB on average by my quick calculation, but 
> that's still a LOT of files for a single dir, even if they're only using 
> just over a GiB of space.
> 
> I ended up doing a btrfs restore on that filesystem (/home), because 
> while I had a backup, restore was getting more recent copies of stuff 
> back, and that dir looped a *LOT* the first time it happened, now several 
> years ago, before they actually added the always option.

I have the same situation here: there is a backup, but the most recent
modifications in files are preferable.

> The second time it happened, about a year ago, restore worked much 
> better, and I was able to use the always option.  But AFAIK, always only 
> applies to that dir.  If you have multiple dirs with the problem, you'll 
> still get asked for the next one.  But it did vastly improve the 
> situation for me, giving me only a handful of prompts instead of the very 
> many I had before the option was there.

Yes, this is exactly the problem discussed a while ago. Would be nice if
"btrfs restore -i" applies "(a)lways" option to all questions or there is
a separate option for that ("-y").

For me personally "looping" is too low-level problem. System administrators
(that are going to use this utility) should operate with some more reasonable
terms. If "looping" is some analogy of "time consumption" then I would say
that during restore time does not matter so much: I am ready to wait for 1
minute until a specific file is restored. So I think not the number of loops
but number of time spent should be measured.

Also I have difficulties in finding out what files have not been restored
due to uncorrectable errors. As I cannot redirect the output of
"btrfs restore" and it does not print the final stats, I cannot tell what
files have to be restored from backup.

> (The main problem triggering the need to run restore for me, turned out 
> to be hardware.  I've had no issues since I replaced that failing ssd, 
> and with a bit of luck, won't be running restore again for a few years, 
> now.)

I would be happy if I am able to replace the failing drive on the fly, without
stopping the system. Unfortunately I cannot do that due to kernel crashes :(
btrfs is still not resistant to these corner cases.

-- 
With best regards,
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel crash if both devices in raid1 are failing

2016-04-18 Thread Dmitry Katsubo
On 2016-04-18 02:19, Chris Murphy wrote:
> With two device failure on raid1 volume, the file system is actually
> broken. There's a big hole in the metadata, not just missing data,
> because there are only two copies of metadata, distributed across
> three drives.

Thanks, I understand that. Well, the drive has not completely failed,
it has accidental read-write errors. I still wonder what went wrong
and why the kernel has crashed - I think this should not happen, as it
does not allow me to operate with the data which still can be read.
I am happy to contribute more information if it would help.

> btrfs restore might be able to scrape off some files, but I don't
> expect it'll get very far. If there were n-way raid1, where every
> drive has a complete copy of 100% of the filesystem metadata, what you
> suggest would be possible.

Actually btrfs restore has recovered many files, however I was not
able to run in fully unattended mode as it complains about "looping a lot".
Does it mean that files are corrupted / not correctly restored?

> OK probably the worst thing you can do if you're trying to recover
> data from a degraded volume where a 2nd device is also having
> problems, is to mount it rw let alone write anything to it. *shrug*
> That's just going to make things much worse and more difficult to
> recover, assuming anything can be recovered at all. The least number
> of changes you make to such a volume, the better.

Another option I have thought about is to shrink the failing volume
up to some small value. This will cause chunks to be moved to another
location. How btrfs will behave if both copies cannot be read?
Would be nice to have a strategy to recover without "btrfs restore"
in such case. I wonder because "btrfs restore" assumes pausing of
normal system operation to do copying back and forth.

-- 
With best regards,
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel crash if both devices in raid1 are failing

2016-04-17 Thread Dmitry Katsubo
On 2016-04-14 22:30, Dmitry Katsubo wrote:
> Dear btrfs community,
> 
> I have the following setup:
> 
> # btrfs fi show /home
> Label: none  uuid: 865f8cf9-27be-41a0-85a4-6cb4d1658ce3
>   Total devices 3 FS bytes used 55.68GiB
>   devid1 size 52.91GiB used 0.00B path /dev/sdd2
>   devid2 size 232.89GiB used 59.03GiB path /dev/sda
>   devid3 size 111.79GiB used 59.03GiB path /dev/sdc1
> 
> btrfs volume was created in raid1 mode both for data and metadata and mounted
> with compress=lzo option.
> 
> Unfortunately, two drives (sda and sdc1) started to fail at the same time. 
> This
> leads to system crash if I start the system in runlevel 3 (see crash1.log).
> 
> After I have started the system in single mode, volume can be mounted in rw
> mode and I can write some data into it. Unfortunately when I tried to read
> a certain file, the system crashed (see crash2.log).
> 
> I have started scrub on the volume and here is the report:
> 
> # btrfs scrub status /home
> scrub status for 865f8cf9-27be-41a0-85a4-6cb4d1658ce3
>   scrub started at Tue Apr 12 20:39:20 2016 and finished after 02:40:09
>   total bytes scrubbed: 55.68GiB with 1767 errors
>   error details: verify=175 csum=1592
>   corrected errors: 1110, uncorrectable errors: 657, unverified errors: 0
> 
> Obviously, some data is lost. However due to above crash, I cannot just copy
> the data from the volume. I would assume that I still can access the data, but
> the files for which data is lost, should result I/O error (I would then 
> recover
> them from my backup).
> 
> I have decided to attach another drive and remove failing devices one-by-one.
> However that does not work:
> 
> # btrfs dev delete /dev/sda /home
> [  168.680057] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> [  168.684236] ata3.00: BMDMA stat 0x25
> [  168.688464] ata3.00: failed command: READ DMA
> [  168.692681] ata3.00: cmd c8/00:08:68:4b:84/00:00:00:00:00/e7 tag 0 dma 
> 4096 in
> [  168.692681]  res 51/40:08:68:4b:84/40:08:07:00:00/e7 Emask 0x9 
> (media error)
> [  168.701281] ata3.00: status: { DRDY ERR }
> [  168.705600] ata3.00: error: { UNC }
> [  168.724446] blk_update_request: I/O error, dev sda, sector 126110568
> [  168.728860] BTRFS error (device sdc1): bdev /dev/sda errs: wr 0, rd 43, 
> flush 0, corrupt 0, gen 0
> [  172.824043] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> [  172.828651] ata3.00: BMDMA stat 0x25
> [  172.833281] ata3.00: failed command: READ DMA
> [  172.837876] ata3.00: cmd c8/00:08:50:4b:84/00:00:00:00:00/e7 tag 0 dma 
> 4096 in
> [  172.837876]  res 51/40:08:50:4b:84/40:08:07:00:00/e7 Emask 0x9 
> (media error)
> [  172.847296] ata3.00: status: { DRDY ERR }
> [  172.852054] ata3.00: error: { UNC }
> [  172.872404] blk_update_request: I/O error, dev sda, sector 126110544
> [  172.877241] BTRFS error (device sdc1): bdev /dev/sda errs: wr 0, rd 44, 
> flush 0, corrupt 0, gen 0
> ERROR: error removing device '/dev/sda': Input/output error
> 
> The same happens when I try to delete /dev/sdc1 from the volume. Is there any
> btrfs "force" option so that btrfs balances only chunks that are accessible? I
> can potentially physically disconnect /dev/sda, but the loss will be greater
> I believe.
> 
> How can I proceed except btrfs restore?
> 
> During scrub operation the following was recorded in the logs:
> 
> [Tue Apr 12 23:10:20 2016] BTRFS warning (device sdc1): checksum error at 
> logical 126952947712 on dev /dev/sdc1, sector 126150176, root 258, inode 
> 879324, offset 308256768, length 4096, links 1 (path: lib/mysql/ibdata1)
> 
> If I collect all the messages like this, will it give a full picture of 
> damaged files?
> 
> Many thanks in advance.
> 
> P.S. Linux kernel v4.4.2, btrfs-progs v4.4.

I have decided to try "btrfs restore". Actually I have discovered two usability
points about it:

1. I cannot run this utility as following:

btrfs -i restore /dev/sda /mnt/usb &> log

because this command is interactive and may read something from the terminal.
It would be nice if there is a flag -y (answer "yes" to all questions) so that
no input is required from user. The example of the question is:

We seem to be looping a lot on ..., do you want to keep going on? [y/N/a]

In general this question puzzles me. What does it mean? As far as I understood
it prevents btrfs restore from looping forever. Should I consider those files
as lost? I have also hit the same problem as discussed in [1]: answer
"a" (always) still causes the questions to be asked.

2. btrfs restore does not print a final statistics: how many files are
successfully restored, and how many have failed.

[1] https://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg36458.html

-- 
With best regards,
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Kernel crash if both devices in raid1 are failing

2016-04-14 Thread Dmitry Katsubo
Dear btrfs community,

I have the following setup:

# btrfs fi show /home
Label: none  uuid: 865f8cf9-27be-41a0-85a4-6cb4d1658ce3
Total devices 3 FS bytes used 55.68GiB
devid1 size 52.91GiB used 0.00B path /dev/sdd2
devid2 size 232.89GiB used 59.03GiB path /dev/sda
devid3 size 111.79GiB used 59.03GiB path /dev/sdc1

btrfs volume was created in raid1 mode both for data and metadata and mounted
with compress=lzo option.

Unfortunately, two drives (sda and sdc1) started to fail at the same time. This
leads to system crash if I start the system in runlevel 3 (see crash1.log).

After I have started the system in single mode, volume can be mounted in rw
mode and I can write some data into it. Unfortunately when I tried to read
a certain file, the system crashed (see crash2.log).

I have started scrub on the volume and here is the report:

# btrfs scrub status /home
scrub status for 865f8cf9-27be-41a0-85a4-6cb4d1658ce3
scrub started at Tue Apr 12 20:39:20 2016 and finished after 02:40:09
total bytes scrubbed: 55.68GiB with 1767 errors
error details: verify=175 csum=1592
corrected errors: 1110, uncorrectable errors: 657, unverified errors: 0

Obviously, some data is lost. However due to above crash, I cannot just copy
the data from the volume. I would assume that I still can access the data, but
the files for which data is lost, should result I/O error (I would then recover
them from my backup).

I have decided to attach another drive and remove failing devices one-by-one.
However that does not work:

# btrfs dev delete /dev/sda /home
[  168.680057] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[  168.684236] ata3.00: BMDMA stat 0x25
[  168.688464] ata3.00: failed command: READ DMA
[  168.692681] ata3.00: cmd c8/00:08:68:4b:84/00:00:00:00:00/e7 tag 0 dma 4096 
in
[  168.692681]  res 51/40:08:68:4b:84/40:08:07:00:00/e7 Emask 0x9 
(media error)
[  168.701281] ata3.00: status: { DRDY ERR }
[  168.705600] ata3.00: error: { UNC }
[  168.724446] blk_update_request: I/O error, dev sda, sector 126110568
[  168.728860] BTRFS error (device sdc1): bdev /dev/sda errs: wr 0, rd 43, 
flush 0, corrupt 0, gen 0
[  172.824043] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[  172.828651] ata3.00: BMDMA stat 0x25
[  172.833281] ata3.00: failed command: READ DMA
[  172.837876] ata3.00: cmd c8/00:08:50:4b:84/00:00:00:00:00/e7 tag 0 dma 4096 
in
[  172.837876]  res 51/40:08:50:4b:84/40:08:07:00:00/e7 Emask 0x9 
(media error)
[  172.847296] ata3.00: status: { DRDY ERR }
[  172.852054] ata3.00: error: { UNC }
[  172.872404] blk_update_request: I/O error, dev sda, sector 126110544
[  172.877241] BTRFS error (device sdc1): bdev /dev/sda errs: wr 0, rd 44, 
flush 0, corrupt 0, gen 0
ERROR: error removing device '/dev/sda': Input/output error

The same happens when I try to delete /dev/sdc1 from the volume. Is there any
btrfs "force" option so that btrfs balances only chunks that are accessible? I
can potentially physically disconnect /dev/sda, but the loss will be greater
I believe.

How can I proceed except btrfs restore?

During scrub operation the following was recorded in the logs:

[Tue Apr 12 23:10:20 2016] BTRFS warning (device sdc1): checksum error at 
logical 126952947712 on dev /dev/sdc1, sector 126150176, root 258, inode 
879324, offset 308256768, length 4096, links 1 (path: lib/mysql/ibdata1)

If I collect all the messages like this, will it give a full picture of damaged 
files?

Many thanks in advance.

P.S. Linux kernel v4.4.2, btrfs-progs v4.4.

-- 
With best regards,
Dmitry
[  231.228068] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[  231.231255] ata3.00: BMDMA stat 0x25
[  231.234443] ata3.00: failed command: READ DMA
[  231.237661] ata3.00: cmd c8/00:08:60:f9:99/00:00:00:00:00/e2 tag 0 dma 4096 
in
[  231.237661]  res 51/40:08:60:f9:99/00:00:00:00:00/e2 Emask 0x9 
(media error)
[  231.244022] ata3.00: status: { DRDY ERR }
[  231.247119] ata3.00: error: { UNC }
[  231.264447] blk_update_request: I/O error, dev sda, sector 43645280
[  231.267817] BTRFS error (device sdc1): bdev /dev/sda errs: wr 0, rd 39, 
flush 0, corrupt 0, gen 0
[  232.127298] BTRFS error (device sdc1): parent transid verify failed on 
65675001856 wanted 480578 found 480435
[  232.185418] BTRFS error (device sdc1): parent transid verify failed on 
65679622144 wanted 480579 found 480435
[  232.359943] BTRFS error (device sdc1): parent transid verify failed on 
65674952704 wanted 480578 found 480435
[  232.656145] BTRFS error (device sdc1): parent transid verify failed on 
65674379264 wanted 480578 found 480435
[  232.851908] BTRFS error (device sdc1): parent transid verify failed on 
65669464064 wanted 480579 found 480577
[  233.142476] BTRFS error (device sdc1): parent transid verify failed on 
65674313728 wanted 480578 found 480435
[  233.497501] BTRFS error (device sdc1): parent transid verify failed on 
65669513216 

Re: Kernel 3.19 and still "disk full" even though 'btrfs fi df" reports enough room left?

2015-11-20 Thread Dmitry Katsubo
Many thanks to Duncan for such a verbose clarification. I am thinking
about another parallel similar to SimSity, and that is memory management
in virtual machines like Java. If heap is full, it does not really mean
that there is no free memory. In this case JVM forces garbage collector
and if after that procedure no memory was released, then it signals this
to application by raising OutOfMemoryError.

I think the similar should happen in btrfs: ENOSPC is returned to
application only when there is really no space left. If no chunk can be
further allocated, btrfs should check all "deleted" data chunks and and
return them to unallocated pool. I would expect this automatic "cleanup"
from modern filesystem.

This behaviour not necessarily should be a default one, as one can argue
that:
* Such cleanup procedure may freeze the calling process for a
considerable time, as btrfs would need to walk all allocated chunks to
find candidates for release.
* Filesystem will perhaps anyway run of free space soon, so why not to
fallback with error earlier? (for example, one process is intensively
writing the log)

It would be nice to have the automatic "cleanup" function controlled by
some /sys/fs/btrfs/features variable, which if set to 1, forces btrfs to
do its best to allocate the chunk before giving up and returning ENOSPC,
sacrificing response time of the process/application.

On 2015-11-20 04:14, Duncan wrote:
> linux-btrfs.tebulin posted on Thu, 19 Nov 2015 18:56:45 + as
> excerpted:
> 
> Meta-comment:
> 
> Apparently that attribution should actually be to Hugo Mills.  I've no 
> idea what went wrong, but at least here as received from gmane.org, the 
> from header really does say linux-btrfs.tebulin, so something obviously 
> bugged out somewhere!
> 
> 
> Meanwhile discussing btrfs data/metadata allocation, vs. usage of that 
> allocation, Hugo also known here as tebulin, explained...
> 
>> If you've ever played SimCity, the allocation process is like zoning --
>> you say what kind of thing can go on the space, but it's not actually
>> used until something gets built on it.
> 
> Very nice analogy.  Thanks. =:^)
> 
> Tho I'd actually put it in terms of the real thing that sim city 
> simulates, thus eliminating the proprietary comparison.  A city zones an 
> area one way or another, restricting what can be built there -- you can't 
> put heavy industry in a residential zone.  But the lot is still empty 
> until something's actually built there.
> 
> And if the city has all its area zoned residential, and some company 
> wants to build a plant there (generally considered a good thing as it'll 
> provide employment), there's a process that allows rezoning.
> 
> In btrfs, there's four types of allocations aka "zones":
> 
> 1) Unallocated (unzoned)
> 
> Can be used for anything but must be allocated/zoned first
> 
> 2) System
> 
> Critical but limited in size and generally only allocated at mkfs or when 
> adding a new device.
> 
> 3) Data
> 
> The actual file storage, generally the largest allocation.
> 
> 4) Metadata
> 
> Information /about/ the files, where they are located (the location and 
> size of individual extents), ownership and permissions, date stamps, 
> checksums, and for very small files (a few KiB), sometimes the file 
> content itself.
> 
> 4a) Global reserve
> 
> A small part of metadata reserved for emergency use only.  Btrfs is 
> pretty strict about its use, and will generally error out with ENOSPC if 
> metadata space other than the global reserve is used up, before actually 
> using this global reserve.  As a result, any of the global reserve used 
> at all indicates a filesystem in very severe straits, crisis mode.
> 
> 
> As it happens, btrfs in practice tends to be a bit liberal about 
> allocating/zoning data chunks, since it's easier to find bigger blocks of 
> space in unallocated space than it is in busy partly used data space.  
> (Think of a big shopping center.  It's easier to build it in outlying 
> areas that haven't been built up yet, where many whole blocks worth of 
> space can be allocated/zoned at once, than it is in the city center, 
> where even finding a single whole block vacant, is near impossible.)
> 
> Over time, therefore, more and more space tends to be allocated to data, 
> while existing data space, like those blocks near city center, may have 
> most of its files/buildings deleted, but still have a couple still 
> occupied.
> 
> Btrfs balancing, then, is comparable to the city functions of 
> condemnation and rezoning to vacant/unallocated, forcing remaining 
> occupants, most commonly data zoned, to move out of the way so the area 
> can be reclaimed for other usage.  Then it can be rezoned to data again, 
> or to metadata, whatever needs it.
> 
> 
> (FWIW, I played the original sim city, but IIRC it wasn't sophisticated 
> enough to have zoning yet.  Of course I've not played anything recent as 
> to my knowledge it's not freedomware, and since I no longer agree 

Re: Kernel 3.19 and still "disk full" even though 'btrfs fi df" reports enough room left?

2015-11-20 Thread Dmitry Katsubo
If I may add:

Information for "System"

  System, DUP: total=32.00MiB, used=16.00KiB

is also quite technical, as for end user system = metadata (one can call
it "filesystem metadata" perhaps). For simplicity the numbers can be
added to "Metadata" thus eliminating that line as well.

For those power users who really want to see the tiny details like
"System" and "GlobalReserve" I suggest to implement "-v" flag:

# btrfs fi usage -v

On 2015-11-19 03:16, Duncan wrote:
> Qu Wenruo posted on Thu, 19 Nov 2015 08:42:13 +0800 as excerpted:
> 
>> Although the metadata output is showing that you still have about 512M
>> available, but the 512M is Global Reserved space, or the unknown one.
> 
> Unknown here, as the userspace (btrfs-progs) is evidently too old to show 
> it as global reserve, as it does in newer versions...
> 
>> The output is really a little confusing. I'd like the change the output
>> by adding global reserved into metadata used space and make it a sub
>> item for metadata.
> 
> Thanks for the clarification.  It's most helpful, here. =:^)
> 
> I've at times wondered if global reserve folded into one of the other 
> settings.  Apparently it comes from the metadata allocation, but while 
> metadata is normally dup (single-device btrfs) or raid1 (multi-device), 
> global reserve is single.
> 
> It would have been nice if that sort of substructure was described a bit 
> better when global reserve first made its appearance, at least in the 
> patch descriptions and release announcement, if not then yet in btrfs fi 
> df output, first implementations being what they are.  But regardless, 
> now at least it should be clear for list regulars who read this thread 
> anyway, since the above reasonably clarifies things.
> 
> As for btrfs fi df, making global reserve a metadata subentry there would 
> be one way to deal with it, preserving the exposure of the additional 
> data provided by that line (here, the fact that global reserve is 
> actually being used, underlining the fact that the filesystem is severely 
> short on space).
> 
> Another way of handling it would be to simply add the global reserve into 
> the metadata used figure before printing it, eliminating the separate 
> global reserve line, and changing the upthread posted metadata line from 
> 8.48 GiB of 9 GiB used, to 8.98 of 9 GiB used, which is effectively the 
> case if the 512 MiB of global reserve indeed comes from the metadata 
> allocation.  This would more clearly show how full metadata actually is 
> without the added complexity of an additional global reserve line, but 
> would lose the fact that global reserve is actually in use, that the 
> broken out global reserve line exposes.
> 
> I'd actually argue in favor of the latter, directly folding global 
> reserve allocation into metadata used, since it'd both be simpler, and 
> more consistent if for instance btrfs fi usage didn't report separate 
> global reserve in the overall stats, but fail to report it in the per-
> device stats and in btrfs dev usage.
> 
> Either way would make much clearer that metadata is actually running out 
> than the current report layout does, since "metadata used" would then 
> either explicitly or implicitly include the global reserve.
> 


-- 
With best regards,
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Kernel 3.19 and still "disk full" even though 'btrfs fi df" reports enough room left?

2015-11-20 Thread Dmitry Katsubo
On 2015-11-20 14:52, Austin S Hemmelgarn wrote:
> On 2015-11-20 08:27, Hugo Mills wrote:
>> On Fri, Nov 20, 2015 at 08:21:31AM -0500, Austin S Hemmelgarn wrote:
>>> On 2015-11-20 06:39, Dmitry Katsubo wrote:
>>>> For those power users who really want to see the tiny details like
>>>> "System" and "GlobalReserve" I suggest to implement "-v" flag:
>>>>
>>>> # btrfs fi usage -v
>>> Actually, I really like this idea, one of the questions I get asked
>>> when I show people BTRFS is the difference between System and
>>> Metadata, and it's not always easy to explain to somebody who
>>> doesn't have a background in filesystem development.  For some
>>> reason, people seem to have trouble with the concept that the system
>>> tree is an index of the other trees.
>>
>> Actually, it's not that in the system chunks. :)
>>
>> System chunks contain the chunk tree, not the tree of tree roots.
>> They're special (and small) because they're listed explicitly by devid
>> and physical offset at the end of the superblock, and allow the FS to
>> read them first so that it can bootstrap the logical:physical mapping
>> table before it starts reading all the other metadata like the tree of
>> tree roots (which is "normal" metadata).
> I guess my understanding was wrong then.  Thanks for the explanation.

The size of "System" is anyway very small in comparison other types of
allocations. Adding it to "Metadata" or simply suppressing does not make
big difference. If shown, it actually raises "dummy" questions (Is
"System" area big enough? Can it run out of free space? How can I add
more space to it?).

-- 
With best regards,
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Potential to loose data in case of disk failure

2015-11-12 Thread Dmitry Katsubo
On 2015-11-12 13:47, Austin S Hemmelgarn wrote:
>> That's a pretty unusual setup, so I'm not surprised there's no quick and
>> easy answer. The best solution in my opinion would be to shuffle your
>> partitions around and combine sda3 and sda8 into a single partition.
>> There's generally no reason to present btrfs with two different
>> partitions on the same disk.
>>
>> If there's something that prevents you from doing that, you may be able
>> to use RAID10 or RAID6 somehow. I'm not really sure, though, so I'll
>> defer to others on the list for implementation details.
> RAID10 has the same issue.  Assume you have 1 block.  This gets stored
> as 2 copies, each with 2 stripes, with the stripes split symmetrically.
>  For this, call the first half of the first copy 1a, the second half 1b,
> and likewise for 2a and 2b with the second copy.  1a and 2a have
> identical contents, and 1b and 2b have identical contents.  It is fully
> possible that you will end up with this block striped such that 1a and
> 2a are on one disk, and 1b and 2b on the other.  Based on this, losing
> one disk would mean losing half the block, which would mean based on how
> BTRFS works that you would lose the whole block (because neither copy
> would be complete).

Does it equally apply to RAID1? Namely, if I create

mkfs.btrfs -mraid1 -draid1 /dev/sda3 /dev/sda8

then btrfs will "believe" that these are different drives and mistakenly
think that RAID pre-condition is satisfied. Am I right? If so then I
think this is a trap, and mkfs.btrfs should at least warn (or require
--force) if two partitions are on the same drive for raid1/raid5/raid10.
In other words, the only scenario when this check should be skipped is:

mkfs.btrfs -mraid0 -draid0 /dev/sda3 /dev/sda8

-- 
With best regards,
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Process is blocked for more than 120 seconds

2015-11-11 Thread Dmitry Katsubo
On 2015-11-09 14:25, Austin S Hemmelgarn wrote:
> On 2015-11-07 07:22, Dmitry Katsubo wrote:
>> Hi everyone,
>>
>> I have noticed the following in the log. The system continues to run,
>> but I am not sure for how long it will be stable. Should I start
>> worrying? Thanks in advance for the opinion.
>>
> This just means that a process was stuck in the D state (uninterruptible
> I/O sleep) for more than 120 seconds.  Depending on a number of factors,
> this happening could mean:
> 1. Absolutely nothing (if you have low-powered or older hardware, for
> example, I get these regularly on a first generation Raspberry Pi if I
> don't increase the timeout significantly)
> 2. The program is doing a very large chunk of I/O (usually with the
> O_DIRECT flag, although this probably isn't the case here)
> 3. There's a bug in the blocked program (this is rarely the case when
> this type of thing happens)
> 4. There's a bug in the kernel (which is why this dumps a stack trace)
> 5. The filesystem itself is messed up somehow, and the kernel isn't
> handling it properly (technically a bug, but a more specific case of it).
> 6. You're hardware is misbehaving, failing, or experienced a transient
> error.
> 
> Assuming you can rule out possibilities 1 and 6, I think that 4 is the
> most likely cause, as all of the listed programs (I'm assuming that
> 'master' is from postfix) are relatively well audited, and all of them
> hit this at the same time.
> 
> For what it's worth, if you want you can do:
> echo 0 > /proc/sys/kernel/hung_task_timeout_secs
> like the message says to stop these from appearing in the future, or use
> some arbitrary number to change the timeout before these messages appear
> (I usually use at least 150 on production systems, and more often 300,
> although on something like a Raspberry Pi I often use timeouts as high
> as 1800 seconds).

Thanks for comments, Austin.

The system is "normal" PC, running Intel Core 2 Duo Mobile @1.66GHz.
"master" is indeed a postfix process.

I haven't seen anything like that when I was on 3.16 kernel, but after I
have upgraded to 4.2.3, I caught that message. I/O and CPU load are
usually low, but it could be (6) from your list, as the system is
generally very old (5+ years).

As the problem appeared only once for passed 15 days, I think it is just
a transient error. Thanks for clarifying the possible reasons.

-- 
With best regards,
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Process is blocked for more than 120 seconds

2015-11-07 Thread Dmitry Katsubo
Hi everyone,

I have noticed the following in the log. The system continues to run,
but I am not sure for how long it will be stable. Should I start
worrying? Thanks in advance for the opinion.

# uname -a
Linux Debian 4.2.3-2~bpo8+1 (2015-10-20) i686 GNU/Linux

# mount | grep /var
/dev/sdd2 on /var type btrfs
(rw,noatime,compress=lzo,space_cache,subvolid=258,subvol=/var)

> [Mon Nov  2 06:35:57 2015] INFO: task nscd:859 blocked for more than 120 
> seconds.
> [Mon Nov  2 06:35:57 2015]   Not tainted 4.2.0-0.bpo.1-686-pae #1
> [Mon Nov  2 06:35:57 2015] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
> disables this message.
> [Mon Nov  2 06:35:57 2015] nscdD f1c7dd20 0   859  1 
> 0x
> [Mon Nov  2 06:35:57 2015]  f1c7dd40 00200082 f79de900 f1c7dd20 c10bc119 
> ffe0 f3aec740 00200246
> [Mon Nov  2 06:35:57 2015]  f74ea800 f79e3f40 f77fb800 f1c7e000 f6b381dc 
> f6b38000 f1c7dd4c c14f1fdb
> [Mon Nov  2 06:35:57 2015]  d5553960 f1c7dd70 f867672f  f77fb800 
> c1099250 d0a4be08 d9755e68
> [Mon Nov  2 06:35:57 2015] Call Trace:
> [Mon Nov  2 06:35:57 2015]  [] ? del_timer_sync+0x49/0x50
> [Mon Nov  2 06:35:57 2015]  [] ? schedule+0x2b/0x80
> [Mon Nov  2 06:35:57 2015]  [] ? 
> wait_current_trans.isra.21+0x8f/0xf0 [btrfs]
> [Mon Nov  2 06:35:57 2015]  [] ? wait_woken+0x80/0x80
> [Mon Nov  2 06:35:57 2015]  [] ? start_transaction+0x3d0/0x5d0 
> [btrfs]
> [Mon Nov  2 06:35:57 2015]  [] ? 
> btrfs_delalloc_reserve_metadata+0x32d/0x580 [btrfs]
> [Mon Nov  2 06:35:57 2015]  [] ? btrfs_dirty_inode+0xb0/0xb0 [btrfs]
> [Mon Nov  2 06:35:57 2015]  [] ? btrfs_join_transaction+0x23/0x30 
> [btrfs]
> [Mon Nov  2 06:35:57 2015]  [] ? btrfs_dirty_inode+0x39/0xb0 [btrfs]
> [Mon Nov  2 06:35:57 2015]  [] ? btrfs_dirty_inode+0xb0/0xb0 [btrfs]
> [Mon Nov  2 06:35:57 2015]  [] ? file_update_time+0x7e/0xc0
> [Mon Nov  2 06:35:57 2015]  [] ? btrfs_page_mkwrite+0x80/0x3c0 
> [btrfs]
> [Mon Nov  2 06:35:57 2015]  [] ? hrtimer_cancel+0x19/0x20
> [Mon Nov  2 06:35:57 2015]  [] ? futex_wait+0x1e1/0x270
> [Mon Nov  2 06:35:57 2015]  [] ? do_page_mkwrite+0x38/0x90
> [Mon Nov  2 06:35:57 2015]  [] ? do_wp_page+0x2e2/0x6d0
> [Mon Nov  2 06:35:57 2015]  [] ? futex_wake+0x71/0x140
> [Mon Nov  2 06:35:57 2015]  [] ? kmap_atomic_prot+0xe7/0x110
> [Mon Nov  2 06:35:57 2015]  [] ? handle_mm_fault+0xd59/0x14d0
> [Mon Nov  2 06:35:57 2015]  [] ? __do_page_fault+0x18c/0x480
> [Mon Nov  2 06:35:57 2015]  [] ? __do_page_fault+0x480/0x480
> [Mon Nov  2 06:35:57 2015]  [] ? error_code+0x67/0x6c
> [Mon Nov  2 06:35:57 2015] INFO: task nscd:864 blocked for more than 120 
> seconds.
> [Mon Nov  2 06:35:57 2015]   Not tainted 4.2.0-0.bpo.1-686-pae #1
> [Mon Nov  2 06:35:57 2015] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
> disables this message.
> [Mon Nov  2 06:35:57 2015] nscdD f1c87f5c 0   864  1 
> 0x
> [Mon Nov  2 06:35:57 2015]  f1c87ef4 00200082 f1c87f80 f1c87f5c 03e7 
> f1c87ee4 f3aec740 ac76c560
> [Mon Nov  2 06:35:57 2015]  f74ea800 f79e3f40 f3c7b040 f1c88000 f3c7b040 
> 0001 f1c87f00 c14f1fdb
> [Mon Nov  2 06:35:57 2015]  f3aec77c f1c87f38 c14f4265 f1c87f1c f3aec780 
> f3aec788  0125
> [Mon Nov  2 06:35:57 2015] Call Trace:
> [Mon Nov  2 06:35:57 2015]  [] ? schedule+0x2b/0x80
> [Mon Nov  2 06:35:57 2015]  [] ? rwsem_down_write_failed+0x185/0x280
> [Mon Nov  2 06:35:57 2015]  [] ? 
> call_rwsem_down_write_failed+0x6/0x8
> [Mon Nov  2 06:35:57 2015]  [] ? down_write+0x25/0x40
> [Mon Nov  2 06:35:57 2015]  [] ? vm_mmap_pgoff+0x4a/0xa0
> [Mon Nov  2 06:35:57 2015]  [] ? SyS_fstat64+0x28/0x30
> [Mon Nov  2 06:35:57 2015]  [] ? SyS_mmap_pgoff+0x110/0x210
> [Mon Nov  2 06:35:57 2015]  [] ? sysenter_do_call+0x12/0x12
> [Mon Nov  2 06:35:57 2015] INFO: task nmbd:1330 blocked for more than 120 
> seconds.
> [Mon Nov  2 06:35:57 2015]   Not tainted 4.2.0-0.bpo.1-686-pae #1
> [Mon Nov  2 06:35:57 2015] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
> disables this message.
> [Mon Nov  2 06:35:57 2015] nmbdD  0  1330  1 
> 0x
> [Mon Nov  2 06:35:57 2015]  ef44bd74 00200086    
>  f3984900 
> [Mon Nov  2 06:35:57 2015]  f69e1800 f79e3f40 f3a7a800 ef44c000 d17255a0 
> d17255a0 ef44bd80 c14f1fdb
> [Mon Nov  2 06:35:57 2015]  d1725600 ef44bdc8 f86961b5 000d3fff  
> 1000  000d3000
> [Mon Nov  2 06:35:57 2015] Call Trace:
> [Mon Nov  2 06:35:57 2015]  [] ? schedule+0x2b/0x80
> [Mon Nov  2 06:35:57 2015]  [] ? 
> btrfs_start_ordered_extent+0xd5/0x100 [btrfs]
> [Mon Nov  2 06:35:57 2015]  [] ? wait_woken+0x80/0x80
> [Mon Nov  2 06:35:57 2015]  [] ? 
> lock_and_cleanup_extent_if_need+0x134/0x260 [btrfs]
> [Mon Nov  2 06:35:57 2015]  [] ? prepare_pages+0xc6/0x150 [btrfs]
> [Mon Nov  2 06:35:57 2015]  [] ? __btrfs_buffered_write+0x17a/0x5e0 
> [btrfs]
> [Mon Nov  2 06:35:57 2015]  [] ? __alloc_pages_nodemask+0x133/0x880
> [Mon Nov  2 06:35:57 2015]  [] ? 

Process is blocked for more than 120 seconds

2015-11-02 Thread Dmitry Katsubo
Hi everyone,

I have noticed the following in the log. The system continues to run,
but I am not sure for how long it will be stable.

# uname -a
Linux Debian 4.2.3-2~bpo8+1 (2015-10-20) i686 GNU/Linux

# mount | grep /var
/dev/sdd2 on /var type btrfs
(rw,noatime,compress=lzo,space_cache,subvolid=258,subvol=/var)

> [Mon Nov  2 06:35:57 2015] INFO: task nscd:859 blocked for more than 120 
> seconds.
> [Mon Nov  2 06:35:57 2015]   Not tainted 4.2.0-0.bpo.1-686-pae #1
> [Mon Nov  2 06:35:57 2015] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
> disables this message.
> [Mon Nov  2 06:35:57 2015] nscdD f1c7dd20 0   859  1 
> 0x
> [Mon Nov  2 06:35:57 2015]  f1c7dd40 00200082 f79de900 f1c7dd20 c10bc119 
> ffe0 f3aec740 00200246
> [Mon Nov  2 06:35:57 2015]  f74ea800 f79e3f40 f77fb800 f1c7e000 f6b381dc 
> f6b38000 f1c7dd4c c14f1fdb
> [Mon Nov  2 06:35:57 2015]  d5553960 f1c7dd70 f867672f  f77fb800 
> c1099250 d0a4be08 d9755e68
> [Mon Nov  2 06:35:57 2015] Call Trace:
> [Mon Nov  2 06:35:57 2015]  [] ? del_timer_sync+0x49/0x50
> [Mon Nov  2 06:35:57 2015]  [] ? schedule+0x2b/0x80
> [Mon Nov  2 06:35:57 2015]  [] ? 
> wait_current_trans.isra.21+0x8f/0xf0 [btrfs]
> [Mon Nov  2 06:35:57 2015]  [] ? wait_woken+0x80/0x80
> [Mon Nov  2 06:35:57 2015]  [] ? start_transaction+0x3d0/0x5d0 
> [btrfs]
> [Mon Nov  2 06:35:57 2015]  [] ? 
> btrfs_delalloc_reserve_metadata+0x32d/0x580 [btrfs]
> [Mon Nov  2 06:35:57 2015]  [] ? btrfs_dirty_inode+0xb0/0xb0 [btrfs]
> [Mon Nov  2 06:35:57 2015]  [] ? btrfs_join_transaction+0x23/0x30 
> [btrfs]
> [Mon Nov  2 06:35:57 2015]  [] ? btrfs_dirty_inode+0x39/0xb0 [btrfs]
> [Mon Nov  2 06:35:57 2015]  [] ? btrfs_dirty_inode+0xb0/0xb0 [btrfs]
> [Mon Nov  2 06:35:57 2015]  [] ? file_update_time+0x7e/0xc0
> [Mon Nov  2 06:35:57 2015]  [] ? btrfs_page_mkwrite+0x80/0x3c0 
> [btrfs]
> [Mon Nov  2 06:35:57 2015]  [] ? hrtimer_cancel+0x19/0x20
> [Mon Nov  2 06:35:57 2015]  [] ? futex_wait+0x1e1/0x270
> [Mon Nov  2 06:35:57 2015]  [] ? do_page_mkwrite+0x38/0x90
> [Mon Nov  2 06:35:57 2015]  [] ? do_wp_page+0x2e2/0x6d0
> [Mon Nov  2 06:35:57 2015]  [] ? futex_wake+0x71/0x140
> [Mon Nov  2 06:35:57 2015]  [] ? kmap_atomic_prot+0xe7/0x110
> [Mon Nov  2 06:35:57 2015]  [] ? handle_mm_fault+0xd59/0x14d0
> [Mon Nov  2 06:35:57 2015]  [] ? __do_page_fault+0x18c/0x480
> [Mon Nov  2 06:35:57 2015]  [] ? __do_page_fault+0x480/0x480
> [Mon Nov  2 06:35:57 2015]  [] ? error_code+0x67/0x6c
> [Mon Nov  2 06:35:57 2015] INFO: task nscd:864 blocked for more than 120 
> seconds.
> [Mon Nov  2 06:35:57 2015]   Not tainted 4.2.0-0.bpo.1-686-pae #1
> [Mon Nov  2 06:35:57 2015] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
> disables this message.
> [Mon Nov  2 06:35:57 2015] nscdD f1c87f5c 0   864  1 
> 0x
> [Mon Nov  2 06:35:57 2015]  f1c87ef4 00200082 f1c87f80 f1c87f5c 03e7 
> f1c87ee4 f3aec740 ac76c560
> [Mon Nov  2 06:35:57 2015]  f74ea800 f79e3f40 f3c7b040 f1c88000 f3c7b040 
> 0001 f1c87f00 c14f1fdb
> [Mon Nov  2 06:35:57 2015]  f3aec77c f1c87f38 c14f4265 f1c87f1c f3aec780 
> f3aec788  0125
> [Mon Nov  2 06:35:57 2015] Call Trace:
> [Mon Nov  2 06:35:57 2015]  [] ? schedule+0x2b/0x80
> [Mon Nov  2 06:35:57 2015]  [] ? rwsem_down_write_failed+0x185/0x280
> [Mon Nov  2 06:35:57 2015]  [] ? 
> call_rwsem_down_write_failed+0x6/0x8
> [Mon Nov  2 06:35:57 2015]  [] ? down_write+0x25/0x40
> [Mon Nov  2 06:35:57 2015]  [] ? vm_mmap_pgoff+0x4a/0xa0
> [Mon Nov  2 06:35:57 2015]  [] ? SyS_fstat64+0x28/0x30
> [Mon Nov  2 06:35:57 2015]  [] ? SyS_mmap_pgoff+0x110/0x210
> [Mon Nov  2 06:35:57 2015]  [] ? sysenter_do_call+0x12/0x12
> [Mon Nov  2 06:35:57 2015] INFO: task nmbd:1330 blocked for more than 120 
> seconds.
> [Mon Nov  2 06:35:57 2015]   Not tainted 4.2.0-0.bpo.1-686-pae #1
> [Mon Nov  2 06:35:57 2015] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
> disables this message.
> [Mon Nov  2 06:35:57 2015] nmbdD  0  1330  1 
> 0x
> [Mon Nov  2 06:35:57 2015]  ef44bd74 00200086    
>  f3984900 
> [Mon Nov  2 06:35:57 2015]  f69e1800 f79e3f40 f3a7a800 ef44c000 d17255a0 
> d17255a0 ef44bd80 c14f1fdb
> [Mon Nov  2 06:35:57 2015]  d1725600 ef44bdc8 f86961b5 000d3fff  
> 1000  000d3000
> [Mon Nov  2 06:35:57 2015] Call Trace:
> [Mon Nov  2 06:35:57 2015]  [] ? schedule+0x2b/0x80
> [Mon Nov  2 06:35:57 2015]  [] ? 
> btrfs_start_ordered_extent+0xd5/0x100 [btrfs]
> [Mon Nov  2 06:35:57 2015]  [] ? wait_woken+0x80/0x80
> [Mon Nov  2 06:35:57 2015]  [] ? 
> lock_and_cleanup_extent_if_need+0x134/0x260 [btrfs]
> [Mon Nov  2 06:35:57 2015]  [] ? prepare_pages+0xc6/0x150 [btrfs]
> [Mon Nov  2 06:35:57 2015]  [] ? __btrfs_buffered_write+0x17a/0x5e0 
> [btrfs]
> [Mon Nov  2 06:35:57 2015]  [] ? __alloc_pages_nodemask+0x133/0x880
> [Mon Nov  2 06:35:57 2015]  [] ? btrfs_file_write_iter+0x1e5/0x550 
> [btrfs]
> [Mon Nov  2 

Re: How to remove missing device on RAID1?

2015-10-21 Thread Dmitry Katsubo
On 2015-10-21 00:40, Henk Slager wrote:
> I had a similar issue some time ago, around the time kernel 4.1.6 was
> just there.
> In case you don't want to wait for new disk or decide to just run the
> filesystem with 1 disk less or maybe later on replace 1 of the still
> healthy disks with a double/bigger sized one and use current/older
> kernel+tools, you could do this (assuming the filesystem is not too
> full of course):
> - mount degraded
> - btrfs balance start -f -v -sdevid=1 -sdevid=1 -sdevid=1 
>   (where missing disk has devid 1)

Am I right that one can "btrfs dev delete 1" after balance succeeded?

> After completion the (virtual/missing) device shall be fully unallocated
> - create /dev/loopX with sparse file of same size as missing disk on
> some other filesystem
> - btrfs replace start 1 /dev/loopX 
> - remove /dev/loopX from the filesystem
> - remount filesystyem without degraded
> And remove /dev/loopX

If would be nice if btrfs allows to delete device and perform rebalance
automatically (provided that left devices still have enough space to
sustain raidX prerequisite).

-- 
With best regards,
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Recover btrfs volume which can only be mounded in read-only mode

2015-10-18 Thread Dmitry Katsubo
On 16/10/2015 10:18, Duncan wrote:
> Dmitry Katsubo posted on Thu, 15 Oct 2015 16:10:13 +0200 as excerpted:
> 
>> On 15 October 2015 at 02:48, Duncan <1i5t5.dun...@cox.net> wrote:
>>
>>> [snipped] 
>>
>> Thanks for this information. As far as I can see, btrfs-tools v4.1.2 in
>> now in experimental Debian repo (but you anyway suggest at least 4.2.2,
>> which is just 10 days ago released in master git). Kernel image 3.18 is
>> still not there, perhaps because Debian jessie was frozen before is was
>> released (2014-12-07).
> 
> For userspace, as long as it's supporting the features you need at 
> runtime (where it generally simply has to know how to make the call to 
> the kernel, to do the actual work), and you're not running into anything 
> really hairy that you're trying to offline-recover, which is where the 
> latest userspace code becomes critical...
> 
> Running a userspace series behind, or even more (as long as it's not 
> /too/ far), isn't all /that/ critical a problem.
> 
> It generally becomes a problem in one of three ways: 1) You have a bad 
> filesystem and want the best chance at fixing it, in which case you 
> really want the latest code, including the absolute latest fixups for the 
> most recently discovered possible problems. 2) You want/need a new 
> feature that's simply not supported in your old userspace.  3) The 
> userspace gets so old that the output from its diagnostics commands no 
> longer easily compares with that of current tools, giving people on-list 
> difficulties when trying to compare the output in your posts to the 
> output they get.
> 
> As a very general rule, at least try to keep the userspace version 
> comparable to the kernel version you are running.  Since the userspace 
> version numbering syncs to kernelspace version numbering, and userspace 
> of a particular version is normally released shortly after the similarly 
> numbered kernel series is released, with a couple minor updates before 
> the next kernel-series-synced release, keeping userspace to at least the 
> kernel space version, means you're at least running the userspace release 
> that was made with that kernel series release in mind.
> 
> Then, as long as you don't get too far behind on kernel version, you 
> should remain at least /somewhat/ current on userspace as well, since 
> you'll be upgrading to near the same userspace (at least), when you 
> upgrade the kernel.
> 
> Using that loose guideline, since you're aiming for the 3.18 stable 
> kernel, you should be running at least a 3.18 btrfs-progs as well.
> 
> In that context, btrfs-progs 4.1.2 should be fine, as long as you're not 
> trying to fix any problems that a newer version fixed.  And, my 
> recommendation of the latest 4.2.2 was in the "fixing problems" context, 
> in which case, yes, getting your hands on 4.2.2, even if it means 
> building from sources to do so, could be critical, depending of course on 
> the problem you're trying to fix.  But otherwise, 4.1.2, or even back to 
> the last 3.18.whatever release since that's the kernel version you're 
> targeting, should be fine.
> 
> Just be sure that whenever you do upgrade to later, you avoid the known-
> bad-mkfs.btrfs in 4.2.0 and/or 4.2.1 -- be sure if you're doing the btrfs-
> progs-4.2 series, that you get 4.2.2 or later.
> 
> As for finding a current 3.18 series kernel released for Debian, I'm not 
> a Debian user so my my knowledge of the ecosystem around it is limited, 
> but I've been very much under the impression that there are various 
> optional repos available that you can choose to include and update from 
> as well, and I'm quite sure based on previous discussions with others 
> that there's a well recognized and fairly commonly enabled repo that 
> includes debian kernel updates thru current release, or close to it.
> 
> Of course you could also simply run a mainstream Linus kernel and build 
> it yourself, and it's not too horribly hard to do either, as there's all 
> sorts of places with instructions for doing so out there, and back when I 
> switched from MS to freedomware Linux in late 2001, I learned the skill, 
> at at least the reasonably basic level of mostly taking a working config 
> from my distro's kernel and using it as a basis for my mainstream kernel 
> config as well, within about two months of switching.
> 
> Tho of course just because you can doesn't mean you want to, and for 
> many, finding their distro's experimental/current kernel repos and simply 
> installing the packages from it, will be far simpler.
> 
> But regardless of the method used, finding or building and keeping 
> current with your own copy of at least the lastest couple of LTS 
> 

Re: Recover btrfs volume which can only be mounded in read-only mode

2015-10-15 Thread Dmitry Katsubo
On 15 October 2015 at 02:48, Duncan <1i5t5.dun...@cox.net> wrote:
> Dmitry Katsubo posted on Wed, 14 Oct 2015 22:27:29 +0200 as excerpted:
>
>> On 14/10/2015 16:40, Anand Jain wrote:
>>>> # mount -o degraded /var Oct 11 18:20:15 kernel: BTRFS: too many
>>>> missing devices, writeable mount is not allowed
>>>>
>>>> # mount -o degraded,ro /var # btrfs device add /dev/sdd1 /var ERROR:
>>>> error adding the device '/dev/sdd1' - Read-only file system
>>>>
>>>> Now I am stuck: I cannot add device to the volume to satisfy raid
>>>> pre-requisite.
>>>
>>>  This is a known issue. Would you be able to test below set of patches
>>>  and update us..
>>>
>>>[PATCH 0/5] Btrfs: Per-chunk degradable check
>>
>> Many thanks for the reply. Unfortunately I have no environment to
>> recompile the kernel, and setting it up will perhaps take a day. Can the
>> latest kernel be pushed to Debian sid?

Duncan, many thanks for verbose answer. I appreciate a lot.

> In the way of general information...
>
> While btrfs is no longer entirely unstable (since 3.12 when the
> experimental tag was removed) and kernel patch backports are generally
> done where stability is a factor, it's not yet fully stable and mature,
> either.  As such, an expectation of true stability such that wishing to
> remain on kernels more than one LTS series behind the latest LTS kernel
> series (4.1, with 3.18 the one LTS series back version) can be considered
> incompatible with wishing to run the still under heavy development and
> not yet fully stable and mature btrfs, at least as soon as problems are
> reported.  A request to upgrade to current and/or to try various not yet
> mainline integrated patches is thus to be expected on report of problems.
>
> As for userspace, the division between btrfs kernel and userspace works
> like this:  Under normal operating conditions, userspace simply makes
> requests of the kernel, which does the actual work.  Thus, under normal
> conditions, updated kernel code is most important.  However, once a
> problem occurs and repair/recovery is attempted, it's generally userspace
> code itself directly operating on the unmounted filesystem, so having the
> latest userspace code fixes becomes most important once something has
> gone wrong and you're trying to fix it.
>
> So upgrading to a 3.18 series kernel, at minimum, is very strongly
> recommended for those running btrfs, with an expectation that an upgrade
> to 4.1 should be being planned and tested, for deployment as soon as it's
> passing on-site pre-deployment testing.  And an upgrade to current or
> close to current btrfs-progs 4.2.2 userspace is recommended as soon as
> you need its features, which include the latest patches for repair and
> recovery, so as soon as you have a filesystem that's not working as
> expected, if not before.  (Note that earlier btrfs-progs 4.2 releases,
> before 4.2.2, had a buggy mkfs.btrfs, so they should be skipped if you
> will be doing mkfs.btrfs with them, and any btrfs created with those
> versions should have what's on them backed up if it's not already, and
> the filesystems recreated with 4.2.2, as they'll be unstable and are
> subject to failure.)

Thanks for this information. As far as I can see, btrfs-tools v4.1.2
in now in experimental Debian repo (but you anyway suggest at least
4.2.2, which is just 10 days ago released in master git). Kernel image
3.18 is still not there, perhaps because Debian jessie was frozen
before is was released (2014-12-07).

>> 1. Is there any way to recover btrfs at the moment? Or the easiest I can
>> do is to mount ro, copy all data to another drive, re-create btrfs
>> volume and copy back?
>
> Sysadmin's rule of backups:  If data isn't backed up, by definition you
> value the data less than the cost of time/hassle/resources to do the
> backup, so loss of a filesystem is never a big problem, because if the
> data was of any value, it was backed up and can be restored from that
> backup, and if it wasn't backed up, then by definition you have already
> saved the more important to you commodity, the hassle/time/resources you
> would have spent doing the backup.  Therefore, loss of a filesystem is
> loss of throw-away data in any case, either because it was backed up (and
> a would-be backup that hasn't been tested restorable isn't yet a
> completed backup, so doesn't count), or because the data really was throw-
> away data, not worth the hassle of backing up in the first place, even at
> risk of loss should the un-backed-up data be lost.
>
> No exceptions.  Any after-the-fact protests to the contrary simply put
> the lie to claims that the value 

Recover btrfs volume which can only be mounded in read-only mode

2015-10-14 Thread Dmitry Katsubo
Dear btrfs community,

I am facing several problems regarding to btrfs, and I will be very
thankful if someone can help me with. Also while playing with btrfs I
have few suggestions – would be nice if one can comment on those.

While starting the system, /var (which is btrfs volume) failed to be
mounted. That btrfs volume was created with the following options:

# mkfs.btrfs -d raid1 -m raid1 /dev/sdc2 /dev/sda /dev/sdd1

Here comes what is recorded in systemd journal during the startup:

[2.931097] BTRFS: device fsid 57b828ee-5984-4f50-89ff-4c9be0fd3084
devid 2 transid 394288 /dev/sda
[9.810439] BTRFS: device fsid 57b828ee-5984-4f50-89ff-4c9be0fd3084
devid 1 transid 394288 /dev/sdc2
Oct 11 13:00:22 systemd[1]: Job
dev-disk-by\x2duuid-57b828ee\x2d5984\x2d4f50\x2d89ff\x2d4c9be0fd3084.device/start
timed out.
Oct 11 13:00:22 systemd[1]: Timed out waiting for device
dev-disk-by\x2duuid-57b828ee\x2d5984\x2d4f50\x2d89ff\x2d4c9be0fd3084.device.

After the system started on runlevel 1, I attempted to mount the filesystem:

# mount /var
Oct 11 13:53:55 kernel: BTRFS info (device sdc2): disk space caching is enabled
Oct 11 13:53:55 kernel: BTRFS: failed to read chunk tree on sdc2
Oct 11 13:53:55 kernel: BTRFS: open_ctree failed

When I google for "failed to read chunk tree" the feedback was that
something really bad is happening, and it's time to restore the data /
give up with btrfs. In fact, this message is misleading because it
refers /dev/sdc2 which is a mount device in fstab but this is SSD
drive, so it is very unlikely to cause "read" error. Literally I read
the message as "BTRFS: tried to read something from sdc2 and failed".
Maybe it is better to re-phrase the message to "failed to construct
chunk tree on /var (sdc2,sda,sdd1)"?

Next I did a check:

# btrfs check /dev/sdc2
warning devid 3 not found already
checking extents
checking free space cache
Error reading 36818145280, -1
checking fs roots
checking csums
checking root refs
Checking filesystem on /dev/sdc2
UUID: 57b828ee-5984-4f50-89ff-4c9be0fd3084
failed to load free space cache for block group 36536582144
found 29602081783 bytes used err is 0
total csum bytes: 57681304
total tree bytes: 1047363584
total fs tree bytes: 843694080
total extent tree bytes: 121159680
btree space waste bytes: 207443742
file data blocks allocated: 4524416
 referenced 60893913088

The message "devid 3 not found already" does not tell much to me. If I
understand correctly, btrfs does not store the list of devices in the
metadata, but maybe it would be a good idea to save the last seen
information about devices so that I would not need to guess what
"devid 3" means?

Next I tried to list all devices in my btrfs volume. I found this is
not possible (unless volume is mounted). Would be nice if "btrfs
device scan" outputs the detected volumes / devices to stdout (e.g.
with "-v" option) or there is any other way to do that.

Then I have mounted the volume in degraded mode and only after that I
could understand what the error message means:

# mount /var -o degraded
# btrfs device stats /var
btrfs device stats /var
[/dev/sdc2].write_io_errs   0
[/dev/sdc2].read_io_errs0
[/dev/sdc2].flush_io_errs   0
[/dev/sdc2].corruption_errs 0
[/dev/sdc2].generation_errs 0
[/dev/sda].write_io_errs   0
[/dev/sda].read_io_errs0
[/dev/sda].flush_io_errs   0
[/dev/sda].corruption_errs 0
[/dev/sda].generation_errs 0
[].write_io_errs   3160958
[].read_io_errs0
[].flush_io_errs   0
[].corruption_errs 0
[].generation_errs 0

Now I can see that the device with devid 3 is actually /dev/sdd1,
which btrfs found not ready. Is it possible to improve btrfs output
and to list "last seen device" in that output, e.g.

[/dev/sdd1*].write_io_errs   3160958
[/dev/sdd1*].read_io_errs0
...

where "*" means that device is missing.

I have listed all partitions and /dev/sdd1 was among them. I have also run

# badblocks /dev/sdd

and it found no bad blocks. Why btrfs considers the device "not ready"
– that is a question.

Afterwards I have decided to run scrub:

# btrfs scrub start /var
# btrfs scrub status /var
scrub status for 57b828ee-5984-4f50-89ff-4c9be0fd3084
scrub started at Sun Oct 11 14:55:45 2015 and was aborted after 1365 seconds
total bytes scrubbed: 89.52GiB with 0 errors

I have noticed that btrfs always reports "was aborted after X
seconds", even if scrub is still running (I check that X and number of
bytes scrubbed is increasing). That is confusing. After scrub
finished, I have no idea whether it scrubbed everything, or was really
aborted. And if it was aborted, what is the reason? Also it would be
nice if status displays the number of data bytes (without replicas)
scrubbed because the number 89.52GiB includes all replicas (of raid1
in my case):

total bytes scrubbed: 89.52GiB (data 55.03GiB, system 16.00KiB,
metadata 998.83MiB) with 0 errors

Then I can compare this number with "filesystem df" output to answer
the question: was all data successfully scrubbed?

# btrfs 

Re: Recover btrfs volume which can only be mounded in read-only mode

2015-10-14 Thread Dmitry Katsubo
On 14/10/2015 16:40, Anand Jain wrote:
>> # mount -o degraded /var
>> Oct 11 18:20:15 kernel: BTRFS: too many missing devices, writeable
>> mount is not allowed
>>
>> # mount -o degraded,ro /var
>> # btrfs device add /dev/sdd1 /var
>> ERROR: error adding the device '/dev/sdd1' - Read-only file system
>>
>> Now I am stuck: I cannot add device to the volume to satisfy raid
>> pre-requisite.
> 
>  This is a known issue. Would you be able to test below set of patches
>  and update us..
> 
>[PATCH 0/5] Btrfs: Per-chunk degradable check

Many thanks for the reply. Unfortunately I have no environment to
recompile the kernel, and setting it up will perhaps take a day. Can the
latest kernel be pushed to Debian sid?

1. Is there any way to recover btrfs at the moment? Or the easiest I can
do is to mount ro, copy all data to another drive, re-create btrfs
volume and copy back?

2. How to avoid such a trap in the future?

3. How can I know what version of kernel the patch "Per-chunk degradable
check" is targeting?

4. What is the best way to express/vote for new features or suggestions
(wikipage "Project_ideas" / bugzilla)?

Thanks!

-- 
With best regards,
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html