Re: Size reported differently between profiles

2016-07-16 Thread Andrei Borzenkov
17.07.2016 05:09, Sébastien Luttringer пишет:
> Hello,
> 
> «btrfs fi usage» report size differently between 
> single,RAID0,RAID1,RAID5,RAID6
> and RAID10.
> 
> The test is done with 2 files of 1.4GiB each on 4x10GiB devices. I used 
> balance
> to get size between profiles.
> 
> Data,single: Size:4.00GiB, Used:2.85GiB
>/dev/sdb  1.00GiB
>/dev/sdc  1.00GiB
>/dev/sdd  1.00GiB
>/dev/sde  1.00GiB
> 
> Data,RAID0: Size:8.00GiB, Used:2.85GiB
>/dev/sdb  2.00GiB
>/dev/sdc  2.00GiB
>/dev/sdd  2.00GiB
>/dev/sde  2.00GiB
> 
> Data,RAID1: Size:4.00GiB, Used:2.85GiB
>/dev/sdb  2.00GiB
>/dev/sdc  2.00GiB
>/dev/sdd  2.00GiB
>/dev/sde  2.00GiB
> 
> Data,RAID5: Size:6.00GiB, Used:2.85GiB
>/dev/sdb  2.00GiB
>/dev/sdc  2.00GiB
>/dev/sdd  2.00GiB
>/dev/sde  2.00GiB
> 
> Data,RAID6: Size:6.00GiB, Used:2.85GiB
>/dev/sdb  3.00GiB
>/dev/sdc  3.00GiB
>/dev/sdd  3.00GiB
>/dev/sde  3.00GiB
> 
> Data,RAID10: Size:4.00GiB, Used:2.85GiB
>/dev/sdb  1.00GiB
>/dev/sdc  1.00GiB
>/dev/sdd  1.00GiB
>/dev/sde  1.00GiB
> 
> For single,RAID0,RAID10 profiles, the sum of device sizes is equal to
> total size. Like total size is the "byte allocated size" for all
> devices.
> 
> For RAID1,RAID5,RAID6 profiles, sum of devices sizes is more than the
> total size. Looks like the redundancy was subtracted from the total. 
> Like total size is the "profile size allocated".
> 
> So, why RAID1 and RAID10 are reporting their sizes differently? This
> confuse me.
> 

In your example only RAID10 is different. RAID1 shows 4GiB on 8GiB total
disks which matches "redundancy subtracted". But yes, I also wonder why
RAID10 does not respect it.



signature.asc
Description: OpenPGP digital signature


Re: Status of SMR with BTRFS

2016-07-16 Thread Tomasz Kusmierz
Just please don't take it as picking or something:

> It's a Seagate Expansion Desktop 5TB (USB3). It is probably a ST5000DM000.

this is TGMR not SMR disk:
http://www.seagate.com/www-content/product-content/desktop-hdd-fam/en-us/docs/100743772a.pdf
So it still confirms to standard record strategy ...


>> There are two types:
>> 1. SMR managed by device firmware. BTRFS sees that as a normal block
>> device … problems you get are not related to BTRFS it self …
>
> That for sure. But the way BTRFS uses/writes data could cause problems in
> conjunction with these devices still, no?
I'm sorry but I'm confused now, what "magical way of using/writing
data" you actually mean ? AFAIK btrfs sees the disk as a block device
... for example devices has a very varying sector size, which is a 512
bytes + some CRC + maybe ECC ... btrfs does not access this data,
drive does ... to be honest drives tend to lie to you continuously !
They use this ECC to magically bail out of bad sector, give you data
and silently switch to spare sector ...

Now think slowly and thoroughly about it: who would write a code (and
maintain it) for a file system that access device specific data for X
amount of vendors with each having Y amount of model specific
configurations/caveats/firmwares/protocols ... S.M.A.R.T. emerged to
give a unifying interface to device statistics ... this is how bad it
was ...


FYI,
in 2009 I was creating a product with linux that was starting from a
flash based FS ... some people required that data after 20 years would
boot up unchanged ... my answer was: "HOW", yes I could ensure a
certain files integrity in readout by checking md5, but I could not
warrant a whole FS integrity ... specially at the time when j2ffs was
only option on flash memories (yeah it had to be RW as well @#$*@#$)
... so btrfs comes along and takes away most of problems ... if you
care about your data, do some research ... if not ... maybe raiserFS
is for you :)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Size reported differently between profiles

2016-07-16 Thread Sébastien Luttringer
Hello,

«btrfs fi usage» report size differently between single,RAID0,RAID1,RAID5,RAID6
and RAID10.

The test is done with 2 files of 1.4GiB each on 4x10GiB devices. I used balance
to get size between profiles.

Data,single: Size:4.00GiB, Used:2.85GiB
   /dev/sdb    1.00GiB
   /dev/sdc    1.00GiB
   /dev/sdd    1.00GiB
   /dev/sde    1.00GiB

Data,RAID0: Size:8.00GiB, Used:2.85GiB
   /dev/sdb    2.00GiB
   /dev/sdc    2.00GiB
   /dev/sdd    2.00GiB
   /dev/sde    2.00GiB

Data,RAID1: Size:4.00GiB, Used:2.85GiB
   /dev/sdb2.00GiB
   /dev/sdc2.00GiB
   /dev/sdd2.00GiB
   /dev/sde2.00GiB

Data,RAID5: Size:6.00GiB, Used:2.85GiB
   /dev/sdb2.00GiB
   /dev/sdc2.00GiB
   /dev/sdd2.00GiB
   /dev/sde2.00GiB

Data,RAID6: Size:6.00GiB, Used:2.85GiB
   /dev/sdb3.00GiB
   /dev/sdc3.00GiB
   /dev/sdd3.00GiB
   /dev/sde3.00GiB

Data,RAID10: Size:4.00GiB, Used:2.85GiB
   /dev/sdb1.00GiB
   /dev/sdc1.00GiB
   /dev/sdd1.00GiB
   /dev/sde1.00GiB

For single,RAID0,RAID10 profiles, the sum of device sizes is equal to total 
size. Like total size is the "byte allocated size" for all devices.

For RAID1,RAID5,RAID6 profiles, sum of devices sizes is more than the total 
size. Looks like the redundancy was subtracted from the total.
Like total size is the "profile size allocated".

So, why RAID1 and RAID10 are reporting their sizes differently? This confuse me.

Cheers,

-- 
Sébastien "Seblu" Luttringer
https://seblu.net | Twitter: @seblu42
GPG: 0x2072D77A

signature.asc
Description: This is a digitally signed message part


Re: mount btrfs takes 30 minutes, btrfs check runs out of memory

2016-07-16 Thread Qu Wenruo



On 07/15/2016 07:29 PM, Christian Rohmann wrote:

Hey Qu, all

On 07/15/2016 05:56 AM, Qu Wenruo wrote:


The good news is, we have patch to slightly speedup the mount, by
avoiding reading out unrelated tree blocks.

In our test environment, it takes 15% less time to mount a fs filled
with 16K files(2T used space).

https://patchwork.kernel.org/patch/9021421/


I have a 30TB RAID6 filesystem with compression on and I've seen mount
times of up to 20 minutes (!).

I don't want to sound unfair, but 15% improvement is good, but not in
the league where BTRFS needs to be.
Do I understand you comments correctly that further improvement would
result in a change of the on-disk format?


Yes, that's the case.

The problem is, we put BLOCK_GROUP_ITEM into extent tree, along with 
tons of EXTENT_ITEM/METADATA_ITEM.


This makes search for BLOCK_GROUP_ITEM very very very slow if extent 
tree is really big.


On the handle, we search CHUNK_ITEM very very fast, because CHUNK_ITEM 
are in their own tree.

(CHUNK_ITEM and BLOCK_GROUP_ITEM are 1:1 mapped)

So to completely fix it, btrfs needs on-disk format change to put 
BLOCK_GROUP_ITEM into their own tree.


IMHO there maybe be some objection from other devs though.

Anyway, I add the three maintainers to Cc, and hopes we can get a better 
idea to fix it.


Thanks,
Qu




Thanks and with regards

Christian
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5

2016-07-16 Thread Jarkko Lavinen
On 07/12/2016 05:50 PM, Goffredo Baroncelli wrote:
> Using "btrfs insp phy" I developed a script to trigger the bug.

Thank you for the script and all for sharing the raid5 and scrubbing issues. I 
have been using two raid5 arrays and ran scrub occasionally without any 
problems lately and been in false confidence. I converted successfully raid5 
arrays into raid10 without any glitch.

I tried to modify the shell script so that instead of corrupting data with dd, 
a simulated bad block is created with device mapper. Modern disks are likely to 
either return the correct data or an error if they cannot.

The modified script behaves very much like the original dd version. With dd 
version I see wrong data instead of expected data. With simulated bad block I 
see no data at all instead of expected data since dd quits on read error.

Jarkko Lavinen


h.sh
Description: Bourne shell script


Kernel bug detected in journalctl

2016-07-16 Thread Francesco Turco
I discovered error messages in my systemd log after my computer froze a
couple of times.

This is the relevant part from journalctl:

Jul 16 12:24:01 desktop kernel: BTRFS error (device dm-3): err add
delayed dir index item(index: 71778) into the deletion tree of the
delayed node(root id: 5, inode id: 1050941, errno: -17)
Jul 16 12:24:01 desktop kernel: [ cut here ]
Jul 16 12:24:01 desktop kernel: kernel BUG at fs/btrfs/delayed-inode.c:1579!
Jul 16 12:24:01 desktop kernel: invalid opcode:  [#1] PREEMPT SMP
Jul 16 12:24:01 desktop kernel: Modules linked in: fuse mei_wdt coretemp
gpio_ich iTCO_wdt iTCO_vendor_support nouveau ppdev psmouse serio_raw
mxm_wmi wmi video ttm drm_kms_helper evdev drm syscopyarea input_leds
mousedev sysfillrect sysimgblt kvm_intel fb_sys_fops i2c_algo_bit kvm
irqbypass led_class pcspkr mac_hid i2c_i801 acpi_cpufreq
snd_hda_codec_hdmi lpc_ich tpm_tis tpm parport_pc parport e1000e fjes
snd_hda_codec_realtek snd_hda_codec_generic ptp snd_hda_intel
snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer snd soundcore
shpchp pps_core mei_me mei intel_agp intel_gtt processor button
sch_fq_codel ip_tables x_tables xts gf128mul algif_skcipher af_alg
dm_crypt dm_mod hid_generic usbhid hid uas usb_storage crc32c_generic
btrfs xor ata_generic pata_acpi raid6_pq atkbd libps2 uhci_hcd
firewire_ohci firewire_core
Jul 16 12:24:01 desktop kernel:  crc_itu_t pata_marvell ehci_pci
ehci_hcd usbcore usb_common i8042 serio ext4 crc16 jbd2 mbcache sd_mod
ahci libahci libata scsi_mod
Jul 16 12:24:01 desktop kernel: CPU: 1 PID: 12815 Comm: QThread Not
tainted 4.6.3-gnu-1 #1
Jul 16 12:24:01 desktop kernel: Hardware name:  /DQ35JO,
BIOS JOQ3510J.86A.1143.2010.1209.0048 12/09/2010
Jul 16 12:24:01 desktop kernel: task: 88019601 ti:
880069dec000 task.ti: 880069dec000
Jul 16 12:24:01 desktop kernel: RIP: 0010:[]
[] btrfs_delete_delayed_dir_index+0x1f5/0x200 [btrfs]
Jul 16 12:24:01 desktop kernel: RSP: 0018:880069defd38  EFLAGS: 00010246
Jul 16 12:24:01 desktop kernel: RAX:  RBX:
880067139560 RCX: 
Jul 16 12:24:01 desktop kernel: RDX:  RSI:
88022bc8db98 RDI: 88022bc8db98
Jul 16 12:24:01 desktop kernel: RBP: 880069defd88 R08:
0389 R09: 0005
Jul 16 12:24:01 desktop kernel: R10:  R11:
81a746ad R12: 8800671395a8
Jul 16 12:24:01 desktop kernel: R13: 88021dbd4000 R14:
00011862 R15: 8801980e3f80
Jul 16 12:24:01 desktop kernel: FS:  7f71abfff700()
GS:88022bc8() knlGS:
Jul 16 12:24:01 desktop kernel: CS:  0010 DS:  ES:  CR0:
80050033
Jul 16 12:24:01 desktop kernel: CR2: 7f95cc00 CR3:
6a078000 CR4: 000406e0
Jul 16 12:24:01 desktop kernel: Stack:
Jul 16 12:24:01 desktop kernel:  880221a23be0 3dff8800b692da78
60001009 00011862
Jul 16 12:24:01 desktop kernel:  1f383142 88006715ba38
8800b692da78 002c3a16
Jul 16 12:24:01 desktop kernel:  0010093d 880067206070
880069defe10 a02d6f81
Jul 16 12:24:01 desktop kernel: Call Trace:
Jul 16 12:24:01 desktop kernel:  []
__btrfs_unlink_inode+0x1b1/0x470 [btrfs]
Jul 16 12:24:01 desktop kernel:  []
btrfs_unlink_inode+0x1c/0x40 [btrfs]
Jul 16 12:24:01 desktop kernel:  []
btrfs_unlink+0x6b/0xc0 [btrfs]
Jul 16 12:24:01 desktop kernel:  [] vfs_unlink+0x117/0x1a0
Jul 16 12:24:01 desktop kernel:  []
do_unlinkat+0x27c/0x2f0
Jul 16 12:24:01 desktop kernel:  [] SyS_unlink+0x16/0x20
Jul 16 12:24:01 desktop kernel:  []
entry_SYSCALL_64_fastpath+0x1a/0xa4
Jul 16 12:24:01 desktop kernel: Code: ff ff 0f 0b 48 8b 53 10 49 8b bd
f0 01 00 00 41 89 c1 4c 8b 03 48 c7 c6 50 cc 35 a0 48 8b 8a 48 03 00 00
4c 89 f2 e8 1b 56 f7 ff <0f> 0b e8 14 e7 d4 e0 0f 1f 40 00 66 66 66 66
90 55 48 89 e5 53
Jul 16 12:24:01 desktop kernel: RIP  []
btrfs_delete_delayed_dir_index+0x1f5/0x200 [btrfs]
Jul 16 12:24:01 desktop kernel:  RSP 
Jul 16 12:24:01 desktop kernel: ---[ end trace 86b4a59cf9b82060 ]---

I checked my btrfs partitions with "btrfs check" and fortunately they
seem OK.

Device dm-3 is my home partition.

# btrfs filesystem show /home
Label: none  uuid: f5392b4f-5d1c-4c7d-89e6-0324c2860a73
Total devices 1 FS bytes used 219.82GiB
devid1 size 400.00GiB used 228.06GiB path /dev/mapper/Desktop-home

I'm using linux 4.6.3 and btrfs-progs 4.6.1 on a Parabola
GNU/Linux-libre system. Please notice that I formatted my btrfs
partitions with a live usb that may have slightly different (older)
package versions.

Please tell me if you need other details.

-- 
Website: http://www.fturco.net/
GPG key: 6712 2364 B2FE 30E1 4791 EB82 7BB1 1F53 29DE CD34
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Status of SMR with BTRFS

2016-07-16 Thread Hendrik Friedel

Hello Tomasz,

thanks for your reply.

What disk are you using ?


It's a Seagate Expansion Desktop 5TB (USB3). It is probably a ST5000DM000.


There are two types:
1. SMR managed by device firmware. BTRFS sees that as a normal block device … 
problems you get are not related to BTRFS it self …
That for sure. But the way BTRFS uses/writes data could cause problems 
in conjunction with these devices still, no?



2. SMR managed by host system, BTRFS still does see this as a block device … 
just emulated by host system to look normal.

I am not sure, what I am using. How can I find out?


In case of funky technologies like that I would research how exactly data is stored 
in terms of “BAND” and experiment with setting leaf & sector size to match a 
band,

Sorry, but I have no idea where to start.

It seems to me, although the drive being a pure consumer drive, it is a 
'pro' feature and I should avoid it with BTRFS. I am just surprised, 
there is no hint in the wiki with that regards.


Greetings,
Hendrik



> On 15 Jul 2016, at 19:29, Hendrik Friedel  wrote:
>
> Hello,
>
> I have a 5TB Seagate drive that uses SMR.
>
> I was wondering, if BTRFS is usable with this Harddrive technology. So, first 
I searched the BTRFS wiki -nothing. Then google.
>
> * I found this: https://bbs.archlinux.org/viewtopic.php?id=203696
> But this turned out to be an issue not related to BTRFS.
>
> * Then this: http://www.snia.org/sites/default/files/SDC15_presentations/smr/ 
HannesReinecke_Strategies_for_running_unmodified_FS_SMR.pdf
>  " BTRFS operation matches SMR parameters very closely [...]
>
> High number of misaligned write accesses ; points to an issue with btrfs 
itself
>
>
> * Then this: 
http://superuser.com/questions/962257/fastest-linux-filesystem-on-shingled-disks
> The BTRFS performance seemed good.
>
>
> * Finally this: http://www.spinics.net/lists/linux-btrfs/msg48072.html
> "So you can get mixed results when trying to use the SMR devices but I'd say 
it will mostly not work.
> But, btrfs has all the fundamental features in place, we'd have to make
> adjustments to follow the SMR constraints:"
> [...]
> I have some notes at
> https://github.com/kdave/drafts/blob/master/btrfs/smr-mode.txt;
>
>
> So, now I am wondering, what the state is today. "We" (I am happy to do that; 
but not sure of access rights) should also summarize this in the wiki.
> My use-case by the way are back-ups. I am thinking of using some of the 
interesting BTRFS features for this (send/receive, deduplication)
>
> Greetings,
> Hendrik
>
>
> ---
> Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
> https://www.avast.com/antivirus
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html





---
Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
https://www.avast.com/antivirus

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs uuid snapshots: orphaned parent_uuid after deleting intermediate subvol

2016-07-16 Thread Hugo Mills
On Sat, Jul 16, 2016 at 12:18:18PM +0200, Kai Krakow wrote:
> Am Fri, 15 Jul 2016 16:12:51 -0700 (PDT)
> schrieb Eric Wheeler :
> 
> > Hello all,
> > 
> > If I create three subvolumes like so:
> > 
> > # btrfs subvolume create a
> > # btrfs subvolume snapshot a b
> > # btrfs subvolume snapshot b c
> > 
> > I get a parent-child relationship which can be determined like so:
> > 
> > # btrfs subvolume list -uq /home/ |grep [abc]$
> > parent_uuid - uuid 0e5f473a-d9e5-144a-8f49-1899af7320ad path a
> > parent_uuid 0e5f473a-d9e5-144a-8f49-1899af7320ad uuid
> > cb4768eb-98e3-5e4c-935d-14f1b97b0de2 path b parent_uuid
> > cb4768eb-98e3-5e4c-935d-14f1b97b0de2 uuid
> > 5ee8de35-2bab-d642-b5c2-f619e46f65c2 path c
> > 
> > Now if I delete 'b', the parent_uuid of 'c' doesn't change to point
> > at 'a':

   Correct -- the parent is the subvol that the snapshot was made
from. After you make a snapshot, the snapshot and its original
subvolume are entirely equal partners, so there's no parent-child
relationship between them for purposes of data storage or anything
like that. The parent_uuid field is only used by send -p to ensure
correctness, and for nothing else.

> > # btrfs subvolume delete b
> > # btrfs subvolume list -uq /home/ |grep [abc]$
> > parent_uuid - uuid 0e5f473a-d9e5-144a-8f49-1899af7320ad path a
> > parent_uuid cb4768eb-98e3-5e4c-935d-14f1b97b0de2 uuid
> > 5ee8de35-2bab-d642-b5c2-f619e46f65c2 path c
> 
> It cannot do that because b may have diverged from a.
> 
> > Notice that 'c' still points at b's UUID, but 'b' is missing and the 
> > parent_uuid for 'c' wasn't set to '-' as if it were a root node (like
> > 'a').
> > 
> > Is this an inconsistency?  Child parent_uuid's it be updated on
> > delete?

   It's not inconsistent. I think it just doesn't mean what you
thought it did. :)

> I think this is by design. This "missing" UUID is now no longer a file
> system tree, it's just referencing blocks different between a and c at
> the time of deleting b.

> > It would be nice to know that 'c' is actually a descendent of 'a',
> > even after having deleted 'b'.  Is a way to look that up somehow?
> 
> Actually it would also be interesting what happens with blocks in
> deleted b after the blocks in c are unshared? Are they garbage
> collected or do we have some orphan subvolume lying around which you
> cannot get rid of?

   They're GCed. Extents are simply reference counted -- it doesn't
matter how those references got there, or in what order. When the
reference count drops to zero, the extent is cleaned up.

   Hugo.

-- 
Hugo Mills | Great oxymorons of the world, no. 9:
hugo@... carfax.org.uk | Standard Deviation
http://carfax.org.uk/  |
PGP: E2AB1DE4  |


signature.asc
Description: Digital signature


Re: Btrfs uuid snapshots: orphaned parent_uuid after deleting intermediate subvol

2016-07-16 Thread Kai Krakow
Am Fri, 15 Jul 2016 16:12:51 -0700 (PDT)
schrieb Eric Wheeler :

> Hello all,
> 
> If I create three subvolumes like so:
> 
> # btrfs subvolume create a
> # btrfs subvolume snapshot a b
> # btrfs subvolume snapshot b c
> 
> I get a parent-child relationship which can be determined like so:
> 
> # btrfs subvolume list -uq /home/ |grep [abc]$
> parent_uuid - uuid 0e5f473a-d9e5-144a-8f49-1899af7320ad path a
> parent_uuid 0e5f473a-d9e5-144a-8f49-1899af7320ad uuid
> cb4768eb-98e3-5e4c-935d-14f1b97b0de2 path b parent_uuid
> cb4768eb-98e3-5e4c-935d-14f1b97b0de2 uuid
> 5ee8de35-2bab-d642-b5c2-f619e46f65c2 path c
> 
> Now if I delete 'b', the parent_uuid of 'c' doesn't change to point
> at 'a':
> 
> # btrfs subvolume delete b
> # btrfs subvolume list -uq /home/ |grep [abc]$
> parent_uuid - uuid 0e5f473a-d9e5-144a-8f49-1899af7320ad path a
> parent_uuid cb4768eb-98e3-5e4c-935d-14f1b97b0de2 uuid
> 5ee8de35-2bab-d642-b5c2-f619e46f65c2 path c

It cannot do that because b may have diverged from a.

> Notice that 'c' still points at b's UUID, but 'b' is missing and the 
> parent_uuid for 'c' wasn't set to '-' as if it were a root node (like
> 'a').
> 
> Is this an inconsistency?  Child parent_uuid's it be updated on
> delete?

I think this is by design. This "missing" UUID is now no longer a file
system tree, it's just referencing blocks different between a and c at
the time of deleting b.

> It would be nice to know that 'c' is actually a descendent of 'a',
> even after having deleted 'b'.  Is a way to look that up somehow?

Actually it would also be interesting what happens with blocks in
deleted b after the blocks in c are unshared? Are they garbage
collected or do we have some orphan subvolume lying around which you
cannot get rid of?


-- 
Regards,
Kai

Replies to list-only preferred.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html