Re: compression disk space saving - what are your results?

2015-12-02 Thread Austin S Hemmelgarn

On 2015-12-02 08:53, Tomasz Chmielewski wrote:

On 2015-12-02 22:03, Austin S Hemmelgarn wrote:


 From these numbers (124 GB used where data size is 153 GB), it appears
that we save around 20% with zlib compression enabled.
Is 20% reasonable saving for zlib? Typically text compresses much better
with that algorithm, although I understand that we have several
limitations when applying that on a filesystem level.


This is actually an excellent question.  A couple of things to note
before I share what I've seen:
1. Text compresses better with any compression algorithm.  It is by
nature highly patterned and moderately redundant data, which is what
benefits the most from compression.


It looks that compress=zlib does not compress very well. Following
Duncan's suggestion, I've changed it to compress-force=zlib, and
re-copied the data to make sure the file are compressed.
For future reference, if you run 'btrfs filesystem defrag -r -czlib' on 
the top level directory, you can achieve the same effect without having 
to deal with the copy overhead.  This has a side effect of breaking 
reflinks, but copying the files off and back onto the filesystem does so 
also, and even then, I doubt that you're using reflinks.  There probably 
wouldn't be much difference in the time it takes, but at least you 
wouldn't be hitting another disk in the process.


Compression ratio is much much better now (on a slightly changed data set):

# df -h
/dev/xvdb   200G   24G  176G  12% /var/log/remote


# du -sh /var/log/remote/
138G/var/log/remote/


So, 138 GB files use just 24 GB on disk - nice!

However, I would still expect that compress=zlib has almost the same
effect as compress-force=zlib, for 100% text files/logs.

That's better than 80% space savings (it works out to about 83.6%), so I 
doubt that you'd manage to get anything better than that even with only 
plain text files.  It's interesting that there's such a big discrepancy 
though, that indicates that BTRFS really needs some work WRT deciding 
what to compress.





smime.p7s
Description: S/MIME Cryptographic Signature


Re: compression disk space saving - what are your results?

2015-12-02 Thread Austin S Hemmelgarn

On 2015-12-02 09:03, Imran Geriskovan wrote:

What are your disk space savings when using btrfs with compression?



* There's the compress vs. compress-force option and discussion.  A
number of posters have reported that for mostly text, compress didn't
give them expected compression results and they needed to use compress-
force.


"compress-force" option compresses regardless of the "compressibility"
of the file.

"compress" option makes some inference about the "compressibility"
and decides to compress or not.

I wonder how that inference is done?
Can anyone provide some pseudo code for it?
I'm not certain how BTRFS does it, but my guess would be trying to 
compress the block, then storing the uncompressed version if the 
compressed one is bigger.


The program lrzip has an option to do per-block compression checks kind 
of like this, but it's method is to try LZO compression on the block 
(which is fast), and only use the selected compression method (bzip2 by 
default I think, but it can also do zlib and xz) if the LZO compression 
ratio is is good enough.  If we went with a similar method, I'd say we 
should integrate LZ4 support first, and use that for the test.  I think 
NTFS compression on Windows might do something similar, but they use an 
old LZ77 derivative for their compression (I think it's referred to as 
LZNT1, and it's designed for speed, and usually doesn't get much better 
than a 30% compression ratio).


On a side note, I really wish BTRFS would just add LZ4 support.  It's a 
lot more deterministic WRT decompression time than LZO, gets a similar 
compression ratio, and runs faster on most processors for both 
compression and decompression.




smime.p7s
Description: S/MIME Cryptographic Signature


Re: utils version and convert crash

2015-12-02 Thread Austin S Hemmelgarn

On 2015-12-02 08:45, Duncan wrote:

Austin S Hemmelgarn posted on Wed, 02 Dec 2015 07:25:13 -0500 as
excerpted:


On 2015-12-02 05:01, Duncan wrote:


[on unverified errors returned by scrub]


Unverified errors are, I believe[1], errors where a metadata block
holding checksums itself has an error, so the blocks its checksums in
turn covered are not checksum-verified.

What that means in practice is that once the first metadata block error
has been corrected in a first scrub run, a second scrub run can now
check the blocks that were recorded as unverified errors in the first
run, potentially finding and hopefully fixing additional errors[.]



---
[1] I'm not a dev and am not absolutely sure of the technical accuracy
of this description, but from an admin's viewpoint it seems to be
correct at least in practice, based on the fact that further scrubs as
long as there were unverified errors often did find additional errors,
while once the unverified count dropped to zero and the last read
errors were corrected, further scrubs turned up no further errors.


AFAICT from reading the code, that is a correct assessment.  It would be
kind of nice though if there was some way to tell scrub to recheck up to
X many times if there are unverified errors...


Yes.  For me as explained it wasn't that big a deal as another scrub was
another minute or less, but definitely on terabyte-scale filesystems on
spinning rust, where scrubs take hours, having scrub be able to
automatically track just the corrected errors along with their
unverifieds, and rescan just those, should only take a matter of a few
minutes more, while a full rescan of /everything/ would take the same
number of hours yet again... and again if there's a third scan required,
etc.

I'd say just make it automatic on corrected metadata errors as I can't
think of a reason people wouldn't want it, given the time it would save
over rerunning a full scrub over and over again, but making it an option
would be fine with me too.

I was thinking an option to do a full re-scrub, but having an automatic 
reparse of the metadata in a fixed metadata block would be a lot more 
efficient that what I was thinking :)




smime.p7s
Description: S/MIME Cryptographic Signature


Re: vfs: move btrfs clone ioctls to common code

2015-12-02 Thread Steve French
On Wed, Dec 2, 2015 at 1:27 AM, Christoph Hellwig  wrote:
> Hi Steve,
>
> we have two APIs in Linux:
>
>  - the copy_file_range syscall which just is a "do a copy by any means"
>  - the btrfs clone ioctls which have stricter semantics that very much
>expect a reflink-like operation

If the copy_file_range is allowed to use any offload mechanism then
cifs.ko could be changed as follows, to fallback among the three
possible mechanisms depending on what the target supports.

- send the fastest one of the three choices,
 the "reflink-like") FSCTL_DUPLICATE_EXTENTS (there is a
server fs capability that we check at mount time that indicates whether
it is supported).   If it is not supported or if the source and target are on
different shares (exports) then fallback to
- send the ODX style copy offload (when implemented).  This is the only
one that could in theory support cross-server copies (rather than require copy
from a source and target on the same server)
- (if the above aren't supported) send the FSCTL_COPYCHUNK (currently
called via CIFS_IOC_COPYCHUNK_FILE)

For the btrfs_ioc_clone_range (or similar ", FSCTL_DUPLICATE_EXTENTS could
probably stay the same since it is the only one of the three that
guarantees using reflinks.

If we want to for Linux->Samba, we could probably add a whole file
clone (similar to hardlinks
on the wire) to Samba and cifs.ko if that is useful (as opposed to the
three mechanisms
above which are copy ranges)

In addition, I noticed that the cp command has added various
optimizations for sparse
file enablement.  I need to test those on cifs.ko and update the
ioctls for retrieving sparse
ranges o make sure that they work over SMB3 mounts, for optimizing
the case where the source file is sparse, and mostly empty.

> I plan to also wire up copy_file_range to try the clone_file_range method
> first if available to make life easier for file systems, but as there isn't
> any test coverage for that I don't dare to actually submit it yet.  I'll
> send a compile tested only RFC for it when resending this series.



-- 
Thanks,

Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug/regression: Read-only mount not read-only

2015-12-02 Thread Eric Sandeen
On 12/2/15 3:23 AM, Qu Wenruo wrote:
> 
> 
> Qu Wenruo wrote on 2015/12/02 17:06 +0800:
>>
>>
>> Russell Coker wrote on 2015/12/02 17:25 +1100:
>>> On Wed, 2 Dec 2015 06:05:09 AM Eric Sandeen wrote:
 yes, xfs does; we have "-o norecovery" if you don't want that, or need
 to mount a filesystem with a dirty log on a readonly device.
>>>
>>> That option also works with Ext3/4 so it seems to be a standard way of
>>> dealing
>>> with this.  I think that BTRFS should do what Ext3/4 and XFS do in this
>>> regard.
>>>
>> BTW, does -o norecovery implies -o ro?
>>
>> If not, how does it keep the filesystem consistent?
>>
>> I'd like to follow that ext2/xfs behavior, but I'm not familiar with
>> those filesystems.
>>
>> Thanks,
>> Qu
>>
> 
> OK, norecovery implies ro.

For XFS, it doesn't imply it, it requires it; i.e. both must be stated 
explicitly:

/*
 * no recovery flag requires a read-only mount
 */
if ((mp->m_flags & XFS_MOUNT_NORECOVERY) &&
!(mp->m_flags & XFS_MOUNT_RDONLY)) {
xfs_warn(mp, "no-recovery mounts must be read-only.");
return -EINVAL;
}

ext4 is the same, I believe:

} else if (test_opt(sb, NOLOAD) && !(sb->s_flags & MS_RDONLY) &&
   ext4_has_feature_journal_needs_recovery(sb)) {
ext4_msg(sb, KERN_ERR, "required journal recovery "
   "suppressed and not mounted read-only");
goto failed_mount_wq;

so if you'd like btrfs to be consistent with these, I would not make
norecovery imply ro; rather, make I would make it require an explicit ro, i.e.

mount -o ro,norecovery

-Eric

> So I think it's possible to do the same thing for btrfs.
> I'll try to do it soon.
> 
> Thanks,
> Qu
> 
>>
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at fs/btrfs/extent-tree.c:1833! [btrfs]

2015-12-02 Thread Filipe Manana
On Wed, Dec 2, 2015 at 9:18 AM, Михаил Гаврилов
 wrote:
> kernel BUG at fs/btrfs/extent-tree.c:1833!
> invalid opcode:  [#1] SMP
> Modules linked in: xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4
> tun nls_utf8 isofs rfcomm fuse nf_conntrack_netbios_ns
> nf_conntrack_broadcast ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
> xt_conntrack ebtable_nat ebtable_broute bridge ebtable_filter ebtables
> ip6table_mangle ip6table_raw ip6table_nat nf_conntrack_ipv6
> nf_defrag_ipv6 nf_nat_ipv6 ip6table_security ip6table_filter
> ip6_tables iptable_mangle iptable_raw iptable_nat nf_conntrack_ipv4
> nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_security bnep
> snd_usb_audio snd_usbmidi_lib snd_rawmidi hid_logitech_hidpp btusb
> btrtl btbcm btintel bluetooth gspca_zc3xx gspca_main videodev uas
> media joydev usb_storage hid_logitech_dj rfkill btrfs xor intel_rapl
> iosf_mbi x86_pkg_temp_thermal snd_hda_codec_hdmi snd_hda_codec_realtek
> coretemp
>  snd_hda_codec_generic snd_hda_codec_ca0132 iTCO_wdt
> iTCO_vendor_support kvm_intel vfat ppdev fat kvm snd_hda_intel
> snd_hda_codec snd_hda_core snd_hwdep snd_seq crct10dif_pclmul
> snd_seq_device crc32_pclmul crc32c_intel snd_pcm raid6_pq snd_timer
> snd mei_me mei soundcore shpchp i2c_i801 lpc_ich parport_pc
> tpm_infineon parport tpm_tis tpm nfsd auth_rpcgss nfs_acl lockd grace
> sunrpc binfmt_misc i915 i2c_algo_bit drm_kms_helper drm 8021q garp stp
> llc mrp serio_raw r8169 mii video
> CPU: 7 PID: 10775 Comm: kworker/u16:15 Not tainted 4.2.6-301.fc23.x86_64 #1
> Hardware name: Gigabyte Technology Co., Ltd. Z87M-D3H/Z87M-D3H, BIOS
> F11 08/12/2014
> Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs]
> task: 8805c042bb00 ti: 88074d748000 task.ti: 88074d748000
> RIP: 0010:[]  []
> insert_inline_extent_backref+0xe7/0xf0 [btrfs]
> RSP: 0018:88074d74baa8  EFLAGS: 00010293
> RAX:  RBX:  RCX: 
> RDX: 8800 RSI: 0001 RDI: 
> RBP: 88074d74bb28 R08: 4000 R09: 88074d74b9a0
> R10:  R11: 0003 R12: 8807ec932800
> R13: 88048185d090 R14:  R15: 
> FS:  () GS:88081e3c() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 13dd12294020 CR3: 01c0b000 CR4: 001406e0
> Stack:
>   0005  
>  0001 81200206 88074d74bb08 a05dcd4a
>  33e8 4e7e87f3 8807ea905800 
> Call Trace:
>  [] ? kmem_cache_alloc+0x1d6/0x210
>  [] ? btrfs_alloc_path+0x1a/0x20 [btrfs]
>  [] __btrfs_inc_extent_ref.isra.52+0xa9/0x270 [btrfs]
>  [] __btrfs_run_delayed_refs+0xc84/0x1080 [btrfs]
>  [] ? mempool_free_slab+0x17/0x20
>  [] ? kmem_cache_alloc+0x193/0x210
>  [] btrfs_run_delayed_refs.part.73+0x74/0x270 [btrfs]
>  [] delayed_ref_async_start+0x7e/0x90 [btrfs]
>  [] btrfs_scrubparity_helper+0xc2/0x260 [btrfs]
>  [] btrfs_extent_refs_helper+0xe/0x10 [btrfs]
>  [] process_one_work+0x19e/0x3f0
>  [] worker_thread+0x4e/0x450
>  [] ? process_one_work+0x3f0/0x3f0
>  [] kthread+0xd8/0xf0
>  [] ? kthread_worker_fn+0x160/0x160
>  [] ret_from_fork+0x3f/0x70
>  [] ? kthread_worker_fn+0x160/0x160
> Code: 10 49 89 d9 48 8b 55 c0 4c 89 7c 24 10 4c 89 f1 4c 89 ee 4c 89
> e7 89 44 24 08 48 8b 45 20 48 89 04 24 e8 5d d5 ff ff 31 c0 eb ac <0f>
> 0b e8 92 47 ab e0 66 90 0f 1f 44 00 00 55 48 89 e5 41 57 41
> RIP  [] insert_inline_extent_backref+0xe7/0xf0 [btrfs]
>  RSP 

We got this fixed in 4.4-rc1:

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=2c3cf7d5f6105bb957df125dfce61d4483b8742d
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=b06c4bf5c874a57254b197f53ddf588e7a24a2bf

Patches were tagged for stable releases but weren't yet backported to
4.2 and 4.3 (the issue affects all 4.2 and 4.3 releases at the
moment).

cheers

>
>
>
> Report from downstream: https://bugzilla.redhat.com/show_bug.cgi?id=1287508
>
>
> --
> Best Regards,
> Mike Gavrilov.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug/regression: Read-only mount not read-only

2015-12-02 Thread Austin S Hemmelgarn

On 2015-12-02 11:54, Eric Sandeen wrote:

On 12/2/15 3:23 AM, Qu Wenruo wrote:



Qu Wenruo wrote on 2015/12/02 17:06 +0800:



Russell Coker wrote on 2015/12/02 17:25 +1100:

On Wed, 2 Dec 2015 06:05:09 AM Eric Sandeen wrote:

yes, xfs does; we have "-o norecovery" if you don't want that, or need
to mount a filesystem with a dirty log on a readonly device.


That option also works with Ext3/4 so it seems to be a standard way of
dealing
with this.  I think that BTRFS should do what Ext3/4 and XFS do in this
regard.


BTW, does -o norecovery implies -o ro?

If not, how does it keep the filesystem consistent?

I'd like to follow that ext2/xfs behavior, but I'm not familiar with
those filesystems.

Thanks,
Qu



OK, norecovery implies ro.


For XFS, it doesn't imply it, it requires it; i.e. both must be stated 
explicitly:

 /*
  * no recovery flag requires a read-only mount
  */
 if ((mp->m_flags & XFS_MOUNT_NORECOVERY) &&
 !(mp->m_flags & XFS_MOUNT_RDONLY)) {
 xfs_warn(mp, "no-recovery mounts must be read-only.");
 return -EINVAL;
 }

ext4 is the same, I believe:

 } else if (test_opt(sb, NOLOAD) && !(sb->s_flags & MS_RDONLY) &&
ext4_has_feature_journal_needs_recovery(sb)) {
 ext4_msg(sb, KERN_ERR, "required journal recovery "
"suppressed and not mounted read-only");
 goto failed_mount_wq;

so if you'd like btrfs to be consistent with these, I would not make
norecovery imply ro; rather, make I would make it require an explicit ro, i.e.

mount -o ro,norecovery
Agreed, with something like that, it should as blatantly obvious as 
possible that you can't write to the FS.


On a side note, do either XFS or ext4 support removing the norecovery 
option from the mount flags through mount -o remount?  Even if they 
don't, that might be a nice feature to have in BTRFS if we can safely 
support it.





smime.p7s
Description: S/MIME Cryptographic Signature


Re: [RFC] Btrfs device and pool management (wip)

2015-12-02 Thread Goffredo Baroncelli
On 2015-12-02 00:43, Qu Wenruo wrote:
[...]
> 
> And block layer provides its own listen interface, reporting errors
> like ATA error.
Could you point me to this kind of interface 



-- 
gpg @keyserver.linux.it: Goffredo Baroncelli 
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Subvolume UUID

2015-12-02 Thread S.J.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug/regression: Read-only mount not read-only

2015-12-02 Thread Hugo Mills
On Wed, Dec 02, 2015 at 12:48:39PM -0500, Austin S Hemmelgarn wrote:
> On 2015-12-02 11:54, Eric Sandeen wrote:
> >On 12/2/15 3:23 AM, Qu Wenruo wrote:
> >>Qu Wenruo wrote on 2015/12/02 17:06 +0800:
> >>>Russell Coker wrote on 2015/12/02 17:25 +1100:
> On Wed, 2 Dec 2015 06:05:09 AM Eric Sandeen wrote:
> >yes, xfs does; we have "-o norecovery" if you don't want that, or need
> >to mount a filesystem with a dirty log on a readonly device.
> 
> That option also works with Ext3/4 so it seems to be a standard way of
> dealing
> with this.  I think that BTRFS should do what Ext3/4 and XFS do in this
> regard.
[snip]
> >so if you'd like btrfs to be consistent with these, I would not make
> >norecovery imply ro; rather, make I would make it require an explicit ro, 
> >i.e.
> >
> >mount -o ro,norecovery
> Agreed, with something like that, it should as blatantly obvious as
> possible that you can't write to the FS.
> 
> On a side note, do either XFS or ext4 support removing the
> norecovery option from the mount flags through mount -o remount?
> Even if they don't, that might be a nice feature to have in BTRFS if
> we can safely support it.

   One minor awkwardness with "norecovery", I've just realised: we
already have a "recovery" mount option. That's going to make things
really confusing if we stick to that name.

   Hugo.

-- 
Hugo Mills | Reintarnation: Coming back from the dead as a
hugo@... carfax.org.uk | hillbilly
http://carfax.org.uk/  |
PGP: E2AB1DE4  |


signature.asc
Description: Digital signature


Re: Bug/regression: Read-only mount not read-only

2015-12-02 Thread Qu Wenruo



Russell Coker wrote on 2015/12/02 17:25 +1100:

On Wed, 2 Dec 2015 06:05:09 AM Eric Sandeen wrote:

yes, xfs does; we have "-o norecovery" if you don't want that, or need
to mount a filesystem with a dirty log on a readonly device.


That option also works with Ext3/4 so it seems to be a standard way of dealing
with this.  I think that BTRFS should do what Ext3/4 and XFS do in this
regard.


BTW, does -o norecovery implies -o ro?

If not, how does it keep the filesystem consistent?

I'd like to follow that ext2/xfs behavior, but I'm not familiar with 
those filesystems.


Thanks,
Qu


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


kernel BUG at fs/btrfs/extent-tree.c:1833! [btrfs]

2015-12-02 Thread Михаил Гаврилов
kernel BUG at fs/btrfs/extent-tree.c:1833!
invalid opcode:  [#1] SMP
Modules linked in: xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4
tun nls_utf8 isofs rfcomm fuse nf_conntrack_netbios_ns
nf_conntrack_broadcast ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
xt_conntrack ebtable_nat ebtable_broute bridge ebtable_filter ebtables
ip6table_mangle ip6table_raw ip6table_nat nf_conntrack_ipv6
nf_defrag_ipv6 nf_nat_ipv6 ip6table_security ip6table_filter
ip6_tables iptable_mangle iptable_raw iptable_nat nf_conntrack_ipv4
nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_security bnep
snd_usb_audio snd_usbmidi_lib snd_rawmidi hid_logitech_hidpp btusb
btrtl btbcm btintel bluetooth gspca_zc3xx gspca_main videodev uas
media joydev usb_storage hid_logitech_dj rfkill btrfs xor intel_rapl
iosf_mbi x86_pkg_temp_thermal snd_hda_codec_hdmi snd_hda_codec_realtek
coretemp
 snd_hda_codec_generic snd_hda_codec_ca0132 iTCO_wdt
iTCO_vendor_support kvm_intel vfat ppdev fat kvm snd_hda_intel
snd_hda_codec snd_hda_core snd_hwdep snd_seq crct10dif_pclmul
snd_seq_device crc32_pclmul crc32c_intel snd_pcm raid6_pq snd_timer
snd mei_me mei soundcore shpchp i2c_i801 lpc_ich parport_pc
tpm_infineon parport tpm_tis tpm nfsd auth_rpcgss nfs_acl lockd grace
sunrpc binfmt_misc i915 i2c_algo_bit drm_kms_helper drm 8021q garp stp
llc mrp serio_raw r8169 mii video
CPU: 7 PID: 10775 Comm: kworker/u16:15 Not tainted 4.2.6-301.fc23.x86_64 #1
Hardware name: Gigabyte Technology Co., Ltd. Z87M-D3H/Z87M-D3H, BIOS
F11 08/12/2014
Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs]
task: 8805c042bb00 ti: 88074d748000 task.ti: 88074d748000
RIP: 0010:[]  []
insert_inline_extent_backref+0xe7/0xf0 [btrfs]
RSP: 0018:88074d74baa8  EFLAGS: 00010293
RAX:  RBX:  RCX: 
RDX: 8800 RSI: 0001 RDI: 
RBP: 88074d74bb28 R08: 4000 R09: 88074d74b9a0
R10:  R11: 0003 R12: 8807ec932800
R13: 88048185d090 R14:  R15: 
FS:  () GS:88081e3c() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 13dd12294020 CR3: 01c0b000 CR4: 001406e0
Stack:
  0005  
 0001 81200206 88074d74bb08 a05dcd4a
 33e8 4e7e87f3 8807ea905800 
Call Trace:
 [] ? kmem_cache_alloc+0x1d6/0x210
 [] ? btrfs_alloc_path+0x1a/0x20 [btrfs]
 [] __btrfs_inc_extent_ref.isra.52+0xa9/0x270 [btrfs]
 [] __btrfs_run_delayed_refs+0xc84/0x1080 [btrfs]
 [] ? mempool_free_slab+0x17/0x20
 [] ? kmem_cache_alloc+0x193/0x210
 [] btrfs_run_delayed_refs.part.73+0x74/0x270 [btrfs]
 [] delayed_ref_async_start+0x7e/0x90 [btrfs]
 [] btrfs_scrubparity_helper+0xc2/0x260 [btrfs]
 [] btrfs_extent_refs_helper+0xe/0x10 [btrfs]
 [] process_one_work+0x19e/0x3f0
 [] worker_thread+0x4e/0x450
 [] ? process_one_work+0x3f0/0x3f0
 [] kthread+0xd8/0xf0
 [] ? kthread_worker_fn+0x160/0x160
 [] ret_from_fork+0x3f/0x70
 [] ? kthread_worker_fn+0x160/0x160
Code: 10 49 89 d9 48 8b 55 c0 4c 89 7c 24 10 4c 89 f1 4c 89 ee 4c 89
e7 89 44 24 08 48 8b 45 20 48 89 04 24 e8 5d d5 ff ff 31 c0 eb ac <0f>
0b e8 92 47 ab e0 66 90 0f 1f 44 00 00 55 48 89 e5 41 57 41
RIP  [] insert_inline_extent_backref+0xe7/0xf0 [btrfs]
 RSP 



Report from downstream: https://bugzilla.redhat.com/show_bug.cgi?id=1287508


--
Best Regards,
Mike Gavrilov.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug/regression: Read-only mount not read-only

2015-12-02 Thread Qu Wenruo



Qu Wenruo wrote on 2015/12/02 17:06 +0800:



Russell Coker wrote on 2015/12/02 17:25 +1100:

On Wed, 2 Dec 2015 06:05:09 AM Eric Sandeen wrote:

yes, xfs does; we have "-o norecovery" if you don't want that, or need
to mount a filesystem with a dirty log on a readonly device.


That option also works with Ext3/4 so it seems to be a standard way of
dealing
with this.  I think that BTRFS should do what Ext3/4 and XFS do in this
regard.


BTW, does -o norecovery implies -o ro?

If not, how does it keep the filesystem consistent?

I'd like to follow that ext2/xfs behavior, but I'm not familiar with
those filesystems.

Thanks,
Qu



OK, norecovery implies ro.

So I think it's possible to do the same thing for btrfs.
I'll try to do it soon.

Thanks,
Qu



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Bug Report: btrfs hangs / freezes on 4.3

2015-12-02 Thread Martin Tippmann
Hi,

just saw this in the logs on a few machines, Kernel 4.3.0, Mount options:

/dev/sda /media/storage1 btrfs
rw,noatime,compress=lzo,space_cache,subvolid=5,subvol=/ 0 0

[414675.258270] INFO: task java:19267 blocked for more than 120 seconds.
[414675.258312]   Not tainted 4.3.0-040300-generic #201511020846
[414675.258353] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[414675.258429] javaD 88081f996980 0 19267  18965 0x
[414675.258436]  880337d67bb8 0082 88081c2f3800
880019be9c00
[414675.258438]  880337d68000 88101b7627bc 880019be9c00

[414675.258440]  88101b7627c0 880337d67bd0 817af3f3
88101b7627b8
[414675.258442] Call Trace:
[414675.258445]  [] schedule+0x33/0x80
[414675.258450]  [] schedule_preempt_disabled+0xe/0x10
[414675.258454]  [] __mutex_lock_slowpath+0x95/0x110
[414675.258458]  [] mutex_lock+0x1f/0x30
[414675.258470]  []
btrfs_wait_ordered_roots+0x42/0x1d0 [btrfs]
[414675.258480]  []
btrfs_check_data_free_space+0x1ff/0x2f0 [btrfs]
[414675.258491]  [] __btrfs_buffered_write+0x13f/0x5a0 [btrfs]
[414675.258495]  [] ? tcp_recvmsg+0x3d1/0xb70
[414675.258505]  [] btrfs_file_write_iter+0x171/0x560 [btrfs]
[414675.258510]  [] ? set_next_entity+0x9c/0xb0
[414675.258515]  [] __vfs_write+0xa7/0xf0
[414675.258520]  [] vfs_write+0xa6/0x1a0
[414675.258524]  [] SyS_write+0x46/0xa0
[414675.258528]  [] entry_SYSCALL_64_fastpath+0x16/0x75
[414675.258533] INFO: task java:19268 blocked for more than 120 seconds.
[414675.258576]   Not tainted 4.3.0-040300-generic #201511020846
[414675.258616] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[414675.258684] javaD 88081f996980 0 19268  18965 0x
[414675.258692]  880634f23bb8 0082 88081c2f3800
880019beaa00
[414675.258704]  880634f24000 88101b7627bc 880019beaa00

[414675.258715]  88101b7627c0 880634f23bd0 817af3f3
88101b7627b8
[414675.258727] Call Trace:
[414675.258732]  [] schedule+0x33/0x80
[414675.258736]  [] schedule_preempt_disabled+0xe/0x10
[414675.258741]  [] __mutex_lock_slowpath+0x95/0x110
[414675.258745]  [] mutex_lock+0x1f/0x30
[414675.258756]  []
btrfs_wait_ordered_roots+0x42/0x1d0 [btrfs]
[414675.258764]  []
btrfs_check_data_free_space+0x1ff/0x2f0 [btrfs]
[414675.258772]  [] __btrfs_buffered_write+0x13f/0x5a0 [btrfs]
[414675.258774]  [] ? tcp_recvmsg+0x3d1/0xb70
[414675.258783]  [] btrfs_file_write_iter+0x171/0x560 [btrfs]
[414675.258785]  [] ? sock_recvmsg+0x3b/0x50
[414675.258787]  [] __vfs_write+0xa7/0xf0
[414675.258790]  [] vfs_write+0xa6/0x1a0
[414675.258791]  [] ? ktime_get_ts64+0x45/0xf0
[414675.258793]  [] SyS_write+0x46/0xa0
[414675.258795]  [] entry_SYSCALL_64_fastpath+0x16/0x75
[414675.258812] INFO: task java:19067 blocked for more than 120 seconds.
[414675.258856]   Not tainted 4.3.0-040300-generic #201511020846
[414675.258898] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[414675.258964] javaD 88081f8d6980 0 19067   2602 0x
[414675.258967]  880056957bb8 0082 88081c2c6200
8810188e
[414675.258968]  880056958000 88101b7627bc 8810188e

[414675.258970]  88101b7627c0 880056957bd0 817af3f3
88101b7627b8
[414675.258972] Call Trace:
[414675.258975]  [] schedule+0x33/0x80
[414675.258977]  [] schedule_preempt_disabled+0xe/0x10
[414675.258979]  [] __mutex_lock_slowpath+0x95/0x110
[414675.258981]  [] mutex_lock+0x1f/0x30
[414675.258992]  []
btrfs_wait_ordered_roots+0x42/0x1d0 [btrfs]
[414675.259003]  []
btrfs_check_data_free_space+0x1ff/0x2f0 [btrfs]
[414675.259013]  [] __btrfs_buffered_write+0x13f/0x5a0 [btrfs]
[414675.259019]  [] ? security_inode_need_killpriv+0x33/0x50
[414675.259029]  [] btrfs_file_write_iter+0x171/0x560 [btrfs]
[414675.259035]  [] __vfs_write+0xa7/0xf0
[414675.259040]  [] vfs_write+0xa6/0x1a0
[414675.259046]  [] ? __do_page_fault+0x1b4/0x400
[414675.259051]  [] SyS_write+0x46/0xa0
[414675.259055]  [] entry_SYSCALL_64_fastpath+0x16/0x75

regards
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: utils version and convert crash

2015-12-02 Thread Gareth Pye
Yeah having a scrub take 9 hours instead of 24 (+ latency of human
involvement) would be really nice.

On Thu, Dec 3, 2015 at 1:32 AM, Austin S Hemmelgarn
 wrote:
> On 2015-12-02 08:45, Duncan wrote:
>>
>> Austin S Hemmelgarn posted on Wed, 02 Dec 2015 07:25:13 -0500 as
>> excerpted:
>>
>>> On 2015-12-02 05:01, Duncan wrote:
>>
>>
>> [on unverified errors returned by scrub]


 Unverified errors are, I believe[1], errors where a metadata block
 holding checksums itself has an error, so the blocks its checksums in
 turn covered are not checksum-verified.

 What that means in practice is that once the first metadata block error
 has been corrected in a first scrub run, a second scrub run can now
 check the blocks that were recorded as unverified errors in the first
 run, potentially finding and hopefully fixing additional errors[.]
>>
>>
 ---
 [1] I'm not a dev and am not absolutely sure of the technical accuracy
 of this description, but from an admin's viewpoint it seems to be
 correct at least in practice, based on the fact that further scrubs as
 long as there were unverified errors often did find additional errors,
 while once the unverified count dropped to zero and the last read
 errors were corrected, further scrubs turned up no further errors.

>>> AFAICT from reading the code, that is a correct assessment.  It would be
>>> kind of nice though if there was some way to tell scrub to recheck up to
>>> X many times if there are unverified errors...
>>
>>
>> Yes.  For me as explained it wasn't that big a deal as another scrub was
>> another minute or less, but definitely on terabyte-scale filesystems on
>> spinning rust, where scrubs take hours, having scrub be able to
>> automatically track just the corrected errors along with their
>> unverifieds, and rescan just those, should only take a matter of a few
>> minutes more, while a full rescan of /everything/ would take the same
>> number of hours yet again... and again if there's a third scan required,
>> etc.
>>
>> I'd say just make it automatic on corrected metadata errors as I can't
>> think of a reason people wouldn't want it, given the time it would save
>> over rerunning a full scrub over and over again, but making it an option
>> would be fine with me too.
>>
> I was thinking an option to do a full re-scrub, but having an automatic
> reparse of the metadata in a fixed metadata block would be a lot more
> efficient that what I was thinking :)
>



-- 
Gareth Pye - blog.cerberos.id.au
Level 2 MTG Judge, Melbourne, Australia
"Dear God, I would like to file a bug report"
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug/regression: Read-only mount not read-only

2015-12-02 Thread Eric Sandeen
On 12/2/15 11:48 AM, Austin S Hemmelgarn wrote:

> On a side note, do either XFS or ext4 support removing the norecovery
> option from the mount flags through mount -o remount?  Even if they
> don't, that might be a nice feature to have in BTRFS if we can safely
> support it.

It's not remountable today on xfs:

/* ro -> rw */
if ((mp->m_flags & XFS_MOUNT_RDONLY) && !(*flags & MS_RDONLY)) {
if (mp->m_flags & XFS_MOUNT_NORECOVERY) {
xfs_warn(mp,
"ro->rw transition prohibited on norecovery mount");
return -EINVAL;
}

not sure about ext4.

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug/regression: Read-only mount not read-only

2015-12-02 Thread Duncan
Hugo Mills posted on Wed, 02 Dec 2015 23:51:55 + as excerpted:

> On Thu, Dec 03, 2015 at 07:40:08AM +0800, Qu Wenruo wrote:
>> 
>> Not remountable is very good to implement it.
>> Makes things super easy to do.
>> 
>> Or we will need to add log replay for remount time.
>> 
>> I'd like to implement it first for non-remountable case as a try. And
>> for the option name, I prefer something like "notreereplay", but I
>> don't consider it the best one yet
> 
>Thinking out loud...
> 
> no-log-replay, no-log, hard-ro, ro-log,
> really-read-only-i-mean-it-this-time-honest-guvnor
> 
> Delete hyphens at your pleasure.

I want the bikeshed green with black polkadots! =:^)

More seriously, ro-noreplay ?

As Hugo says, norecovery clashes with the recovery option we already 
have, so unless we _really_ want to maintain cross-filesystem mount 
option compatibility, that's not going to work.

I'm not sure we want to encourage thinking of it as a log, since it's not 
a log in the journalling-filesystem sense but much more limited.

And I think ro needs to be in there for clarity.

hard-ro strikes my fancy as well, but ro-noreplay seems clearer to me.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: compression disk space saving - what are your results?

2015-12-02 Thread Duncan
Austin S Hemmelgarn posted on Wed, 02 Dec 2015 09:39:08 -0500 as
excerpted:

> On 2015-12-02 09:03, Imran Geriskovan wrote:
 What are your disk space savings when using btrfs with compression?
>>
>>> [Some] posters have reported that for mostly text, compress didn't
>>> give them expected compression results and they needed to use
>>> compress-force.
>>
>> "compress-force" option compresses regardless of the "compressibility"
>> of the file.
>>
>> "compress" option makes some inference about the "compressibility" and
>> decides to compress or not.
>>
>> I wonder how that inference is done?
>> Can anyone provide some pseudo code for it?

> I'm not certain how BTRFS does it, but my guess would be trying to
> compress the block, then storing the uncompressed version if the
> compressed one is bigger.

No pseudocode as I'm not a dev and wouldn't want to give the wrong 
impression, but as I believe I replied recently in another thread, based 
on comments the devs have made...

With compress, btrfs does a(n intended to be fast) trial compression of 
the first 128 KiB block or two and uses the result of that to decide 
whether to compress the entire file.

Compress-force simply bypasses that first decision point, processing the 
file as if the test always succeeded and compression was chosen.

If the decision to compress is made, the file is (evidently, again, not a 
dev, but filefrag results support) compressed a 128 KiB block at a time 
with the resulting size compared against the uncompressed version, with 
the smaller version stored.

(Filefrag doesn't understand btrfs compression and reports individual 
extents for each 128 KiB compression block, if compressed.  However, for 
many files processed with compress-force, filefrag doesn't report the 
expected size/128-KiB extents, but rather something lower.  If
filefrag -v is used, details of each "extent" are listed, and some show 
up as multiples of 128 KiB, indicating runs of uncompressable blocks that 
unlike actually compressed blocks, filefrag can and does report correctly 
as single extents.  The conclusion is thus as above, that btrfs is 
testing the compression result of each block, and not compressing if the 
"compression" ends up being negative, that is, if the "compressed" size 
is larger than the uncompressed size.)

> On a side note, I really wish BTRFS would just add LZ4 support.  It's a
> lot more deterministic WRT decompression time than LZO, gets a similar
> compression ratio, and runs faster on most processors for both
> compression and decompression.

There were patches (at least RFC level, IIRC) floating around years ago 
to add lz4... I wonder what happened to them?  My impression was that a 
large deployment somewhere may actually be running them as well, making 
them well tested (and obviously well beyond preliminary RFC level) by 
now, altho that impression could well be wrong.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug/regression: Read-only mount not read-only

2015-12-02 Thread Hugo Mills
On Thu, Dec 03, 2015 at 07:40:08AM +0800, Qu Wenruo wrote:
> 
> 
> On 12/03/2015 06:48 AM, Eric Sandeen wrote:
> >On 12/2/15 11:48 AM, Austin S Hemmelgarn wrote:
> >
> >>On a side note, do either XFS or ext4 support removing the norecovery
> >>option from the mount flags through mount -o remount?  Even if they
> >>don't, that might be a nice feature to have in BTRFS if we can safely
> >>support it.
> >
> >It's not remountable today on xfs:
> >
> > /* ro -> rw */
> > if ((mp->m_flags & XFS_MOUNT_RDONLY) && !(*flags & MS_RDONLY)) {
> > if (mp->m_flags & XFS_MOUNT_NORECOVERY) {
> > xfs_warn(mp,
> > "ro->rw transition prohibited on norecovery mount");
> > return -EINVAL;
> > }
> >
> >not sure about ext4.
> >
> >-Eric
> 
> Not remountable is very good to implement it.
> Makes things super easy to do.
> 
> Or we will need to add log replay for remount time.
> 
> I'd like to implement it first for non-remountable case as a try.
> And for the option name, I prefer something like "notreereplay", but
> I don't consider it the best one yet

   Thinking out loud...

no-log-replay, no-log, hard-ro, ro-log,
really-read-only-i-mean-it-this-time-honest-guvnor

Delete hyphens at your pleasure.

   Hugo.

-- 
Hugo Mills | ORLY? IÄ! R'LYH!
hugo@... carfax.org.uk |
http://carfax.org.uk/  |
PGP: E2AB1DE4  |


signature.asc
Description: Digital signature


Re: Bug/regression: Read-only mount not read-only

2015-12-02 Thread Qu Wenruo



On 12/03/2015 06:48 AM, Eric Sandeen wrote:

On 12/2/15 11:48 AM, Austin S Hemmelgarn wrote:


On a side note, do either XFS or ext4 support removing the norecovery
option from the mount flags through mount -o remount?  Even if they
don't, that might be a nice feature to have in BTRFS if we can safely
support it.


It's not remountable today on xfs:

 /* ro -> rw */
 if ((mp->m_flags & XFS_MOUNT_RDONLY) && !(*flags & MS_RDONLY)) {
 if (mp->m_flags & XFS_MOUNT_NORECOVERY) {
 xfs_warn(mp,
 "ro->rw transition prohibited on norecovery mount");
 return -EINVAL;
 }

not sure about ext4.

-Eric


Not remountable is very good to implement it.
Makes things super easy to do.

Or we will need to add log replay for remount time.

I'd like to implement it first for non-remountable case as a try.
And for the option name, I prefer something like "notreereplay", but I 
don't consider it the best one yet


Thanks,
Qu


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Btrfs device and pool management (wip)

2015-12-02 Thread Qu Wenruo



On 12/03/2015 03:07 AM, Goffredo Baroncelli wrote:

On 2015-12-02 00:43, Qu Wenruo wrote:
[...]


And block layer provides its own listen interface, reporting errors
like ATA error.

Could you point me to this kind of interface




Not yet, and that's the problem...

Thanks,
Qu
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


compression disk space saving - what are your results?

2015-12-02 Thread Tomasz Chmielewski

What are your disk space savings when using btrfs with compression?

I have a 200 GB btrfs filesystem which uses compress=zlib, only stores 
text files (logs), mostly multi-gigabyte files.



It's a "single" filesystem, so "df" output matches "btrfs fi df":

# df -h
Filesystem  Size  Used Avail Use% Mounted on
(...)
/dev/xvdb   200G  124G   76G  62% /var/log/remote


# du -sh /var/log/remote/
153G/var/log/remote/


From these numbers (124 GB used where data size is 153 GB), it appears 
that we save around 20% with zlib compression enabled.
Is 20% reasonable saving for zlib? Typically text compresses much better 
with that algorithm, although I understand that we have several 
limitations when applying that on a filesystem level.



Tomasz Chmielewski
http://wpkg.org

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: utils version and convert crash

2015-12-02 Thread Duncan
Gareth Pye posted on Wed, 02 Dec 2015 18:07:48 +1100 as excerpted:

> Output from scrub:
> sudo btrfs scrub start -Bd /data

[Omitted no-error device reports.]

> scrub device /dev/sdh (id 6) done
>scrub started at Wed Dec  2 07:04:08 2015 and finished after 06:47:22
>total bytes scrubbed: 1.09TiB with 2 errors
>error details: read=2
>corrected errors: 2, uncorrectable errors: 0, unverified errors: 30

Also note those unverified errors...

I have quite a bit of experience with btrfs scrub as I ran with a failing 
ssd for awhile, using btrfs scrub on the multiple btrfs raid1 filesystems 
on parallel partitions on the failing ssd and another good one to correct 
the errors and continue operations.

Unverified errors are, I believe[1], errors where a metadata block 
holding checksums itself has an error, so the blocks its checksums in 
turn covered are not checksum-verified.

What that means in practice is that once the first metadata block error 
has been corrected in a first scrub run, a second scrub run can now check 
the blocks that were recorded as unverified errors in the first run, 
potentially finding and hopefully fixing additional errors, tho unless 
the problem's extreme, most of the unverifieds should end up being 
correct once they can be verified, with only a few possible further 
errors found.

Of course if some of these previously unverified blocks are themselves 
metadata blocks with further checksums, yet another run may be required.

Fortunately, these trees are quite wide (121 items according to an old 
post from Hugo I found myself rereading a few hours ago) and thus don't 
tend to be very deep -- I think I ended up rerunning scrub four times at 
one point, before both read and unverified errors went to zero, tho 
that's on relatively small partitioned-up ssd filesystems of under 50 gig 
usable capacity (pair-raid1, 50 gig per device), so I could see terabyte-
scale filesystems going to 6-7 levels.

And, again on a btrfs raid1 with a known failing device -- several 
thousand redirected sectors by the time I gave up and btrfs replaced -- 
generally each successive scrub run would return an order of magnitude or 
so fewer errors (corrected and unverified both) than the previous run, 
tho occasionally I'd hit a bad spot and the number would go up a bit in 
one run, before dropping an order of magnitude or so again on the next 
run.

So with only two corrected read-errors and 30 unverified, I'd expect 
maybe another one or two corrected read-errors on a second run, and 
probably no unverifieds, in which case a third run shouldn't be necessary 
unless you just want the peace of mind of seeing that no errors found 
message.  Tho of course if you're unlucky, one of those 30 will turn out 
to be a a read error on a full 121-item metadata block, so your 
unverifieds will go up for that run, before going down again in 
subsequent runs.

Of course with filesystems of under 50 gig capacity on fast ssds, a 
typical scrub ran in under a minute, so repeated scrubs to find and 
correct all errors wasn't a big deal, generally under 10 minutes 
including human response time.  On terabyte-scale spinning rust with 
scrubs taking hours, multiple scrubs could easily take a full 24-hour day 
or more! =:^(

So now that you did one scrub and did find errors, you do probably want 
to trace them down and correct the problem if possible, before running 
further scrubs to find and exterminate any errors still hiding behind 
unverified in the first run.  But once you're reasonably confident you're 
running a reliable system again, you probably do want to run further 
scrubs until that unverified count goes to zero (assuming no 
uncorrectable errors in the mean time).

---
[1] I'm not a dev and am not absolutely sure of the technical accuracy of 
this description, but from an admin's viewpoint it seems to be correct at 
least in practice, based on the fact that further scrubs as long as there 
were unverified errors often did find additional errors, while once the 
unverified count dropped to zero and the last read errors were corrected, 
further scrubs turned up no further errors.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: compression disk space saving - what are your results?

2015-12-02 Thread Duncan
Tomasz Chmielewski posted on Wed, 02 Dec 2015 18:46:30 +0900 as excerpted:

> What are your disk space savings when using btrfs with compression?
> 
> I have a 200 GB btrfs filesystem which uses compress=zlib, only stores
> text files (logs), mostly multi-gigabyte files.
> 
> 
> It's a "single" filesystem, so "df" output matches "btrfs fi df":
> 
> # df -h Filesystem  Size  Used Avail Use% Mounted on (...)
> /dev/xvdb   200G  124G   76G  62% /var/log/remote
> 
> 
> # du -sh /var/log/remote/
> 153G/var/log/remote/
> 
> 
>  From these numbers (124 GB used where data size is 153 GB), it appears
> that we save around 20% with zlib compression enabled.
> Is 20% reasonable saving for zlib? Typically text compresses much better
> with that algorithm, although I understand that we have several
> limitations when applying that on a filesystem level.

Here, just using compress=lzo, no compress-force and lzo not zlib, I'm 
mostly just happy to see lower usage than I was getting on reiserfs.  
Between that and no longer needing to worry whether copying a sparse file 
is going to end up sparse or not, because even if not the compression 
should effectively collapse the sparse areas, I've been happy /enough/ 
with it.


There's at least three additional factors to consider, for your case.

* There is of course metadata to consider as well as data, and on
single-device btrfs, metadata normally defaults to dup, 2X the space.  
You did say single, but didn't specify whether that was for metadata also 
(and for that matter, didn't specify whether it was a single-device 
filesystem or not, tho I assume it is).  And of course btrfs does 
checksumming that other filesystems don't do, and even puts small files 
in metadata too, all of which will be dup by default, taking even more 
space.

A btrfs fi df will of course give you separate data/metadata/system 
values, and you can take the data used value and compare that against the 
du -sh value to get a more accurate read on how well your compression 
really is working.  (Tho as noted, small files, a few KiB max, are often 
stored in the metadata, so if you have lots of those, you'd probably need 
to adjust for that, but you mentioned mostly GiB-scale files, so...)

* There's the compress vs. compress-force option and discussion.  A 
number of posters have reported that for mostly text, compress didn't 
give them expected compression results and they needed to use compress-
force.

Of course, changing the option now won't change how existing files are 
stored.  You'd have to either rewrite them, or wait for log rotation to 
rotate out the old files, to see the full effect.  Also see the btrfs fi 
defrag -c option.

* Talking about defrag, it's not snapshot aware, which brings up the 
question of whether you're using btrfs snapshots on this filesystem and 
the effect that would have if you do. 

I'll presume not, as that would seem to be important enough to mention in 
a discussion of this sort, if you were, and also because that allows me 
to simply handwave further discussion of this point away. =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: utils version and convert crash

2015-12-02 Thread Austin S Hemmelgarn

On 2015-12-02 05:01, Duncan wrote:

Gareth Pye posted on Wed, 02 Dec 2015 18:07:48 +1100 as excerpted:


Output from scrub:
sudo btrfs scrub start -Bd /data


[Omitted no-error device reports.]


scrub device /dev/sdh (id 6) done
scrub started at Wed Dec  2 07:04:08 2015 and finished after 06:47:22
total bytes scrubbed: 1.09TiB with 2 errors
error details: read=2
corrected errors: 2, uncorrectable errors: 0, unverified errors: 30


Also note those unverified errors...

I have quite a bit of experience with btrfs scrub as I ran with a failing
ssd for awhile, using btrfs scrub on the multiple btrfs raid1 filesystems
on parallel partitions on the failing ssd and another good one to correct
the errors and continue operations.

Unverified errors are, I believe[1], errors where a metadata block
holding checksums itself has an error, so the blocks its checksums in
turn covered are not checksum-verified.

What that means in practice is that once the first metadata block error
has been corrected in a first scrub run, a second scrub run can now check
the blocks that were recorded as unverified errors in the first run,
potentially finding and hopefully fixing additional errors, tho unless
the problem's extreme, most of the unverifieds should end up being
correct once they can be verified, with only a few possible further
errors found.

Of course if some of these previously unverified blocks are themselves
metadata blocks with further checksums, yet another run may be required.

Fortunately, these trees are quite wide (121 items according to an old
post from Hugo I found myself rereading a few hours ago) and thus don't
tend to be very deep -- I think I ended up rerunning scrub four times at
one point, before both read and unverified errors went to zero, tho
that's on relatively small partitioned-up ssd filesystems of under 50 gig
usable capacity (pair-raid1, 50 gig per device), so I could see terabyte-
scale filesystems going to 6-7 levels.

And, again on a btrfs raid1 with a known failing device -- several
thousand redirected sectors by the time I gave up and btrfs replaced --
generally each successive scrub run would return an order of magnitude or
so fewer errors (corrected and unverified both) than the previous run,
tho occasionally I'd hit a bad spot and the number would go up a bit in
one run, before dropping an order of magnitude or so again on the next
run.

So with only two corrected read-errors and 30 unverified, I'd expect
maybe another one or two corrected read-errors on a second run, and
probably no unverifieds, in which case a third run shouldn't be necessary
unless you just want the peace of mind of seeing that no errors found
message.  Tho of course if you're unlucky, one of those 30 will turn out
to be a a read error on a full 121-item metadata block, so your
unverifieds will go up for that run, before going down again in
subsequent runs.

Of course with filesystems of under 50 gig capacity on fast ssds, a
typical scrub ran in under a minute, so repeated scrubs to find and
correct all errors wasn't a big deal, generally under 10 minutes
including human response time.  On terabyte-scale spinning rust with
scrubs taking hours, multiple scrubs could easily take a full 24-hour day
or more! =:^(

So now that you did one scrub and did find errors, you do probably want
to trace them down and correct the problem if possible, before running
further scrubs to find and exterminate any errors still hiding behind
unverified in the first run.  But once you're reasonably confident you're
running a reliable system again, you probably do want to run further
scrubs until that unverified count goes to zero (assuming no
uncorrectable errors in the mean time).

---
[1] I'm not a dev and am not absolutely sure of the technical accuracy of
this description, but from an admin's viewpoint it seems to be correct at
least in practice, based on the fact that further scrubs as long as there
were unverified errors often did find additional errors, while once the
unverified count dropped to zero and the last read errors were corrected,
further scrubs turned up no further errors.

AFAICT from reading the code, that is a correct assessment.  It would be 
kind of nice though if there was some way to tell scrub to recheck up to 
X many times if there are unverified errors...




smime.p7s
Description: S/MIME Cryptographic Signature


Re: utils version and convert crash

2015-12-02 Thread Gareth Pye
Thanks for that info, ram appears to be checking out fine and smartctl
reported that the drives are old but one had some form of elevated
error. Looks like I might be buying a new drive.

On Wed, Dec 2, 2015 at 9:01 PM, Duncan <1i5t5.dun...@cox.net> wrote:
> Gareth Pye posted on Wed, 02 Dec 2015 18:07:48 +1100 as excerpted:
>
>> Output from scrub:
>> sudo btrfs scrub start -Bd /data
>
> [Omitted no-error device reports.]
>
>> scrub device /dev/sdh (id 6) done
>>scrub started at Wed Dec  2 07:04:08 2015 and finished after 06:47:22
>>total bytes scrubbed: 1.09TiB with 2 errors
>>error details: read=2
>>corrected errors: 2, uncorrectable errors: 0, unverified errors: 30
>
> Also note those unverified errors...
>
> I have quite a bit of experience with btrfs scrub as I ran with a failing
> ssd for awhile, using btrfs scrub on the multiple btrfs raid1 filesystems
> on parallel partitions on the failing ssd and another good one to correct
> the errors and continue operations.
>
> Unverified errors are, I believe[1], errors where a metadata block
> holding checksums itself has an error, so the blocks its checksums in
> turn covered are not checksum-verified.
>
> What that means in practice is that once the first metadata block error
> has been corrected in a first scrub run, a second scrub run can now check
> the blocks that were recorded as unverified errors in the first run,
> potentially finding and hopefully fixing additional errors, tho unless
> the problem's extreme, most of the unverifieds should end up being
> correct once they can be verified, with only a few possible further
> errors found.
>
> Of course if some of these previously unverified blocks are themselves
> metadata blocks with further checksums, yet another run may be required.
>
> Fortunately, these trees are quite wide (121 items according to an old
> post from Hugo I found myself rereading a few hours ago) and thus don't
> tend to be very deep -- I think I ended up rerunning scrub four times at
> one point, before both read and unverified errors went to zero, tho
> that's on relatively small partitioned-up ssd filesystems of under 50 gig
> usable capacity (pair-raid1, 50 gig per device), so I could see terabyte-
> scale filesystems going to 6-7 levels.
>
> And, again on a btrfs raid1 with a known failing device -- several
> thousand redirected sectors by the time I gave up and btrfs replaced --
> generally each successive scrub run would return an order of magnitude or
> so fewer errors (corrected and unverified both) than the previous run,
> tho occasionally I'd hit a bad spot and the number would go up a bit in
> one run, before dropping an order of magnitude or so again on the next
> run.
>
> So with only two corrected read-errors and 30 unverified, I'd expect
> maybe another one or two corrected read-errors on a second run, and
> probably no unverifieds, in which case a third run shouldn't be necessary
> unless you just want the peace of mind of seeing that no errors found
> message.  Tho of course if you're unlucky, one of those 30 will turn out
> to be a a read error on a full 121-item metadata block, so your
> unverifieds will go up for that run, before going down again in
> subsequent runs.
>
> Of course with filesystems of under 50 gig capacity on fast ssds, a
> typical scrub ran in under a minute, so repeated scrubs to find and
> correct all errors wasn't a big deal, generally under 10 minutes
> including human response time.  On terabyte-scale spinning rust with
> scrubs taking hours, multiple scrubs could easily take a full 24-hour day
> or more! =:^(
>
> So now that you did one scrub and did find errors, you do probably want
> to trace them down and correct the problem if possible, before running
> further scrubs to find and exterminate any errors still hiding behind
> unverified in the first run.  But once you're reasonably confident you're
> running a reliable system again, you probably do want to run further
> scrubs until that unverified count goes to zero (assuming no
> uncorrectable errors in the mean time).
>
> ---
> [1] I'm not a dev and am not absolutely sure of the technical accuracy of
> this description, but from an admin's viewpoint it seems to be correct at
> least in practice, based on the fact that further scrubs as long as there
> were unverified errors often did find additional errors, while once the
> unverified count dropped to zero and the last read errors were corrected,
> further scrubs turned up no further errors.
>
> --
> Duncan - List replies preferred.   No HTML msgs.
> "Every nonfree program has a lord, a master --
> and if you use the program, he is your master."  Richard Stallman
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Gareth Pye - blog.cerberos.id.au
Level 2 MTG Judge, Melbourne, Australia
"Dear God, I would 

Re: compression disk space saving - what are your results?

2015-12-02 Thread Austin S Hemmelgarn

On 2015-12-02 04:46, Tomasz Chmielewski wrote:

What are your disk space savings when using btrfs with compression?

I have a 200 GB btrfs filesystem which uses compress=zlib, only stores
text files (logs), mostly multi-gigabyte files.


It's a "single" filesystem, so "df" output matches "btrfs fi df":

# df -h
Filesystem  Size  Used Avail Use% Mounted on
(...)
/dev/xvdb   200G  124G   76G  62% /var/log/remote


# du -sh /var/log/remote/
153G/var/log/remote/


 From these numbers (124 GB used where data size is 153 GB), it appears
that we save around 20% with zlib compression enabled.
Is 20% reasonable saving for zlib? Typically text compresses much better
with that algorithm, although I understand that we have several
limitations when applying that on a filesystem level.


This is actually an excellent question.  A couple of things to note 
before I share what I've seen:
1. Text compresses better with any compression algorithm.  It is by 
nature highly patterned and moderately redundant data, which is what 
benefits the most from compression.
2. When BTRFS does in-line compression, it uses 128k blocks.  Because of 
this, there are diminishing returns for smaller files when using 
compression.
3. The best compression ratio I've ever seen from zlib on real data is 
about 65-70%, and that was using SquashFS, which is designed to take up 
as little room as possible.
4. LZO gets a worse compression ratio than zlib (around 40-50% if you're 
lucky), but is a _lot_ faster.
5. By playing around with the -c option for defrag, you can compress or 
uncompress different parts of the filesystem, and get a rough idea of 
what compresses best.


Now, to my results.  These are all from my desktop system, with no 
deduplication, and the data for zlib is somewhat outdated (I've not used 
it since LZO support stabilized).


For the filesystems I have on traditional hard disks:
1. For /home (mostly text files, some SQLite databases, and a couple of 
git repositories), I get about 15-20% space savings with zlib, and about 
a 2-4$ performance hit.  I get about 5-10% space savings with lzo, but 
performance is about 5-8% better than uncompressed.
2. For /usr/src (50/50 mix of text and executable code), I get about 25% 
space savings with zlib with a 5-7% hit to performance, and about 10% 
with lzo with a 7% boost in performance relative to uncompressed.
3. For /usr/portage and /var/lib/layman (lots of small text files, a 
number of VCS repos, and about 2000 compressed source archives), I get 
about 25% space savings with zlib, with a 15% performance hit (yes, 
seriously 15%), and with lzo I get about 25% space savings with no 
measurable performance difference relative to uncompressed.


For the filesystems I have on SSD's:
1. For /var/tmp (huge assortment of different things, but usually 
similar to /usr/src because this is where packages get built), I get 
almost no space savings with either type of compression, and see a 
performance reduction of about 5% for both.
2. For /var/log (Lots of text files (notably, I don't compress rotated 
logs, and I don't have systemd's insane binary log files), I get about 
30% space savings with zlib, but it makes the _whole_ system run about 
5% slower, and I get about 20% space savings with lzo, with no 
measurable performance difference relative to uncompressed.
3. For /var/spool (Lots of really short text files, mostly stuff from 
postfix and CUPS), I actually see higher disk usage with both types of 
compression, but almost zero performance impact from either of them.
4. For /boot (a couple of big binary files that already have built-in 
compression), I see no net space savings, and don't have any numbers 
regarding performance impact.
5. For / (everything that isn't on one of the other filesystems I listed 
above), I see about 10-20% space savings from zlib, with a roughly 5% 
performance hit, and about 5-15% space savings with lzo, with no 
measurable performance difference.




smime.p7s
Description: S/MIME Cryptographic Signature


Re: LWN mention

2015-12-02 Thread Duncan
Russell Coker posted on Wed, 02 Dec 2015 17:42:15 +1100 as excerpted:

> On Mon, 9 Nov 2015 08:10:13 AM Duncan wrote:
>> Russell Coker posted on Sun, 08 Nov 2015 17:38:32 +1100 as excerpted:
>> > https://lwn.net/Articles/663474/
>> > http://thread.gmane.org/gmane.comp.file-systems.btrfs/49500
>> > 
>> > Above is a BTRFS issue that is mentioned in a LWN comment.  Has this
>> > one been fixed yet?
>> 
>> Good job linking to a subscription-only article, without using a link
>> that makes it available for non-subscriber list readers to read.[1]
> 
> Well it's only subscription-only for a week, so it's available for
> everyone now.

Thanks for the reminder.  I just finished reading TFA and all the 
comments.  Wow!  PaXTeam arguably knows his security stuff but he either 
doesn't know much about actually playing on a team or he has incentive 
not to play well with the kernel team.  And he seemed to be "in vehement 
agreement" with Jon's article.  Oh, well...

As for the question of the thread, no real answers on-thread, but I 
believe I've seen patches posted mentioning that the problem fixed was 
found by the grsec overflow checker.  It may be that the devs are a bit 
wary of posting direct answers to the threads themselves, in case PaXTeam 
replies and it turns into a flame fest.  Better for all concerned just to 
do the patch and not directly reply to the thread, and given PaXTeam's 
behavior in the comments of that article I can definitely agree with the 
wisdom of such a strategy, if there's any chance at all that he might be 
drawn by the mention to the thread.

> While LWN has a feature for "subscribers" (which includes me because HP
> sponsors LWN access for all DDs) to send free links for other people I
> don't believe it would be appropriate to use that on a mailing list.  If
> anyone had asked by private mail I'd have been happy to send them a
> personal link for that.

FWIW, I was still a subscriber when that feature was first presented, and 
Jon actually seemed fine with posting the special links to for instance 
the various kernel lists, but to other more technical or limited 
readership Linux related lists as well, the idea being that like the 
public access months many subscriber TV channels have, a sample of the 
content would bring in new subscribers.  And sponsored article access, 
please consider subscribing, notices on such articles, seems to encourage 
that as well.

So I think it would have been fine, but of course I respect your 
differing opinion as well, as it's clearly a personal call. =:^)

But it would have been nice to have added a "subscriber-only until xx 
date" note, in that case, instead of simply posting a link with no 
reference to the (temporary) pay-wall in place, in which case I'd have 
not even bothered, since I know under the current circumstances I can't 
subscribe (as explained).

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: utils version and convert crash

2015-12-02 Thread Duncan
Austin S Hemmelgarn posted on Wed, 02 Dec 2015 07:25:13 -0500 as
excerpted:

> On 2015-12-02 05:01, Duncan wrote:

[on unverified errors returned by scrub]
>>
>> Unverified errors are, I believe[1], errors where a metadata block
>> holding checksums itself has an error, so the blocks its checksums in
>> turn covered are not checksum-verified.
>>
>> What that means in practice is that once the first metadata block error
>> has been corrected in a first scrub run, a second scrub run can now
>> check the blocks that were recorded as unverified errors in the first
>> run, potentially finding and hopefully fixing additional errors[.]

>> ---
>> [1] I'm not a dev and am not absolutely sure of the technical accuracy
>> of this description, but from an admin's viewpoint it seems to be
>> correct at least in practice, based on the fact that further scrubs as
>> long as there were unverified errors often did find additional errors,
>> while once the unverified count dropped to zero and the last read
>> errors were corrected, further scrubs turned up no further errors.
>>
> AFAICT from reading the code, that is a correct assessment.  It would be
> kind of nice though if there was some way to tell scrub to recheck up to
> X many times if there are unverified errors...

Yes.  For me as explained it wasn't that big a deal as another scrub was 
another minute or less, but definitely on terabyte-scale filesystems on 
spinning rust, where scrubs take hours, having scrub be able to 
automatically track just the corrected errors along with their 
unverifieds, and rescan just those, should only take a matter of a few 
minutes more, while a full rescan of /everything/ would take the same 
number of hours yet again... and again if there's a third scan required, 
etc.

I'd say just make it automatic on corrected metadata errors as I can't 
think of a reason people wouldn't want it, given the time it would save 
over rerunning a full scrub over and over again, but making it an option 
would be fine with me too.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: compression disk space saving - what are your results?

2015-12-02 Thread Tomasz Chmielewski

On 2015-12-02 22:03, Austin S Hemmelgarn wrote:

 From these numbers (124 GB used where data size is 153 GB), it 
appears

that we save around 20% with zlib compression enabled.
Is 20% reasonable saving for zlib? Typically text compresses much 
better

with that algorithm, although I understand that we have several
limitations when applying that on a filesystem level.


This is actually an excellent question.  A couple of things to note
before I share what I've seen:
1. Text compresses better with any compression algorithm.  It is by
nature highly patterned and moderately redundant data, which is what
benefits the most from compression.


It looks that compress=zlib does not compress very well. Following 
Duncan's suggestion, I've changed it to compress-force=zlib, and 
re-copied the data to make sure the file are compressed.


Compression ratio is much much better now (on a slightly changed data 
set):


# df -h
/dev/xvdb   200G   24G  176G  12% /var/log/remote


# du -sh /var/log/remote/
138G/var/log/remote/


So, 138 GB files use just 24 GB on disk - nice!

However, I would still expect that compress=zlib has almost the same 
effect as compress-force=zlib, for 100% text files/logs.



Tomasz Chmielewski
http://wpkg.org

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: compression disk space saving - what are your results?

2015-12-02 Thread Imran Geriskovan
>> What are your disk space savings when using btrfs with compression?

> * There's the compress vs. compress-force option and discussion.  A
> number of posters have reported that for mostly text, compress didn't
> give them expected compression results and they needed to use compress-
> force.

"compress-force" option compresses regardless of the "compressibility"
of the file.

"compress" option makes some inference about the "compressibility"
and decides to compress or not.

I wonder how that inference is done?
Can anyone provide some pseudo code for it?

Regards,
Imran
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: compression disk space saving - what are your results?

2015-12-02 Thread Tomasz Chmielewski

On 2015-12-02 23:03, Wang Shilong wrote:

Compression ratio is much much better now (on a slightly changed data 
set):


# df -h
/dev/xvdb   200G   24G  176G  12% /var/log/remote


# du -sh /var/log/remote/
138G/var/log/remote/


So, 138 GB files use just 24 GB on disk - nice!

However, I would still expect that compress=zlib has almost the same 
effect

as compress-force=zlib, for 100% text files/logs.


btw, what is your kernel version? there was a bug that detected inode
compression
ration wrong.

http://git.kernel.org/cgit/linux/kernel/git/mason/linux-btrfs.git/commit/?id=68bb462d42a963169bf7acbe106aae08c17129a5
http://git.kernel.org/cgit/linux/kernel/git/mason/linux-btrfs.git/commit/?id=4bcbb33255131adbe481c0467df26d654ce3bc78


Linux 4.3.0.


Tomasz Chmielewski
http://wpkg.org/

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: compression disk space saving - what are your results?

2015-12-02 Thread Wang Shilong
On Wed, Dec 2, 2015 at 9:53 PM, Tomasz Chmielewski  wrote:
> On 2015-12-02 22:03, Austin S Hemmelgarn wrote:
>
>>>  From these numbers (124 GB used where data size is 153 GB), it appears
>>> that we save around 20% with zlib compression enabled.
>>> Is 20% reasonable saving for zlib? Typically text compresses much better
>>> with that algorithm, although I understand that we have several
>>> limitations when applying that on a filesystem level.
>>
>>
>> This is actually an excellent question.  A couple of things to note
>> before I share what I've seen:
>> 1. Text compresses better with any compression algorithm.  It is by
>> nature highly patterned and moderately redundant data, which is what
>> benefits the most from compression.
>
>
> It looks that compress=zlib does not compress very well. Following Duncan's
> suggestion, I've changed it to compress-force=zlib, and re-copied the data
> to make sure the file are compressed.
>
> Compression ratio is much much better now (on a slightly changed data set):
>
> # df -h
> /dev/xvdb   200G   24G  176G  12% /var/log/remote
>
>
> # du -sh /var/log/remote/
> 138G/var/log/remote/
>
>
> So, 138 GB files use just 24 GB on disk - nice!
>
> However, I would still expect that compress=zlib has almost the same effect
> as compress-force=zlib, for 100% text files/logs.

btw, what is your kernel version? there was a bug that detected inode
compression
ration wrong.

http://git.kernel.org/cgit/linux/kernel/git/mason/linux-btrfs.git/commit/?id=68bb462d42a963169bf7acbe106aae08c17129a5
http://git.kernel.org/cgit/linux/kernel/git/mason/linux-btrfs.git/commit/?id=4bcbb33255131adbe481c0467df26d654ce3bc78

Regards,
Shilong

>
>
> Tomasz Chmielewski
> http://wpkg.org
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html