Re: btrfs device ready non-success exit code

2015-06-04 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 06/04/2015 04:25 PM, Anand Jain wrote:
 typically ready cli is to check disk pool status in an unmounted
 state.

On my desktop with btrfs RAID0 on two partitions, the exit code of
ready is always 1.  On my laptop with btrfs RAID0 on two LUKS/dmcrypt
partitions the exit code is 0.  (In both cases there are / and /home
already mounted as subvolumes.)

The documentation says:

  Check device to see if it has all of it’s devices in
  cache for mounting

The semantics implied by that are that a successful return means you
can mount the device(s).  Consequently if already mounted, surely it
should return success?

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iEYEARECAAYFAlVxEakACgkQmOOfHg372QTIKwCgwKJIqjNWJH/FMfVtm1Ktesxz
E5YAnjxMP8VyfuLPM2fmfxU8UuuyDpGi
=Axi1
-END PGP SIGNATURE-

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfs device ready non-success exit code

2015-06-04 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

This is on current Ubuntu 15.04.  What is device ready unhappy with?
Note that the root filesystem and /home (both subvolumes of this) are
mounted and working perfectly, and even where the commands are run from.

$ sudo btrfs fi show /dev/sda1
Label: 'main'  uuid: 3ff68715-0daa-4e44-8de2-0997f36d8ab6
Total devices 2 FS bytes used 417.22GiB
devid2 size 894.25GiB used 362.03GiB path /dev/sdb1
devid3 size 894.25GiB used 362.03GiB path /dev/sda1

Btrfs v3.17
$ sudo btrfs fi show /
Label: 'main'  uuid: 3ff68715-0daa-4e44-8de2-0997f36d8ab6
Total devices 2 FS bytes used 417.22GiB
devid2 size 894.25GiB used 362.03GiB path /dev/sdb1
devid3 size 894.25GiB used 362.03GiB path /dev/sda1

Btrfs v3.17
$ sudo btrfs device ready /dev/sda1 ; echo $?
1
$ sudo btrfs device ready /dev/sdb1 ; echo $?
1
$ uname -a
Linux workstation 3.19.0-18-generic #18-Ubuntu SMP Tue May 19 18:31:35
UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iEYEARECAAYFAlVwkX4ACgkQmOOfHg372QTEjQCgmhFU8Gxx8VOeNay2fkJjgpaU
ZOEAn18pogQVQnS8AzPzSEY7gu6rH+q3
=foM6
-END PGP SIGNATURE-

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: price to pay for nocow file bit?

2015-01-08 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 01/08/2015 08:53 AM, Lennart Poettering wrote:
 this will help little if we change things in the beginning of the
 file,

Have you considered changing the format so that those pointers are
stored at the end of the file, letting data always be append only?

While it is traditional to have things at the beginning as headers,
there are formats like zip where metadata is stored at the end instead
providing other benefits.

Roger

-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iEYEARECAAYFAlSu68gACgkQmOOfHg372QSn5wCfaRAfI/xN3SHiDEPNMjjAuFQB
NbcAn2GCjzZyfHocF7yTKEBFdt3znD6n
=KL2f
-END PGP SIGNATURE-

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


scrub wedged (both running and not running at the same time)

2015-01-02 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

I can't start a scrub because it is running, and can't cancel it
because it isn't running!  How do I get out of this state?  OS is
Ubuntu 14.10.

$ uname -r
3.16.0-28-generic

# btrfs scrub start .
ERROR: scrub is already running.
To cancel use 'btrfs scrub cancel .'.
To see the status use 'btrfs scrub status [-d] .'.
# btrfs scrub cancel .
ERROR: scrub cancel failed on .: not running
# btrfs scrub status .
scrub status for b02cc605-dd78-40bc-98a5-8f5543d83b66
scrub started at Mon Nov 17 20:27:17 2014, running for 64491 seconds
total bytes scrubbed: 3.43GiB with 1 errors
error details: read=1
corrected errors: 1, uncorrectable errors: 0, unverified errors: 0

Even a reboot doesn't make this go away.

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iEYEARECAAYFAlSnL88ACgkQmOOfHg372QTpogCgvOpEAjIQI5dq+QPtRPty1gB/
3q0An0llPrQkIeDprwiH4pRBzuZdWdRg
=NXGR
-END PGP SIGNATURE-

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Crazy idea of cleanup the inode_record btrfsck things with SQL?

2014-12-11 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 11/30/2014 05:58 PM, Qu Wenruo wrote:
 2. Heavy dependency If use it, btrfs-progs will include RDBMS as
 the make and runtime dependency. Such low level progs depend on
 high level programs like sqlite3 may be very strange.

BTW SQLite is designed as a library.  It is shipped as a single file
with the deliberate intention you add the sqlite3.c file to your
project.  For private internal tool use you don't need to depend or
use the system SQLite in any way.

  https://www.sqlite.org/amalgamation.html

SQLite also lets you easily define collation sequences, your own
functions and virtual tables which will make all this easier.  Using
pragmas you can control use of memory and disk space for operation.

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iEYEARECAAYFAlSJ8rAACgkQmOOfHg372QRzJACgoip0vhoM0XEkVIB9/ZggPXX1
PuMAn3/0lP+SQyxDh6UFStt5hlA2Wwkz
=b5GF
-END PGP SIGNATURE-

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What to do about df and btrfs fi df

2014-02-11 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 10/02/14 19:13, cwillu wrote:
 But the answer changes dramatically depending on whether it's large 
 numbers of small files or a small number of large files, and the 
 conservative worst-case choice means we report a number that is half 
 what is probably expected.

Perfect is the enemy of good.

We aren't talking about a billion zero byte files and expecting them to
take no space.  It is things like a user with a file manager grabbing some
files and eyeballing if they will fit in the destination. Or the file
manager itself giving a warning before the copying starts (they might not
fit).

In both cases the sum of the source file sizes is compared to the df on
the destination.

Roger

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)

iEYEARECAAYFAlL6gNsACgkQmOOfHg372QR+WACfd91k2MYzbBbb3RFFuLCJUyw0
tw0AoI51yxrXCGFYHJBEK3+rwqR6i/iY
=RIiX
-END PGP SIGNATURE-

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What to do about df and btrfs fi df

2014-02-10 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 10/02/14 10:24, cwillu wrote:
 The regular df data used number should be the amount of space required 
 to hold a backup of that content (assuming that the backup maintains 
 reflinks and compression and so forth).
 
 There's no good answer for available space;

I think the flipside of the above works well.  How large a group of files
can you expect to create before you will get ENOSPC?

That for example is the check code does that looks at df - I need to put
in XGB of files - will it fit?  It is also what users do.

This is also what NTFS under Windows does with compression.  If it says
you have 5GB of space left then you will be able to put in 5GB of
uncompressible files.  Of course if they are compressible then you don't
end up consuming all the free space.

Roger

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)

iEYEARECAAYFAlL5dqcACgkQmOOfHg372QQBzgCgyrvj+WnZevjEDdgbAFd2nHaD
H98AoK0ZSDwZJpSMIdXpGYZGjWuPpGTh
=xJ+X
-END PGP SIGNATURE-

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Btrfs: disable snapshot aware defrag for now

2014-02-03 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 03/02/14 09:27, Josef Bacik wrote:
 It is so totally broken that I don't want it being turned on by anybody
 who can't edit this and change it themselves.

The symptoms I saw are huge amounts of kernel memory consumption, possibly
till exhaustion of swap.  Are there other ways in which is it broken (eg
corruption)?

Also is this patch making its way to the various stables?

Roger

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)

iEYEARECAAYFAlLwXE8ACgkQmOOfHg372QRLngCgpc445lPvM7YhGUxVdlU2O4vN
1CUAoM2NmeGPOeYxOji4yL4VRysBnTxg
=sQ3M
-END PGP SIGNATURE-

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Working on Btrfs as topic for master thesis

2014-01-23 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 23/01/14 10:36, David Sterba wrote:
 'Theoretical best' seems too vaguely defined,

It seems like a good thing for someone to tackle as part of a master's
thesis :-)

 with compression it's always some trade-off and compromise

Which you can put in context against the theoretical best.  The links you
gave are a good example of trying to do that.

 This is a bit different usecase, defrag is triggered by user at the
 time he knows the resources are available

I'm a user and I use autodefrag :-)  As a developer you are more
interested in making users be aware of what they do, when they do it, and
carefully select the optimum conditions and configuration.

As a user I just want to point to a pile of storage and have btrfs do the
right thing, without me having to babysit it or play admin.  Computers
have billions of processor cycles per second, gigabytes of memory etc.
They should just figure this stuff out and not require me to be versed in
lots of intricate details!

 Keeping the dictionary implies more data to be read/written, with
 small chunks there's a low chance of actual dictionary reuse for other
 files.

I'm willing to bet that there is a good chance of reuse for files with the
same extension as an example.  And it is highly likely for Maildir files.

 Also, thinking about the implementaion, it would become too complex to 
 do in kernel for this particular usecase.

A thesis could study if it is worth doing first.  If it found that was a
good idea, then figuring out how to implement it is a second step.

 To Zip or Not to Zip: Effective Resource Usage for Real-Time 
 Compression

Except for systems that are 100% busy all the time, there is no need to
get perfect real time compression.  IMHO it is fine coming back later and
doing a better job of it.  Again this assumes that there is a sufficiently
large difference between what real time does and what a later
recompression does.

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)

iEYEARECAAYFAlLhjeUACgkQmOOfHg372QTBKwCgxbWmfwr0MMfAo9bVwThmGTOq
F1EAoIsgVlzfeqPZS9zpKM1mJ3Cdw9LL
=IINU
-END PGP SIGNATURE-

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Working on Btrfs as topic for master thesis

2014-01-22 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 22/01/14 04:12, David Sterba wrote:
 I have done some work here, so far it's stalled due to more important 
 work.
 
 https://btrfs.wiki.kernel.org/index.php/Project_ideas#Compression_enhancements

  Do you have other suggestions beyond what's proposed there?

There was the theoretical side - ie coming up with a way of defining
perfection which then allows measuring against.  For example you have
going up to a 128K block size but without knowing the theoretical best we
don't know if that is a stopgap or very good.

That also feeds into things like if it would be a good idea to go back
afterwards (perhaps as part of defrag) and spend more effort on
(re)compression.

Another consideration is perhaps having the compression dictionary kept
separate from the compressed blocks thereby allowing it to be used across
blocks and potentially files.  Compressors like smaz (very good on short
pieces of text) work by having a precomputed dictionary - perhaps those
can be used too.

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)

iEYEARECAAYFAlLgMEQACgkQmOOfHg372QRGDACeI604tw4OZsITHZEY60O6aiQX
GD4AoIj9s2rbVWiRp2W4FR6rkAf+iSsH
=cD4/
-END PGP SIGNATURE-

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-convert destroyed my system

2014-01-19 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 18/01/14 17:13, Marc MERLIN wrote:
 For what it's worth I also tried a btrfs convert on ubuntu precise
 with their stock kernel and old btrfs-tools and it mostly destroyed
 the filesystem too,

Just in case some folks think btrfs-convert never works, I had no problems
at all on two filesystems - a 128GB OS install  home directory on SSD and
a 2TB media disk (mainly ~7GB ISO files) on HDD.

The system was also Ubuntu 12.04 (Precise) running kernel 3.2.

Roger

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)

iEYEARECAAYFAlLcnAYACgkQmOOfHg372QRMnQCePTWBDwI8iEdw67KpIbqAuHCw
xp4Anjcp1lbHdcP4ZR8ThfdOPH1EkiXS
=msAu
-END PGP SIGNATURE-

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Working on Btrfs as topic for master thesis

2014-01-19 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 16/01/14 11:23, Toggenburger Lukas wrote:
 One of my ideas was to work on Btrfs.

One thing I would like see it automatic backup copies of data.  For
example if you are only using 10% of the total space then make an
additional 9 copies of the data in the free space (low priority in the
background).  As more space is used reclaim that dup/free space.  The
principle especially applies when you have more than one device.

The goal is to ensure that all the space is used, and that there is the
maximum probability of data recovery.

If you are more interested in the theoretical side then looking into
compression would be interesting.  ie how close to the theoretical best
compression are we.  Various filesystems like btrfs and NTFS make all
sorts of compromises in algorithm choices but also especially in the size
of blocks they compress.  How much better could be done?

Roger

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)

iEYEARECAAYFAlLct8cACgkQmOOfHg372QTiNACgs1WNJF1tCM5boWP8L3obD0jU
4qsAnAjANq6TE6Gh5ATNbEGTmoA6/W+s
=YnrI
-END PGP SIGNATURE-

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfs-transaction blocked for more than 120 seconds

2014-01-04 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 03/01/14 09:25, Marc MERLIN wrote:
 Is there even a reason for this not to become a default mount option
 in newer kernels?

autodefrag can go insane because it is unbounded.  For example I have a
4GB RAM system (3.12, no gui) that kept hanging.  I eventually managed to
work out the cause being a MySQL database (about 750MB of data only being
used by tt-rss refreshing RSS feeds every 4 hours).

autodefrag would eventually consume all the RAM and 20GB of swap kicking
off the OOM killer and with so little RAM left for anything else that the
only recourse was sysrq keys.

What I'd love to see is some sort of background worker that does sensible
things.  For example it could defragment files, but pick the ones that
need it the most, and I'd love to see extra copies of (meta)data in
currently unused space that is freed as needed.  deduping is another
worthwhile option.  So is recompressing data that hasn't changed recently
but using larger block sizes to get more effective ratios.  Some of these
happen at the moment but they are independent and you have to be aware of
the caveats.

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.14 (GNU/Linux)

iEYEARECAAYFAlLIc6wACgkQmOOfHg372QQgjgCeJp1sZQ0+Y7WRGE+U+IFljiDY
MgQAnjEBspyJZvTC2caEn1Qkn942vPQ2
=rhNY
-END PGP SIGNATURE-

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Crash in submit_extent_page.isra (3.12.6)

2014-01-02 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

This occurs sporadically and the machine is somewhat useless as a result
since most filesystem operations then hang.  For example even reboot 
fails because it does filesystem operations.

kernel is regular kernel.org 3.12.6 compiled with Ubuntu's kernel config.
OS is Ubuntu 13.10 AMD64

I have 3 separate btrfs filesystems each with a few subvolumes, and with
subvolume snapshots (hourly, daily, weekly etc).  No RAID, crypto etc.

Kernel BUG at a01268e8 [verbose debug info unavailable]
[86961.217522] invalid opcode:  [#1] SMP 
[86961.217536] Modules linked in: pci_stub vboxpci(OF) vboxnetadp(OF) 
vboxnetflt(OF) vboxdrv(OF) 8021q garp mrp parport_pc ppdev bnep rfcomm 
bluetooth binfmt_misc snd_hda_codec_hdmi snd_hda_codec_realtek 
x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul 
crc32_pclmul ghash_clmulni_intel bridge aesni_intel stp aes_x86_64 llc lrw 
gf128mul glue_helper ablk_helper cryptd eeepc_wmi asus_wmi mxm_wmi 
sparse_keymap joydev snd_hda_intel snd_hda_codec arc4 snd_usb_audio 
snd_seq_midi snd_usbmidi_lib snd_hwdep snd_seq_midi_event uvcvideo snd_rawmidi 
videobuf2_vmalloc ath9k videobuf2_memops snd_pcm videobuf2_core ath9k_common 
videodev ath9k_hw snd_page_alloc ath mac80211 microcode psmouse serio_raw 
cfg80211 i915 snd_seq lpc_ich snd_seq_device drm_kms_helper snd_timer drm snd 
wmi video i2c_algo_bit mac_hid mei_me soundcore mei lp parport hid_generic 
usbhid hid btrfs libcrc32c raid10 raid456 async_raid6_recov async_memcpy 
async_pq async_xor async_tx xor raid6_pq e1000e ra
 i
d1 ahci raid0 ptp libahci pps_core multipath linear
[86961.217905] CPU: 2 PID: 6825 Comm: btrfs-endio-wri Tainted: GF   W  O 
3.12.6-rogerb #1
[86961.217926] Hardware name: System manufacturer System Product Name/P8Z77-V 
PRO, BIOS 2003 05/10/2013
[86961.217954] task: 8800a2374710 ti: 88003fbb8000 task.ti: 
88003fbb8000
[86961.217985] RIP: 0010:[a01268e8]  [a01268e8] 
btrfs_merge_bio_hook+0x78/0x80 [btrfs]
[86961.218031] RSP: :88003fbb9638  EFLAGS: 00010282
[86961.218044] RAX: ffea RBX: 1000 RCX: 0006
[86961.218070] RDX: 0007 RSI: 21382136 RDI: 0246
[86961.218108] RBP: 88003fbb9650 R08: 0092 R09: 078f
[86961.218126] R10:  R11: 88003fbb91e6 R12: 1000
[86961.218156] R13: 8801044e0ba8 R14: 8807f3580040 R15: 1988
[86961.218174] FS:  () GS:88081fa8() 
knlGS:
[86961.218193] CS:  0010 DS:  ES:  CR0: 80050033
[86961.218208] CR2: 7f5b389be000 CR3: 01a3 CR4: 001407e0
[86961.218240] Stack:
[86961.218250]  1000 1000 88003fbb9870 
88003fbb96a0
[86961.218283]  a0141c67 0020  
ea0019b1d7c0
[86961.218306]  1988   
1000
[86961.218333] Call Trace:
[86961.218356]  [a0141c67] submit_extent_page.isra.35+0xb7/0x1e0 
[btrfs]
[86961.218386]  [a0141fd5] __do_readpage+0x245/0x760 [btrfs]
[86961.218408]  [a01436f0] ? repair_eb_io_failure+0xb0/0xb0 [btrfs]
[86961.218432]  [a011c240] ? free_root_pointers+0x190/0x190 [btrfs]
[86961.218457]  [a013d987] ? btrfs_lookup_ordered_extent+0x27/0x1e0 
[btrfs]
[86961.218482]  [a01425b5] __extent_read_full_page+0xc5/0xe0 [btrfs]
[86961.218508]  [a011c240] ? free_root_pointers+0x190/0x190 [btrfs]
[86961.218544]  [a011c240] ? free_root_pointers+0x190/0x190 [btrfs]
[86961.218580]  [a0145ceb] read_extent_buffer_pages+0x21b/0x300 
[btrfs]
[86961.218603]  [a011c240] ? free_root_pointers+0x190/0x190 [btrfs]
[86961.218625]  [a011d373] 
btree_read_extent_buffer_pages.constprop.55+0xb3/0x120 [btrfs]
[86961.218651]  [a011fc16] read_tree_block+0x46/0x80 [btrfs]
[86961.218671]  [a0101038] read_block_for_search.isra.34+0x148/0x380 
[btrfs]
[86961.218702]  [81061ccf] ? warn_slowpath_common+0x8f/0xa0
[86961.218724]  [a01039f7] btrfs_search_old_slot+0x2c7/0x900 [btrfs]
[86961.218747]  [a017d04c] __resolve_indirect_refs+0x11c/0x5c0 [btrfs]
[86961.218771]  [a017de2a] find_parent_nodes+0x5ca/0xe70 [btrfs]
[86961.218793]  [a01287f0] ? btrfs_submit_bio_hook+0x1e0/0x1e0 [btrfs]
[86961.218816]  [a017f03f] iterate_extent_inodes+0xdf/0x250 [btrfs]
[86961.218838]  [a01287f0] ? btrfs_submit_bio_hook+0x1e0/0x1e0 [btrfs]
[86961.218861]  [a017f237] iterate_inodes_from_logical+0x87/0xa0 
[btrfs]
[86961.218883]  [a0126dab] record_extent_backrefs+0x7b/0xf0 [btrfs]
[86961.218906]  [a013116f] btrfs_finish_ordered_io+0x1ef/0xaf0 [btrfs]
[86961.218929]  [a0131cf5] finish_ordered_fn+0x15/0x20 [btrfs]
[86961.218950]  [a0152cca] worker_loop+0x15a/0x5c0 [btrfs]

Re: Samba strict allocate = yes stops btrfs compression working

2013-08-23 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 22/08/13 07:07, Josef Bacik wrote:
 Not sure what strict allocate = yes does,

I've worked on SMB servers before and can answer that.  Historically the
way Windows apps (right back into the 16 bit days) have made sure there is
space for a file about to be written is to ask the OS to allocate all the
space for it.  (Unix by default leaves holes making a sparse file.)

For example if a 10MB file is going to be written then an allocation will
be done of 10MB.  (The exact underlying protocol commands vary, but
originally were similar to the Unix seek to end and write.)  After that
seeks and writes are done.  Because the allocation succeeded the app knows
that it won't get an out of space error.

Separately from that, it turns out that some filesystems do benefit from
preallocating the file to the expected size, and then writing the contents
in dribs and drabs into the allocated space.

Consequently Samba gives you the option of really allocating all the file,
either for Windows semantics compatibility, or because it results in
improved performance on the Unix filesystem.

However I can't see it being of any benefit on a COW filesystem like btrfs.

Roger



-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlIXFtsACgkQmOOfHg372QR7cwCggRyQxtxj9E7dNKV94M/Tv5o6
LC0AoN9icJNVxzkV0kDQSgf3Vt0N3g3V
=wBHz
-END PGP SIGNATURE-

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Questions about multi-device behavior

2013-07-18 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 18/07/13 13:05, Chris Murphy wrote:
 Sounds like if I have a degraded 'single' volume, I can simply cp or
 rsync everything from that volume to another, and I'll end up with a
 successful copy of the surviving data. True?

Not quite.  I did it with cp -a.  Because all the metadata survived, cp
would create the target file, but then get an i/o error on opening/reading
the source file.  It would print an error message, but not delete the
empty target file. Consequently I ended up with loads of zero length files
I had to go in and delete afterwards.

I briefly looked for an rsync option to keep going on source i/o errors
but didn't find one.

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlHoZV4ACgkQmOOfHg372QRPFwCgob01TavS2qffBkxkuv0g9bl3
pC8An25Mgx+cRXb0Kds+GRnzaj2P0Acy
=UA5J
-END PGP SIGNATURE-

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Questions about multi-device behavior

2013-07-17 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 17/07/13 14:24, Florian Lindner wrote:
 metadata ist mirrored on each device, data chunks are scattered more or
 less randomly on one disk.
 
 a) If one disk fails, is there any chance of data recovery? b) If not,
 is there any advantage over a raid0 configuration.

I was using that exact configuration when one disk failed (2 x 2TB Seagate
drives).  The data was backed up in multiple ways, a lot of it was in
source control systems and the remainder was generated information.
Essentially the risk was worth taking since nothing would be lost.

One drive gave up mechanically - the controller still worked and it was
fun running SMART tests and having huge amounts of red text show up in
response.  The initial symptoms were that various programs crashed or
didn't launch with no diagnostics.  That is typical behaviour for Linux
apps when they get I/O errors on reads and writes.

Eventually I figured out the problem, and bought a new 4TB drive to
replace both originals and started recovery.  Out of ~750GB of original
data I could recover just over 2GB which represented files whose entire
contents were on the unfailed drive.

Having the metadata duplicated was however immensely helpful and I could
easily get a list of all directories and filenames, and used that to guide
what data I recovered/regenerated/reinstalled/checked out.

Meanwhile the performance improvement by having the data scattered across
both drives was noticeable.  I would often see it in iostat roughly evenly
balanced.

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlHnTDgACgkQmOOfHg372QSTJwCeI17B4QhstkM4nnO0qOMDB1ae
WfwAoOBu6lBwZ+GyFwnZVGXC5ki7Oge/
=i+YN
-END PGP SIGNATURE-

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: unclean shutdown and space cache rebuild

2013-06-30 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 30/06/13 10:53, Garry T. Williams wrote:
 ~/.cache/chromium/Default/Cache ~/.cache/chromium/Default/Media\ Cache

I've taken to making ~/.cache be tmpfs and all the apps have been fine
with that.  It also meant I didn't have to worry about my btrfs snapshots
being full of transient web junk.

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlHQu0wACgkQmOOfHg372QRTMACg1YQx1B6liiLnVpOZLxnoHC+W
5ewAn1Z40V/52dongHBpg6OUdprUVqwo
=601F
-END PGP SIGNATURE-

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC 0/5] BTRFS hot relocation support

2013-05-09 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 08/05/13 16:13, Zhi Yong Wu wrote:
 i want to know if btrfs hot relocation support is still meanful

It is to me.  The problem with bcache is that it is a cache.  ie if you
have a 256GB SSD and a 500GB HDD then you'll have total storage of 500GB.
 Hot relocation was described as having 750GB available in the same scenario.

I'd also expect btrfs level support to be more friendly such as being able
to mount -o degraded if one of the devices is missing.

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlGLSIkACgkQmOOfHg372QSHQACeLwbhnVsH7+/6ZSIaGAcMUyBe
gPwAoNAKBAFB65XZQbLyxyCJPHODR+9z
=6OXt
-END PGP SIGNATURE-

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: data DUP

2013-04-28 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 27/04/13 19:53, Alex Elsayed wrote:
 When using btrfs, run a recent kernel :P.

Every software developer says that of what they produce.  Newer is almost
always better in many different axes.

 Honestly, even leaving aside the lack of backporting, there are other 
 benefits to a recent kernel - things like cross-subvolume reflinks,
 btrfs device replace support being far more efficient than 
 add/balance/remove/balance, and a bunch more.

Those are all features, none of which I use or have had to use yet.

If it will make you feel better I did upgrade some systems today to the
most recent Ubuntu release which meant going from kernel 3.5 to 3.8.

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlF83dUACgkQmOOfHg372QQeDACgz0oBnrYeg6fO5tFxUy9qonE9
HYIAoJWjT8z2sJ356YAph1NAyLKhcEBz
=ySAg
-END PGP SIGNATURE-

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Btrfs performance problem; metadata size to blame?

2013-04-28 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 28/04/13 12:57, Harald Glatt wrote:
 If you want better answers ...

There is a lot of good information at the wiki and it does see regular
updates.  For example the performance mount options are on this page:

  https://btrfs.wiki.kernel.org/index.php/Mount_options

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)

iEYEARECAAYFAlF9g+wACgkQmOOfHg372QQu6QCffq/cB7GPutTwiAUE0CyTuIJx
Qj8AnjsqxVyPrK5FTDqaLk1d1lsYYB38
=6HN3
-END PGP SIGNATURE-

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: move leak debug code to functions

2013-04-21 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 20/04/13 23:32, Eric Sandeen wrote:
 +#define btrfs_leak_list_add(new, head)   do {} while (0); +#define
 btrfs_leak_list_del(entry)do {} while (0);

Shouldn't the trailing semi-colons be omitted?

Roger

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAlF0uAsACgkQmOOfHg372QSuKACfSSd2ufXyZHzuD8xLhGhcBYnp
6iMAn3NbhUgzIxbrVxHDAiWMChQOPJUx
=n9GF
-END PGP SIGNATURE-

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


data DUP

2013-04-20 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Is there any particular reason why I can't use DUP for data?

When I try to set it with balance there is a kernel message:

  btrfs: dup for data is not allowed

The glossary at https://btrfs.wiki.kernel.org/index.php/Glossary says:

  Regular data cannot be assigned DUP level.

It is somewhat baffling that code and documentation exists to prevent this!

My current use case is an older hard drive I am putting backups on.  Since
my data fits in less than half of the drive, and drives get bad sectors
(this has had several it has reallocated) using DUP would be useful.

(I realise making two partitions and RAID-1 with them would work, which
makes the DUP restriction even sillier.)

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAlFy978ACgkQmOOfHg372QTCXwCdGuWbRFY8cZoWNFZb1k3mFeZM
nrkAoK9c5KddsJ1R57pAR1Lk89fPSo9Q
=ENfT
-END PGP SIGNATURE-

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: data DUP

2013-04-20 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 20/04/13 13:48, Hugo Mills wrote: On Sat, Apr 20, 2013 at 01:17:06PM
- -0700, Roger Binns wrote:
 Is there any particular reason why I can't use DUP for data?
 
 Technically, no. Performance is likely to suck if you use rotational
 disks, and you may find some SSDs deduplicate blocks, making it fairly
 pointless for those devices.

Wanting dup is because I care about resilience in the face of errors over
performance, so I don't actually care how bad performance sucks.  I only
put data on SSDs that I can afford to lose (usually backed up to
Dropbox/github/spinning disks).

 When I try to set it with balance there is a kernel message:
 
 btrfs: dup for data is not allowed
 
 What kernel and userspace versions are you using? I thought the 
 restriction had been removed at some point (but possibly I'm just 
 misremembering it).

Whatever Ubuntu 12.10 ships with.  Kernel package is 3.5.0.27.43 and
btrfs-tools is 0.19+20120328-7ubuntu1.  Note the message came from the
kernel so it would appear to be solely to blame for refusing my request.

 it can be quite hard to find every single implication of a feature when
 that feature gets changd/updated.

I'm more amused that someone went to the trouble of putting in kernel
detection and messages plus updating the documentation in order to prevent
using DUP for data!

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAlFzBWEACgkQmOOfHg372QT5twCeI5KOgHYT1wcPNPtni9TYKQaG
aeIAmQGw/nL/ziTDAu4gUJIFPwCLI5td
=ABih
-END PGP SIGNATURE-

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: data DUP

2013-04-20 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 20/04/13 14:23, Hugo Mills wrote:
 You should upgrade anyway -- there's been a number of serious bugs in
 btrfs fixed since then.

13.04 is imminent so I'll pick up a newer kernel as part of that anyway.
(Also Tanglu which I hope to move to intends to use the Ubuntu kernel anyway.)

In any event I am not worried.  Bug fixes get backported. The probability
of hitting any serious bug is low (others would likely be victims first)
and worst case I have backups (snapshots, other machines, Google/Dropbox,
DVDs  hard drives at other people's houses etc).

There are two major reasons I switched from ext4.  The first is that
everything is online, including adding and removing devices, checking data
integrity etc.

The second is that data not silently lost.  I had some bad sectors develop
on ext4 spinning disk and the only way to properly recover was to do
offline checks that would have taken ~ 24 hours!  Finding out which
filenames were involved was far too much effort.

My philosophy is that we have machines with billions of processor cycles
per second.  They can figure things out for themselves without requiring
me to baby sit them!

I also run btrfs in a variety of configurations - raid0, raid1, sata, usb,
hdd, ssd, single device, multi-device, bare, dmcrypt, machines on all the
time, laptop, frequent suspend/resume, frequent power on/off.  I've never
experienced any problems with btrfs and use scrub for reassurance.

Roger

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAlFzHkwACgkQmOOfHg372QSBBwCfYYJQm9l9ts9eWOvZIRUlTCWJ
KsMAoLVsd1fRdV+T7KO7nVVGuFCGYN5a
=Y4by
-END PGP SIGNATURE-

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] [RFC] RAID-level terminology change

2013-03-26 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 26/03/13 21:27, Brendan Hide wrote:

 On 11/03/13 02:21, Roger Binns wrote:
 Why does all data have to be rewritten?  Why does every piece of data
 have to have exactly the same storage parameters in terms of 
 non-redundancy/performance/striping options?

 This is a good point. You don't necessarily have to rewrite everything
 all at once so the performance penalty is not necessarily that bad.
 More importantly, some restripe operations actually don't need much
 change on-disk (in theory).

Note that is not what I was describing.  What I meant was that if I put
10GB of data onto 100GB of space that btrfs is free to go above and beyond
the minimums, and to do so differently for different pieces of data.  For
example  btrfs could make 6 copies of files beginning with 'a', 10 of
files beginning with 'c' and 274 of all others.  Obviously that is a bad
heuristic, but anything it deems useful for all that unused space is fine
by me, and there is absolutely no need for every block to have exactly the
same parameters all others.

Roger



-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAlFSgqcACgkQmOOfHg372QRIWwCgs/1ou96E5S0d93XEcAnIDvTd
f08AoNn6F4zjfQSzXAnkZk4RS4KWZq0b
=+kL4
-END PGP SIGNATURE-

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: document mount options in Documentation/fs/btrfs.txt

2013-03-23 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 23/03/13 10:48, Eric Sandeen wrote:
 Btrfs is a new copy on write filesystem for Linux aimed at

How much longer does new get to be there as the filesystem has been
going for well over half a decade.

 +  autodefrag +   Detect small random writes into files and queue them up
 for the + defrag process.  Works best for small files; Not well suited
 for + large database workloads.

What is large?  One man's large database is another's trivial database!
 Same applies to small.  Virtual machines are also in the category of
large files with small random writes.

Quantification would help a lot.  Suggestions are more than 10 random
writes an hour to files larger than a gigabyte.

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAlFOKx4ACgkQmOOfHg372QROFgCfRaaK6mlx3dn6E22Dy3c5PC2T
itUAnR0WxlZ8bFtVnlD+hdJrsnc6s9gU
=dkJp
-END PGP SIGNATURE-

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] btrfs: document mount options in Documentation/fs/btrfs.txt

2013-03-23 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 23/03/13 15:40, Eric Sandeen wrote:
 I imagine it depends on the details of the workload  storage as well.

If the people who write btrfs can't come up with some measures to deem
appropriateness, then how can the administrators who have even less
information :-)

I suspect file size has nothing to do with it, and it is entirely about
the volume of random writes.  (But as a correlator smaller files are
unlikely to get many random writes because they contain less useful
information than larger files.)

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAlFOUr0ACgkQmOOfHg372QSIsgCg3D1k0dL/2bMQpzHRDdlMkUo2
TT8AoI11eLAdAv6iQPweHaeVUiJNSRf6
=OVVq
-END PGP SIGNATURE-

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] [RFC] RAID-level terminology change

2013-03-10 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 10/03/13 15:04, Hugo Mills wrote:
 On Sat, Mar 09, 2013 at 09:41:50PM -0800, Roger Binns wrote:
 The only constraints that matter are surviving N device failures, and
 data not lost if at least N devices are still present.  Under the
 hood the best way of meeting those can be heuristically determined,
 and I'd expect things like overhead to dynamically adjust as storage
 fills up or empties.
 
 That's really not going to work happily -- you'd have to run the 
 restriper in the background automatically as the device fills up.

Which is the better approach - the administrator has to sit there
adjusting various parameters after having done some difficult calculations
redoing it as data and devices increase or decrease - or a computer with
billions of bytes of memory and billions of cpu cycles per second just
figures it out based on experience :-)

 Given that this is going to end up rewriting *all* of the data on the 
 FS,

Why does all data have to be rewritten?  Why does every piece of data have
to have exactly the same storage parameters in terms of
non-redundancy/performance/striping options?

I can easily imagine the final implementation being informed by hot data
tracking.  There is absolutely no need for data that is rarely read to be
using the maximum striping/performance/overhead options.

There is no need to rewrite everything anyway - if a filesystem with 1GB
of data is heading towards 2GB of data then only enough readjusts need to
be made to release that additional 1GB of overhead.

I also assume that the probability of all devices being exactly the same
size and exactly the same performance characteristics is going to
decrease.  Many will expect that they can add an SSD to the soup, and over
time add/update devices.  ie the homogenous case that regular RAID
implicitly assumes will become increasingly rare.

 If you want maximum storage (with some given redundancy), regardless of
 performance, then you might as well start with the parity-based levels
 and just leave it at that.

In the short term it would certainly make sense to have an online
calculator or mkfs helper where you specify the device sizes and
redundancy requirements together with how much data you have, and it then
spits out the string of numbers and letters to use for mkfs/balance.

 Thinking about it, specifying a (redundancy, acceptable_wastage) pair
 is fairly pointless in controlling the performance levels,

I don't think there is merit in specifying acceptable message - the answer
is obvious in that any unused space is acceptable for use.  That also
means it changes over time as storage is used/freed.

 There's not much else a heuristic can do, without effectively exposing
 all the config options to the admin, in some obfuscated form.

There is lots heuristics can do.  At the simplest level btrfs can monitor
device performance characteristics and use that as a first pass.  One
database that I use has an interesting approach for queries - rather than
trying to work out the single best perfect execution strategy (eg which
indices in which order) it actually tries them all out concurrently and
picks the quickest.  That is then used for future similar queries with the
performance being monitored.  Once responses times no longer match the
strategy it tries them all again to pick a new winner.

There is no reason btrfs can't try a similar approach.  When presented
with a pile of heterogenous storage with different sizes and performance
characteristics, use all reasonable approaches and monitor resulting
read/write performance.  Then start biasing towards what works best.  Use
hot data tracking to determine which data would most benefit from its
approach being changed to more optimal values.

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAlE9I4UACgkQmOOfHg372QTNZgCeJe7H9FDiwMq1CWWZTWE89/4O
fDsAn1s6/J1am4mxHhOYUnz/3JUZ6VJx
=/XF8
-END PGP SIGNATURE-

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] [RFC] RAID-level terminology change

2013-03-09 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 09/03/13 12:31, Hugo Mills wrote:
 Some time ago, and occasionally since, we've discussed altering the 
 RAID-n terminology to change it to an nCmSpP format, where n is
 the number of copies, m is the number of (data) devices in a stripe per
 copy, and p is the number of parity devices in a stripe.

I despise both terminologies because they mix up administrator goals with
how those goals are provided by the filesystem.

Using RAID0 as an example, what is actually desired is maximum performance
and there is no need to survive the failure of even a single disk. I don't
actually care if it uses striping, parity, hot data tracking, moving
things to faster outside edges of spinning disks, hieroglyphics, rot13
encoding, all of the above or anything else.

Maximum performance is always desired and RAID settings really track to
data must survive the failure of N disks and/or data must be accessible
if at least N disks are present.  As an administrator that is what I
would like to set and let the filesystem do whatever is necessary to meet
those goals  (I'd love to be able to set this on a per directory/file
basis too.)

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAlE7ttUACgkQmOOfHg372QT9LwCgg8lxpxC/w8E5dTsQ3Qx4ujWh
esQAnR2pKmwrJndsvynDia88KsrzJ9m9
=vv3v
-END PGP SIGNATURE-

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/5] [RFC] RAID-level terminology change

2013-03-09 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 09/03/13 17:44, Hugo Mills wrote:
 You've got at least three independent parameters to the system in order
 to make that choice, though, and it's a fairly fuzzy decision problem.
 You've got:
 
 - Device redundancy - Storage overhead - Performance

Overhead and performance aren't separate goals.  More accurately the goal
is best performance given the devices available and constrained by redundancy.

If I have 1GB of unique data and 10GB of underlying space available then
feel free to make 9 additional copies of each piece of data if that helps
performance.  As I increase the unique data the overhead available will
decrease, but I doubt anyone has a goal of micromanaging overhead usage.
Why can't the filesystem just figure it out and do the best job available
given minimal constraints?

 I definitely want to report the results in nCmSpP form, which tells you
 what it's actually done. The internal implementation, while not 
 expressing the full gamut of possibilities, maps directly from the 
 internal configuration to that form, and so it should at least be an 
 allowable input for configuration (e.g. mkfs.btrfs and the restriper).

Agreed on that for the micromanagers :-)

 If you'd like to suggest a usable set of configuration axes [say, 
 (redundancy, overhead) ], and a set of rules for converting those
 requirements to the internal representation, then there's no reason we
 can't add them as well in a later set of patches.

The only constraints that matter are surviving N device failures, and data
not lost if at least N devices are still present.  Under the hood the best
way of meeting those can be heuristically determined, and I'd expect
things like overhead to dynamically adjust as storage fills up or empties.

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAlE8HR0ACgkQmOOfHg372QSyngCgpE9PTyBl3MsJ1kCYODtQWno/
85cAn0dcqE8ZWhOpFbZnQISmpe/KYceN
=LTf8
-END PGP SIGNATURE-

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RAID 0 across SSD and HDD

2013-01-30 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

I've been unable to find anything definitive about what happens if I use
RAID0 to join an SSD and HDD together with respect to performance
(latency, throughput).  The future is obvious (hot data tracking, using
most appropriate device for the data, data migration).

In my specific case I have a 250GB SSD and a 500GB HDD, and about 250GB of
files (constantly growing).  One message I saw said that new blocks are
allocated on the device with the most free space which implies the SSD
would be virtually unused in my case, except for metadata which would only
be used half the time.

At the moment I have two independent filesystems (one per device) and
manually move data files between them using symlinks to keep pathnames the
same.  This requires keeping lots of slop free space on the SSD as well as
administration whenever it runs out of space.

My hope would be overall performance between that of the two devices, and
closer to that of the SSD.

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAlEI54kACgkQmOOfHg372QR1HwCfROJ10FAC51V0wuLSRwPq0LSL
2GwAmQF1F2k3cthGThEbf67Xn3usKS1K
=HFi8
-END PGP SIGNATURE-

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 0 across SSD and HDD

2013-01-30 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 30/01/13 02:02, Hugo Mills wrote:
 On Wed, Jan 30, 2013 at 01:27:37AM -0800, Roger Binns wrote:
 In my specific case I have a 250GB SSD and a 500GB HDD, and about
 250GB of files (constantly growing).  One message I saw said that new
 blocks are allocated on the device with the most free space which
 implies the SSD would be virtually unused in my case, except for
 metadata which would only be used half the time.
 
 That would be the case with single mode, not with RAID-0.

Ah, I hadn't realised there was a major difference.

 With RAID-0, you'd get data striped equally across all (in this case,
 both) the devices, up to the size of the second-largest one, at which
 point it'll stop allocating space.

By stop allocating space I assume you mean it will return out of space
errors, even though there is technically 250GB of unused space.  I presume
there is no way to say that RAID-0 should be used where possible and then
fallback to single for the remaining space.

It looks like my choices are:

* RAID 0 and getting 500GB of usable space, with performance 50% of the
accesses at HDD levels and 50% at SSD levels

* Single and getting 750GB of usable space with performance and usage
mostly on the HDD

 We don't have any kind of hot-data management yet, but it's on the list
 of things we'd like to have at some point.

I'm happy to wait till it is available.  btrfs has been beneficial to me
in so many other respects (eg checksums, compression, online everything,
not having to deal with LVM and friends).  I was just hoping that joining
an SSD and HDD would be somewhat worthwhile now even if it isn't close to
what hot data will deliver in the future.

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAlEI+qUACgkQmOOfHg372QT/pwCfd0UiGGlQpIjCBtCpysPZtGEs
wEQAoNVIzFIkPp/EzHTDDaD9RD178dkB
=VUqP
-END PGP SIGNATURE-

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 0 across SSD and HDD

2013-01-30 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 30/01/13 04:01, Sander wrote:
 Do you know about bcache and EnhanceIO ?

Yes, but there are two reasons I don't use them.  One is that the capacity
of your cache is not included in the filesystem - ie with a 250GB SSD and
500GB the filesystem capacity will be 500GB not 750GB.

The second is that I use btrfs for my root filesystem so I'd have to get
bcache/EnhanceIO integrated into the distributor's initramfs build
mechanism, as well as worry about livecd/network boots without it.  This
is a lot of unnecessary work and worry.

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAlEJfUgACgkQmOOfHg372QR42wCfUV9MK6luScTtu59g4p9BsTdf
6/8AoLlumP6NeEsSv/pmgd+857m/2LUF
=Eigx
-END PGP SIGNATURE-

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID 0 across SSD and HDD

2013-01-30 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 30/01/13 11:10, Filipe Brandenburger wrote:
 You could try something like -l=linear on md-raid or something 
 similar on LVM to build a 750GB volume

That would also require wiping the filesystems and starting again(*).  One
of the joys of btrfs has been not dealing with LVM.  On my workstation I
have two 2GB disks, but on one there is a sizeable Windows partition.
Getting LVM to stripe across the common sized space and then just use the
rest took quite a while to work out, requires running several different
commands and was something I had to write down.  There was nothing
intuitive.  It was a happy day when I could wipe and replace with btrfs.

Contrast with btrfs where 'btrfs --help' is almost always sufficient and
adding/removing/resizing is trivial (and online).

(*) I realise I could do things like add an external disk, btrfs add that
and then btrfs delete the internals, redo the internal storage, btrfs add
those back and then btrfs delete the external.  It would take a long time,
and is a reminder as to why I would prefer to be all btrfs everywhere
rather than also dealing with LVM and similar.

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAlEJgCEACgkQmOOfHg372QSwEwCdG5GDUC2Ab/eVZo36t3Zs691R
otAAn3p4Gq8lV2NgPp79799BflBwt/cW
=yl2B
-END PGP SIGNATURE-

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html