Re: SSD caching an existing btrfs raid1

2017-09-21 Thread Psalle

On 20/09/17 22:45, Kai Krakow wrote:

Am Wed, 20 Sep 2017 17:51:15 +0200
schrieb Psalle <psalleets...@gmail.com>:


On 19/09/17 17:47, Austin S. Hemmelgarn wrote:
(...)

A better option if you can afford to remove a single device from
that array temporarily is to use bcache.  Bcache has one specific
advantage in this case, multiple backend devices can share the same
cache device. This means you don't have to carve out dedicated
cache space for each disk on the SSD and leave some unused space so
that you can add new devices if needed.  The downside is that you
can't convert each device in-place, but because you're using BTRFS,
you can still convert the volume as a whole in-place.  The
procedure for doing so looks like this:

1. Format the SSD as a bcache cache.
2. Use `btrfs device delete` to remove a single hard drive from the
array.
3. Set up the drive you just removed as a bcache backing device
bound to the cache you created in step 1.
4. Add the new bcache device to the array.
5. Repeat from step 2 until the whole array is converted.

A similar procedure can actually be used to do almost any
underlying storage conversion (for example, switching to whole disk
encryption, or adding LVM underneath BTRFS) provided all your data
can fit on one less disk than you have.

Thanks Austin, that's just great. For some reason I had discarded
bcache thinking that it would force me to rebuild from scratch, but
this kind of incremental migration is exactly why I hoped was
possible. I have plenty of space to replace the devices one by one.

I will report back my experience in a few days, I hope.

I've done it exactly that way in the past and it worked flawlessly (but
it took 24+ hours). But it was easy for me because I was also adding a
third disk to the pool, so existing stuff could easily move.

I suggest to initialize bcache to writearound mode while converting, so
your maybe terabytes of disk don't go through the SSD.

If you later decide to remove bcache or not sure about future bcache
usage, you can wrap any partition into a bcache container - just don't
connect it to a cache and it will work like a normal partition.


Those are good advices. I've finished now and it seems to have gone 
without a hitch. Thanks!







--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SSD caching an existing btrfs raid1

2017-09-20 Thread Psalle



On 19/09/17 17:47, Austin S. Hemmelgarn wrote:
(...)


A better option if you can afford to remove a single device from that 
array temporarily is to use bcache.  Bcache has one specific advantage 
in this case, multiple backend devices can share the same cache 
device. This means you don't have to carve out dedicated cache space 
for each disk on the SSD and leave some unused space so that you can 
add new devices if needed.  The downside is that you can't convert 
each device in-place, but because you're using BTRFS, you can still 
convert the volume as a whole in-place.  The procedure for doing so 
looks like this:


1. Format the SSD as a bcache cache.
2. Use `btrfs device delete` to remove a single hard drive from the 
array.
3. Set up the drive you just removed as a bcache backing device bound 
to the cache you created in step 1.

4. Add the new bcache device to the array.
5. Repeat from step 2 until the whole array is converted.

A similar procedure can actually be used to do almost any underlying 
storage conversion (for example, switching to whole disk encryption, 
or adding LVM underneath BTRFS) provided all your data can fit on one 
less disk than you have.


Thanks Austin, that's just great. For some reason I had discarded bcache 
thinking that it would force me to rebuild from scratch, but this kind 
of incremental migration is exactly why I hoped was possible. I have 
plenty of space to replace the devices one by one.


I will report back my experience in a few days, I hope.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Deadlock while removing device, kernel 4.4.1

2016-02-16 Thread Psalle
This is a test system so I'm reporting in case this is unknown but no 
data at risk.


This filesystem was created with a device (well, actually partition) 
/dev/sdb3, then /dev/sdc{2,3,4} were added, and finally I attempted to 
remove /dev/sdb3. No profiles were passed at any point.


Briefly after starting the remove, which seemed to proceed fine 
according to fi show, I started a rsync involving around 8GB from 
another fs into the one being reshaped. Not sure if this could have been 
related; rsync never transferred anything. Source was a degraded raid5 
with six devices, one of them missing.


Soon everything requiring disk access froze. This was with latest ubuntu 
stable upstream, i.e. 4.4.1-040401-generic


I rebooted without problems to mount the filesystems. As I write, I'm 
doing the same process with latest 15.10 kernel 4.2.0-27-generic, for 
the moment things going smoothly.


Login as root I captured the dmesg. Here is the final bit:

[  600.114436] INFO: task D-Bus thread:7692 blocked for more than 120 
seconds.
[  600.114438]   Tainted: P   OE   4.4.1-040401-generic 
#201601311534
[  600.114440] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  600.114442] D-Bus threadD 88007e4bfde8 0  7692   2842 
0x
[  600.114446]  88007e4bfde8  81e11500 
8800b0b65940
[  600.114450]  88007e4c 8800bf509e68 8800bf509e80 
88007e4bff58
[  600.114454]  8800b0b65940 88007e4bfe00 817f9b15 
8800b0b65940

[  600.114458] Call Trace:
[  600.114461]  [] schedule+0x35/0x80
[  600.114464]  [] rwsem_down_read_failed+0xe0/0x140
[  600.114467]  [] ? 
schedule_hrtimeout_range_clock+0x19/0x40

[  600.114471]  [] call_rwsem_down_read_failed+0x14/0x30
[  600.114474]  [] ? down_read+0x20/0x30
[  600.114477]  [] __do_page_fault+0x375/0x400
[  600.114480]  [] do_page_fault+0x22/0x30
[  600.114483]  [] page_fault+0x28/0x30
[  600.114487] INFO: task BrowserBlocking:7697 blocked for more than 120 
seconds.
[  600.114489]   Tainted: P   OE   4.4.1-040401-generic 
#201601311534
[  600.114491] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[  600.114493] BrowserBlocking D 88003565fbe0 0  7697   2842 
0x
[  600.114497]  88003565fbe0 0058c6dff62b 88011abf8000 
8800ae816600
[  600.114501]  88003566 ff00 8800ae816600 
8800ae816600
[  600.114505]  8800a35c1c70 88003565fbf8 817f9b15 
8800a35c1cd8

[  600.114509] Call Trace:
[  600.114512]  [] schedule+0x35/0x80
[  600.114534]  [] btrfs_tree_read_lock+0xe6/0x140 [btrfs]
[  600.114538]  [] ? wake_atomic_t_function+0x60/0x60
[  600.114554]  [] btrfs_read_lock_root_node+0x34/0x50 
[btrfs]

[  600.114569]  [] btrfs_search_slot+0x73f/0x9f0 [btrfs]
[  600.114574]  [] ? crypto_shash_update+0x30/0xe0
[  600.114593]  [] 
btrfs_check_dir_item_collision+0x77/0x120 [btrfs]

[  600.114614]  [] btrfs_rename2+0x130/0x7b0 [btrfs]
[  600.114618]  [] ? generic_permission+0x110/0x190
[  600.114622]  [] vfs_rename+0x54a/0x870
[  600.114626]  [] ? security_path_rename+0x20/0xd0
[  600.114630]  [] SyS_rename+0x38b/0x3d0
[  600.114634]  [] entry_SYSCALL_64_fastpath+0x16/0x75

There's more before this but it looks similar.

Known issue?
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS RAM requirements, RAID 6 stability/write holes and expansion questions

2016-02-10 Thread Psalle

On 05/02/16 20:36, Mackenzie Meyer wrote:

RAID 6 stability?
I'll say more: currently, btrfs is in a state of flux where if you don't 
have a very recent kernel that's the first recommendation you're going 
to receive in case of problems. This means going out of stable packages 
in most distros.


Once you're in the bleeding kernel edge, you are obviously more likely 
to run into undiscovered bugs. I even see here people that has to patch 
the kernel with still non-mainline patches when trying to recover.


So don't for anything but testing.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS RAM requirements, RAID 6 stability/write holes and expansion questions

2016-02-09 Thread Psalle



On 05/02/16 20:36, Mackenzie Meyer wrote:

Hello,

I've tried checking around on google but can't find information
regarding the RAM requirements of BTRFS and most of the topics on
stability seem quite old.


To keep my answer short: every time I've tried (offline) deduplication 
or raid5 pools I've ended with borked filesystems. Last attempt was 
about a year ago. Given that the pages you mention looked the same by 
then, I'd stay away of raid56 for anything but testing purposes. I 
haven't read anything about raid5 that increases my confidence in it 
recently (i.e. post 3.19 kernels). Dedup, OTOH, I don't know. What I 
used were third-party (I think?) things so the fault may have rested on 
them and not btrfs (does that makes sense?)


I'm building a new small raid5 pool as we speak, though, for throw-away 
data, so I hope to be favourably impressed.


Cheers.


So first would be memory requirements, my goal is to use deduplication
and compression. Approximately how many GB of RAM per TB of storage
would be recommended?

RAID 6 write holes?
The BTRFS wiki states that parity might be inconsistent after a crash.
That said, the wiki page for RAID 5/6 doesn't look like it has much
recent information on there. Has this issue been addressed and if not,
are there plans to address the RAID write hole issue? What would be a
recommended workaround to resolve inconsistent parity, should an
unexpected power down happen during write operations?

RAID 6 stability?
Any articles I've tried looking for online seem to be from early 2014,
I can't find anything recent discussing the stability of RAID 5 or 6.
Are there or have there recently been any data corruption bugs which
impact RAID 6? Would you consider RAID 6 safe/stable enough for
production use?

Do you still strongly recommend backups, or has stability reached a
point where backups aren't as critical? I'm thinking from a data
consistency standpoint, not a hardware failure standpoint.

I plan to start with a small array and add disks over time. That said,
currently I have mostly 2TB disks and some 3TB disks. If I replace all
2TB disks with 3TB disks, would BTRFS then start utilizing the full
3TB capacity of each disk, or would I need to destroy and rebuild my
array to benefit from the larger disks?


Thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Purposely using btrfs RAID1 in degraded mode ?

2016-01-05 Thread Psalle

Hello Alphazo,

I am a mere btrfs user, but given the discussions I regularly see here 
about difficulties with degraded filesystems I wouldn't rely on this 
(yet?) as a regular work strategy, even if it's supposed to work.


If you're familiar with git, perhaps git-annex could be an alternative.

-Psalle.

On 04/01/16 18:00, Alphazo wrote:

Hello,

My picture library today lies on an external hard drive that I sync on
a regular basis with a couple of servers and other external drives.
I'm interested by the on-the-fly checksum brought by btrfs and would
like to get your opinion on the following unusual use case that I have
tested:
- Create a btrfs with the two drives with RAID1
- When at home I can work with the two drives connected so I can enjoy
the self-healing feature if a bit goes mad so I only backup perfect
copies to my backup servers.
- When not at home I only bring one external drive and manually mount
it in degraded mode so I can continue working on my pictures while
still having checksum error detection (but not correction).
- When coming back home I can plug-back the seconde drive and initiate
a scrub or balance to get the second drive duplicated.

I have tested the above use case with a couple of USB flash drive and
even used btrfs over dm-crypt partitions and it seemed to work fine
but I wanted to get some advices from the community if this is really
a bad practice that should not be used on the long run. Is there any
limitation/risk to read/write to/from a degraded filesystem knowing it
will be re-synced later?

Thanks
alphazo

PS: I have also investigated the RAID1 on a single drive with two
partitions but I cannot afford the half capacity resulting from that
approach.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


raid1 vs raid5

2016-01-05 Thread Psalle
Hello all and excuse me if this is a silly question. I looked around in 
the wiki and list archives but couldn't find any in-depth discussion 
about this:


I just realized that, since raid1 in btrfs is special (meaning only two 
copies in different devices), the effect in terms of resilience achieved 
with raid1 and raid5 are the same: you can lose one drive and not lose data.


So!, presuming that raid5 were at the same level of maturity, what would 
be the pros/cons of each mode?


As a corollary, I guess that if raid1 is considered a good compromise, 
then functional equivalents to raid6 and beyond could simply be 
implemented as "storing n copies in different devices", dropping any 
complex parity computations and making this mode entirely generic. Since 
this seems pretty obvious, I'd welcome your insights on what are the 
things I'm missing, since it doesn't exist (and it isn't planned to be 
this way, AFAIK). I can foresee consistency difficulties, but that seems 
hardly insurmountable if its being done for raid1?


Thanks in advance,
Psalle.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html