Re: SSD caching an existing btrfs raid1
On 20/09/17 22:45, Kai Krakow wrote: Am Wed, 20 Sep 2017 17:51:15 +0200 schrieb Psalle <psalleets...@gmail.com>: On 19/09/17 17:47, Austin S. Hemmelgarn wrote: (...) A better option if you can afford to remove a single device from that array temporarily is to use bcache. Bcache has one specific advantage in this case, multiple backend devices can share the same cache device. This means you don't have to carve out dedicated cache space for each disk on the SSD and leave some unused space so that you can add new devices if needed. The downside is that you can't convert each device in-place, but because you're using BTRFS, you can still convert the volume as a whole in-place. The procedure for doing so looks like this: 1. Format the SSD as a bcache cache. 2. Use `btrfs device delete` to remove a single hard drive from the array. 3. Set up the drive you just removed as a bcache backing device bound to the cache you created in step 1. 4. Add the new bcache device to the array. 5. Repeat from step 2 until the whole array is converted. A similar procedure can actually be used to do almost any underlying storage conversion (for example, switching to whole disk encryption, or adding LVM underneath BTRFS) provided all your data can fit on one less disk than you have. Thanks Austin, that's just great. For some reason I had discarded bcache thinking that it would force me to rebuild from scratch, but this kind of incremental migration is exactly why I hoped was possible. I have plenty of space to replace the devices one by one. I will report back my experience in a few days, I hope. I've done it exactly that way in the past and it worked flawlessly (but it took 24+ hours). But it was easy for me because I was also adding a third disk to the pool, so existing stuff could easily move. I suggest to initialize bcache to writearound mode while converting, so your maybe terabytes of disk don't go through the SSD. If you later decide to remove bcache or not sure about future bcache usage, you can wrap any partition into a bcache container - just don't connect it to a cache and it will work like a normal partition. Those are good advices. I've finished now and it seems to have gone without a hitch. Thanks! -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SSD caching an existing btrfs raid1
On 19/09/17 17:47, Austin S. Hemmelgarn wrote: (...) A better option if you can afford to remove a single device from that array temporarily is to use bcache. Bcache has one specific advantage in this case, multiple backend devices can share the same cache device. This means you don't have to carve out dedicated cache space for each disk on the SSD and leave some unused space so that you can add new devices if needed. The downside is that you can't convert each device in-place, but because you're using BTRFS, you can still convert the volume as a whole in-place. The procedure for doing so looks like this: 1. Format the SSD as a bcache cache. 2. Use `btrfs device delete` to remove a single hard drive from the array. 3. Set up the drive you just removed as a bcache backing device bound to the cache you created in step 1. 4. Add the new bcache device to the array. 5. Repeat from step 2 until the whole array is converted. A similar procedure can actually be used to do almost any underlying storage conversion (for example, switching to whole disk encryption, or adding LVM underneath BTRFS) provided all your data can fit on one less disk than you have. Thanks Austin, that's just great. For some reason I had discarded bcache thinking that it would force me to rebuild from scratch, but this kind of incremental migration is exactly why I hoped was possible. I have plenty of space to replace the devices one by one. I will report back my experience in a few days, I hope. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Deadlock while removing device, kernel 4.4.1
This is a test system so I'm reporting in case this is unknown but no data at risk. This filesystem was created with a device (well, actually partition) /dev/sdb3, then /dev/sdc{2,3,4} were added, and finally I attempted to remove /dev/sdb3. No profiles were passed at any point. Briefly after starting the remove, which seemed to proceed fine according to fi show, I started a rsync involving around 8GB from another fs into the one being reshaped. Not sure if this could have been related; rsync never transferred anything. Source was a degraded raid5 with six devices, one of them missing. Soon everything requiring disk access froze. This was with latest ubuntu stable upstream, i.e. 4.4.1-040401-generic I rebooted without problems to mount the filesystems. As I write, I'm doing the same process with latest 15.10 kernel 4.2.0-27-generic, for the moment things going smoothly. Login as root I captured the dmesg. Here is the final bit: [ 600.114436] INFO: task D-Bus thread:7692 blocked for more than 120 seconds. [ 600.114438] Tainted: P OE 4.4.1-040401-generic #201601311534 [ 600.114440] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 600.114442] D-Bus threadD 88007e4bfde8 0 7692 2842 0x [ 600.114446] 88007e4bfde8 81e11500 8800b0b65940 [ 600.114450] 88007e4c 8800bf509e68 8800bf509e80 88007e4bff58 [ 600.114454] 8800b0b65940 88007e4bfe00 817f9b15 8800b0b65940 [ 600.114458] Call Trace: [ 600.114461] [] schedule+0x35/0x80 [ 600.114464] [] rwsem_down_read_failed+0xe0/0x140 [ 600.114467] [] ? schedule_hrtimeout_range_clock+0x19/0x40 [ 600.114471] [] call_rwsem_down_read_failed+0x14/0x30 [ 600.114474] [] ? down_read+0x20/0x30 [ 600.114477] [] __do_page_fault+0x375/0x400 [ 600.114480] [] do_page_fault+0x22/0x30 [ 600.114483] [] page_fault+0x28/0x30 [ 600.114487] INFO: task BrowserBlocking:7697 blocked for more than 120 seconds. [ 600.114489] Tainted: P OE 4.4.1-040401-generic #201601311534 [ 600.114491] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 600.114493] BrowserBlocking D 88003565fbe0 0 7697 2842 0x [ 600.114497] 88003565fbe0 0058c6dff62b 88011abf8000 8800ae816600 [ 600.114501] 88003566 ff00 8800ae816600 8800ae816600 [ 600.114505] 8800a35c1c70 88003565fbf8 817f9b15 8800a35c1cd8 [ 600.114509] Call Trace: [ 600.114512] [] schedule+0x35/0x80 [ 600.114534] [] btrfs_tree_read_lock+0xe6/0x140 [btrfs] [ 600.114538] [] ? wake_atomic_t_function+0x60/0x60 [ 600.114554] [] btrfs_read_lock_root_node+0x34/0x50 [btrfs] [ 600.114569] [] btrfs_search_slot+0x73f/0x9f0 [btrfs] [ 600.114574] [] ? crypto_shash_update+0x30/0xe0 [ 600.114593] [] btrfs_check_dir_item_collision+0x77/0x120 [btrfs] [ 600.114614] [] btrfs_rename2+0x130/0x7b0 [btrfs] [ 600.114618] [] ? generic_permission+0x110/0x190 [ 600.114622] [] vfs_rename+0x54a/0x870 [ 600.114626] [] ? security_path_rename+0x20/0xd0 [ 600.114630] [] SyS_rename+0x38b/0x3d0 [ 600.114634] [] entry_SYSCALL_64_fastpath+0x16/0x75 There's more before this but it looks similar. Known issue? -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS RAM requirements, RAID 6 stability/write holes and expansion questions
On 05/02/16 20:36, Mackenzie Meyer wrote: RAID 6 stability? I'll say more: currently, btrfs is in a state of flux where if you don't have a very recent kernel that's the first recommendation you're going to receive in case of problems. This means going out of stable packages in most distros. Once you're in the bleeding kernel edge, you are obviously more likely to run into undiscovered bugs. I even see here people that has to patch the kernel with still non-mainline patches when trying to recover. So don't for anything but testing. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BTRFS RAM requirements, RAID 6 stability/write holes and expansion questions
On 05/02/16 20:36, Mackenzie Meyer wrote: Hello, I've tried checking around on google but can't find information regarding the RAM requirements of BTRFS and most of the topics on stability seem quite old. To keep my answer short: every time I've tried (offline) deduplication or raid5 pools I've ended with borked filesystems. Last attempt was about a year ago. Given that the pages you mention looked the same by then, I'd stay away of raid56 for anything but testing purposes. I haven't read anything about raid5 that increases my confidence in it recently (i.e. post 3.19 kernels). Dedup, OTOH, I don't know. What I used were third-party (I think?) things so the fault may have rested on them and not btrfs (does that makes sense?) I'm building a new small raid5 pool as we speak, though, for throw-away data, so I hope to be favourably impressed. Cheers. So first would be memory requirements, my goal is to use deduplication and compression. Approximately how many GB of RAM per TB of storage would be recommended? RAID 6 write holes? The BTRFS wiki states that parity might be inconsistent after a crash. That said, the wiki page for RAID 5/6 doesn't look like it has much recent information on there. Has this issue been addressed and if not, are there plans to address the RAID write hole issue? What would be a recommended workaround to resolve inconsistent parity, should an unexpected power down happen during write operations? RAID 6 stability? Any articles I've tried looking for online seem to be from early 2014, I can't find anything recent discussing the stability of RAID 5 or 6. Are there or have there recently been any data corruption bugs which impact RAID 6? Would you consider RAID 6 safe/stable enough for production use? Do you still strongly recommend backups, or has stability reached a point where backups aren't as critical? I'm thinking from a data consistency standpoint, not a hardware failure standpoint. I plan to start with a small array and add disks over time. That said, currently I have mostly 2TB disks and some 3TB disks. If I replace all 2TB disks with 3TB disks, would BTRFS then start utilizing the full 3TB capacity of each disk, or would I need to destroy and rebuild my array to benefit from the larger disks? Thanks! -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Purposely using btrfs RAID1 in degraded mode ?
Hello Alphazo, I am a mere btrfs user, but given the discussions I regularly see here about difficulties with degraded filesystems I wouldn't rely on this (yet?) as a regular work strategy, even if it's supposed to work. If you're familiar with git, perhaps git-annex could be an alternative. -Psalle. On 04/01/16 18:00, Alphazo wrote: Hello, My picture library today lies on an external hard drive that I sync on a regular basis with a couple of servers and other external drives. I'm interested by the on-the-fly checksum brought by btrfs and would like to get your opinion on the following unusual use case that I have tested: - Create a btrfs with the two drives with RAID1 - When at home I can work with the two drives connected so I can enjoy the self-healing feature if a bit goes mad so I only backup perfect copies to my backup servers. - When not at home I only bring one external drive and manually mount it in degraded mode so I can continue working on my pictures while still having checksum error detection (but not correction). - When coming back home I can plug-back the seconde drive and initiate a scrub or balance to get the second drive duplicated. I have tested the above use case with a couple of USB flash drive and even used btrfs over dm-crypt partitions and it seemed to work fine but I wanted to get some advices from the community if this is really a bad practice that should not be used on the long run. Is there any limitation/risk to read/write to/from a degraded filesystem knowing it will be re-synced later? Thanks alphazo PS: I have also investigated the RAID1 on a single drive with two partitions but I cannot afford the half capacity resulting from that approach. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
raid1 vs raid5
Hello all and excuse me if this is a silly question. I looked around in the wiki and list archives but couldn't find any in-depth discussion about this: I just realized that, since raid1 in btrfs is special (meaning only two copies in different devices), the effect in terms of resilience achieved with raid1 and raid5 are the same: you can lose one drive and not lose data. So!, presuming that raid5 were at the same level of maturity, what would be the pros/cons of each mode? As a corollary, I guess that if raid1 is considered a good compromise, then functional equivalents to raid6 and beyond could simply be implemented as "storing n copies in different devices", dropping any complex parity computations and making this mode entirely generic. Since this seems pretty obvious, I'd welcome your insights on what are the things I'm missing, since it doesn't exist (and it isn't planned to be this way, AFAIK). I can foresee consistency difficulties, but that seems hardly insurmountable if its being done for raid1? Thanks in advance, Psalle. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html