Re: cancel btrfs delete job
Hi Franziska, (2014/06/27 14:00), Franziska Näpelt wrote: Hi! After about 12 hours of booting, the system runs now Congratulations! The fifth harddrive is still in the btrfs-pool. Here is the log from the crash, while the btrfs delete job runs: Jun 25 20:34:59 hsad-srv-03 kernel: [614028.957248] [ cut here ] Jun 25 20:34:59 hsad-srv-03 kernel: [614028.957268] WARNING: CPU: 3 PID: 31131 at fs/btrfs/super.c:259 __btrfs_abort_transaction+0x46/0xf8 [ btrfs]() Jun 25 20:34:59 hsad-srv-03 kernel: [614028.957270] Modules linked in: xts gf128mul tun parport_pc ppdev lp parport bnep rfcomm bluetooth rf kill pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O) cpufreq_powersave cpufreq_userspace cpufreq_stats cpufreq_conservative vboxdrv(O) binfm t_misc fuse nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc ext2 dm_crypt hwmon_vid loop firewire_sbp2 snd_hda_codec_hdmi snd _hda_intel joydev radeon ttm drm_kms_helper iTCO_wdt iTCO_vendor_support snd_hda_controller drm i2c_algo_bit snd_hda_codec snd_hwdep snd_pcm i7core_edac snd_timer edac_core snd soundcore psmouse acpi_cpufreq coretemp processor kvm_intel kvm microcode lpc_ich mfd_core pcspkr asus_ atk0110 ehci_pci mxm_wmi i2c_i801 i2c_core wmi serio_raw thermal_sys evdev button ext4 crc16 jbd2 mbcache btrfs xor raid6_pq dm_mod raid1 md _mod sg sd_mod crct10dif_generic crc_t10dif crct10dif_common hid_generic usbhid hid crc32c_intel firewire_ohci r8169 firewire_core mii crc_i tu_t sata_sil ahci libahci sata_mv uhci_hcd ehci_hcd Jun 25 20:34:59 hsad-srv-03 kernel: libata xhci_hcd scsi_mod usbcore usb_common Jun 25 20:34:59 hsad-srv-03 kernel: [614028.957309] CPU: 3 PID: 31131 Comm: find Tainted: G O 3.15.0 #1 Jun 25 20:34:59 hsad-srv-03 kernel: [614028.957310] Hardware name: System manufacturer System Product Name/SABERTOOTH X58, BIOS 1304 08/02/2011 Jun 25 20:34:59 hsad-srv-03 kernel: [614028.957311] 0009 8138b54a 880001593b58 Jun 25 20:34:59 hsad-srv-03 kernel: [614028.957313] 81039583 a01c8123 00b0 Jun 25 20:34:59 hsad-srv-03 kernel: [614028.957315] ffe4 880625fb9000 8801077e8e80 a0247ac0 Jun 25 20:34:59 hsad-srv-03 kernel: [614028.957317] Call Trace: Jun 25 20:34:59 hsad-srv-03 kernel: [614028.957321] [8138b54a] ? dump_stack+0x41/0x51 Jun 25 20:34:59 hsad-srv-03 kernel: [614028.957324] [81039583] ? warn_slowpath_common+0x78/0x90 Jun 25 20:34:59 hsad-srv-03 kernel: [614028.957331] [a01c8123] ? __btrfs_abort_transaction+0x46/0xf8 [btrfs] Jun 25 20:34:59 hsad-srv-03 kernel: [614028.957333] [81039633] ? warn_slowpath_fmt+0x45/0x4a Jun 25 20:34:59 hsad-srv-03 kernel: [614028.957340] [a01c8123] ? __btrfs_abort_transaction+0x46/0xf8 [btrfs] Jun 25 20:34:59 hsad-srv-03 kernel: [614028.957348] [a01d648a] ? __btrfs_free_extent+0x80a/0x84d [btrfs] Jun 25 20:34:59 hsad-srv-03 kernel: [614028.957351] [8138db1c] ? mutex_trylock+0x10/0x29 Jun 25 20:34:59 hsad-srv-03 kernel: [614028.957359] [a01dabfe] ? __btrfs_run_delayed_refs+0xae4/0xc2b [btrfs] Jun 25 20:34:59 hsad-srv-03 kernel: [614028.957368] [a01dc86c] ? btrfs_run_delayed_refs+0x7b/0x17e [btrfs] Jun 25 20:34:59 hsad-srv-03 kernel: [614028.957378] [a01ea1d7] ? __btrfs_end_transaction+0xe5/0x2c0 [btrfs] Jun 25 20:34:59 hsad-srv-03 kernel: [614028.957389] [a01ee9bb] ? btrfs_dirty_inode+0x8c/0xa7 [btrfs] Jun 25 20:34:59 hsad-srv-03 kernel: [614028.957391] [8111f12d] ? touch_atime+0xe3/0x11c Jun 25 20:34:59 hsad-srv-03 kernel: [614028.957393] [81119843] ? iterate_dir+0x7c/0xa2 Jun 25 20:34:59 hsad-srv-03 kernel: [614028.957395] [81119949] ? SyS_getdents+0x74/0xca Jun 25 20:34:59 hsad-srv-03 kernel: [614028.957397] [811196ee] ? filldir64+0xdd/0xdd Jun 25 20:34:59 hsad-srv-03 kernel: [614028.957399] [81394522] ? system_call_fastpath+0x16/0x1b Jun 25 20:34:59 hsad-srv-03 kernel: [614028.957400] ---[ end trace 8392ac15dafb7de4 ]--- Jun 25 20:34:59 hsad-srv-03 kernel: [614028.957422] BTRFS info (device sdh): forced readonly Your delete job seemed to fail at __btrfs_abort_transaction(), resulted in readonly remount at that time. In addition, if you encounter this kind of situation, setting skip_balance mount option would help you. It skips to continue balance at mount time. Please see also the following thread. It's about the case which Marc tried to balance btrfs and hanup happened. http://comments.gmane.org/gmane.comp.file-systems.btrfs/35791 Thanks, Satoru After that event, there are furher entries in the messages log, but nothing interesting, only some dhcp infos. Three minutes later, the log stopped without any message. Does someone need further logs? I have no idea. Thanks, Satoru best regards, Franziska -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to
Re: cancel btrfs delete job
Hi Franziska, (2014/06/26 20:34), Franziska Näpelt wrote: Hi Satoru, I'm sorry, but the boot process is always runnig(i hope so), i can't login until now. So therefore i have currently no logs. I don't want to interrupt these process, because there are a lot of fileactions on the harddrive (LED is blinking). I'm not sure about the mkfs.btrfs option, because the system was set up more than one year ago. mount-options in fstab: LABEL=btrfs-pool /mnt/btrfs btrfs compress=lzo,degraded 0 1 kernel version is 3.15 on a Debian Whezzy with current btrfs-tools installed Can you estimate, how long the boot-process (repairing btrfs?) could take? The boot-process works for five hours now. To do so, I'll try to follow your steps with the system which similar to your environment as possible. Unfortunately I don't have plenty of disks. Although you've already succeeded to mount your btrfs now, I share how long does Franziska's operations take anyway. Please note that I measured not balance after reset during balance triggered by delete, but balance triggered by delete. Since most required time of both work are balance, the result of former would be similar to the latter. Environment: - x86_64 fedora20 KVM guest on x86_64 fedora20 host - RAM: 4GiB - kernel: 3.15.2 - Storage: 50GB virt-io disk - small devices: /dev/vd[d-g] - large devices: /dev/vd[hi] # All of these virtual devices are backed by # files on a real HDD in the host. Operations: 1. Make a Btrfs filesystem 2. Make a junk file in the filesystem. 3. Add a large device 4. Remove a small device and measure how long it takes. Script: === #!/bin/bash MOUNTPOINT=/home/sat/mnt MEGABYTES=4096 mkfs.btrfs -f /dev/vdd /dev/vde /dev/vdf /dev/vdg mount -o compress=lzo /dev/vdd /home/sat/mnt dd if=/dev/urandom of=/home/sat/mnt/junk oflag=direct bs=1MiB count=$MEGABYTES btrfs dev add -f /dev/vdh /home/sat/mnt time btrfs dev del /dev/vdg /home/sat/mnt umount /home/sat/mnt === Test factors: - Device size - small: 2GB, large: 3GB - small: 4GB, large: 6GB - The size of junk file # MEGABYTES parameter of script - 1/2 GB - 1 GB - 2 GB - 4 GB Result (*1): device size[GB]| junk file | | time/junk --++ size [GB] | time[s] | file size small | large | | |[s/GB] ==++===+=+=== 2 | 3 | 1/2 | 5.3 |10.6 || 1 | 9.6 | 9.6 || 2 |19.0 | 9.5 --++---+-+--- 4 | 6 | 1/2 | 5.1 |10.2 || 1 | 9.4 | 9.4 || 2 |17.0 | 8.5 || 4 |39.3 | 9.8 *1) This data is the average of three tries. So, it seems that how long delete (and balance) takes is proportional to the used size (the size of junk file here). In my case, delete work seems to take about 10 [s/GB]. If the storage size are 2TB for small devices and 3TB for large devices, and junk file size is 2TB, this operation would take 5.4 hours. Of course, it's a too simplified case and it wouldn't apply to your case cleanly. However, this kind of measurement would help to estimate the required time to your next balance operation. Thanks, Satoru best regards, Franziska Hi Franziska, (2014/06/26 19:05), Franziska Näpelt wrote: Hello Satoru, here are your requested informations: environment: - four 2 TB disks: /dev/sd[c-f] - two 3 TB disks: /dev/sdg (but until now, only one is connected) filesystem consists of/dev/sd[c-f] I wanted to replace /dev/sdc by /dev/sdg ( with commands add and after that delete. In the second step, I wanted to replace the next disk. But it hanged during btrfs delete command (after successfull add). The delete process was still in progress, but with iotop it seems to me, that there is was no data transfer. Hm, them something bad would happen on Btrfs. Today in the morning the hole computer hangs and there was no other possibility than reset :( So, unfortunately any debug info like sysrq-w can't be get. Until now, he tries to boot with a lot of erros. But I can see, that there are fileactions on the harddrive. There are a lot of following messages: btrfs free space inode generation (0) did not match free space cache generation And doesn't finish to mount process? Your filesystem is in inconsistent state since you reset during rebalancing filesystem which triggered by device deletion. The following link would help you. But I'm not sure whether your data can be restored or not. https://btrfs.wiki.kernel.org/index.php/Btrfsck Could you tell me your mkfs.btrfs options, mount options, and kernel version, if possible? I'd like to try to reproduce your problem anyway. Thanks, Satoru best
Re: Blocked tasks on 3.15.1
I've been getting blocked tasks on 3.15.1 generally at times when the filesystem is somewhat busy (such as doing a backup via scp/clonezilla writing to the disk). A week ago I had enabled snapper for a day which resulted in a daily cleanup of about 8 snapshots at once, which might have contributed, but I've been limping along since. I've started seeing similar on several servers, after upgrading to 3.15 or 3.15.1. With 3.16-rc1 it was even crashing for me. I've rolled back to the latest 3.14.x, and it's still behaving fine. I've signalled it before on the list in btrfs filesystem hang with 3.15-rc3 when doing rsync thread. -- Tomasz Chmielewski http://www.sslrack.com -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] btrfs-progs: Add uninstall targets to Makefiles.
On Wed, Jun 25, 2014 at 09:40:40PM +0200, Nils Steinger wrote: On Mon, Jun 23, 2014 at 05:04:42PM +0200, David Sterba wrote: On Mon, Jun 23, 2014 at 04:23:48AM +0200, Nils Steinger wrote: + rmdir -p --ignore-fail-on-non-empty $(DESTDIR)$(man8dir) + rmdir -p --ignore-fail-on-non-empty $(DESTDIR)$(libdir) + rmdir -p --ignore-fail-on-non-empty $(DESTDIR)$(bindir) I don't think it's right to remove the systemwide directories: bindir, libdir and man8dir. There rest are btrfs subdirs (eg. incdir), that's fine. On my system, man8dir didn't exist prior to the installation, so I thought it would be reasonable to have the uninstallation routine remove it. According to the FHS [1] the manX directories do not have to exsit, so this part shall stay. bindir and libdir will exist by default on most systems, so that's a different case… So, should we really keep the directories around, even if they were created by the installation and are now empty (if they aren't, they won't be removed anyway)? But we don't track if the directories were created by the installation or not. Normally the directories would exist anyway (/usr or /usr/local as prefix) and are expected to exist at the locations. Installation to arbitraty directory works, but managing the directories is IMO up to the user. So are you ok with keeping bindir and libdir only (ie. removing only man8dir)? Thanks. [1] http://www.pathname.com/fhs/pub/fhs-2.3.html#USRSHAREMANMANUALPAGES and then note #32 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/6] btrfs-progs: limit minimal num of args for btrfs-image
On Thu, Jun 26, 2014 at 10:53:05AM +0800, Gui Hecheng wrote: @@ -2521,6 +2521,9 @@ int main(int argc, char *argv[]) } argc = argc - optind; + if (argc 2) Please use the check_argc_min helper instead. Thanks. + print_usage(); + dev_cnt = argc - 1; if (create) { -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Blocked tasks on 3.15.1
Tomasz Chmielewski posted on Fri, 27 Jun 2014 12:02:43 +0200 as excerpted: I've been getting blocked tasks on 3.15.1 generally at times when the filesystem is somewhat busy (such as doing a backup via scp/clonezilla writing to the disk). I've started seeing similar on several servers, after upgrading to 3.15 or 3.15.1. With 3.16-rc1 it was even crashing for me. I've rolled back to the latest 3.14.x, and it's still behaving fine. I've signalled it before on the list in btrfs filesystem hang with 3.15-rc3 when doing rsync thread. There is a known btrfs lockup bug that was introduced in the commit- window btrfs pull for 3.16, that was fixed by a pull I believe the day before 3.16-rc2. So 3.16-pre to rc2 is known-bad tho it'll work for a few minutes and didn't do any permanent damage that I could see, here. But from 3.16-rc2 on, the 3.16-pre series has been working fine for me. For 3.15, I didn't run the pre-releases as I had another project I was focusing on, but I experienced no problems with 3.15 itself. However, my use-case is multiple independent small btrfs on partitioned SSD, sub-100- GB per btrfs, so I'd be less likely to experience the blocked task issues that others reported, mostly on TB+ size spinning rust. And it /did/ seem to me that the frequency of blocked-task reports were higher for 3.15 than for previous kernel series, tho 3.15 worked fine for me on small btrfs on SSD, the relatively short time I ran it. Hopefully that problem's fixed on 3.16-rc2+, but as of yet there's not enough 3.16-rc2+ reports out there from folks experiencing issues with 3.15 blocked tasks to rightfully say. What CAN be said is that the known 3.16-series commit-window btrfs lockups bug that DID affect me was fixed right before rc2, and I'm running rc2+ just fine, here. -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: add supported attr flags to btrfs(5)
On Thu, Jun 26, 2014 at 03:38:33PM -0500, Eric Sandeen wrote: +FILE ATTRIBUTES +--- +The btrfs filesystem supports setting the following file +attributes the `chattr`(1) utility +append only (a), no atime updates (A), compressed (c), no copy on write (C), +no dump (d), synchronous directory updates (d), immutable (i), +synchronous updates (S), and no compression (X). The formatting is not eye-pleasing. I've spotted a few mistakes: * 'd' is listed twice, for sync directory updates it's 'D' * and 'X' does not mean no compression and never has, although I'd like to see a chattr bit for that because we have the corresponding inode bit I've checked your patches, the meaning of 'X' hasn't changed. I took the opportunity and reformated the options: @@ -183,9 +183,24 @@ FILE ATTRIBUTES --- The btrfs filesystem supports setting the following file attributes the `chattr`(1) utility -append only (a), no atime updates (A), compressed (c), no copy on write (C), -no dump (d), synchronous directory updates (d), immutable (i), -synchronous updates (S), and no compression (X). + +*a* -- append only + +*A* -- no atime updates + +*c* -- compressed + +*C* -- no copy on write + +*d* -- no dump + +*D* -- synchronous directory updates + +*i* -- immutable + +*S* -- synchronous updates For descriptions of these attribute flags, please refer to the `chattr`(1) man page. --- looks almost the same in the manpage and gives IMO a good overview. For initial patch I'm ok with the descriptions, we can enhance it later with btrfs specifics. Are you ok with the proposed changes? (I don't want to bother with resending for simple changes.) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: add supported attr flags to btrfs(5)
On 6/27/14, 8:42 AM, David Sterba wrote: On Thu, Jun 26, 2014 at 03:38:33PM -0500, Eric Sandeen wrote: +FILE ATTRIBUTES +--- +The btrfs filesystem supports setting the following file +attributes the `chattr`(1) utility +append only (a), no atime updates (A), compressed (c), no copy on write (C), +no dump (d), synchronous directory updates (d), immutable (i), +synchronous updates (S), and no compression (X). The formatting is not eye-pleasing. I've spotted a few mistakes: * 'd' is listed twice, for sync directory updates it's 'D' Crud, sorry about that. * and 'X' does not mean no compression and never has, although I'd like to see a chattr bit for that because we have the corresponding inode bit Ok, then I'm not sure what it does mean. Supposedly these flags are supported; via check_flags(), called by setflags(), which I was basing these on: if (flags ~(FS_IMMUTABLE_FL | FS_APPEND_FL | \ FS_NOATIME_FL | FS_NODUMP_FL | \ FS_SYNC_FL | FS_DIRSYNC_FL | \ FS_NOCOMP_FL | FS_COMPR_FL | FS_NOCOW_FL)) and the kernel header says that's: #define FS_NOCOMP_FL0x0400 /* Don't compress */ chattr(1) says: compression raw access (X), and also The ’X’ attribute is used by the experimental compression patches to indicate that a raw contents of a compressed file can be accessed directly. It currently may not be set or reset using chattr(1), although it can be displayed by lsattr(1). Hum, ok, so we are starting to go off the rails here, aren't we ;) e2fsprogs has this flag translation: { EXT2_NOCOMPR_FL, X, Compression_Raw_Access }, for: #define EXT2_NOCOMPR_FL 0x0400 /* Access raw compressed data */ and btrfs_ioctl_setflags claims to handle it: if (flags FS_NOCOMP_FL) { ip-flags = ~BTRFS_INODE_COMPRESS; ip-flags |= BTRFS_INODE_NOCOMPRESS; ... so hopefully you can understand my confusion? ;) The comment says: * The COMPRESS flag can only be changed by users, while the NOCOMPRESS * flag may be changed automatically if compression code won't make * things smaller. (but doesn't says may *only* be...) But OTOH, chattr won't ever even *pass* X to the fs, will it. So I guess I'm lost. It looks like there's code to handle an incoming X but I don't think chattr will send it. Do we ever get an outbound X for an opportunistically not-compressed file? If so, maybe that still needs to be specified. Otherwise, yeah, the *format* changes look great, thanks. ;) -Eric I've checked your patches, the meaning of 'X' hasn't changed. I took the opportunity and reformated the options: @@ -183,9 +183,24 @@ FILE ATTRIBUTES --- The btrfs filesystem supports setting the following file attributes the `chattr`(1) utility -append only (a), no atime updates (A), compressed (c), no copy on write (C), -no dump (d), synchronous directory updates (d), immutable (i), -synchronous updates (S), and no compression (X). + +*a* -- append only + +*A* -- no atime updates + +*c* -- compressed + +*C* -- no copy on write + +*d* -- no dump + +*D* -- synchronous directory updates + +*i* -- immutable + +*S* -- synchronous updates For descriptions of these attribute flags, please refer to the `chattr`(1) man page. --- looks almost the same in the manpage and gives IMO a good overview. For initial patch I'm ok with the descriptions, we can enhance it later with btrfs specifics. Are you ok with the proposed changes? (I don't want to bother with resending for simple changes.) -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Blocked tasks on 3.15.1
On Fri, Jun 27, 2014 at 9:06 AM, Duncan 1i5t5.dun...@cox.net wrote: Hopefully that problem's fixed on 3.16-rc2+, but as of yet there's not enough 3.16-rc2+ reports out there from folks experiencing issues with 3.15 blocked tasks to rightfully say. Any chance that it was backported to 3.15.2? I'd rather not move to mainline just for btrfs. I got another block this morning and failed to capture a log before my terminals gave out. I switched back to 3.15.0 for the moment, and we'll see if that fares any better. Rich -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: add supported attr flags to btrfs(5)
On 6/27/14, 10:30 AM, David Sterba wrote: On Fri, Jun 27, 2014 at 09:56:10AM -0500, Eric Sandeen wrote: * and 'X' does not mean no compression and never has, although I'd like to see a chattr bit for that because we have the corresponding inode bit Ok, then I'm not sure what it does mean. Supposedly these flags are supported; via check_flags(), called by setflags(), which I was basing these on: if (flags ~(FS_IMMUTABLE_FL | FS_APPEND_FL | \ FS_NOATIME_FL | FS_NODUMP_FL | \ FS_SYNC_FL | FS_DIRSYNC_FL | \ FS_NOCOMP_FL | FS_COMPR_FL | FS_NOCOW_FL)) and the kernel header says that's: #define FS_NOCOMP_FL0x0400 /* Don't compress */ Passing this bit directly via ioctl works as expected, but to my knowledge there is no chattr letter allocated for it. it's in the manpage, but as a read-only attribute, i.e. lsattr only. chattr(1) says: compression raw access (X), and also The ’X’ attribute is used by the experimental compression patches to indicate that a raw contents of a compressed file can be accessed directly. It currently may not be set or reset using chattr(1), although it can be displayed by lsattr(1). Hum, ok, so we are starting to go off the rails here, aren't we ;) Yeah. And there's no support for accessing raw compressed data. e2fsprogs has this flag translation: { EXT2_NOCOMPR_FL, X, Compression_Raw_Access }, for: #define EXT2_NOCOMPR_FL 0x0400 /* Access raw compressed data */ and btrfs_ioctl_setflags claims to handle it: if (flags FS_NOCOMP_FL) { ip-flags = ~BTRFS_INODE_COMPRESS; ip-flags |= BTRFS_INODE_NOCOMPRESS; ... so hopefully you can understand my confusion? ;) Oh I do now, but it's ext2 fault :) Ok but btrfs setflags tries to handle FS_NOCOMP_FL - how is that ever set? The comment says: * The COMPRESS flag can only be changed by users, while the NOCOMPRESS * flag may be changed automatically if compression code won't make * things smaller. (but doesn't says may *only* be...) And thats IMO right (at least I expect it work like that), the user may set or drop the NOCOMPRESS flag. The comment says that it may appear without user interaction. But OTOH, chattr won't ever even *pass* X to the fs, will it. So I guess I'm lost. It looks like there's code to handle an incoming X but I don't think chattr will send it. Do we ever get an outbound X for an opportunistically not-compressed file? If so, maybe that still needs to be specified. AFAICS 'X' is not listed among the standard chattr options and chattr.c in e2fsprogs has no support for that. There is lib/e2p/pf.c: { EXT2_NOCOMPR_FL, X, Compression_Raw_Access }, but this is used only locally by print_flags. Right, it's a read-only flag for lsattr. I hope this answers your questions, 'X' has no meaning for btrfs now. The only remaining question is, why does the btrfs setflags interface try to parse it, if nothing sends it? Or if something does send it, what? And where is this all documented? ;) btrfs tries to handle a flag value which is identical to the 'X' flag value, which lsattr/chattr says is readonly... -Eric -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Question] Btrfs on iSCSI device
Hi, I setup 2 Linux servers to share the same device through iSCSI. Then I created a btrfs on the device. Then I saw the problem that the 2 Linux servers do not see a consistent file system image. Details: -- Server 1 running kernel 2.6.32, server 2 running 3.2.1 -- Both running btrfs v0.20-rc1 -- Server 2 has device /dev/vdc, exposed as iSCSI target -- Server 1 mounts the device as /dev/sda -- Server 1 'mount /dev/sda /mnt/btrfs'; server 2 'mount /dev/vdc /mnt/btrfs', -- When server 1 'touch /mnt/btrfs/foo', server 2 doesn't see any file under /mnt/btrfs -- I created /mnt/btrfs/foo on server 2 as well; then I added some content from both server 1 and server 2 to /mnt/btrfs/foo -- After that each server sees the content it adds, but not the content from the other server -- Both server 'umount /mnt/btrfs', and mount it again -- Then both servers see /mnt/btrfs/foo with the content added from server 2 (I guess it's because server 2 created the foo file later than server 1). I did a similar test on ext4 and both servers see a consistent image of the file system. When server 1 creates a foo file server 2 immediately sees it. Is this how btrfs is supposed to work? Thanks, Zhe -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Blocked tasks on 3.15.1
On Jun 27, 2014, at 9:14 AM, Rich Freeman r-bt...@thefreemanclan.net wrote: On Fri, Jun 27, 2014 at 9:06 AM, Duncan 1i5t5.dun...@cox.net wrote: Hopefully that problem's fixed on 3.16-rc2+, but as of yet there's not enough 3.16-rc2+ reports out there from folks experiencing issues with 3.15 blocked tasks to rightfully say. Any chance that it was backported to 3.15.2? I'd rather not move to mainline just for btrfs. The backports don't happen that quickly. I'm uncertain about specifics but I think many such fixes need to be demonstrated in mainline before they get backported to stable. I got another block this morning and failed to capture a log before my terminals gave out. I switched back to 3.15.0 for the moment, and we'll see if that fares any better. Yeah I'd start going backwards. The idea of going forwards is to hopefully get you unstuck or extract data where otherwise you can't, it's not really a recommendation for production usage. It's also often useful if you can reproduce the block with a current rc kernel and issue sysrq+w and post that. Then do your regression with an older kernel. Chris Murphy-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Question] Btrfs on iSCSI device
Hi, On 06/27/2014 05:44 PM, Zhe Zhang wrote: Hi, I setup 2 Linux servers to share the same device through iSCSI. Then I created a btrfs on the device. Then I saw the problem that the 2 Linux servers do not see a consistent file system image. Details: -- Server 1 running kernel 2.6.32, server 2 running 3.2.1 -- Both running btrfs v0.20-rc1 -- Server 2 has device /dev/vdc, exposed as iSCSI target -- Server 1 mounts the device as /dev/sda -- Server 1 'mount /dev/sda /mnt/btrfs'; server 2 'mount /dev/vdc /mnt/btrfs', -- When server 1 'touch /mnt/btrfs/foo', server 2 doesn't see any file under /mnt/btrfs -- I created /mnt/btrfs/foo on server 2 as well; then I added some content from both server 1 and server 2 to /mnt/btrfs/foo -- After that each server sees the content it adds, but not the content from the other server -- Both server 'umount /mnt/btrfs', and mount it again -- Then both servers see /mnt/btrfs/foo with the content added from server 2 (I guess it's because server 2 created the foo file later than server 1). I did a similar test on ext4 and both servers see a consistent image of the file system. When server 1 creates a foo file server 2 immediately sees it. Is this how btrfs is supposed to work? I don't think that it is possible to mount the _same device_ at the _same time_ on two different machines. And this doesn't depend by the filesystem. The fact that you see it working, I suspect that is is casual. When I tried this (same scsi HD connected to two machines), I had to ensure that the two machines never accessed to the HD at the same time. Thanks, Zhe -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- gpg @keyserver.linux.it: Goffredo Baroncelli (kreijackATinwind.it Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] btrfs-progs: add supported attr flags to btrfs(5)
On 6/27/14, 11:10 AM, David Sterba wrote: On Fri, Jun 27, 2014 at 10:36:54AM -0500, Eric Sandeen wrote: ... btrfs tries to handle a flag value which is identical to the 'X' flag value, which lsattr/chattr says is readonly... I'm looking at it from the kernel side, ie what's its meaning of the flag. The chattr tool still lives under the hood of e2fsprogs, but the ioctl interface is inherited to other filesystems (stating the obvious). e2fsprogs/chattr can decide to implement other meaning or new bits more or less freely (eg. there's the new 'N' flag for inlined files that I discovered just today while exploring the 'X' flag). Yes, the interface originated w/ extN, but has clearly spread to other filesystems, and spread like a weed. ;) It's still the de facto interface, but looking through other filesystems, it's a bit of a disaster. (filesystems specifying inheritance of flags they ignore, for example). There was a discussion at fsdevel about extending the interface or reworking it completely, I don't know if there's an outcome. From the btrfs side, we have the object properties that make a nice interface for accessing the file attributes in parallel with the chattr tool. The interface is currently underused so it's not possible to manipulate the flags yet. or test the code, despite it being merged. \o/ oh well... I'd rather move the efforts to finalize this interface than adding single bits of the SETFLAGS ioctl and further extensions of the chattr/lsattr tools. ok. In any case, back to the original patch: Your changes look fine. 'X' can't be set, so leave it out. Sorry about the 'd' vs 'D' - and I like the new formatting. Feel free to make those changes. (only nitpick: is 'X' ever reported by lsattr on btrfs? If so, it could/should still be included) ((and a side note: I tried to change the text of the manpage to be btrfs not btrfs-mount but that somehow broke the build, and I didn't dig a lot further)) -Eric -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Question] Btrfs on iSCSI device
On 2014-06-27 12:34, Goffredo Baroncelli wrote: Hi, On 06/27/2014 05:44 PM, Zhe Zhang wrote: Hi, I setup 2 Linux servers to share the same device through iSCSI. Then I created a btrfs on the device. Then I saw the problem that the 2 Linux servers do not see a consistent file system image. Details: -- Server 1 running kernel 2.6.32, server 2 running 3.2.1 -- Both running btrfs v0.20-rc1 -- Server 2 has device /dev/vdc, exposed as iSCSI target -- Server 1 mounts the device as /dev/sda -- Server 1 'mount /dev/sda /mnt/btrfs'; server 2 'mount /dev/vdc /mnt/btrfs', -- When server 1 'touch /mnt/btrfs/foo', server 2 doesn't see any file under /mnt/btrfs -- I created /mnt/btrfs/foo on server 2 as well; then I added some content from both server 1 and server 2 to /mnt/btrfs/foo -- After that each server sees the content it adds, but not the content from the other server -- Both server 'umount /mnt/btrfs', and mount it again -- Then both servers see /mnt/btrfs/foo with the content added from server 2 (I guess it's because server 2 created the foo file later than server 1). I did a similar test on ext4 and both servers see a consistent image of the file system. When server 1 creates a foo file server 2 immediately sees it. Is this how btrfs is supposed to work? I don't think that it is possible to mount the _same device_ at the _same time_ on two different machines. And this doesn't depend by the filesystem. The fact that you see it working, I suspect that is is casual. When I tried this (same scsi HD connected to two machines), I had to ensure that the two machines never accessed to the HD at the same time. Thanks, Zhe -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html If you need shared storage like that, you need to use a real cluster filesystem like GFS2 or OCFS2, BTRFS isn't designed for any kind of concurrent access to shared storage from separate systems. The reason it appears to work when using iSCSI and not with directly connected parallel SCSI or SAS is that iSCSI doesn't provide low level hardware access. smime.p7s Description: S/MIME Cryptographic Signature
Re: Blocked tasks on 3.15.1
Chris Murphy posted on Fri, 27 Jun 2014 09:52:46 -0600 as excerpted: On Jun 27, 2014, at 9:14 AM, Rich Freeman r-bt...@thefreemanclan.net wrote: On Fri, Jun 27, 2014 at 9:06 AM, Duncan 1i5t5.dun...@cox.net wrote: Hopefully that problem's fixed on 3.16-rc2+, but as of yet there's not enough 3.16-rc2+ reports out there from folks experiencing issues with 3.15 blocked tasks to rightfully say. Any chance that it was backported to 3.15.2? I'd rather not move to mainline just for btrfs. The backports don't happen that quickly. The lockup bug that affected early 3.16 was introduced in the commit- window pull for 3.16, so the fix for that shouldn't have needed backported (unless the problem commit ended up in stable too, which I doubt but don't know for sure). 3.15.0 didn't contain that bug, which affected me, but as I said, there did seem to be more blocked-task reports in 3.15, which didn't affect me. I didn't run 3.15.1, however, staying on 3.15.0 until after 3.16-rc2 fixed the earlier 3.16-pre series bug that had kept me from the 3.16 series until then. So anything that might have affected the 3.15 stable series after 3.15.0, I wouldn't know about. If I'm not mistaken the fix for the 3.16 series bug was: ea4ebde02e08558b020c4b61bb9a4c0fcf63028e Btrfs: fix deadlocks with trylock on tree nodes. But I think the 3.16 commit-window changes introducing the bug weren't btrfs specific but instead at the generic vfs level. If that's the case, then it's possible that the bug was there before 3.16's commit window and might have been triggering some of the 3.15 reports as well, and the 3.16 vfs change simply made it much worse. IOW, I don't know whether that 3.16 series fix will help 3.15 or not, but I don't believe it'll hurt, and it /might/ help. -- Duncan - List replies preferred. No HTML msgs. Every nonfree program has a lord, a master -- and if you use the program, he is your master. Richard Stallman -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Question] Btrfs on iSCSI device
On Fri, Jun 27, 2014 at 1:15 PM, Austin S Hemmelgarn ahferro...@gmail.com wrote: On 2014-06-27 12:34, Goffredo Baroncelli wrote: Hi, On 06/27/2014 05:44 PM, Zhe Zhang wrote: Hi, I setup 2 Linux servers to share the same device through iSCSI. Then I created a btrfs on the device. Then I saw the problem that the 2 Linux servers do not see a consistent file system image. Details: -- Server 1 running kernel 2.6.32, server 2 running 3.2.1 -- Both running btrfs v0.20-rc1 -- Server 2 has device /dev/vdc, exposed as iSCSI target -- Server 1 mounts the device as /dev/sda -- Server 1 'mount /dev/sda /mnt/btrfs'; server 2 'mount /dev/vdc /mnt/btrfs', -- When server 1 'touch /mnt/btrfs/foo', server 2 doesn't see any file under /mnt/btrfs -- I created /mnt/btrfs/foo on server 2 as well; then I added some content from both server 1 and server 2 to /mnt/btrfs/foo -- After that each server sees the content it adds, but not the content from the other server -- Both server 'umount /mnt/btrfs', and mount it again -- Then both servers see /mnt/btrfs/foo with the content added from server 2 (I guess it's because server 2 created the foo file later than server 1). I did a similar test on ext4 and both servers see a consistent image of the file system. When server 1 creates a foo file server 2 immediately sees it. Is this how btrfs is supposed to work? I don't think that it is possible to mount the _same device_ at the _same time_ on two different machines. And this doesn't depend by the filesystem. The fact that you see it working, I suspect that is is casual. When I tried this (same scsi HD connected to two machines), I had to ensure that the two machines never accessed to the HD at the same time. Thanks, Zhe -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html If you need shared storage like that, you need to use a real cluster filesystem like GFS2 or OCFS2, BTRFS isn't designed for any kind of concurrent access to shared storage from separate systems. The reason it appears to work when using iSCSI and not with directly connected parallel SCSI or SAS is that iSCSI doesn't provide low level hardware access. I did more testing with ext4 and it supports what Goffredo and Austin said above. Error message is cannot access xxx: Input/output error. It seems to me that both servers hold some file system data structures in memory and eventually conflict with each other (like writing inode info to the same blocks). -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Blocked tasks on 3.15.1
On Fri, Jun 27, 2014 at 11:52 AM, Chris Murphy li...@colorremedies.com wrote: On Jun 27, 2014, at 9:14 AM, Rich Freeman r-bt...@thefreemanclan.net wrote: I got another block this morning and failed to capture a log before my terminals gave out. I switched back to 3.15.0 for the moment, and we'll see if that fares any better. Yeah I'd start going backwards. The idea of going forwards is to hopefully get you unstuck or extract data where otherwise you can't, it's not really a recommendation for production usage. It's also often useful if you can reproduce the block with a current rc kernel and issue sysrq+w and post that. Then do your regression with an older kernel. So, obviously I'm getting my money's worth from the btrfs team, but neither is always a great option as neither involves me running a stable kernel. 3.15.0 contains CVE-2014-4014, although I'm running a version patched for that vulnerability. If I go back any further I'd probably have to backport it myself, and I only know about it because my distro patched that CVE on 3.15.0 before moving to 3.15.1. Running 3.16 doesn't bother me much from a btrfs standpoint, but it means I'm getting unstable updates on all the other modules as well. It is just more to deal with. I might give 3.15.2 a shot and see what happens, and I can always fall back to 3.15.0 again. Rich -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Cannot delete snapshot
Hello, =) I got a problem with a simple backup bash script. It creates a snapshot and then backs it up. The user (Ubuntu 12.04, 64-bit) interrupted the script with CTRL+C shortly after it started. Then the machine was rebooted several times. Now these snapshots cannot be deleted anymore and new ones can't be taken. This is what the script does before it starts the backup process: mount /mnt/big -o remount -o rw # /mnt/big is a partition for backups, which is read-only mkdir /mnt/backup_root mount /dev/sda1 /mnt/backup_root # /dev/sda1 on /home type btrfs (rw,subvol=@home) btrfs subvolume snapshot /mnt/backup_root/@ /mnt/backup_root/@snapshot_backup_root btrfs subvolume snapshot /mnt/backup_root/@home /mnt/backup_root/@snapshot_backup_home mkdir /mnt/backup mount --bind /mnt/backup_root/@snapshot_backup_root /mnt/backup/ mount --bind /mnt/backup_root/@snapshot_backup_home/ /mnt/backup/home/ mount --bind /boot /mnt/backup/boot/ This is what it didn't do, as it was interrupted: umount /mnt/backup/boot/ 1 /dev/null 2 /dev/null umount /mnt/backup/home/ 1 /dev/null 2 /dev/null umount /mnt/backup 1 /dev/null 2 /dev/null rmdir /mnt/backup 1 /dev/null 2 /dev/null btrfs subvolume delete /mnt/backup_root/@snapshot_backup_home 1 /dev/null 2 /dev/null btrfs subvolume delete /mnt/backup_root/@snapshot_backup_root 1 /dev/null 2 /dev/null umount /mnt/backup_root 1 /dev/null 2 /dev/null rmdir /mnt/backup_root 1 /dev/null 2 /dev/null mount /mnt/big -o remount -o ro 1 /dev/null 2 /dev/null This is what the file-system is supposed to look like: # btrfs subvolume list '/' ID 256 top level 5 path @ ID 257 top level 5 path @home and this is what it looks like instead: # btrfs subvolume list '/' ID 256 top level 5 path @ ID 257 top level 5 path @home ID 258 top level 5 path @snapshot_backup_root ID 259 top level 5 path @snapshot_backup_home ID 260 top level 5 path @snapshot_backup_root/@ ID 261 top level 5 path @snapshot_backup_home/@home # btrfs subvolume list '/mnt' ERROR: '/mnt' is not a subvolume # btrfs subvolume list '/mnt/backup_root' ERROR: error accessing '/mnt/backup_root' When running the script again, it prints: ERROR: cannot snapshot '/mnt/backup_root/@' and ERROR: cannot snapshot '/mnt/backup_root/@home' as they still seem to exist. But deleting the snapshots fails as well. btrfs subvolume delete '/@snapshot_backup_root' btrfs subvolume delete '/@snapshot_backup_home' btrfs subvolume delete '/@snapshot_backup_root/@' btrfs subvolume delete '/@snapshot_backup_home/@home' btrfs subvolume delete '/@snapshot_backup_root' btrfs subvolume delete '/@snapshot_backup_home' didn't work (ERROR: error accessing '/@snapshot_backup_root'. Simply running the unmount/delete snapshot part of the script didn't either. How can I get rid of those snapshots (ID 257-261)? I tried: # mkdir /tmp/t # mount /dev/sda1 /tmp/t -o subvol=/ # ls /tmp/t @ @home @snapshot_backup_home @snapshot_backup_root # btrfs subvol delete /tmp/t/@snapshot_backup_home Delete subvolume '/tmp/t/@snapshot_backup_home' ERROR: cannot delete '/tmp/t/@snapshot_backup_home' Needless to say that such an error message isn't really helpful. I'd appreciate any help. As I'm NOT SUBSCRIBED TO THE MAILING LIST, please CC to m...@gmx.net Thank you very much! :-) PS.: I didn't append any dmesg and such as they likely won't be of any use, given the very specific problem. If needed I can supply them. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Also seeing full deadlocks with 3.15.1
My laptop deadlocked some more times (everything works until it needs to touch the filesystem, and then it's deadlocked). Unfortunately, I can trigger sysrq, but it doesn't get committed to disk and netconsole eats half of it because it goes too fast for UDP apparently Now, I just captured that on my server with serial console. 11005 1-16:11:10 wait_current_trans.isra.15 /usr/bin/zma -m 3 14441 1-16:07:44 wait_current_trans.isra.15 /usr/bin/zma -m 1 17045 1-23:53:33 wait_current_trans.isra.15 /usr/bin/zma -m 9 22261 2-00:40:36 wait_current_trans.isra.15 /usr/bin/zma -m 6 22292 2-00:40:36 wait_current_trans.isra.15 /usr/bin/zma -m 8 1991109:29:35 wait_current_trans.isra.15 rm -f -- /mnt/dshelf2/backup/0Notmachines/mysql//mysql.daily.sql.gz.13 /mnt/dshelf2/backup/0Notmachines/mysql//mysql.daily.sql.gz.13.gz 22848 1-05:18:35 wait_current_trans.isra.15 rm -f -- mnt/dshelf2/backup/0Notmachines/jen//backup.tar.bz.11 mnt/dshelf2/backup/0Notmachines/jen//backup.tar.bz.11.gz Those are 2 different filesystems (one single device mapper disk, the other one is btrfs raid1), so I'm not sure which one of the 2 caused the problem, but I'm perplexed as to why one would than hang the other, unless they both hit the same bug? The sysrq-w output is here: http://marc.merlins.org/tmp/btrfs-hang.txt but here is one hung process: zmaD 0003 0 22292 1 0x20020084 880074733bb0 0082 8800c933f270 880074733fd8 8801853b4610 000141c0 8801aac60f00 880036caa9e8 880036caa800 8801db59f0c0 880074733bc0 Call Trace: [8161d3c6] schedule+0x73/0x75 [8122a87b] wait_current_trans.isra.15+0x98/0xf4 [810847ed] ? finish_wait+0x65/0x65 [8122bd95] start_transaction+0x498/0x4fc [8122be14] btrfs_start_transaction+0x1b/0x1d [8123602a] btrfs_create+0x3c/0x1ce [81298985] ? security_inode_permission+0x1c/0x23 [8115e93e] ? __inode_permission+0x79/0xa4 [8115fbfc] vfs_create+0x66/0x8c [8116095e] do_last+0x5af/0xa23 [81161009] path_openat+0x237/0x4de [81162408] do_filp_open+0x3a/0x7f [8161faeb] ? _raw_spin_unlock+0x17/0x2a [8116c3eb] ? __alloc_fd+0xea/0xf9 [8115499d] do_sys_open+0x70/0xff [81194e20] compat_SyS_open+0x1b/0x1d [8162842c] sysenter_dispatch+0x7/0x21 As per the other thread, I'm happy to test a patch against 3.15, but not hot about switching to a likely even less stable 3.16 since it's a real server with real data. Thanks, Marc -- A mouse is a device used to point at the xterm you want to type in - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
nossd option ignored
Hello, With kernel 3.14.5... $ sudo umount /mnt/net/alpha/11 umount: /mnt/net/alpha/11: not mounted $ sudo mount -o inode_cache,space_cache,compress=lzo,noatime,nossd,skip_balance /dev/nbd11 /mnt/net/alpha/11 $ sudo mount | grep nbd11 /dev/nbd11 on /mnt/net/alpha/11 type btrfs (rw,noatime,compress=lzo,ssd,space_cache,inode_cache,skip_balance) $ dmesg | tail ... [1353819.363462] BTRFS: device fsid 8cf8eff9-fd5a-4b6f-bb85-3f2df2f63c99 devid 1 transid 25041 /dev/nbd11 [1353819.364668] BTRFS info (device nbd11): enabling inode map caching [1353819.364674] BTRFS info (device nbd11): disk space caching is enabled [1353821.784617] BTRFS: detected SSD devices, enabling SSD mode -- I'm certain the nossd option used to work (prevent the SSD mode) with this exact same configuration on older kernels. Any idea why it doesn't now? -- With respect, Roman signature.asc Description: PGP signature
[PATCH] btrfs: fix nossd and ssd_spread mount option regression
The commit 0780253 btrfs: Cleanup the btrfs_parse_options for remount. broke ssd options quite badly; it stopped making ssd_spread imply ssd, and it made nossd unsettable. Put things back at least as well as they were before (though ssd mount option handling is still pretty odd: # mount -o nossd,ssd_spread works?) Reported-by: Roman Mamedov r...@romanrm.net Signed-off-by: Eric Sandeen sand...@redhat.com --- I've tested this insofar as I was actually able to mount with nossd,and see it reflected in /proc/mounts. If SSD_SPREAD is set, show_options() won't show you the ssd option, so that's not totally obvious. Still, this is what the code did before the regression. diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 4662d92..0e8edcc 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -522,9 +522,10 @@ int btrfs_parse_options(struct btrfs_root *root, char *options) case Opt_ssd_spread: btrfs_set_and_info(root, SSD_SPREAD, use spread ssd allocation scheme); + btrfs_set_opt(info-mount_opt, SSD); break; case Opt_nossd: - btrfs_clear_and_info(root, NOSSD, + btrfs_set_and_info(root, NOSSD, not using ssd allocation scheme); btrfs_clear_opt(info-mount_opt, SSD); break; -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Can't mount subvolume with ro option
On Jun 27, 2014, at 2:07 PM, Sébastien ROHAUT sebastien.roh...@free.fr wrote: Hi, In the wiki, it's said we can mount subvolumes with different mount options. nosuid, nodev, rw and ro are listed, as valid generic mount options. This might require 3.15. I don't recall it working with early 3.14 kernels, but by 3.14.3 I'd moved onto testing 3.15. [root@rawhide ~]# mount /dev/sda3 /mnt [root@rawhide ~]# btrfs subvol create /mnt/test Create subvolume '/mnt/test' [root@rawhide ~]# umount /mnt [root@rawhide ~]# mount -o ro,subvol=test /dev/sda3 /mnt [root@rawhide ~]# mount | grep btrfs /dev/sda3 on / type btrfs (rw,relatime,seclabel,space_cache,autodefrag) /dev/sda3 on /home type btrfs (rw,relatime,seclabel,space_cache,autodefrag) /dev/sda3 on /var type btrfs (rw,relatime,seclabel,space_cache,autodefrag) /dev/sda3 on /boot type btrfs (rw,relatime,seclabel,space_cache,autodefrag) /dev/sda3 on /mnt type btrfs (ro,relatime,seclabel,space_cache,autodefrag) [root@rawhide ~]# cat /proc/self/mountinfo | grep btrfs 58 0 0:33 /root / rw,relatime shared:1 - btrfs /dev/sda3 rw,seclabel,space_cache,autodefrag 72 58 0:33 /home /home rw,relatime shared:29 - btrfs /dev/sda3 rw,seclabel,space_cache,autodefrag 74 58 0:33 /var /var rw,relatime shared:30 - btrfs /dev/sda3 rw,seclabel,space_cache,autodefrag 76 58 0:33 /boot /boot rw,relatime shared:31 - btrfs /dev/sda3 rw,seclabel,space_cache,autodefrag 84 58 0:33 /test /mnt ro,relatime shared:35 - btrfs /dev/sda3 rw,seclabel,space_cache,autodefrag So on my end it seems like it's working correctly with 3.15. Chris Murphy-- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Can't mount subvolume with ro option
On Jun 27, 2014, at 4:08 PM, Chris Murphy li...@colorremedies.com wrote: On Jun 27, 2014, at 2:07 PM, Sébastien ROHAUT sebastien.roh...@free.fr wrote: Hi, In the wiki, it's said we can mount subvolumes with different mount options. nosuid, nodev, rw and ro are listed, as valid generic mount options. This might require 3.15. I don't recall it working with early 3.14 kernels, but by 3.14.3 I'd moved onto testing 3.15. [root@f20v ~]# mount /dev/sda3 /mnt [root@f20v ~]# btrfs subvol create /mnt/test Create subvolume '/mnt/test' [root@f20v ~]# umount /mnt [root@f20v ~]# mount -o ro,subvol=test /dev/sda3 /mnt mount: /dev/sda3 is already mounted or /mnt busy /dev/sda3 is already mounted on / /dev/sda3 is already mounted on /home /dev/sda3 is already mounted on /var /dev/sda3 is already mounted on /boot [root@f20v ~]# uname -r 3.14.6-200.fc20.x86_64 I don't know if this feature will be backported to stable kernels. If not, then probably the wiki should say it's a 3.15+ feature. Chris Murphy -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Also seeing full deadlocks with 3.15.1
On Fri, Jun 27, 2014 at 02:50:10PM -0700, ronnie sahlberg wrote: If I don't hear anything by the end of today, I'll just delete the filesystem and start over. At some stage it would be nice to see not only fixes but also changes to fsck to make it able to repair these problems. Blow it away and create a new filesystem from scratch is sub-optimal. I don't think you'll find disagreement from me or anyone here :) But I'd go one step further. The filesystem is not corrupted as far as I can tell, I'm happily copying data off it in ro,recovery mode (to prevent background btrfs code from trying to do stuff and trip over itself again). The problem in my experience so far is that btrfs isn't stabilizing at all. Some bugs are fixed, other things are changed, and new ones are added. I've not had a single version of btrfs in the last 4 kernels that didn't deadlock and/or trip over itself (apparently from evolving or balancing/filling filesystems into states where it can't handle them properly anymore). I really really wish we had a kernel release with only stabilizations and where all recent deadlock and corruption problems (on newly created filesystems) would be handled. Right now, general state is bad enough that you can't tell if you hit a new bug, or if it's an old bug that hasn't been fixed yet and developers can't easily know if newer kernels have introduced regressions or not since the general state of performance and stability isn't good across all recent kernel versions. Marc -- A mouse is a device used to point at the xterm you want to type in - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Also seeing full deadlocks with 3.15.1
On 06/27/2014 11:50 AM, Marc MERLIN wrote: My laptop deadlocked some more times (everything works until it needs to touch the filesystem, and then it's deadlocked). Unfortunately, I can trigger sysrq, but it doesn't get committed to disk and netconsole eats half of it because it goes too fast for UDP apparently Now, I just captured that on my server with serial console. 11005 1-16:11:10 wait_current_trans.isra.15 /usr/bin/zma -m 3 14441 1-16:07:44 wait_current_trans.isra.15 /usr/bin/zma -m 1 17045 1-23:53:33 wait_current_trans.isra.15 /usr/bin/zma -m 9 22261 2-00:40:36 wait_current_trans.isra.15 /usr/bin/zma -m 6 22292 2-00:40:36 wait_current_trans.isra.15 /usr/bin/zma -m 8 1991109:29:35 wait_current_trans.isra.15 rm -f -- /mnt/dshelf2/backup/0Notmachines/mysql//mysql.daily.sql.gz.13 /mnt/dshelf2/backup/0Notmachines/mysql//mysql.daily.sql.gz.13.gz 22848 1-05:18:35 wait_current_trans.isra.15 rm -f -- mnt/dshelf2/backup/0Notmachines/jen//backup.tar.bz.11 mnt/dshelf2/backup/0Notmachines/jen//backup.tar.bz.11.gz Those are 2 different filesystems (one single device mapper disk, the other one is btrfs raid1), so I'm not sure which one of the 2 caused the problem, but I'm perplexed as to why one would than hang the other, unless they both hit the same bug? The sysrq-w output is here: https://urldefense.proofpoint.com/v1/url?u=http://marc.merlins.org/tmp/btrfs-hang.txtk=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0Ar=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0Am=CZ0ka0XcM6ZpRAF31LYBziutfoecu9ODO78jo5Kb2JQ%3D%0As=6213c6dc2c99166a71f262a1804bc7135ca17bffd8b9de175f655ed2a6a54f10 but here is one hung process: zma D 0003 0 22292 1 0x20020084 880074733bb0 0082 8800c933f270 880074733fd8 8801853b4610 000141c0 8801aac60f00 880036caa9e8 880036caa800 8801db59f0c0 880074733bc0 Call Trace: [8161d3c6] schedule+0x73/0x75 [8122a87b] wait_current_trans.isra.15+0x98/0xf4 [810847ed] ? finish_wait+0x65/0x65 [8122bd95] start_transaction+0x498/0x4fc [8122be14] btrfs_start_transaction+0x1b/0x1d [8123602a] btrfs_create+0x3c/0x1ce [81298985] ? security_inode_permission+0x1c/0x23 [8115e93e] ? __inode_permission+0x79/0xa4 [8115fbfc] vfs_create+0x66/0x8c [8116095e] do_last+0x5af/0xa23 [81161009] path_openat+0x237/0x4de [81162408] do_filp_open+0x3a/0x7f [8161faeb] ? _raw_spin_unlock+0x17/0x2a [8116c3eb] ? __alloc_fd+0xea/0xf9 [8115499d] do_sys_open+0x70/0xff [81194e20] compat_SyS_open+0x1b/0x1d [8162842c] sysenter_dispatch+0x7/0x21 As per the other thread, I'm happy to test a patch against 3.15, but not hot about switching to a likely even less stable 3.16 since it's a real server with real data. A few other people have complained about this, I've not been able to reproduce it but I have a patch you can try. It will make it so the box doesn't deadlock anymore but I still need the output, look for timed out, thats when you need to dump the logs and send it to me. The patch is here http://ur1.ca/hlj6d Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Question] Btrfs on iSCSI device
On Fri, 27 Jun 2014 18:34:34 Goffredo Baroncelli wrote: I don't think that it is possible to mount the _same device_ at the _same time_ on two different machines. And this doesn't depend by the filesystem. If you use a clustered filesystem then you can safely mount it on multiple machines. If you use a non-clustered filesystem it can still mount and even appear to work for a while. It's surprising how many writes you can make to a dual- mounted filesystem that's not designed for such things before you get a totally broken filesystem. On Fri, 27 Jun 2014 13:15:16 Austin S Hemmelgarn wrote: The reason it appears to work when using iSCSI and not with directly connected parallel SCSI or SAS is that iSCSI doesn't provide low level hardware access. I've tried this with dual-attached FC and had no problems mounting. In what way is directly connected SCSI different from FC? -- My Main Blog http://etbe.coker.com.au/ My Documents Bloghttp://doc.coker.com.au/ -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Also seeing full deadlocks with 3.15.1
On Fri, Jun 27, 2014 at 03:36:08PM -0700, Josef Bacik wrote: On 06/27/2014 11:50 AM, Marc MERLIN wrote: My laptop deadlocked some more times (everything works until it needs to touch the filesystem, and then it's deadlocked). Unfortunately, I can trigger sysrq, but it doesn't get committed to disk and netconsole eats half of it because it goes too fast for UDP apparently Now, I just captured that on my server with serial console. 11005 1-16:11:10 wait_current_trans.isra.15 /usr/bin/zma -m 3 14441 1-16:07:44 wait_current_trans.isra.15 /usr/bin/zma -m 1 17045 1-23:53:33 wait_current_trans.isra.15 /usr/bin/zma -m 9 22261 2-00:40:36 wait_current_trans.isra.15 /usr/bin/zma -m 6 22292 2-00:40:36 wait_current_trans.isra.15 /usr/bin/zma -m 8 1991109:29:35 wait_current_trans.isra.15 rm -f -- /mnt/dshelf2/backup/0Notmachines/mysql//mysql.daily.sql.gz.13 /mnt/dshelf2/backup/0Notmachines/mysql//mysql.daily.sql.gz.13.gz 22848 1-05:18:35 wait_current_trans.isra.15 rm -f -- mnt/dshelf2/backup/0Notmachines/jen//backup.tar.bz.11 mnt/dshelf2/backup/0Notmachines/jen//backup.tar.bz.11.gz Those are 2 different filesystems (one single device mapper disk, the other one is btrfs raid1), so I'm not sure which one of the 2 caused the problem, but I'm perplexed as to why one would than hang the other, unless they both hit the same bug? The sysrq-w output is here: https://urldefense.proofpoint.com/v1/url?u=http://marc.merlins.org/tmp/btrfs-hang.txtk=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0Ar=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0Am=CZ0ka0XcM6ZpRAF31LYBziutfoecu9ODO78jo5Kb2JQ%3D%0As=6213c6dc2c99166a71f262a1804bc7135ca17bffd8b9de175f655ed2a6a54f10 but here is one hung process: zmaD 0003 0 22292 1 0x20020084 880074733bb0 0082 8800c933f270 880074733fd8 8801853b4610 000141c0 8801aac60f00 880036caa9e8 880036caa800 8801db59f0c0 880074733bc0 Call Trace: [8161d3c6] schedule+0x73/0x75 [8122a87b] wait_current_trans.isra.15+0x98/0xf4 [810847ed] ? finish_wait+0x65/0x65 [8122bd95] start_transaction+0x498/0x4fc [8122be14] btrfs_start_transaction+0x1b/0x1d [8123602a] btrfs_create+0x3c/0x1ce [81298985] ? security_inode_permission+0x1c/0x23 [8115e93e] ? __inode_permission+0x79/0xa4 [8115fbfc] vfs_create+0x66/0x8c [8116095e] do_last+0x5af/0xa23 [81161009] path_openat+0x237/0x4de [81162408] do_filp_open+0x3a/0x7f [8161faeb] ? _raw_spin_unlock+0x17/0x2a [8116c3eb] ? __alloc_fd+0xea/0xf9 [8115499d] do_sys_open+0x70/0xff [81194e20] compat_SyS_open+0x1b/0x1d [8162842c] sysenter_dispatch+0x7/0x21 As per the other thread, I'm happy to test a patch against 3.15, but not hot about switching to a likely even less stable 3.16 since it's a real server with real data. A few other people have complained about this, I've not been able to reproduce it but I have a patch you can try. It will make it so the box doesn't deadlock anymore but I still need the output, look for timed out, thats when you need to dump the logs and send it to me. The patch is here Mmmh, I applied the patch, but now I'm getting tens of thousands of the lines below. The machine is so unresponsive (due to serial port speed limitation and amount of console spamming) that I cannot even ssh into it. Example output below. I have to back that kernel out, it's unusable and I'm not sure what output I can get you out of it. [ 1313.747004] looking up page 46 on inode 8801ac3e9d68 [ 1313.747006] created a page, should be locked ? eac6d480 [ 1313.747006] looking up page 47 on inode 8801ac3e9d68 [ 1313.747008] created a page, should be locked ? eac6d4b8 [ 1313.747009] looking up page 48 on inode 8801ac3e9d68 [ 1313.747011] created a page, should be locked ? eac75ad0 [ 1313.747012] looking up page 49 on inode 8801ac3e9d68 [ 1313.747013] created a page, should be locked ? eac75b08 [ 1313.747014] looking up page 50 on inode 8801ac3e9d68 [ 1313.747016] created a page, should be locked ? eac5d420 [ 1313.747017] looking up page 51 on inode 8801ac3e9d68 [ 1313.747018] created a page, should be locked ? eac5d458 [ 1313.747019] looking up page 52 on inode 8801ac3e9d68 [ 1313.747021] created a page, should be locked ? eace4f00 [ 1313.747022] looking up page 53 on inode 8801ac3e9d68 [ 1313.747023] created a page, should be locked ? eace4f38 [ 1313.747024] looking up page 54 on inode 8801ac3e9d68 [ 1313.747026] created a page, should be locked ? eac989f0 [ 1313.747027] looking up page 55 on inode 8801ac3e9d68 [ 1313.747029] created a page, should be locked ? eac98a28 [ 1375.660075] dropping
[PATCH] Btrfs: make sure to use btrfs_header_owner when freeing tree block
Mark noticed that his qgroup accounting for snapshot deletion wasn't working properly on a particular file system. Turns out we pass the root-objectid of the root we are deleting to btrfs_free_extent, and use that root always when we call btrfs_free_tree_block. This isn't correct, the owner must match the btrfs_header_owner() of the eb. So to fix this we need to use that when we call btrfs_free_extent, and we also need to use btrfs_header_owner(eb) in btrfs_free_tree_block as the root we pass in may not be the owner in the case of snapshot delete (though it is for all the normal cases which is why it wasn't noticed before.) With this patch on top of Mark's snapshot delete patch everything is working a-ok. Thanks, Signed-off-by: Josef Bacik jba...@fb.com --- fs/btrfs/extent-tree.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 7671b15..7f9bb7c 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -6189,7 +6189,7 @@ void btrfs_free_tree_block(struct btrfs_trans_handle *trans, if (root-root_key.objectid != BTRFS_TREE_LOG_OBJECTID) { ret = btrfs_add_delayed_tree_ref(root-fs_info, trans, buf-start, buf-len, - parent, root-root_key.objectid, + parent, btrfs_header_owner(eb), btrfs_header_level(buf), BTRFS_DROP_DELAYED_REF, NULL, 0); BUG_ON(ret); /* -ENOMEM */ @@ -7925,7 +7925,8 @@ skip: } ret = btrfs_free_extent(trans, root, bytenr, blocksize, parent, - root-root_key.objectid, level - 1, 0, 0); + btrfs_header_owner(next), level - 1, 0, + 0); BUG_ON(ret); /* -ENOMEM */ } btrfs_tree_unlock(next); -- 2.0.0 -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Also seeing full deadlocks with 3.15.1
On 06/27/2014 04:59 PM, Marc MERLIN wrote: On Fri, Jun 27, 2014 at 03:36:08PM -0700, Josef Bacik wrote: On 06/27/2014 11:50 AM, Marc MERLIN wrote: My laptop deadlocked some more times (everything works until it needs to touch the filesystem, and then it's deadlocked). Unfortunately, I can trigger sysrq, but it doesn't get committed to disk and netconsole eats half of it because it goes too fast for UDP apparently Now, I just captured that on my server with serial console. 11005 1-16:11:10 wait_current_trans.isra.15 /usr/bin/zma -m 3 14441 1-16:07:44 wait_current_trans.isra.15 /usr/bin/zma -m 1 17045 1-23:53:33 wait_current_trans.isra.15 /usr/bin/zma -m 9 22261 2-00:40:36 wait_current_trans.isra.15 /usr/bin/zma -m 6 22292 2-00:40:36 wait_current_trans.isra.15 /usr/bin/zma -m 8 1991109:29:35 wait_current_trans.isra.15 rm -f -- /mnt/dshelf2/backup/0Notmachines/mysql//mysql.daily.sql.gz.13 /mnt/dshelf2/backup/0Notmachines/mysql//mysql.daily.sql.gz.13.gz 22848 1-05:18:35 wait_current_trans.isra.15 rm -f -- mnt/dshelf2/backup/0Notmachines/jen//backup.tar.bz.11 mnt/dshelf2/backup/0Notmachines/jen//backup.tar.bz.11.gz Those are 2 different filesystems (one single device mapper disk, the other one is btrfs raid1), so I'm not sure which one of the 2 caused the problem, but I'm perplexed as to why one would than hang the other, unless they both hit the same bug? The sysrq-w output is here: https://urldefense.proofpoint.com/v1/url?u=http://marc.merlins.org/tmp/btrfs-hang.txtk=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0Ar=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0Am=CZ0ka0XcM6ZpRAF31LYBziutfoecu9ODO78jo5Kb2JQ%3D%0As=6213c6dc2c99166a71f262a1804bc7135ca17bffd8b9de175f655ed2a6a54f10 but here is one hung process: zma D 0003 0 22292 1 0x20020084 880074733bb0 0082 8800c933f270 880074733fd8 8801853b4610 000141c0 8801aac60f00 880036caa9e8 880036caa800 8801db59f0c0 880074733bc0 Call Trace: [8161d3c6] schedule+0x73/0x75 [8122a87b] wait_current_trans.isra.15+0x98/0xf4 [810847ed] ? finish_wait+0x65/0x65 [8122bd95] start_transaction+0x498/0x4fc [8122be14] btrfs_start_transaction+0x1b/0x1d [8123602a] btrfs_create+0x3c/0x1ce [81298985] ? security_inode_permission+0x1c/0x23 [8115e93e] ? __inode_permission+0x79/0xa4 [8115fbfc] vfs_create+0x66/0x8c [8116095e] do_last+0x5af/0xa23 [81161009] path_openat+0x237/0x4de [81162408] do_filp_open+0x3a/0x7f [8161faeb] ? _raw_spin_unlock+0x17/0x2a [8116c3eb] ? __alloc_fd+0xea/0xf9 [8115499d] do_sys_open+0x70/0xff [81194e20] compat_SyS_open+0x1b/0x1d [8162842c] sysenter_dispatch+0x7/0x21 As per the other thread, I'm happy to test a patch against 3.15, but not hot about switching to a likely even less stable 3.16 since it's a real server with real data. A few other people have complained about this, I've not been able to reproduce it but I have a patch you can try. It will make it so the box doesn't deadlock anymore but I still need the output, look for timed out, thats when you need to dump the logs and send it to me. The patch is here Mmmh, I applied the patch, but now I'm getting tens of thousands of the lines below. The machine is so unresponsive (due to serial port speed limitation and amount of console spamming) that I cannot even ssh into it. Example output below. I have to back that kernel out, it's unusable and I'm not sure what output I can get you out of it. Oh yeah I should have mentioned that, it's going to spit out a metric shittone of stuff. No worries, you had a lot more info in your sysrq+w, I'm hoping I can get this to reproduce next week. Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: make sure to use btrfs_header_owner when freeing tree block V2
On 06/27/2014 05:05 PM, Josef Bacik wrote: Mark noticed that his qgroup accounting for snapshot deletion wasn't working properly on a particular file system. Turns out we pass the root-objectid of the root we are deleting to btrfs_free_extent, and use that root always when we call btrfs_free_tree_block. This isn't correct, the owner must match the btrfs_header_owner() of the eb. So to fix this we need to use that when we call btrfs_free_extent, and we also need to use btrfs_header_owner(eb) in btrfs_free_tree_block as the root we pass in may not be the owner in the case of snapshot delete (though it is for all the normal cases which is why it wasn't noticed before.) With this patch on top of Mark's snapshot delete patch everything is working a-ok. Thanks, Signed-off-by: Josef Bacik jba...@fb.com --- V1-V2: this one actually compiles. Huh I may be completely full of crap here, let's just ignore all post 5pm Friday patches from me for now. Thanks, Josef -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Blocked tasks on 3.15.1
On Fri, 27 Jun 2014 05:20:41 PM Duncan wrote: If I'm not mistaken the fix for the 3.16 series bug was: ea4ebde02e08558b020c4b61bb9a4c0fcf63028e Btrfs: fix deadlocks with trylock on tree nodes. That patch applies cleanly to 3.15.2 so if it is indeed the fix it should probably go to -stable for the next 3.15 release.. Unfortunately my test system died a while ago (hardware problem) and I've not been able to resurrect it yet. cheers, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC signature.asc Description: This is a digitally signed message part.
RAID1 3+ drives
Can I get more protection by using more than 2 drives? I had an onboard RAID a few years back that would let me use RAID1 across up to 4 drives. Apologies if this has been covered already, I don't recall seeing anything saying yay or nay. -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RAID1 3+ drives
On Fri, 27 Jun 2014 20:30:32 Zack Coffey wrote: Can I get more protection by using more than 2 drives? I had an onboard RAID a few years back that would let me use RAID1 across up to 4 drives. Currently the only RAID level that fully works in BTRFS is RAID-1 with data on 2 disks. If you have 4 disks in the array then each block will be on 2 of the disks. RAID-5/6 code mostly works but the last report I read indicated that some situations for recovery and disk replacement didn't work - presumably anyone who's afraid of multiple disks failing isn't going to want to trust BTRFS RAID-6 code at the moment. If you want to have 4 disks in a fully redundant configuration (IE you could lose 3 disks without losing any data) then the thing to do is to have 2 RAID-1 arrays with Linux software RAID and then run BTRFS RAID-1 on top of that. -- My Main Blog http://etbe.coker.com.au/ My Documents Bloghttp://doc.coker.com.au/ -- To unsubscribe from this list: send the line unsubscribe linux-btrfs in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Question] Btrfs on iSCSI device
On 06/27/2014 07:40 PM, Russell Coker wrote: On Fri, 27 Jun 2014 18:34:34 Goffredo Baroncelli wrote: I don't think that it is possible to mount the _same device_ at the _same time_ on two different machines. And this doesn't depend by the filesystem. If you use a clustered filesystem then you can safely mount it on multiple machines. If you use a non-clustered filesystem it can still mount and even appear to work for a while. It's surprising how many writes you can make to a dual- mounted filesystem that's not designed for such things before you get a totally broken filesystem. On Fri, 27 Jun 2014 13:15:16 Austin S Hemmelgarn wrote: The reason it appears to work when using iSCSI and not with directly connected parallel SCSI or SAS is that iSCSI doesn't provide low level hardware access. I've tried this with dual-attached FC and had no problems mounting. In what way is directly connected SCSI different from FC? FC is actually it's own networking stack (and you can even run (in theory) other protocols like IP and ATM on top of it), whereas parallel SCSI is just a multi-drop bus, and SAS is just a tree-structured bus with point-to-point communications emulated on top of it. In other words, parallel SCSI has topological constraints like RS-485, SAS has topology constraints like USB, and FC has topology constraints like Ethernet. Secondarily, most filesystems on Linux will let you mount them multiple times on separate hosts (ext4 has features to prevent this, but they are expensive and therefore turned off by default, I think XFS might have similar features, but I'm not sure). BTRFS should in theory be more resilient than most because of the COW nature (as long as it's only a few commit cycles, you should still be able to recover most of the data just fine). smime.p7s Description: S/MIME Cryptographic Signature
Re: RAID1 3+ drives
Russell Coker posted on Sat, 28 Jun 2014 10:51:00 +1000 as excerpted: On Fri, 27 Jun 2014 20:30:32 Zack Coffey wrote: Can I get more protection by using more than 2 drives? I had an onboard RAID a few years back that would let me use RAID1 across up to 4 drives. Currently the only RAID level that fully works in BTRFS is RAID-1 with data on 2 disks. Not /quite/ correct. Raid0 works, but of course that isn't exactly RAID as it's not redundant. And raid10 works. But that's simply raid0 over raid1. So depending on whether you consider raid0 actually RAID or not, which in turn depends on how strict you are with the redundant part, there is or is not more than btrfs raid1 working. If you have 4 disks in the array then each block will be on 2 of the disks. Correct. FWIW I'm told that the paper that laid out the original definition of RAID (which was linked on this list in a similar discussion some months ago) defined RAID-1 as paired redundancy, no matter the number of devices. Various implementations (including Linux' own mdraid soft-raid, and I believe dmraid as well) feature multi-way-mirroring aka N-way- mirroring such that N devices equals N way mirroring, but that's an implementation extension and isn't actually necessary to claim RAID-1 support. So look for N-way-mirroring when you go RAID shopping, and no, btrfs does not have it at this time, altho it is roadmapped for implementation after completion of the raid5/6 code. FWIW, N-way-mirroring is my #1 btrfs wish-list item too, not just for device redundancy, but to take full advantage of btrfs data integrity features, allowing to scrub a checksum-mismatch copy with the content of a checksum-validated copy if available. That's currently possible, but due to the pair-mirroring-only restriction, there's only one additional copy, and if it happens to be bad as well, there's no possibility of a third copy to scrub from. As it happens my personal sweet-spot between cost/performance and reliability would be 3-way mirroring, but once they code beyond N=2, N should go unlimited, so N=3, N=4, N=50 if you have a way to hook them all up... should all be possible. But... RAID-5/6 code mostly works but the last report I read indicated that some situations for recovery and disk replacement didn't work - presumably anyone who's afraid of multiple disks failing isn't going to want to trust BTRFS RAID-6 code at the moment. The raid5/6 code was on the list to be introduced in the next kernel or two something like two years ago, when I originally looked into it, and likely before that. Like many of the btrfs features, it actually took rather longer to cook than was in the original plan -- it's actually rather more complicated than anticipated, and additionally it has been put off a few times to work on bugfixing currently supported feature bugs. An incomplete raid56 implementation, normal runtime but not scrub or recovery, was introduced several kernels ago now, but it's still not complete. So N-way-mirroring, which is supposed to build on several bits of the raid5/6 implementation and therefore is roadmapped for after it, continues to look about the same 3-5 kernels off, after raid5/6, as it did two years ago. Except, having seen the raid5/6 timing, and having looked back at btrfs feature history going back rather longer, even if raid5/6 was declared finished for kernel 3.17 (since 3.16 is past the commit window), I'd guess it'd probably take another five kernels (a year's worth) or so, at /least/, for N-way-mirroring to properly cook. So in actuality I'd be surprised to see any N-way-mirroring code at all before next spring, and would /not/ be surprised at all to see it take all of next year to fully cook to completion. Not that I'm complaining /too/ much. We work with what we have and btrfs as it is is quite beyond the features of most filesystems (just the data integrity and multi-device filesystem stuff at all, is great to work with, besides the stuff like subvolumes and snapshotting that doesn't fit my use-case that well =:^), even if it /is/ all presently limited to two- way-mirroring! =:^\ ). But it will sure be nice when I /can/ count on that third copy to scrub two bad copies, if two copies /do/ happen to be bad. If you want to have 4 disks in a fully redundant configuration (IE you could lose 3 disks without losing any data) then the thing to do is to have 2 RAID-1 arrays with Linux software RAID and then run BTRFS RAID-1 on top of that. The caveat with that is that at least mdraid1/dmraid1 has no verified data integrity, and while mdraid5/6 does have 1/2-way-parity calculation, it's only used in recovery, NOT cross-verified in ordinary use. So it's not a proper substitute, tho I guess some big-money hardware raids might do it. In fact, with md/dmraid and its reasonable possibility of silent corruption since at that level any of the copies could be returned and