Re: discard synchronous on most SSDs?
On Sat, 15 Mar 2014 04:25:05 PM Chris Samuel wrote: > I wonder if it would be possible to use that knowledge to extend the > smartctl's --identify functionality to report this? After reading the SATA 3.1 spec I believe that smartctl *can* indicate if a drive claims to support SATA 3.1 NCQ TRIM, thus: $ sudo smartctl --identify /dev/sdb | fgrep 'Trim bit in DATA SET MANAGEMENT' 169 0 1 Trim bit in DATA SET MANAGEMENT command supported $ If that command returns nothing then it's not reported as supported (and I've tested that). You can get the same info with hdparm -I. Of course, as Martin said, that doesn't necessarily mean the kernel is using that reported ability. My puzzle now is that I have two SSD drives that report supporting NCQ TRIM (one confirmed via product info) but report only supporting SATA 3.0 not 3.1. cheers, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC signature.asc Description: This is a digitally signed message part.
Re: discard synchronous on most SSDs?
On Fri, 14 Mar 2014 03:57:41 PM Martin K. Petersen wrote: > The fact that the drive reports compliance with a certain version of > SATA does not in any way imply that it implements all commands defined > in that specification. It looks like drives that do support it can be detected with the kernel helper function ata_fpdma_dsm_supported() defined in include/linux/libata.h. I wonder if it would be possible to use that knowledge to extend the smartctl's --identify functionality to report this? Not even all drives that implement it do so correctly, the kernel has a blacklist of drives that don't and currently lists just two: /* devices that don't properly handle queued TRIM commands */ { "Micron_M500*",· · NULL,· ATA_HORKAGE_NO_NCQ_TRIM, }, { "Crucial_CT???M500SSD*",· NULL,· ATA_HORKAGE_NO_NCQ_TRIM, }, cheers, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC signature.asc Description: This is a digitally signed message part.
Re: discard synchronous on most SSDs?
On Fri, Mar 14, 2014 at 08:46:09PM +, Holger Hoffstätte wrote: > On Fri, 14 Mar 2014 15:57:41 -0400, Martin K. Petersen wrote: > > > So right now I'm afraid we don't have a good way for a user to determine > > whether a device supports queued trims or not. > > Mount with discard, unpack kernel tree, sync, rm -rf tree. > If it takes several seconds, you have sync discard, no? Mmmh, interesting point. legolas:/usr/src# time rm -rf linux-3.14-rc5 real0m1.584s user0m0.008s sys 0m1.524s I remounted my FS with remount,nodiscard, and the time was the same. > This changed somewhere around kernel 3.8.x; before that it used to be > acceptably fast. Since then I only do batch trims, daily (server) or > weekly (laptop). I'm never really timed this before. Is it supposed to be faster than 1.5s on a fast SSD? Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: discard synchronous on most SSDs?
On Fri, 14 Mar 2014 06:33:24 PM Chris Samuel wrote: > I *think* you want smartctl -i instead, and look for the field that says > something like: > > ATA Version is: ATA8-ACS, ACS-2 T13/2015-D revision 3 Late night, cut and pasted the wrong line of output, mine says: SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) Of course that's what the drive is reporting it supports, I'm not sure whether that's the result of what has been negotiated between the controller and drive or purely what the drive supports. To get more information from smartctl you can use the --identify=wb option instead of -i and that should give you a lot more detail about what then drives claims to (and not to) support. On the version in Kubuntu 13.10 (6.1+svn3812-1) it only reports 3 things regarding TRIM for my drives. chris@quad:/tmp$ sudo smartctl --identify=wb -d sat /dev/sdb | egrep -i 'trim| discard' 69 14 1 Deterministic data after trim supported 69 5 0 Trimmed LBA range(s) returning zeroed data supported 169 0 1 Trim bit in DATA SET MANAGEMENT command supported I'm currently doing a git clone of their SVN repo to see if there's any new functionality that will gather any more information. cheers, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC signature.asc Description: This is a digitally signed message part.
Re: [PATCH] Btrfs-progs: scrub: don't call unlock if pthread_mutex_lock fails
Hi, Forgot to mention the reason for change. If accepted this can be included in commit message: On Sat, Mar 15, 2014 at 01:49:45AM +0200, Rakesh Pandit wrote: > If pthread_mutex_lock fails (rare but fix it anyway), don't call > pthread_mutex_unlock on mutex. > Rationale being that if pthread_mutex_lock fails pthread_mutex_unlock will always fail and overwrite actual error value in err. > Signed-off-by: Rakesh Pandit regards, -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
btrfs: lock inversion between delayed_node->mutex and found->groups_sem
Hi all, While fuzzing with trinity inside a KVM tools guest running the latest -next kernel I've stumbled on the following: [ 788.451695] = [ 788.452455] [ INFO: possible irq lock inversion dependency detected ] [ 788.453020] 3.14.0-rc6-next-20140313-sasha-00010-gb8c1db1-dirty #217 Tainted: GW [ 788.453827] - [ 788.454371] kswapd3/4199 just changed the state of lock: [ 788.454902] (&delayed_node->mutex){+.+.-.}, at: __btrfs_release_delayed_node+0x4f/0x140 (fs/btrfs/delayed-inode.c:263) [ 788.455890] but this lock took another, RECLAIM_FS-unsafe lock in the past: [ 788.456543] (&found->groups_sem){+.} and interrupts could create inverse lock ordering between them. [ 788.457491] [ 788.457491] other info that might help us debug this: [ 788.458115] Possible interrupt unsafe locking scenario: [ 788.458115] [ 788.458756]CPU0CPU1 [ 788.459188] [ 788.459625] lock(&found->groups_sem); [ 788.460041]local_irq_disable(); [ 788.460041]lock(&delayed_node->mutex); [ 788.460041]lock(&found->groups_sem); [ 788.460041] [ 788.460041] lock(&delayed_node->mutex); [ 788.460041] [ 788.460041] *** DEADLOCK *** [ 788.460041] [ 788.460041] 2 locks held by kswapd3/4199: [ 788.460041] #0: (shrinker_rwsem){..}, at: shrink_slab+0x3f/0x160 (mm/vmscan.c:360) [ 788.460041] #1: (&type->s_umount_key#108){.+.+..}, at: grab_super_passive+0x56/0x90 (fs/super.c:361) [ 788.460041] [ 788.460041] the shortest dependencies between 2nd lock and 1st lock: [ 788.460041] -> (&found->groups_sem){+.} ops: 46 { [ 788.460041] HARDIRQ-ON-W at: [ 788.460041] mark_irqflags+0xf0/0x170 (kernel/locking/lockdep.c:2800) [ 788.460041] __lock_acquire+0x2de/0x5a0 (kernel/locking/lockdep.c:3138) [ 788.460041] lock_acquire+0x182/0x1d0 (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602) [ 788.460041] down_write+0x5c/0xc0 (arch/x86/include/asm/rwsem.h:130 kernel/locking/rwsem.c:50) [ 788.460041] __link_block_group+0x45/0x110 (fs/btrfs/extent-tree.c:8348) [ 788.460041] btrfs_read_block_groups+0x3ae/0x700 (fs/btrfs/extent-tree.c:8533) [ 788.460041] open_ctree+0x1abf/0x2210 (fs/btrfs/disk-io.c:2749) [ 788.460041] btrfs_fill_super+0x81/0x140 (fs/btrfs/super.c:958) [ 788.460041] btrfs_mount+0x26a/0x300 (fs/btrfs/super.c:1295) [ 788.460041] mount_fs+0x8d/0x1a0 (fs/super.c:1091) [ 788.460041] vfs_kern_mount+0x79/0x150 (fs/namespace.c:813) [ 788.460041] do_new_mount+0xcd/0x1c0 (fs/namespace.c:2068)[ 788.460041] do_mount+0x15d/0x210 (fs/namespace.c:2392) [ 788.460041] SyS_mount+0x9d/0xe0 (fs/namespace.c:2589 fs/namespace.c:2560) [ 788.460041] tracesys+0xdd/0xe2 (arch/x86/kernel/entry_64.S:749) [ 788.460041] HARDIRQ-ON-R at: [ 788.460041] mark_irqflags+0xbc/0x170 (kernel/locking/lockdep.c:2792) [ 788.460041] __lock_acquire+0x2de/0x5a0 (kernel/locking/lockdep.c:3138) [ 788.460041] lock_acquire+0x182/0x1d0 (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602) [ 788.460041] down_read+0x4c/0xa0 (arch/x86/include/asm/rwsem.h:83 kernel/locking/rwsem.c:23) [ 788.460041] btrfs_calc_num_tolerated_disk_barrier_failures+0x2a7/0x3a0 (fs/btrfs/disk-io.c:3309) [ 788.460041] open_ctree+0x1af7/0x2210 (fs/btrfs/disk-io.c:2755) [ 788.460041] btrfs_fill_super+0x81/0x140 (fs/btrfs/super.c:958) [ 788.460041] btrfs_mount+0x26a/0x300 (fs/btrfs/super.c:1295) [ 788.460041] mount_fs+0x8d/0x1a0 (fs/super.c:1091) [ 788.460041] vfs_kern_mount+0x79/0x150 (fs/namespace.c:813) [ 788.460041] do_new_mount+0xcd/0x1c0 (fs/namespace.c:2068) [ 788.460041] do_mount+0x15d/0x210 (fs/namespace.c:2392) [ 788.460041] SyS_mount+0x9d/0xe0 (fs/namespace.c:2589 fs/namespace.c:2560) [ 788.460041] tracesys+0xdd/0xe2 (arch/x86/kernel/entry_64.S:749) [ 788.460041] SOFTIRQ-ON-W at: [ 788.460041] mark_irqflags+0x110/0x170 (kernel/locking/lockdep.c:2804) [ 788.460041] __lock_acquire+0x2de/0x5a0 (kernel/locking/lockdep.c:3138) [ 788.460041] lock_acquire+0x182/0x1d0 (arch/x86/include/asm/current.h:14 ke
[PATCH] Btrfs-progs: scrub: don't call unlock if pthread_mutex_lock fails
If pthread_mutex_lock fails (rare but fix it anyway), don't call pthread_mutex_unlock on mutex. Signed-off-by: Rakesh Pandit --- cmds-scrub.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/cmds-scrub.c b/cmds-scrub.c index 128537b..ca11fb5 100644 --- a/cmds-scrub.c +++ b/cmds-scrub.c @@ -776,7 +776,7 @@ static int scrub_write_progress(pthread_mutex_t *m, const char *fsid, ret = pthread_mutex_lock(m); if (ret) { err = -ret; - goto out; + goto fail; } ret = pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, &old); @@ -808,6 +808,7 @@ out: if (ret && !err) err = -ret; +fail: ret = pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, &old); if (ret && !err) err = -ret; -- 1.8.5.3 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: remove transaction from send
On Fri, Mar 14, 2014 at 02:51:22PM -0400, Josef Bacik wrote: > On 03/13/2014 06:16 PM, Hugo Mills wrote: > >On Thu, Mar 13, 2014 at 03:42:13PM -0400, Josef Bacik wrote: > >>Lets try this again. We can deadlock the box if we send on a box and try to > >>write onto the same fs with the app that is trying to listen to the send > >>pipe. > >>This is because the writer could get stuck waiting for a transaction commit > >>which is being blocked by the send. So fix this by making sure looking at > >>the > >>commit roots is always going to be consistent. We do this by keeping track > >>of > >>which roots need to have their commit roots swapped during commit, and then > >>taking the commit_root_sem and swapping them all at once. Then make sure we > >>take a read lock on the commit_root_sem in cases where we search the commit > >>root > >>to make sure we're always looking at a consistent view of the commit roots. > >>Previously we had problems with this because we would swap a fs tree commit > >>root > >>and then swap the extent tree commit root independently which would cause > >>the > >>backref walking code to screw up sometimes. With this patch we no longer > >>deadlock and pass all the weird send/receive corner cases. Thanks, > > > >There's something still going on here. I managed to get about twice > >as far through my test as I had before, but I again got an "unexpected > >EOF in stream", with btrfs send returning 1. As before, I have this in > >syslog: > > > >Mar 13 22:09:12 s_src@amelia kernel: BTRFS error (device sda2): did not find > >backref in send_root. inode=1786631, offset=825257984, disk_byte=36504023040 > >found extent=36504023040\x0a > > > > I just noticed that the offset you have there is freaking gigantic, > like 700mb, which is way larger than what an extent should be. Here > is a newer debug patch, just chuck the old on and put this instead > and re-run > > http://paste.fedoraproject.org/85486/39482301 That last run, with the above patch, failed again, at approximately the same place again. The only output in dmesg is: [ 6488.168469] BTRFS error (device sda2): did not find backref in send_root. inode=1786631, offset=825257984, disk_byte=36504023040 found extent=36504023040, len=1294336 as before. Definitely no kernel WARN, no backtraces. Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- You're never alone with a rubber duck... --- signature.asc Description: Digital signature
Re: [PATCH] Btrfs: fix deadlock with nested trans handles
On Wed, Mar 12, 2014 at 12:34 PM, Rich Freeman wrote: > On Wed, Mar 12, 2014 at 11:24 AM, Josef Bacik wrote: >> On 03/12/2014 08:56 AM, Rich Freeman wrote: >>> >>> After a number of reboots the system became stable, presumably >>> whatever race condition btrfs was hitting followed a favorable >>> path. >>> >>> I do have a 2GB btrfs-image pre-dating my application of this >>> patch that was causing the issue last week. >>> >> >> Uhm wow that's pretty epic. I will talk to chris and figure out how >> we want to deal with that and send you a patch shortly. Thanks, > > A tiny bit more background. And some more background. I had more reboots over the next two days at the same time each day, just after my crontab successfully completed. One of the last thing it does is runs the snapper cleanups which delete a bunch of snapshots. During a reboot I checked and there were a bunch of deleted snapshots, which disappeared over the next 30-60 seconds before the panic, and then they would re-appear on the next reboot. I disabled the snapper cron job and this morning had no issues at all. One day isn't much to establish a trend, but I suspect that this is the cause. Obviously getting rid of snapshots would be desirable at some point, but I can wait for a patch. Snapper would be deleting about 48 snapshots at the same time, since I create them hourly and the cleanup occurs daily on two different subvolumes on the same filesystem. Rich -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: discard synchronous on most SSDs?
On Mar 13, 2014, at 11:17 PM, Marc MERLIN wrote: > On Thu, Mar 13, 2014 at 09:39:02PM -0600, Chris Murphy wrote: >> >> On Mar 13, 2014, at 8:11 PM, Marc MERLIN wrote: >> >>> On Sun, Mar 09, 2014 at 11:33:50AM +, Hugo Mills wrote: discard is, except on the very latest hardware, a synchronous command (it's a limitation of the SATA standard), and therefore results in very very poor performance. >>> >>> Interesting. How do I know if a given SSD will hang on discard? >>> Is a Samsung EVO 840 1TB SSD latest hardware enough, or not? :) >> >> smartctl -a or -x will tell you what SATA revision is in place. The queued >> trim support is in SATA Rev 3.1. I'm not certain if this requires only the >> drive to support that revision level, or both controller and drive. > > I'm not sure I'm seeing this, which field is that? > > === START OF INFORMATION SECTION === > Device Model: Samsung SSD 840 EVO 1TB > Serial Number:S1D9NEAD934600N > LU WWN Device Id: 5 002538 85009a8ff > Firmware Version: EXT0BB0Q > User Capacity:1,000,204,886,016 bytes [1.00 TB] > Sector Size: 512 bytes logical/physical > Device is:Not in smartctl database [for details use: -P showall] > ATA Version is: 8 > ATA Standard is: ATA-8-ACS revision 4c > Local Time is:Thu Mar 13 22:15:14 2014 PDT > SMART support is: Available - device has SMART capability. > SMART support is: Enabled After ATA Version for me. $ smartctl -a /dev/disk0 smartctl 6.1 2013-03-16 r3800 [x86_64-apple-darwin12.3.0] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Samsung based SSDs Device Model: SAMSUNG SSD 830 Series Serial Number:S0Z4NEAC933856 LU WWN Device Id: 5 002538 043584d30 Firmware Version: CXM03B1Q User Capacity:256,060,514,304 bytes [256 GB] Sector Size: 512 bytes logical/physical Rotation Rate:Solid State Device Device is:In smartctl database [for details use: -P show] ATA Version is: ACS-2 T13/2015-D revision 2 SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is:Fri Mar 14 15:37:07 2014 MDT SMART support is: Available - device has SMART capability. SMART support is: Enabled The Samsung hardware by and large is fairly well behaved with discard in my experience. But it does really depend a lot on the workload. I'd notice occasional random freezes for a couple of seconds when I had it enabled in OS X (totally different animal from the kernel up), nothing severe. But it was annoying enough I disabled it, and the problem went away. Apple doesn't enable trim by default on non-Apple SSD's still, so the idea that "everyone else" is doing this isn't true. The Windows implementation is rather complex, and also isn't always used contrary to what's been reported (on the everybody panic or get mad NOW type web sites). If you want to be conservative about it, I'd say just manually run fstrim when the system is idle. Do that once a week or two. Chron job it if you want. Chris Murphy-- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: fix race when updating existing ref head
While we update an existing ref head's extent_op, we're not holding its spinlock, so while we're updating its extent_op contents (key, flags) we can have a task running __btrfs_run_delayed_refs() that holds the ref head's lock and sets its extent_op to NULL right after the task updating the ref head just checked its extent_op was not NULL. Signed-off-by: Filipe David Borba Manana --- fs/btrfs/delayed-ref.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c index 2502ba5..3129964 100644 --- a/fs/btrfs/delayed-ref.c +++ b/fs/btrfs/delayed-ref.c @@ -495,6 +495,7 @@ update_existing_head_ref(struct btrfs_delayed_ref_node *existing, ref = btrfs_delayed_node_to_head(update); BUG_ON(existing_ref->is_data != ref->is_data); + spin_lock(&existing_ref->lock); if (ref->must_insert_reserved) { /* if the extent was freed and then * reallocated before the delayed ref @@ -536,7 +537,6 @@ update_existing_head_ref(struct btrfs_delayed_ref_node *existing, * only need the lock for this case cause we could be processing it * currently, for refs we just added we know we're a-ok. */ - spin_lock(&existing_ref->lock); existing->ref_mod += update->ref_mod; spin_unlock(&existing_ref->lock); } -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: discard synchronous on most SSDs?
On Fri, 14 Mar 2014 15:57:41 -0400, Martin K. Petersen wrote: > So right now I'm afraid we don't have a good way for a user to determine > whether a device supports queued trims or not. Mount with discard, unpack kernel tree, sync, rm -rf tree. If it takes several seconds, you have sync discard, no? This changed somewhere around kernel 3.8.x; before that it used to be acceptably fast. Since then I only do batch trims, daily (server) or weekly (laptop). -h -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Btrfs: abort the transaction when we don't find our extent ref
I'm not sure why we weren't aborting here in the first place, it is obviously a bad time from the fact that we print the leaf and yell loudly about it. Fix this up, otherwise we panic because our path could be pointing into oblivion. Thanks, Signed-off-by: Josef Bacik --- fs/btrfs/extent-tree.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 696f0b6..0015b02 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -5744,6 +5744,8 @@ static int __btrfs_free_extent(struct btrfs_trans_handle *trans, "unable to find ref byte nr %llu parent %llu root %llu owner %llu offset %llu", bytenr, parent, root_objectid, owner_objectid, owner_offset); + btrfs_abort_transaction(trans, extent_root, ret); + goto out; } else { btrfs_abort_transaction(trans, extent_root, ret); goto out; -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: discard synchronous on most SSDs?
> "Marc" == Marc MERLIN writes: Marc, Marc> So I have Sata 3.1, that's great news, it means I can keep using Marc> discard without worrying about performance and hangs The fact that the drive reports compliance with a certain version of SATA does not in any way imply that it implements all commands defined in that specification. The location where queued TRIM support is reported is somewhat unusual. And last I looked hdparm -I had no infrastructure in place to report stuff contained in log pages. The kernel does look the right place to determine whether to issue the queued or unqueued variant or not. But the information isn't exported to userland. So right now I'm afraid we don't have a good way for a user to determine whether a device supports queued trims or not. I guess we could consider either adding an ATA-specific "I don't suck" flag in sysfs, add the missing code to hdparm, or both... -- Martin K. Petersen Oracle Linux Engineering -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: discard synchronous on most SSDs?
On Fri, Mar 14, 2014 at 12:07:54PM +, Duncan wrote: > Marc MERLIN posted on Thu, 13 Mar 2014 22:17:50 -0700 as excerpted: > > > On Thu, Mar 13, 2014 at 09:39:02PM -0600, Chris Murphy wrote: > >> > >> On Mar 13, 2014, at 8:11 PM, Marc MERLIN wrote: > >> > >> > On Sun, Mar 09, 2014 at 11:33:50AM +, Hugo Mills wrote: > >> >> discard is, except on the very latest hardware, a synchronous > >> >> command (it's a limitation of the SATA standard), and therefore > >> >> results in very very poor performance. > >> > > >> > Interesting. How do I know if a given SSD will hang on discard? > >> > Is a Samsung EVO 840 1TB SSD latest hardware enough, or not? :) > >> > >> smartctl -a or -x will tell you what SATA revision is in place. The > >> queued trim support is in SATA Rev 3.1. I'm not certain if this > >> requires only the drive to support that revision level, or both > >> controller and drive. > > > > I'm not sure I'm seeing this, which field is that? > > > ATA Version is: 8 > > ATA Standard is: ATA-8-ACS revision 4c > > Your drive didn't report it, but here, I have SATA fields as well, in > addition to the ATA fields: > > Here's the fields from my Corsair Neutron SSDs: > > ATA Version is: ATA8-ACS (minor revision not indicated) > SATA Version is: SATA 2.5, 6.0 Gb/s > > Here's the fields from my Seagate 500-gig 2.5-inch spinning rust: > > ATA Version is: ATA8-ACS T13/1699-D revision 4 > SATA Version is: SATA 2.6, 3.0 Gb/s Ok, my smartmontools was too old. I got a newer one and now have proper output: Device Model: Samsung SSD 840 EVO 1TB Serial Number:S1D9NEAD934600N LU WWN Device Id: 5 002538 85009a8ff Firmware Version: EXT0BB0Q User Capacity:1,000,204,886,016 bytes [1.00 TB] Sector Size: 512 bytes logical/physical Rotation Rate:Solid State Device Device is:Not in smartctl database [for details use: -P showall] ATA Version is: ACS-2, ATA8-ACS T13/1699-D revision 4c SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is:Fri Mar 14 10:49:39 2014 PDT SMART support is: Available - device has SMART capability. SMART support is: Enabled So I have Sata 3.1, that's great news, it means I can keep using discard without worrying about performance and hangs Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: remove transaction from send
On 03/13/2014 06:16 PM, Hugo Mills wrote: On Thu, Mar 13, 2014 at 03:42:13PM -0400, Josef Bacik wrote: Lets try this again. We can deadlock the box if we send on a box and try to write onto the same fs with the app that is trying to listen to the send pipe. This is because the writer could get stuck waiting for a transaction commit which is being blocked by the send. So fix this by making sure looking at the commit roots is always going to be consistent. We do this by keeping track of which roots need to have their commit roots swapped during commit, and then taking the commit_root_sem and swapping them all at once. Then make sure we take a read lock on the commit_root_sem in cases where we search the commit root to make sure we're always looking at a consistent view of the commit roots. Previously we had problems with this because we would swap a fs tree commit root and then swap the extent tree commit root independently which would cause the backref walking code to screw up sometimes. With this patch we no longer deadlock and pass all the weird send/receive corner cases. Thanks, There's something still going on here. I managed to get about twice as far through my test as I had before, but I again got an "unexpected EOF in stream", with btrfs send returning 1. As before, I have this in syslog: Mar 13 22:09:12 s_src@amelia kernel: BTRFS error (device sda2): did not find backref in send_root. inode=1786631, offset=825257984, disk_byte=36504023040 found extent=36504023040\x0a I just noticed that the offset you have there is freaking gigantic, like 700mb, which is way larger than what an extent should be. Here is a newer debug patch, just chuck the old on and put this instead and re-run http://paste.fedoraproject.org/85486/39482301 thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: warn at fs/btrfs/extent-tree.c:5748 __btrfs_free_extent+0x9ce/0xa20
On Fri, Mar 14, 2014 at 3:35 PM, Josef Bacik wrote: > On 03/14/2014 11:34 AM, Sage Weil wrote: >> >> On Fri, 14 Mar 2014, Josef Bacik wrote: >>> >>> On 03/11/2014 07:44 PM, Sage Weil wrote: Hey, Is this something you guys have seen before? This is from v3.13-rc2. kernel: [49432.696440] WARNING: CPU: 3 PID: 26411 at /srv/autobuild-ceph/gitbuilder.git/build/fs/btrfs/extent-tree.c:5748 __btrfs_free_extent+0x9ce/0xa20 [btrfs]() kernel: [49432.710128] Modules linked in: arc4(F) md4(F) nls_utf8(F) cifs(F) ufs(F) qnx4(F) hfsplus(F) hfs(F) minix(F) ntfs(F) msdos(F) jfs(F) xfs(F) reiserfs(F) ext2(F) kvm_intel(F) kvm(F) ib_iser(F) rdma_cm(F) ib_cm(F) iw_cm(F) ib_sa(F) ib_mad(F) ib_core(F) ib_addr(F) iscsi_tcp(F) libiscsi_tcp(F) libiscsi(F) psmouse(F) ipmi_si(F) serio_raw(F) gpio_ich(F) joydev(F) dcdbas(F) i7core_edac(F) edac_core(F) ipmi_msghandler(F) mac_hid(F) acpi_power_meter(F) lpc_ich(F) tpm_tis(F) nfsd(F) nfs_acl(F) auth_rpcgss(F) scsi_transport_iscsi(F) nfs(F) fscache(F) lockd(F) lp(F) sunrpc(F) parport(F) hid_generic(F) usbhid(F) hid(F) btrfs(F) raid6_pq(F) mptsas(F) ixgbe(F) mptscsih(F) dca(F) mptbase(F) ptp(F) pps_core(F) scsi_transport_sas(F) xor(F) mdio(F) bnx2(F) libcrc32c(F) kernel: [49432.777445] CPU: 3 PID: 26411 Comm: ceph-osd Tainted: GF I 3.14.0-rc5-ceph-00016-gf31a96a #1 kernel: [49432.786704] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011 kernel: [49432.794223] 1674 8800bf1cbac8 816e4840 88022726ef90 kernel: [49432.801700] 8800bf1cbb08 810524ac a8b07e50 kernel: [49432.809176] 880094e74120 b07c9000 kernel: [49432.816653] Call Trace: kernel: [49432.819119] [] dump_stack+0x46/0x58 kernel: [49432.825384] [] warn_slowpath_common+0x8c/0xc0 kernel: [49432.831413] [] warn_slowpath_null+0x1a/0x20 kernel: [49432.837284] [] __btrfs_free_extent+0x9ce/0xa20 [btrfs] kernel: [49432.844108] [] __btrfs_run_delayed_refs+0x428/0x11e0 [btrfs] kernel: [49432.851465] [] ? block_rsv_release_bytes+0x108/0x190 [btrfs] kernel: [49432.858823] [] btrfs_run_delayed_refs+0x76/0x2a0 [btrfs] kernel: [49432.865869] [] __btrfs_end_transaction+0x26f/0x370 [btrfs] kernel: [49432.873044] [] btrfs_end_transaction+0x10/0x20 [btrfs] kernel: [49432.879872] [] btrfs_link+0x13e/0x1d0 [btrfs] kernel: [49432.885903] [] vfs_link+0x1b1/0x270 kernel: [49432.891060] [] SyS_linkat+0x210/0x2d0 kernel: [49432.896394] [] SyS_link+0x1e/0x20 kernel: [49432.901380] [] system_call_fastpath+0x1a/0x1f The full dump is at https://urldefense.proofpoint.com/v1/url?u=http://tracker.ceph.com/issues/7688&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=5Q0Wl4GGvXb3sw11Xy%2FYQnZbcMlzHHsbegI1uoQnEbE%3D%0A&s=f85b1094d776c10386c681a8a7b31e49f0621bf51829b6e7153095f2335a01c0 https://urldefense.proofpoint.com/v1/url?u=http://tracker.ceph.com/attachments/download/1141/kern.log.gz&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=5Q0Wl4GGvXb3sw11Xy%2FYQnZbcMlzHHsbegI1uoQnEbE%3D%0A&s=dff103270aba751a919e566182a4d9482041f972ac72cb12435814ae75cacf14 >>> >>> Filipe's looking at this Sage, you said it happend on v3.13-rc2 but the >>> kernel line says 3.14.0-rc5, have you had it happen in both places? >>> Thanks, >> >> >> Whoops, that's my mistake.. it's 3.14-rc5. The exact commit is it >> git://github.com/ceph/ceph-client.git, if it matters; it's -rc5 + some >> ceph patches. >> > > Cool, not worried about what you guys are doing, just wondering if it may be > related to me screwing around in delayed ref land recently or if you had > seen it earlier too. Thanks, I ran into this a couple times months ago, definitely way before the recent changes in the ref merging code added in 3.14. I had balance running with concurrent snapshot creation and deletion at the time, but unsuccessful so far to trigger it again. > > Josef > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Filipe David Manana, "Reasonable men adapt themselves to the world. Unreasonable men adapt the world to themselves. That's why all progress depends on unreasonable men." -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: warn at fs/btrfs/extent-tree.c:5748 __btrfs_free_extent+0x9ce/0xa20
On 03/14/2014 11:34 AM, Sage Weil wrote: On Fri, 14 Mar 2014, Josef Bacik wrote: On 03/11/2014 07:44 PM, Sage Weil wrote: Hey, Is this something you guys have seen before? This is from v3.13-rc2. kernel: [49432.696440] WARNING: CPU: 3 PID: 26411 at /srv/autobuild-ceph/gitbuilder.git/build/fs/btrfs/extent-tree.c:5748 __btrfs_free_extent+0x9ce/0xa20 [btrfs]() kernel: [49432.710128] Modules linked in: arc4(F) md4(F) nls_utf8(F) cifs(F) ufs(F) qnx4(F) hfsplus(F) hfs(F) minix(F) ntfs(F) msdos(F) jfs(F) xfs(F) reiserfs(F) ext2(F) kvm_intel(F) kvm(F) ib_iser(F) rdma_cm(F) ib_cm(F) iw_cm(F) ib_sa(F) ib_mad(F) ib_core(F) ib_addr(F) iscsi_tcp(F) libiscsi_tcp(F) libiscsi(F) psmouse(F) ipmi_si(F) serio_raw(F) gpio_ich(F) joydev(F) dcdbas(F) i7core_edac(F) edac_core(F) ipmi_msghandler(F) mac_hid(F) acpi_power_meter(F) lpc_ich(F) tpm_tis(F) nfsd(F) nfs_acl(F) auth_rpcgss(F) scsi_transport_iscsi(F) nfs(F) fscache(F) lockd(F) lp(F) sunrpc(F) parport(F) hid_generic(F) usbhid(F) hid(F) btrfs(F) raid6_pq(F) mptsas(F) ixgbe(F) mptscsih(F) dca(F) mptbase(F) ptp(F) pps_core(F) scsi_transport_sas(F) xor(F) mdio(F) bnx2(F) libcrc32c(F) kernel: [49432.777445] CPU: 3 PID: 26411 Comm: ceph-osd Tainted: GF I 3.14.0-rc5-ceph-00016-gf31a96a #1 kernel: [49432.786704] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011 kernel: [49432.794223] 1674 8800bf1cbac8 816e4840 88022726ef90 kernel: [49432.801700] 8800bf1cbb08 810524ac a8b07e50 kernel: [49432.809176] 880094e74120 b07c9000 kernel: [49432.816653] Call Trace: kernel: [49432.819119] [] dump_stack+0x46/0x58 kernel: [49432.825384] [] warn_slowpath_common+0x8c/0xc0 kernel: [49432.831413] [] warn_slowpath_null+0x1a/0x20 kernel: [49432.837284] [] __btrfs_free_extent+0x9ce/0xa20 [btrfs] kernel: [49432.844108] [] __btrfs_run_delayed_refs+0x428/0x11e0 [btrfs] kernel: [49432.851465] [] ? block_rsv_release_bytes+0x108/0x190 [btrfs] kernel: [49432.858823] [] btrfs_run_delayed_refs+0x76/0x2a0 [btrfs] kernel: [49432.865869] [] __btrfs_end_transaction+0x26f/0x370 [btrfs] kernel: [49432.873044] [] btrfs_end_transaction+0x10/0x20 [btrfs] kernel: [49432.879872] [] btrfs_link+0x13e/0x1d0 [btrfs] kernel: [49432.885903] [] vfs_link+0x1b1/0x270 kernel: [49432.891060] [] SyS_linkat+0x210/0x2d0 kernel: [49432.896394] [] SyS_link+0x1e/0x20 kernel: [49432.901380] [] system_call_fastpath+0x1a/0x1f The full dump is at https://urldefense.proofpoint.com/v1/url?u=http://tracker.ceph.com/issues/7688&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=5Q0Wl4GGvXb3sw11Xy%2FYQnZbcMlzHHsbegI1uoQnEbE%3D%0A&s=f85b1094d776c10386c681a8a7b31e49f0621bf51829b6e7153095f2335a01c0 https://urldefense.proofpoint.com/v1/url?u=http://tracker.ceph.com/attachments/download/1141/kern.log.gz&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=5Q0Wl4GGvXb3sw11Xy%2FYQnZbcMlzHHsbegI1uoQnEbE%3D%0A&s=dff103270aba751a919e566182a4d9482041f972ac72cb12435814ae75cacf14 Filipe's looking at this Sage, you said it happend on v3.13-rc2 but the kernel line says 3.14.0-rc5, have you had it happen in both places? Thanks, Whoops, that's my mistake.. it's 3.14-rc5. The exact commit is it git://github.com/ceph/ceph-client.git, if it matters; it's -rc5 + some ceph patches. Cool, not worried about what you guys are doing, just wondering if it may be related to me screwing around in delayed ref land recently or if you had seen it earlier too. Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: warn at fs/btrfs/extent-tree.c:5748 __btrfs_free_extent+0x9ce/0xa20
On Fri, 14 Mar 2014, Josef Bacik wrote: > On 03/11/2014 07:44 PM, Sage Weil wrote: > > Hey, > > > > Is this something you guys have seen before? This is from v3.13-rc2. > > > > kernel: [49432.696440] WARNING: CPU: 3 PID: 26411 at > > /srv/autobuild-ceph/gitbuilder.git/build/fs/btrfs/extent-tree.c:5748 > > __btrfs_free_extent+0x9ce/0xa20 [btrfs]() > > kernel: [49432.710128] Modules linked in: arc4(F) md4(F) nls_utf8(F) > > cifs(F) ufs(F) qnx4(F) hfsplus(F) hfs(F) minix(F) ntfs(F) msdos(F) jfs(F) > > xfs(F) reiserfs(F) ext2(F) kvm_intel(F) kvm(F) ib_iser(F) rdma_cm(F) > > ib_cm(F) iw_cm(F) ib_sa(F) ib_mad(F) ib_core(F) ib_addr(F) iscsi_tcp(F) > > libiscsi_tcp(F) libiscsi(F) psmouse(F) ipmi_si(F) serio_raw(F) gpio_ich(F) > > joydev(F) dcdbas(F) i7core_edac(F) edac_core(F) ipmi_msghandler(F) > > mac_hid(F) acpi_power_meter(F) lpc_ich(F) tpm_tis(F) nfsd(F) nfs_acl(F) > > auth_rpcgss(F) scsi_transport_iscsi(F) nfs(F) fscache(F) lockd(F) lp(F) > > sunrpc(F) parport(F) hid_generic(F) usbhid(F) hid(F) btrfs(F) raid6_pq(F) > > mptsas(F) ixgbe(F) mptscsih(F) dca(F) mptbase(F) ptp(F) pps_core(F) > > scsi_transport_sas(F) xor(F) mdio(F) bnx2(F) libcrc32c(F) > > kernel: [49432.777445] CPU: 3 PID: 26411 Comm: ceph-osd Tainted: GF > > I 3.14.0-rc5-ceph-00016-gf31a96a #1 > > kernel: [49432.786704] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS > > 1.6.3 02/07/2011 > > kernel: [49432.794223] 1674 8800bf1cbac8 816e4840 > > 88022726ef90 > > kernel: [49432.801700] 8800bf1cbb08 810524ac > > a8b07e50 > > kernel: [49432.809176] 880094e74120 b07c9000 > > > > kernel: [49432.816653] Call Trace: > > kernel: [49432.819119] [] dump_stack+0x46/0x58 > > kernel: [49432.825384] [] warn_slowpath_common+0x8c/0xc0 > > kernel: [49432.831413] [] warn_slowpath_null+0x1a/0x20 > > kernel: [49432.837284] [] > > __btrfs_free_extent+0x9ce/0xa20 [btrfs] > > kernel: [49432.844108] [] > > __btrfs_run_delayed_refs+0x428/0x11e0 [btrfs] > > kernel: [49432.851465] [] ? > > block_rsv_release_bytes+0x108/0x190 [btrfs] > > kernel: [49432.858823] [] > > btrfs_run_delayed_refs+0x76/0x2a0 [btrfs] > > kernel: [49432.865869] [] > > __btrfs_end_transaction+0x26f/0x370 [btrfs] > > kernel: [49432.873044] [] > > btrfs_end_transaction+0x10/0x20 [btrfs] > > kernel: [49432.879872] [] btrfs_link+0x13e/0x1d0 [btrfs] > > kernel: [49432.885903] [] vfs_link+0x1b1/0x270 > > kernel: [49432.891060] [] SyS_linkat+0x210/0x2d0 > > kernel: [49432.896394] [] SyS_link+0x1e/0x20 > > kernel: [49432.901380] [] system_call_fastpath+0x1a/0x1f > > > > The full dump is at > > > > > > https://urldefense.proofpoint.com/v1/url?u=http://tracker.ceph.com/issues/7688&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=5Q0Wl4GGvXb3sw11Xy%2FYQnZbcMlzHHsbegI1uoQnEbE%3D%0A&s=f85b1094d776c10386c681a8a7b31e49f0621bf51829b6e7153095f2335a01c0 > > > > https://urldefense.proofpoint.com/v1/url?u=http://tracker.ceph.com/attachments/download/1141/kern.log.gz&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=5Q0Wl4GGvXb3sw11Xy%2FYQnZbcMlzHHsbegI1uoQnEbE%3D%0A&s=dff103270aba751a919e566182a4d9482041f972ac72cb12435814ae75cacf14 > > > > Filipe's looking at this Sage, you said it happend on v3.13-rc2 but the > kernel line says 3.14.0-rc5, have you had it happen in both places? Thanks, Whoops, that's my mistake.. it's 3.14-rc5. The exact commit is it git://github.com/ceph/ceph-client.git, if it matters; it's -rc5 + some ceph patches. Thanks! sage -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: warn at fs/btrfs/extent-tree.c:5748 __btrfs_free_extent+0x9ce/0xa20
On 03/11/2014 07:44 PM, Sage Weil wrote: > Hey, > > Is this something you guys have seen before? This is from v3.13-rc2. > > kernel: [49432.696440] WARNING: CPU: 3 PID: 26411 at > /srv/autobuild-ceph/gitbuilder.git/build/fs/btrfs/extent-tree.c:5748 > __btrfs_free_extent+0x9ce/0xa20 [btrfs]() > kernel: [49432.710128] Modules linked in: arc4(F) md4(F) nls_utf8(F) cifs(F) > ufs(F) qnx4(F) hfsplus(F) hfs(F) minix(F) ntfs(F) msdos(F) jfs(F) xfs(F) > reiserfs(F) ext2(F) kvm_intel(F) kvm(F) ib_iser(F) rdma_cm(F) ib_cm(F) > iw_cm(F) ib_sa(F) ib_mad(F) ib_core(F) ib_addr(F) iscsi_tcp(F) > libiscsi_tcp(F) libiscsi(F) psmouse(F) ipmi_si(F) serio_raw(F) gpio_ich(F) > joydev(F) dcdbas(F) i7core_edac(F) edac_core(F) ipmi_msghandler(F) mac_hid(F) > acpi_power_meter(F) lpc_ich(F) tpm_tis(F) nfsd(F) nfs_acl(F) auth_rpcgss(F) > scsi_transport_iscsi(F) nfs(F) fscache(F) lockd(F) lp(F) sunrpc(F) parport(F) > hid_generic(F) usbhid(F) hid(F) btrfs(F) raid6_pq(F) mptsas(F) ixgbe(F) > mptscsih(F) dca(F) mptbase(F) ptp(F) pps_core(F) scsi_transport_sas(F) xor(F) > mdio(F) bnx2(F) libcrc32c(F) > kernel: [49432.777445] CPU: 3 PID: 26411 Comm: ceph-osd Tainted: GF I > 3.14.0-rc5-ceph-00016-gf31a96a #1 > kernel: [49432.786704] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS > 1.6.3 02/07/2011 > kernel: [49432.794223] 1674 8800bf1cbac8 816e4840 > 88022726ef90 > kernel: [49432.801700] 8800bf1cbb08 810524ac > a8b07e50 > kernel: [49432.809176] 880094e74120 b07c9000 > > kernel: [49432.816653] Call Trace: > kernel: [49432.819119] [] dump_stack+0x46/0x58 > kernel: [49432.825384] [] warn_slowpath_common+0x8c/0xc0 > kernel: [49432.831413] [] warn_slowpath_null+0x1a/0x20 > kernel: [49432.837284] [] __btrfs_free_extent+0x9ce/0xa20 > [btrfs] > kernel: [49432.844108] [] > __btrfs_run_delayed_refs+0x428/0x11e0 [btrfs] > kernel: [49432.851465] [] ? > block_rsv_release_bytes+0x108/0x190 [btrfs] > kernel: [49432.858823] [] > btrfs_run_delayed_refs+0x76/0x2a0 [btrfs] > kernel: [49432.865869] [] > __btrfs_end_transaction+0x26f/0x370 [btrfs] > kernel: [49432.873044] [] btrfs_end_transaction+0x10/0x20 > [btrfs] > kernel: [49432.879872] [] btrfs_link+0x13e/0x1d0 [btrfs] > kernel: [49432.885903] [] vfs_link+0x1b1/0x270 > kernel: [49432.891060] [] SyS_linkat+0x210/0x2d0 > kernel: [49432.896394] [] SyS_link+0x1e/0x20 > kernel: [49432.901380] [] system_call_fastpath+0x1a/0x1f > > The full dump is at > > > https://urldefense.proofpoint.com/v1/url?u=http://tracker.ceph.com/issues/7688&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=5Q0Wl4GGvXb3sw11Xy%2FYQnZbcMlzHHsbegI1uoQnEbE%3D%0A&s=f85b1094d776c10386c681a8a7b31e49f0621bf51829b6e7153095f2335a01c0 > > https://urldefense.proofpoint.com/v1/url?u=http://tracker.ceph.com/attachments/download/1141/kern.log.gz&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=5Q0Wl4GGvXb3sw11Xy%2FYQnZbcMlzHHsbegI1uoQnEbE%3D%0A&s=dff103270aba751a919e566182a4d9482041f972ac72cb12435814ae75cacf14 > Filipe's looking at this Sage, you said it happend on v3.13-rc2 but the kernel line says 3.14.0-rc5, have you had it happen in both places? Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Incremental backup for a raid1
On 2014-03-14 09:46, George Mitchell wrote: > Actually, an interesting concept would be to have the initial two drive > RAID 1 mirrored by 2 additional drives in 4-way configuration on a > second machine at a remote location on a private high speed network with > both machines up 24/7. In that case, if such a configuration would > work, either machine could be obliterated and the data would survive > fully intact in full duplex mode. It would just need to be remounted > from the backup system and away it goes. Just thinking of interesting > possibilities with n-way mirroring. Oh how I would love to have n-way > mirroring to play with! That can already be done, albeit slightly differently by stacking btrfs RAID 1 on top of a pair of DRBD devices. Of course, this doesn't provide quite the same degree of safety as your suggestion, but it does work (and DRBD makes the remote copy write-mostly for the local system automatically). -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: remove transaction from send
On 03/14/2014 09:13 AM, Wang Shilong wrote: >> Lets try this again. We can deadlock the box if we send on a box and try to >> write onto the same fs with the app that is trying to listen to the send >> pipe. >> This is because the writer could get stuck waiting for a transaction commit >> which is being blocked by the send. So fix this by making sure looking at >> the >> commit roots is always going to be consistent. We do this by keeping track >> of >> which roots need to have their commit roots swapped during commit, and then >> taking the commit_root_sem and swapping them all at once. Then make sure we >> take a read lock on the commit_root_sem in cases where we search the commit >> root >> to make sure we're always looking at a consistent view of the commit roots. >> Previously we had problems with this because we would swap a fs tree commit >> root >> and then swap the extent tree commit root independently which would cause the >> backref walking code to screw up sometimes. With this patch we no longer >> deadlock and pass all the weird send/receive corner cases. Thanks, > > Now btrfs send are alway searching commit root! Your codes only seems to > protect backref codes, > it reduce transaction blocked but make it not safe as we have discussed > before. > > I was trying to remember why we didn't like this solution before but I couldn't come up with anything. Apparently I haven't completely fixed the problem yet so stay tuned for what I do next ;). Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 3.14.0-rc3: btrfs send/receive blocks btrfs IO on other devices (near deadlocks)
On 03/12/2014 11:18 AM, Marc MERLIN wrote: > I have a file server with 4 cpu cores and 5 btrfs devices: Label: > btrfs_boot uuid: e4c1daa8-9c39-4a59-b0a9-86297d397f3b Total > devices 1 FS bytes used 48.92GiB devid1 size 79.93GiB used > 73.04GiB path /dev/mapper/cryptroot > > Label: varlocalspace uuid: 9f46dbe2-1344-44c3-b0fb-af2888c34f18 > Total devices 1 FS bytes used 1.10TiB devid1 size 1.63TiB used > 1.50TiB path /dev/mapper/cryptraid0 > > Label: btrfs_pool1 uuid: 6358304a-2234-4243-b02d-4944c9af47d7 > Total devices 1 FS bytes used 7.16TiB devid1 size 14.55TiB used > 7.50TiB path /dev/mapper/dshelf1 > > Label: btrfs_pool2 uuid: cb9df6d3-a528-4afc-9a45-4fed5ec358d6 > Total devices 1 FS bytes used 3.34TiB devid1 size 7.28TiB used > 3.42TiB path /dev/mapper/dshelf2 > > Label: bigbackup uuid: 024ba4d0-dacb-438d-9f1b-eeb34083fe49 Total > devices 5 FS bytes used 6.02TiB devid1 size 1.82TiB used > 1.43TiB path /dev/dm-9 devid2 size 1.82TiB used 1.43TiB path > /dev/dm-6 devid3 size 1.82TiB used 1.43TiB path /dev/dm-5 devid > 4 size 1.82TiB used 1.43TiB path /dev/dm-7 devid5 size 1.82TiB > used 1.43TiB path /dev/dm-8 > > > I have a very long running btrfs send/receive from btrfs_pool1 to > bigbackup (long running meaning that it's been slowly copying over > 5 days) > > The problem is that this is blocking IO to btrfs_pool2 which is > using totally different drives. By blocking IO I mean that IO to > pool2 kind of works sometimes, and hangs for very long times at > other times. > > It looks as if one rsync to btrfs_pool2 or one piece of IO hangs on > a shared lock and once that happens, all IO to btrfs_pool2 stops > for a long time. It does recover eventually without reboot, but the > wait times are ridiculous (it could be 1H or more). > > As I write this, I have a killall -9 rsync that waited for over > 10mn before these processes would finally die: 23555 07:36 > wait_current_trans.isra.15 rsync -av -SH --delete (...) 23556 > 07:36 exit [rsync] 25387 > 2-04:41:22 wait_current_trans.isra.15 rsync --password-file > (...) 27481 31:26 wait_current_trans.isra.15 rsync > --password-file (...) 2926804:41:34 wait_current_trans.isra.15 > rsync --password-file (...) 2934304:41:31 exit > [rsync] 2949204:41:27 wait_current_trans.isra.15 > rsync --password-file (...) > > 1455907:14:49 wait_current_trans.isra.15 cp -i -al current > 20140312-feisty > > This is all stuck in btrfs kernel code. If someeone wants sysrq-w, > there it is. > https://urldefense.proofpoint.com/v1/url?u=http://marc.merlins.org/tmp/btrfs_full.txt&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=NfFB494sWgA3qCQbFaAQO2FapIJ6kuZcyS6PlP%2FXkCg%3D%0A&s=573f0b2deecc8980550a7645c9627b918659e0ab067590577c8ead4a59498bc1 > > A quick summary: SysRq : Show Blocked State task > PC stack pid father btrfs-cleaner D 8802126b0840 0 > 3332 2 0x 8800c5dc9d00 0046 > 8800c5dc9fd8 8800c69f6310 000141c0 8800c69f6310 > 88017574c170 880211e671e8 880211e67000 > 8801e5936e20 8800c5dc9d10 Call Trace: [] > schedule+0x73/0x75 [] > wait_current_trans.isra.15+0x98/0xf4 [] ? > finish_wait+0x65/0x65 [] > start_transaction+0x48e/0x4f2 [] ? > __btrfs_end_transaction+0x2a1/0x2c6 [] > btrfs_start_transaction+0x1b/0x1d [] > btrfs_drop_snapshot+0x443/0x610 [] ? > _raw_spin_unlock+0x17/0x2a [] ? > finish_task_switch+0x51/0xdb [] ? > __schedule+0x537/0x5de [] > btrfs_clean_one_deleted_snapshot+0x103/0x10f [] > cleaner_kthread+0x103/0x136 [] ? > btrfs_alloc_root+0x26/0x26 [] kthread+0xae/0xb6 > [] ? __kthread_parkme+0x61/0x61 > [] ret_from_fork+0x7c/0xb0 [] ? > __kthread_parkme+0x61/0x61 btrfs-transacti D 88021387eb00 0 > 2 0x 8800c5dcb890 0046 > 8800c5dcbfd8 88021387e5d0 000141c0 88021387e5d0 > 88021f2141c0 88021387e5d0 8800c5dcb930 810fe574 > 0002 8800c5dcb8a0 Call Trace: [] > ? wait_on_page_read+0x3c/0x3c [] > schedule+0x73/0x75 [] io_schedule+0x60/0x7a > [] sleep_on_page+0xe/0x12 [] > __wait_on_bit+0x48/0x7a [] > wait_on_page_bit+0x7a/0x7c [] ? > autoremove_wake_function+0x34/0x34 [] > read_extent_buffer_pages+0x1bf/0x204 [] ? > free_root_pointers+0x5b/0x5b [] > btree_read_extent_buffer_pages.constprop.45+0x66/0x100 > [] read_tree_block+0x2f/0x47 [] > read_block_for_search.isra.26+0x24a/0x287 [] > btrfs_search_slot+0x4f4/0x6bb [] > lookup_inline_extent_backref+0xda/0x3fb [] > __btrfs_free_extent+0xf4/0x712 [] > __btrfs_run_delayed_refs+0x939/0xbdf [] > btrfs_run_delayed_refs+0x81/0x18f [] > btrfs_commit_transaction+0x3a9/0x849 [] ? > finish_wait+0x65/0x65 [] > transaction_kthread+0xf8/0x1ab [] ? > btrfs_cleanup_transaction+0x43f/0x43f [] > kthread+0xae/0xb6 [] ? > __kthread_parkme+0x61/0x61 [] > ret_from_fork+0x7c/0xb0 [] ? > __kthread_parkme+0x61/0x
Re: [PATCH 1/2] btrfs: Cleanup the btrfs_workqueue related function type
On Thu, Mar 06, 2014 at 04:19:50AM +, quwen...@cn.fujitsu.com wrote: > @@ -23,11 +23,13 @@ > struct btrfs_workqueue; > /* Internal use only */ > struct __btrfs_workqueue; > +struct btrfs_work; > +typedef void (*btrfs_func_t)(struct btrfs_work *arg); I don't see what's wrong with the non-typedef type, CodingStyle discourages from using typedefs in general (Chapter 5). The name btrfs_func_t is a generic, if you really need to use a typedef here, please change it to something closer to the workqueues, eg. btrfs_work_func_t. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Incremental backup for a raid1
George Mitchell posted on Fri, 14 Mar 2014 06:46:19 -0700 as excerpted: > Actually, an interesting concept would be to have the initial two drive > RAID 1 mirrored by 2 additional drives in 4-way configuration on a > second machine at a remote location on a private high speed network with > both machines up 24/7. In that case, if such a configuration would > work, either machine could be obliterated and the data would survive > fully intact in full duplex mode. It would just need to be remounted > from the backup system and away it goes. Just thinking of interesting > possibilities with n-way mirroring. Oh how I would love to have n-way > mirroring to play with! In terms of raid1, mdraid already supports such a concept with its "write mostly" component device designation. A component device designated "write mostly" is never read from unless it becomes the only device available, so it's perfect for such an "over-the-net real-time-online- backup" solution. The other half of the solution are the various block-device-over-network drivers such as BLK_DEV_NBD (see Documentation/blockdev/nbd.txt) for the client side, the server-side of which is in userspace. That lets you have what appears to be a block-device routed over the inet to that remote location. Of course mdraid is lacking btrfs' data integrity features, etc, with its raid1 implementation entirely lacking any data integrity or real-time cross-checking at all, but unlike btrfs' N-way-mirroring it gets points for actually being available right now, so as they say, YMMV. Of course the other notable issue with your idea is that while it DOES address the real-time remote redundancy issue, that doesn't (by itself) deal with fat-fingering or similar issues where real-time actually means the same problem's duplicated to the backup as well. But btrfs snapshots address the fat-fingering issue and can be done on the partially-remote filesystem solution as well, and local or remote- local solutions (like periodic btrfs send to a separate local filesystem at both ends) can deal with the filesystem damage possibilities. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs-progs: fsck: disable --init-extent-tree option when using snapshots
On 03/14/2014 09:36 AM, Wang Shilong wrote: > Hi Josef, > > Just ping this again. > > Did you have any good ideas to rebuild extent tree if broken > filesystem is filled with snapshots.? > > I was working on this recently, i was blocked that i can not verify > if an extent is *FULL BACKREF* mode or not. As a *FULL BACKREF* > extent's refs can be 1 and more than 1.. > > I am willing to test codes or have a try if you could give me some > advice etc. > Full backrefs aren't too hard. Basically all you have to do is walk down the fs tree and keep track of btrfs_header_owner(eb) for everything we walk into. If btrfs_header_owner(eb) == root->objectid for the tree we are walking down then we need a ye olde normal backref for this block. If btrfs_header_owner(eb) != root->objectid we _may_ need a full backref, it depends on who owns the parent block. The following may be incomplete, I'm kind of sick 1) We walk down the original tree, every eb we encounter has btrfs_header_owner(eb) == root->objectid. We add normal references for this root (BTRFS_TREE_BLOCK_REF_KEY) for this root. World peace is achieved. 2) We walk down the snapshotted tree. Say we didn't change anything at all, it was just a clean snapshot and then boom. So the btrfs_header_owner(root->node) == root->objectid, so normal backref. We walk down to the next level, where btrfs_header_owner(eb) != root->objectid, but the level above did, so we add normal refs for all of these blocks. We go down the next level, now our btrfs_header_owner(parent) != root->objectid and btrfs_header_owner(eb) != root->objectid. This is where we need to now go back and see if btrfs_header_owner(eb) currently has a ref on eb. If it does we are done, move on to the next block in this same level, we don't have to go further down. 3) Harder case, we snapshotted and then changed things in the original root. Do the same thing as in step 2, but now we get down to btrfs_header_level(eb) != root->objectid && btrfs_header_level(parent) != root->objectid. We lookup the references we have for eb and notice that btrfs_header_owner(eb) no longer refers to eb. So now we must set FULL_BACKREF on this extent reference and add a SHARED_BLOCK_REF_KEY for this eb using the parent->start as the offset. And we need to keep walking down and doing the same thing until we either hit level 0 or btrfs_header_owner(eb) has a ref on the block. 4) Not really a whole special case, just something to keep in mind, if btrfs_header_owner(parent) == root->objectid but btrfs_header_owner(eb) != root->objectid that means we have a normal TREE_BLOCK_REF on eb, it's only when the parent doesn't match our current root that it's a problem. Does that make sense? Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Incremental backup for a raid1
Actually, an interesting concept would be to have the initial two drive RAID 1 mirrored by 2 additional drives in 4-way configuration on a second machine at a remote location on a private high speed network with both machines up 24/7. In that case, if such a configuration would work, either machine could be obliterated and the data would survive fully intact in full duplex mode. It would just need to be remounted from the backup system and away it goes. Just thinking of interesting possibilities with n-way mirroring. Oh how I would love to have n-way mirroring to play with! On 03/14/2014 04:24 AM, Duncan wrote: Michael Schuerig posted on Fri, 14 Mar 2014 09:56:20 +0100 as excerpted: [Duncan posted...] 3) Disconnect the backup device(s). (Don't btrfs device delete, this would remove the copy. Just disconnect.) Hmm... Looking back at what I wrote... Presumably either have the filesystem unmounted for the disconnect (and ideally, the system off, tho with modern drives in theory that's not an issue, but still good if it can be done), or at least remounted read-only. I had guessed that was implicit, but making it explicit is probably best all around, just in case. At least I can rest better with it, having made that explicit. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs-progs: fsck: disable --init-extent-tree option when using snapshots
Hi Josef, Just ping this again. Did you have any good ideas to rebuild extent tree if broken filesystem is filled with snapshots.? I was working on this recently, i was blocked that i can not verify if an extent is *FULL BACKREF* mode or not. As a *FULL BACKREF* extent's refs can be 1 and more than 1.. I am willing to test codes or have a try if you could give me some advice etc. -Wang > On 03/10/2014 11:50 PM, Josef Bacik wrote: >> -BEGIN PGP SIGNED MESSAGE- >> Hash: SHA1 >> >> On 03/10/2014 08:12 AM, Shilong Wang wrote: >>> Hi Josef, >>> >>> As i haven't thought any better ideas to rebuild extent tree which >>> contains extent that owns 'FULL BACKREF' flag. >>> >>> Considering an extent's refs can be equal or more than 1 if this >>> extent has *FULL BACKREF* flag, so we could not make sure an >>> extent's flag by only searching fs/file tree any more. >>> >>> So until now, i just disable this option if snapshots exists, >>> please correct me if i miss something here. Or you have any better >>> ideas to solve this problem.~_~ >>> >>> >> I thought the fsck stuff rebuilds full backref refs properly, does it >> not? If it doesn't we need to fix that, however I'm fine with >> disabling the option if snapshots exist for the time being. Thanks, > If there are no snapshots, --init-extent-tree can works as expected. > I just have not thought a better idea to rebuild extent tree if we do have > snapshots which means we may have an extent with *FULL BACKREF* > flag. > > Thanks, > Wang >> >> Josef >> -BEGIN PGP SIGNATURE- >> Version: GnuPG v1 >> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ >> >> iQIcBAEBAgAGBQJTHd9NAAoJEANb+wAKly3BYCYP/0iTaaa7w0SnfXtgjoVyX+nT >> +e0Pa46zeKzpTujotCDb9E/2PBesCAvA4Psog3rkfsqJ2nXN9cERN4E6/JG4nAHh >> Hv4KPo+w+tCkC4U2wSoDivYrVk9G5SH25ewkgW6iheSYNIlm+PLbOQz9DzGjCFDp >> 51J9tG5E010siOyhlLCyGj8ZTj+gXuoQVWKCS8dOpCLMrbYYjMDXa562hqWaLoS/ >> t3eSfP7Tnnpl43NiMZI4fWrzmlFa5lba5iJmG59FeyiseRH4Zrhee4St1L1xDL5A >> /6f3tJJT7DJjRRJFv0nJAOvOPyFaK8bMaYmOQJg6VrhcyPKM3BxBVEab3HrmQ7jt >> LCMWobpIcM7e5BugmbTGGsFymhv05SQgvYGzpzRVXdsSzqubuqTcXwloNU5RyyFF >> sXT9IiW9wAibHe7mDN7V6nfo1bVfHsjvSVi1rqz4/zFOWyh8oqxfEhxUJIWhfFsn >> j0WJevvqKnjBJujyyuQpL13tzh69qei0AHOEme3R46BSRMnyuacy/WOeyo4VXPcn >> 0GIeWbngAIWF/quhoQGkvofRMlPgftiDge8uz9pbm3IEKeiP9dQ/HvKsIHMKjnKW >> 3dEBvMV/CSUQNek4VjO1ALefTRZQvJVL8Wxdij4W+djJw/uVX7fOhuqdkqyfM3FY >> CKSB3HUSUtDCammsvgQA >> =OT98 >> -END PGP SIGNATURE- >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in >> the body of a message to majord...@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: remove transaction from send
> Lets try this again. We can deadlock the box if we send on a box and try to > write onto the same fs with the app that is trying to listen to the send pipe. > This is because the writer could get stuck waiting for a transaction commit > which is being blocked by the send. So fix this by making sure looking at the > commit roots is always going to be consistent. We do this by keeping track of > which roots need to have their commit roots swapped during commit, and then > taking the commit_root_sem and swapping them all at once. Then make sure we > take a read lock on the commit_root_sem in cases where we search the commit > root > to make sure we're always looking at a consistent view of the commit roots. > Previously we had problems with this because we would swap a fs tree commit > root > and then swap the extent tree commit root independently which would cause the > backref walking code to screw up sometimes. With this patch we no longer > deadlock and pass all the weird send/receive corner cases. Thanks, Now btrfs send are alway searching commit root! Your codes only seems to protect backref codes, it reduce transaction blocked but make it not safe as we have discussed before. -Wang > > Reportedy-by: Hugo Mills > Signed-off-by: Josef Bacik > --- > fs/btrfs/backref.c | 33 +++ > fs/btrfs/ctree.c | 88 -- > fs/btrfs/ctree.h | 3 +- > fs/btrfs/disk-io.c | 3 +- > fs/btrfs/extent-tree.c | 20 ++-- > fs/btrfs/inode-map.c | 14 > fs/btrfs/send.c| 57 ++-- > fs/btrfs/transaction.c | 45 -- > fs/btrfs/transaction.h | 1 + > 9 files changed, 77 insertions(+), 187 deletions(-) > > diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c > index 860f4f2..0be0e94 100644 > --- a/fs/btrfs/backref.c > +++ b/fs/btrfs/backref.c > @@ -329,7 +329,10 @@ static int __resolve_indirect_ref(struct btrfs_fs_info > *fs_info, > goto out; > } > > - root_level = btrfs_old_root_level(root, time_seq); > + if (path->search_commit_root) > + root_level = btrfs_header_level(root->commit_root); > + else > + root_level = btrfs_old_root_level(root, time_seq); > > if (root_level + 1 == level) { > srcu_read_unlock(&fs_info->subvol_srcu, index); > @@ -1092,9 +1095,9 @@ static int btrfs_find_all_leafs(struct > btrfs_trans_handle *trans, > * > * returns 0 on success, < 0 on error. > */ > -int btrfs_find_all_roots(struct btrfs_trans_handle *trans, > - struct btrfs_fs_info *fs_info, u64 bytenr, > - u64 time_seq, struct ulist **roots) > +static int __btrfs_find_all_roots(struct btrfs_trans_handle *trans, > + struct btrfs_fs_info *fs_info, u64 bytenr, > + u64 time_seq, struct ulist **roots) > { > struct ulist *tmp; > struct ulist_node *node = NULL; > @@ -1130,6 +1133,20 @@ int btrfs_find_all_roots(struct btrfs_trans_handle > *trans, > return 0; > } > > +int btrfs_find_all_roots(struct btrfs_trans_handle *trans, > + struct btrfs_fs_info *fs_info, u64 bytenr, > + u64 time_seq, struct ulist **roots) > +{ > + int ret; > + > + if (!trans) > + down_read(&fs_info->commit_root_sem); > + ret = __btrfs_find_all_roots(trans, fs_info, bytenr, time_seq, roots); > + if (!trans) > + up_read(&fs_info->commit_root_sem); > + return ret; > +} > + > /* > * this makes the path point to (inum INODE_ITEM ioff) > */ > @@ -1509,6 +1526,8 @@ int iterate_extent_inodes(struct btrfs_fs_info *fs_info, > if (IS_ERR(trans)) > return PTR_ERR(trans); > btrfs_get_tree_mod_seq(fs_info, &tree_mod_seq_elem); > + } else { > + down_read(&fs_info->commit_root_sem); > } > > ret = btrfs_find_all_leafs(trans, fs_info, extent_item_objectid, > @@ -1519,8 +1538,8 @@ int iterate_extent_inodes(struct btrfs_fs_info *fs_info, > > ULIST_ITER_INIT(&ref_uiter); > while (!ret && (ref_node = ulist_next(refs, &ref_uiter))) { > - ret = btrfs_find_all_roots(trans, fs_info, ref_node->val, > -tree_mod_seq_elem.seq, &roots); > + ret = __btrfs_find_all_roots(trans, fs_info, ref_node->val, > + tree_mod_seq_elem.seq, &roots); > if (ret) > break; > ULIST_ITER_INIT(&root_uiter); > @@ -1542,6 +1561,8 @@ out: > if (!search_commit_root) { > btrfs_put_tree_mod_seq(fs_info, &tree_mod_seq_elem); > btrfs_end_transaction(trans, fs_info->extent_root); > + } else { > + up_read(&fs_info->commit_root_sem); > } > > return ret; > diff --git a/fs/btrfs/ctree
Re: [PATCH] Btrfs: fix joining same transaction handle more than twice
On 03/13/2014 10:05 PM, Josef Bacik wrote: On 03/13/2014 01:19 AM, Wang Shilong wrote: We hit something like the following function call flows: |->run_delalloc_range() |->btrfs_join_transaction() |->cow_file_range() |->btrfs_join_transaction() |->find_free_extent() |->btrfs_join_transaction() Trace infomation can be seen as: [ 7411.127040] [ cut here ] [ 7411.127060] WARNING: CPU: 0 PID: 11557 at fs/btrfs/transaction.c:383 start_transaction+0x561/0x580 [btrfs]() [ 7411.127079] CPU: 0 PID: 11557 Comm: kworker/u8:9 Tainted: G O 3.13.0+ #4 [ 7411.127080] Hardware name: LENOVO QiTianM4350/ , BIOS F1KT52AUS 05/24/2013 [ 7411.127085] Workqueue: writeback bdi_writeback_workfn (flush-btrfs-5) [ 7411.127092] Call Trace: [ 7411.127097] [] dump_stack+0x45/0x56 [ 7411.127101] [] warn_slowpath_common+0x7d/0xa0 [ 7411.127102] [] warn_slowpath_null+0x1a/0x20 [ 7411.127109] [] start_transaction+0x561/0x580 [btrfs] [ 7411.127115] [] btrfs_join_transaction+0x17/0x20 [btrfs] [ 7411.127120] [] find_free_extent+0xa21/0xb50 [btrfs] [ 7411.127126] [] btrfs_reserve_extent+0xa8/0x1a0 [btrfs] [ 7411.127131] [] btrfs_alloc_free_block+0xee/0x440 [btrfs] [ 7411.127137] [] ? btree_set_page_dirty+0xe/0x10 [btrfs] [ 7411.127142] [] __btrfs_cow_block+0x121/0x530 [btrfs] [ 7411.127146] [] btrfs_cow_block+0x11f/0x1c0 [btrfs] [ 7411.127151] [] btrfs_search_slot+0x1d4/0x9c0 [btrfs] [ 7411.127157] [] btrfs_lookup_file_extent+0x37/0x40 [btrfs] [ 7411.127163] [] __btrfs_drop_extents+0x16c/0xd90 [btrfs] [ 7411.127169] [] ? start_transaction+0x93/0x580 [btrfs] [ 7411.127171] [] ? kmem_cache_alloc+0x132/0x140 [ 7411.127176] [] ? btrfs_alloc_path+0x1a/0x20 [btrfs] [ 7411.127182] [] cow_file_range_inline+0x181/0x2e0 [btrfs] [ 7411.127187] [] cow_file_range+0x2ed/0x440 [btrfs] [ 7411.127194] [] ? free_extent_buffer+0x4f/0xb0 [btrfs] [ 7411.127200] [] run_delalloc_nocow+0x38f/0xa60 [btrfs] [ 7411.127207] [] ? test_range_bit+0x30/0x180 [btrfs] [ 7411.127212] [] run_delalloc_range+0x2e8/0x350 [btrfs] [ 7411.127219] [] ? find_lock_delalloc_range+0x1a9/0x1e0 [btrfs] [ 7411.127222] [] ? blk_queue_bio+0x2c1/0x330 [ 7411.127228] [] __extent_writepage+0x2f4/0x760 [btrfs] Here we fix it by avoiding joining transaction again if we have held a transaction handle when allocating chunk in find_free_extent(). So I just put that warning there to see if we were ever embedding 3 joins at a time, not because it was an actual problem, I'd say just kill the warning. Thanks, We need keep @orgin_rsv and restore it when ending transaction. So we'd better not embed more than 2 joins now. Thanks, Wang Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: discard synchronous on most SSDs?
Marc MERLIN posted on Thu, 13 Mar 2014 22:17:50 -0700 as excerpted: > On Thu, Mar 13, 2014 at 09:39:02PM -0600, Chris Murphy wrote: >> >> On Mar 13, 2014, at 8:11 PM, Marc MERLIN wrote: >> >> > On Sun, Mar 09, 2014 at 11:33:50AM +, Hugo Mills wrote: >> >> discard is, except on the very latest hardware, a synchronous >> >> command (it's a limitation of the SATA standard), and therefore >> >> results in very very poor performance. >> > >> > Interesting. How do I know if a given SSD will hang on discard? >> > Is a Samsung EVO 840 1TB SSD latest hardware enough, or not? :) >> >> smartctl -a or -x will tell you what SATA revision is in place. The >> queued trim support is in SATA Rev 3.1. I'm not certain if this >> requires only the drive to support that revision level, or both >> controller and drive. > > I'm not sure I'm seeing this, which field is that? > ATA Version is: 8 > ATA Standard is: ATA-8-ACS revision 4c Your drive didn't report it, but here, I have SATA fields as well, in addition to the ATA fields: Here's the fields from my Corsair Neutron SSDs: ATA Version is: ATA8-ACS (minor revision not indicated) SATA Version is: SATA 2.5, 6.0 Gb/s Here's the fields from my Seagate 500-gig 2.5-inch spinning rust: ATA Version is: ATA8-ACS T13/1699-D revision 4 SATA Version is: SATA 2.6, 3.0 Gb/s (More about that below.) Smartctl version here is 6.2 2013-07-26 r3841, according to the output. (I'm running gentoo/~amd64 FWIW so it's a local-build). You snipped that bit of your output so I can't compare. But it may also depend on whether smartctl auto-detected and used the ATA or the SCSI (or something else) command set and how your devices are actually connected, plus BIOS settings, etc. See the manpage documentation for the -d TYPE (--device=TYPE) option and the ATA/SCSI/SAT discussion rather further down the manpage for more. Here I have direct SATA connections with the BIOS set to AHCI mode and am thus using the kernel's AHCI drivers, since that's the most common SATA chipset standard these days, thus increasing portability given my monolithic kernel build. smartctl's -d test reports an original guess of scsi, changed to sat after detection. Of course connection via USB bridge or the like complicates things considerably. Meanwhile, SATA 2.5, 6 Gb/s on the SSDs, SATA 2.6, 3 Gb/s on the spinning rust? WTF? The SSDs have SATA 2.5 but 6 Gb/s while the spinning rust has a later 2.6 but only 3 Gb/s (tho of course on a mechanical drive the bus speed won't be the bottleneck)? Now I'm confused. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Incremental backup for a raid1
Michael Schuerig posted on Fri, 14 Mar 2014 09:56:20 +0100 as excerpted: [Duncan posted...] >> 3) Disconnect the backup device(s). (Don't btrfs device delete, this >> would remove the copy. Just disconnect.) Hmm... Looking back at what I wrote... Presumably either have the filesystem unmounted for the disconnect (and ideally, the system off, tho with modern drives in theory that's not an issue, but still good if it can be done), or at least remounted read-only. I had guessed that was implicit, but making it explicit is probably best all around, just in case. At least I can rest better with it, having made that explicit. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
UOB-X1H: Message..
I have proposal for you. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to view transaction log chronologically, human-readable?
[...] Theoretically, there should be someone on this mailing list capable of answering this question, no? Please feel invited to share your insights ;) #Regards On 01/03/14 02:21, Marcel Partap wrote: > Dear BTFRS devs, > I have a 1TB btrfs volume mounted read-only since two years because I > deleted a bunch of files and didn't want to give up on them. > Now with latest btrfs-find-root and btrfs restore --dry-run -t in a > loop, I generated the full list of files contained in the last several > hundred root trees. However, diffing these, I find the current one being > the same until 94 root trees back, and the ones before contain earlier > changes. Maybe by my own fault that is..whatever. > > Is there a way to just view the transaction history in a human-readable way? > > #Regards > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Incremental backup for a raid1
On Friday 14 March 2014 06:42:27 Duncan wrote: > N-way-mirroring is actually my most hotly anticipated feature for a > different reason[2], but for you it would work like this: > > 1) Setup the 3-way (or 4-way if preferred) mirroring and balance to > ensured copies of all data on all devices. > > 2) Optionally scrub to ensure the integrity of all copies. > > 3) Disconnect the backup device(s). (Don't btrfs device delete, this > would remove the copy. Just disconnect.) > > 4) Store the backups. > > 5) Periodically get them out and reconnect. > > 6) Rebalance to update. (Since the devices remain members of the > mirror, simply outdated, the balance should only update, not rewrite > the entire thing.) > > 7) Optionally scrub to verify. > > 8) Repeat steps 3-7 as necessary. Judging from your description, N-way mirroring is (going to be) exactly what I was hoping for. Michael -- Michael Schuerig mailto:mich...@schuerig.de http://www.schuerig.de/michael/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Btrfs: remove transaction from send
On Thu, Mar 13, 2014 at 10:16:28PM +, Hugo Mills wrote: > On Thu, Mar 13, 2014 at 03:42:13PM -0400, Josef Bacik wrote: > > Lets try this again. We can deadlock the box if we send on a box and try to > > write onto the same fs with the app that is trying to listen to the send > > pipe. > > This is because the writer could get stuck waiting for a transaction commit > > which is being blocked by the send. So fix this by making sure looking at > > the > > commit roots is always going to be consistent. We do this by keeping track > > of > > which roots need to have their commit roots swapped during commit, and then > > taking the commit_root_sem and swapping them all at once. Then make sure we > > take a read lock on the commit_root_sem in cases where we search the commit > > root > > to make sure we're always looking at a consistent view of the commit roots. > > Previously we had problems with this because we would swap a fs tree commit > > root > > and then swap the extent tree commit root independently which would cause > > the > > backref walking code to screw up sometimes. With this patch we no longer > > deadlock and pass all the weird send/receive corner cases. Thanks, > >There's something still going on here. I managed to get about twice > as far through my test as I had before, but I again got an "unexpected > EOF in stream", with btrfs send returning 1. As before, I have this in > syslog: > > Mar 13 22:09:12 s_src@amelia kernel: BTRFS error (device sda2): did not find > backref in send_root. inode=1786631, offset=825257984, disk_byte=36504023040 > found extent=36504023040\x0a > >So, on the evidence of one data point (I'll have another one when I > wake up tomorrow morning), this has made the problem harder to trigger > but it's still possible. Data point two has arrived, and it's gone boom at about the same point. The first failed at: 2014-03-13 22:09:11,749INFO Read 7247356514 bytes total and the second at: 2014-03-14 03:53:46,990INFO Read 7247357071 bytes total at approximately 1h45 into the process. The boot and home subvols have been OK, and have been backing up happily all this time, but both are smaller than the (~10 GiB) root subvol. I can add a load of data to /home and see if the problem happens with a larger send size, or if it's just the process writing to a subvol that has the snapshot being sent that causes it. The interesting thing here is that the error seems to be fairly reliably in the same place (more or less). Before this patch, I was seeing lockups (or EOF, with the earlier version of this patch) at approximately 3.6-3.8 GB. Now it looks like it's going to be 7.2 GB. At least it's not locking up any more, just dying noisily (which is marginally preferable). Hugo. -- === Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk === PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk --- Hail and greetings. We are a flat-pack invasion force from --- Planet Ikea. We come in pieces. signature.asc Description: Digital signature
Re: discard synchronous on most SSDs?
Hi Marc, On Thu, 13 Mar 2014 10:17:50 PM Marc MERLIN wrote: > I'm not sure I'm seeing this, which field is that? I *think* you want smartctl -i instead, and look for the field that says something like: ATA Version is: ATA8-ACS, ACS-2 T13/2015-D revision 3 So if my understanding is correct that says it's just rev. 3.0 so TRIM for this is synchronous. Good luck! Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC signature.asc Description: This is a digitally signed message part.
Re: discard synchronous on most SSDs?
On Thu, 13 Mar 2014 09:39:02 PM Chris Murphy wrote: > smartctl -a or -x will tell you what SATA revision is in place. The queued > trim support is in SATA Rev 3.1. I'm not certain if this requires only the > drive to support that revision level, or both controller and drive. Both I'd say as I believe it's the controller that has to issue it to the drive, and the drive needs to understand it. cheers, Chris -- Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC signature.asc Description: This is a digitally signed message part.