date:20140314

Re: discard synchronous on most SSDs?

2014-03-14 Thread Chris Samuel

On Sat, 15 Mar 2014 04:25:05 PM Chris Samuel wrote:

> I wonder if it would be possible to use that knowledge to extend the 
> smartctl's --identify functionality to report this?

After reading the SATA 3.1 spec I believe that smartctl *can* indicate if a 
drive claims to support SATA 3.1 NCQ TRIM, thus:

$ sudo smartctl --identify /dev/sdb | fgrep 'Trim bit in DATA SET MANAGEMENT'
 169  0  1   Trim bit in DATA SET MANAGEMENT command supported 
$

If that command returns nothing then it's not reported as supported (and I've 
tested that).  You can get the same info with hdparm -I.

Of course, as Martin said, that doesn't necessarily mean the kernel is using 
that reported ability.

My puzzle now is that I have two SSD drives that report supporting NCQ TRIM 
(one confirmed via product info) but report only supporting SATA 3.0 not 3.1.

cheers,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC

signature.asc
Description: This is a digitally signed message part.

Re: discard synchronous on most SSDs?

2014-03-14 Thread Chris Samuel

On Fri, 14 Mar 2014 03:57:41 PM Martin K. Petersen wrote:

> The fact that the drive reports compliance with a certain version of
> SATA does not in any way imply that it implements all commands defined
> in that specification.

It looks like drives that do support it can be detected with the kernel helper 
function ata_fpdma_dsm_supported() defined in include/linux/libata.h.

I wonder if it would be possible to use that knowledge to extend the 
smartctl's --identify functionality to report this?

Not even all drives that implement it do so correctly, the kernel has a 
blacklist of drives that don't and currently lists just two:

   /* devices that don't properly handle queued TRIM commands */
   { "Micron_M500*",·  ·   NULL,·  ATA_HORKAGE_NO_NCQ_TRIM, },
   { "Crucial_CT???M500SSD*",· NULL,·  ATA_HORKAGE_NO_NCQ_TRIM, },

cheers,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC

signature.asc
Description: This is a digitally signed message part.

Re: discard synchronous on most SSDs?

2014-03-14 Thread Marc MERLIN

On Fri, Mar 14, 2014 at 08:46:09PM +, Holger Hoffstätte wrote:
> On Fri, 14 Mar 2014 15:57:41 -0400, Martin K. Petersen wrote:
> 
> > So right now I'm afraid we don't have a good way for a user to determine
> > whether a device supports queued trims or not.
> 
> Mount with discard, unpack kernel tree, sync, rm -rf tree.
> If it takes several seconds, you have sync discard, no?

Mmmh, interesting point.

legolas:/usr/src# time rm -rf linux-3.14-rc5
real0m1.584s
user0m0.008s
sys 0m1.524s

I remounted my FS with remount,nodiscard, and the time was the same.

> This changed somewhere around kernel 3.8.x; before that it used to be 
> acceptably fast. Since then I only do batch trims, daily (server) or 
> weekly (laptop).

I'm never really timed this before. Is it supposed to be faster than 1.5s on
a fast SSD?

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: discard synchronous on most SSDs?

2014-03-14 Thread Chris Samuel

On Fri, 14 Mar 2014 06:33:24 PM Chris Samuel wrote:

> I *think* you want smartctl -i instead, and look for the field that says 
> something like:
> 
> ATA Version is:   ATA8-ACS, ACS-2 T13/2015-D revision 3

Late night, cut and pasted the wrong line of output, mine says:

SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)

Of course that's what the drive is reporting it supports, I'm not sure whether 
that's the result of what has been negotiated between the controller and drive 
or purely what the drive supports.

To get more information from smartctl you can use the --identify=wb option 
instead of -i and that should give you a lot more detail about what then 
drives claims to (and not to) support.   On the version in Kubuntu 13.10 
(6.1+svn3812-1) it only reports 3 things regarding TRIM for my drives.

chris@quad:/tmp$ sudo smartctl --identify=wb -d sat /dev/sdb | egrep -i 'trim|
discard'
  69 14  1   Deterministic data after trim supported
  69  5  0   Trimmed LBA range(s) returning zeroed data supported
 169  0  1   Trim bit in DATA SET MANAGEMENT command supported

I'm currently doing a git clone of their SVN repo to see if there's any new 
functionality that will gather any more information.

cheers,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC

signature.asc
Description: This is a digitally signed message part.

Re: [PATCH] Btrfs-progs: scrub: don't call unlock if pthread_mutex_lock fails

2014-03-14 Thread Rakesh Pandit

Hi,

Forgot to mention the reason for change. If accepted this can be
included in commit message:

On Sat, Mar 15, 2014 at 01:49:45AM +0200, Rakesh Pandit wrote:
> If pthread_mutex_lock fails (rare but fix it anyway), don't call
> pthread_mutex_unlock on mutex.
>

Rationale being that if pthread_mutex_lock fails pthread_mutex_unlock
will always fail and overwrite actual error value in err.

> Signed-off-by: Rakesh Pandit 

regards,
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

btrfs: lock inversion between delayed_node->mutex and found->groups_sem

2014-03-14 Thread Sasha Levin


Hi all,

While fuzzing with trinity inside a KVM tools guest running the latest -next
kernel I've stumbled on the following:

[  788.451695] =
[  788.452455] [ INFO: possible irq lock inversion dependency detected ]
[  788.453020] 3.14.0-rc6-next-20140313-sasha-00010-gb8c1db1-dirty #217 
Tainted: GW
[  788.453827] -
[  788.454371] kswapd3/4199 just changed the state of lock:
[  788.454902]  (&delayed_node->mutex){+.+.-.}, at: 
__btrfs_release_delayed_node+0x4f/0x140 (fs/btrfs/delayed-inode.c:263)
[  788.455890] but this lock took another, RECLAIM_FS-unsafe lock in the past:
[  788.456543]  (&found->groups_sem){+.}

and interrupts could create inverse lock ordering between them.

[  788.457491]
[  788.457491] other info that might help us debug this:
[  788.458115]  Possible interrupt unsafe locking scenario:
[  788.458115]
[  788.458756]CPU0CPU1
[  788.459188]
[  788.459625]   lock(&found->groups_sem);
[  788.460041]local_irq_disable();
[  788.460041]lock(&delayed_node->mutex);
[  788.460041]lock(&found->groups_sem);
[  788.460041]   
[  788.460041] lock(&delayed_node->mutex);
[  788.460041]
[  788.460041]  *** DEADLOCK ***
[  788.460041]
[  788.460041] 2 locks held by kswapd3/4199:
[  788.460041]  #0:  (shrinker_rwsem){..}, at: shrink_slab+0x3f/0x160 
(mm/vmscan.c:360)
[  788.460041]  #1:  (&type->s_umount_key#108){.+.+..}, at: 
grab_super_passive+0x56/0x90 (fs/super.c:361)
[  788.460041]
[  788.460041] the shortest dependencies between 2nd lock and 1st lock:
[  788.460041]  -> (&found->groups_sem){+.} ops: 46 {
[  788.460041] HARDIRQ-ON-W at:
[  788.460041]   mark_irqflags+0xf0/0x170 
(kernel/locking/lockdep.c:2800)
[  788.460041]   __lock_acquire+0x2de/0x5a0 
(kernel/locking/lockdep.c:3138)
[  788.460041]   lock_acquire+0x182/0x1d0 
(arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
[  788.460041]   down_write+0x5c/0xc0 
(arch/x86/include/asm/rwsem.h:130 kernel/locking/rwsem.c:50)
[  788.460041]   __link_block_group+0x45/0x110 
(fs/btrfs/extent-tree.c:8348)
[  788.460041]   btrfs_read_block_groups+0x3ae/0x700 
(fs/btrfs/extent-tree.c:8533)
[  788.460041]   open_ctree+0x1abf/0x2210 
(fs/btrfs/disk-io.c:2749)
[  788.460041]   btrfs_fill_super+0x81/0x140 
(fs/btrfs/super.c:958)
[  788.460041]   btrfs_mount+0x26a/0x300 
(fs/btrfs/super.c:1295)
[  788.460041]   mount_fs+0x8d/0x1a0 (fs/super.c:1091)
[  788.460041]   vfs_kern_mount+0x79/0x150 
(fs/namespace.c:813)
[  788.460041]   do_new_mount+0xcd/0x1c0 
(fs/namespace.c:2068)[  788.460041]   do_mount+0x15d/0x210 
(fs/namespace.c:2392)
[  788.460041]   SyS_mount+0x9d/0xe0 (fs/namespace.c:2589 
fs/namespace.c:2560)
[  788.460041]   tracesys+0xdd/0xe2 
(arch/x86/kernel/entry_64.S:749)
[  788.460041] HARDIRQ-ON-R at:
[  788.460041]   mark_irqflags+0xbc/0x170 
(kernel/locking/lockdep.c:2792)
[  788.460041]   __lock_acquire+0x2de/0x5a0 
(kernel/locking/lockdep.c:3138)
[  788.460041]   lock_acquire+0x182/0x1d0 
(arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
[  788.460041]   down_read+0x4c/0xa0 
(arch/x86/include/asm/rwsem.h:83 kernel/locking/rwsem.c:23)
[  788.460041]   
btrfs_calc_num_tolerated_disk_barrier_failures+0x2a7/0x3a0 
(fs/btrfs/disk-io.c:3309)
[  788.460041]   open_ctree+0x1af7/0x2210 
(fs/btrfs/disk-io.c:2755)
[  788.460041]   btrfs_fill_super+0x81/0x140 
(fs/btrfs/super.c:958)
[  788.460041]   btrfs_mount+0x26a/0x300 
(fs/btrfs/super.c:1295)
[  788.460041]   mount_fs+0x8d/0x1a0 (fs/super.c:1091)
[  788.460041]   vfs_kern_mount+0x79/0x150 
(fs/namespace.c:813)
[  788.460041]   do_new_mount+0xcd/0x1c0 
(fs/namespace.c:2068)
[  788.460041]   do_mount+0x15d/0x210 (fs/namespace.c:2392)
[  788.460041]   SyS_mount+0x9d/0xe0 (fs/namespace.c:2589 
fs/namespace.c:2560)
[  788.460041]   tracesys+0xdd/0xe2 
(arch/x86/kernel/entry_64.S:749)
[  788.460041] SOFTIRQ-ON-W at:
[  788.460041]   mark_irqflags+0x110/0x170 
(kernel/locking/lockdep.c:2804)
[  788.460041]   __lock_acquire+0x2de/0x5a0 
(kernel/locking/lockdep.c:3138)
[  788.460041]   lock_acquire+0x182/0x1d0 
(arch/x86/include/asm/current.h:14 ke

[PATCH] Btrfs-progs: scrub: don't call unlock if pthread_mutex_lock fails

2014-03-14 Thread Rakesh Pandit

If pthread_mutex_lock fails (rare but fix it anyway), don't call
pthread_mutex_unlock on mutex.

Signed-off-by: Rakesh Pandit 
---
 cmds-scrub.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/cmds-scrub.c b/cmds-scrub.c
index 128537b..ca11fb5 100644
--- a/cmds-scrub.c
+++ b/cmds-scrub.c
@@ -776,7 +776,7 @@ static int scrub_write_progress(pthread_mutex_t *m, const 
char *fsid,
ret = pthread_mutex_lock(m);
if (ret) {
err = -ret;
-   goto out;
+   goto fail;
}
 
ret = pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, &old);
@@ -808,6 +808,7 @@ out:
if (ret && !err)
err = -ret;
 
+fail:
ret = pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, &old);
if (ret && !err)
err = -ret;
-- 
1.8.5.3

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs: remove transaction from send

2014-03-14 Thread Hugo Mills

On Fri, Mar 14, 2014 at 02:51:22PM -0400, Josef Bacik wrote:
> On 03/13/2014 06:16 PM, Hugo Mills wrote:
> >On Thu, Mar 13, 2014 at 03:42:13PM -0400, Josef Bacik wrote:
> >>Lets try this again.  We can deadlock the box if we send on a box and try to
> >>write onto the same fs with the app that is trying to listen to the send 
> >>pipe.
> >>This is because the writer could get stuck waiting for a transaction commit
> >>which is being blocked by the send.  So fix this by making sure looking at 
> >>the
> >>commit roots is always going to be consistent.  We do this by keeping track 
> >>of
> >>which roots need to have their commit roots swapped during commit, and then
> >>taking the commit_root_sem and swapping them all at once.  Then make sure we
> >>take a read lock on the commit_root_sem in cases where we search the commit 
> >>root
> >>to make sure we're always looking at a consistent view of the commit roots.
> >>Previously we had problems with this because we would swap a fs tree commit 
> >>root
> >>and then swap the extent tree commit root independently which would cause 
> >>the
> >>backref walking code to screw up sometimes.  With this patch we no longer
> >>deadlock and pass all the weird send/receive corner cases.  Thanks,
> >
> >There's something still going on here. I managed to get about twice
> >as far through my test as I had before, but I again got an "unexpected
> >EOF in stream", with btrfs send returning 1. As before, I have this in
> >syslog:
> >
> >Mar 13 22:09:12 s_src@amelia kernel: BTRFS error (device sda2): did not find 
> >backref in send_root. inode=1786631, offset=825257984, disk_byte=36504023040 
> >found extent=36504023040\x0a
> >
> 
> I just noticed that the offset you have there is freaking gigantic,
> like 700mb, which is way larger than what an extent should be.  Here
> is a newer debug patch, just chuck the old on and put this instead
> and re-run
> 
> http://paste.fedoraproject.org/85486/39482301

   That last run, with the above patch, failed again, at approximately
the same place again. The only output in dmesg is:

[ 6488.168469] BTRFS error (device sda2): did not find backref in send_root. 
inode=1786631, offset=825257984, disk_byte=36504023040 found 
extent=36504023040, len=1294336

as before. Definitely no kernel WARN, no backtraces.

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
--- You're never alone with a rubber duck... --- 


signature.asc
Description: Digital signature

Re: [PATCH] Btrfs: fix deadlock with nested trans handles

2014-03-14 Thread Rich Freeman

On Wed, Mar 12, 2014 at 12:34 PM, Rich Freeman
 wrote:
> On Wed, Mar 12, 2014 at 11:24 AM, Josef Bacik  wrote:
>> On 03/12/2014 08:56 AM, Rich Freeman wrote:
>>>
>>>  After a number of reboots the system became stable, presumably
>>> whatever race condition btrfs was hitting followed a favorable
>>> path.
>>>
>>> I do have a 2GB btrfs-image pre-dating my application of this
>>> patch that was causing the issue last week.
>>>
>>
>> Uhm wow that's pretty epic.  I will talk to chris and figure out how
>> we want to deal with that and send you a patch shortly.  Thanks,
>
> A tiny bit more background.

And some more background.  I had more reboots over the next two days
at the same time each day, just after my crontab successfully
completed.  One of the last thing it does is runs the snapper cleanups
which delete a bunch of snapshots.  During a reboot I checked and
there were a bunch of deleted snapshots, which disappeared over the
next 30-60 seconds before the panic, and then they would re-appear on
the next reboot.

I disabled the snapper cron job and this morning had no issues at all.
 One day isn't much to establish a trend, but I suspect that this is
the cause.  Obviously getting rid of snapshots would be desirable at
some point, but I can wait for a patch.  Snapper would be deleting
about 48 snapshots at the same time, since I create them hourly and
the cleanup occurs daily on two different subvolumes on the same
filesystem.

Rich
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: discard synchronous on most SSDs?

2014-03-14 Thread Chris Murphy

On Mar 13, 2014, at 11:17 PM, Marc MERLIN  wrote:

> On Thu, Mar 13, 2014 at 09:39:02PM -0600, Chris Murphy wrote:
>> 
>> On Mar 13, 2014, at 8:11 PM, Marc MERLIN  wrote:
>> 
>>> On Sun, Mar 09, 2014 at 11:33:50AM +, Hugo Mills wrote:
 discard is, except on the very latest hardware, a synchronous command
 (it's a limitation of the SATA standard), and therefore results in
 very very poor performance.
>>> 
>>> Interesting. How do I know if a given SSD will hang on discard?
>>> Is a Samsung EVO 840 1TB SSD latest hardware enough, or not? :)
>> 
>> smartctl -a or -x will tell you what SATA revision is in place. The queued 
>> trim support is in SATA Rev 3.1. I'm not certain if this requires only the 
>> drive to support that revision level, or both controller and drive.
> 
> I'm not sure I'm seeing this, which field is that?
> 
> === START OF INFORMATION SECTION ===
> Device Model: Samsung SSD 840 EVO 1TB
> Serial Number:S1D9NEAD934600N
> LU WWN Device Id: 5 002538 85009a8ff
> Firmware Version: EXT0BB0Q
> User Capacity:1,000,204,886,016 bytes [1.00 TB]
> Sector Size:  512 bytes logical/physical
> Device is:Not in smartctl database [for details use: -P showall]
> ATA Version is:   8
> ATA Standard is:  ATA-8-ACS revision 4c
> Local Time is:Thu Mar 13 22:15:14 2014 PDT
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled

After ATA Version for me.

$ smartctl -a /dev/disk0
smartctl 6.1 2013-03-16 r3800 [x86_64-apple-darwin12.3.0] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Samsung based SSDs
Device Model: SAMSUNG SSD 830 Series
Serial Number:S0Z4NEAC933856
LU WWN Device Id: 5 002538 043584d30
Firmware Version: CXM03B1Q
User Capacity:256,060,514,304 bytes [256 GB]
Sector Size:  512 bytes logical/physical
Rotation Rate:Solid State Device
Device is:In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 T13/2015-D revision 2
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:Fri Mar 14 15:37:07 2014 MDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

The Samsung hardware by and large is fairly well behaved with discard in my 
experience. But it does really depend a lot on the workload. I'd notice 
occasional random freezes for a couple of seconds when I had it enabled in OS X 
(totally different animal from the kernel up), nothing severe. But it was 
annoying enough I disabled it, and the problem went away. Apple doesn't enable 
trim by default on non-Apple SSD's still, so the idea that "everyone else" is 
doing this isn't true. The Windows implementation is rather complex, and also 
isn't always used contrary to what's been reported (on the everybody panic or 
get mad NOW type web sites).

If you want to be conservative about it, I'd say just manually run fstrim when 
the system is idle. Do that once a week or two. Chron job it if you want.

Chris Murphy--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Btrfs: fix race when updating existing ref head

2014-03-14 Thread Filipe David Borba Manana

While we update an existing ref head's extent_op, we're not holding
its spinlock, so while we're updating its extent_op contents (key,
flags) we can have a task running __btrfs_run_delayed_refs() that
holds the ref head's lock and sets its extent_op to NULL right after
the task updating the ref head just checked its extent_op was not NULL.

Signed-off-by: Filipe David Borba Manana 
---
 fs/btrfs/delayed-ref.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index 2502ba5..3129964 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -495,6 +495,7 @@ update_existing_head_ref(struct btrfs_delayed_ref_node 
*existing,
ref = btrfs_delayed_node_to_head(update);
BUG_ON(existing_ref->is_data != ref->is_data);
 
+   spin_lock(&existing_ref->lock);
if (ref->must_insert_reserved) {
/* if the extent was freed and then
 * reallocated before the delayed ref
@@ -536,7 +537,6 @@ update_existing_head_ref(struct btrfs_delayed_ref_node 
*existing,
 * only need the lock for this case cause we could be processing it
 * currently, for refs we just added we know we're a-ok.
 */
-   spin_lock(&existing_ref->lock);
existing->ref_mod += update->ref_mod;
spin_unlock(&existing_ref->lock);
 }
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: discard synchronous on most SSDs?

2014-03-14 Thread Holger Hoffstätte

On Fri, 14 Mar 2014 15:57:41 -0400, Martin K. Petersen wrote:

> So right now I'm afraid we don't have a good way for a user to determine
> whether a device supports queued trims or not.

Mount with discard, unpack kernel tree, sync, rm -rf tree.
If it takes several seconds, you have sync discard, no?

This changed somewhere around kernel 3.8.x; before that it used to be 
acceptably fast. Since then I only do batch trims, daily (server) or 
weekly (laptop).

-h

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] Btrfs: abort the transaction when we don't find our extent ref

2014-03-14 Thread Josef Bacik

I'm not sure why we weren't aborting here in the first place, it is obviously a
bad time from the fact that we print the leaf and yell loudly about it.  Fix
this up, otherwise we panic because our path could be pointing into oblivion.
Thanks,

Signed-off-by: Josef Bacik 
---
 fs/btrfs/extent-tree.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 696f0b6..0015b02 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -5744,6 +5744,8 @@ static int __btrfs_free_extent(struct btrfs_trans_handle 
*trans,
"unable to find ref byte nr %llu parent %llu root %llu  
owner %llu offset %llu",
bytenr, parent, root_objectid, owner_objectid,
owner_offset);
+   btrfs_abort_transaction(trans, extent_root, ret);
+   goto out;
} else {
btrfs_abort_transaction(trans, extent_root, ret);
goto out;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: discard synchronous on most SSDs?

2014-03-14 Thread Martin K. Petersen

> "Marc" == Marc MERLIN  writes:

Marc,

Marc> So I have Sata 3.1, that's great news, it means I can keep using
Marc> discard without worrying about performance and hangs

The fact that the drive reports compliance with a certain version of
SATA does not in any way imply that it implements all commands defined
in that specification.

The location where queued TRIM support is reported is somewhat unusual.
And last I looked hdparm -I had no infrastructure in place to report
stuff contained in log pages.

The kernel does look the right place to determine whether to issue the
queued or unqueued variant or not. But the information isn't exported to
userland.

So right now I'm afraid we don't have a good way for a user to determine
whether a device supports queued trims or not.

I guess we could consider either adding an ATA-specific "I don't suck"
flag in sysfs, add the missing code to hdparm, or both...

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: discard synchronous on most SSDs?

2014-03-14 Thread Marc MERLIN

On Fri, Mar 14, 2014 at 12:07:54PM +, Duncan wrote:
> Marc MERLIN posted on Thu, 13 Mar 2014 22:17:50 -0700 as excerpted:
> 
> > On Thu, Mar 13, 2014 at 09:39:02PM -0600, Chris Murphy wrote:
> >> 
> >> On Mar 13, 2014, at 8:11 PM, Marc MERLIN  wrote:
> >> 
> >> > On Sun, Mar 09, 2014 at 11:33:50AM +, Hugo Mills wrote:
> >> >> discard is, except on the very latest hardware, a synchronous
> >> >> command (it's a limitation of the SATA standard), and therefore
> >> >> results in very very poor performance.
> >> > 
> >> > Interesting. How do I know if a given SSD will hang on discard?
> >> > Is a Samsung EVO 840 1TB SSD latest hardware enough, or not? :)
> >> 
> >> smartctl -a or -x will tell you what SATA revision is in place. The
> >> queued trim support is in SATA Rev 3.1. I'm not certain if this
> >> requires only the drive to support that revision level, or both
> >> controller and drive.
> > 
> > I'm not sure I'm seeing this, which field is that?
> 
> > ATA Version is:   8
> > ATA Standard is:  ATA-8-ACS revision 4c
> 
> Your drive didn't report it, but here, I have SATA fields as well, in 
> addition to the ATA fields:
> 
> Here's the fields from my Corsair Neutron SSDs:
> 
> ATA Version is:   ATA8-ACS (minor revision not indicated)
> SATA Version is:  SATA 2.5, 6.0 Gb/s
> 
> Here's the fields from my Seagate 500-gig 2.5-inch spinning rust:
> 
> ATA Version is:   ATA8-ACS T13/1699-D revision 4
> SATA Version is:  SATA 2.6, 3.0 Gb/s

Ok, my smartmontools was too old. I got a newer one and now have proper
output:
Device Model: Samsung SSD 840 EVO 1TB
Serial Number:S1D9NEAD934600N
LU WWN Device Id: 5 002538 85009a8ff
Firmware Version: EXT0BB0Q
User Capacity:1,000,204,886,016 bytes [1.00 TB]
Sector Size:  512 bytes logical/physical
Rotation Rate:Solid State Device
Device is:Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4c
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:Fri Mar 14 10:49:39 2014 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

So I have Sata 3.1, that's great news, it means I can keep using discard
without worrying about performance and hangs

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems 
   what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs: remove transaction from send

2014-03-14 Thread Josef Bacik


On 03/13/2014 06:16 PM, Hugo Mills wrote:

On Thu, Mar 13, 2014 at 03:42:13PM -0400, Josef Bacik wrote:

Lets try this again.  We can deadlock the box if we send on a box and try to
write onto the same fs with the app that is trying to listen to the send pipe.
This is because the writer could get stuck waiting for a transaction commit
which is being blocked by the send.  So fix this by making sure looking at the
commit roots is always going to be consistent.  We do this by keeping track of
which roots need to have their commit roots swapped during commit, and then
taking the commit_root_sem and swapping them all at once.  Then make sure we
take a read lock on the commit_root_sem in cases where we search the commit root
to make sure we're always looking at a consistent view of the commit roots.
Previously we had problems with this because we would swap a fs tree commit root
and then swap the extent tree commit root independently which would cause the
backref walking code to screw up sometimes.  With this patch we no longer
deadlock and pass all the weird send/receive corner cases.  Thanks,


There's something still going on here. I managed to get about twice
as far through my test as I had before, but I again got an "unexpected
EOF in stream", with btrfs send returning 1. As before, I have this in
syslog:

Mar 13 22:09:12 s_src@amelia kernel: BTRFS error (device sda2): did not find 
backref in send_root. inode=1786631, offset=825257984, disk_byte=36504023040 
found extent=36504023040\x0a



I just noticed that the offset you have there is freaking gigantic, like 
700mb, which is way larger than what an extent should be.  Here is a 
newer debug patch, just chuck the old on and put this instead and re-run


http://paste.fedoraproject.org/85486/39482301

thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: warn at fs/btrfs/extent-tree.c:5748 __btrfs_free_extent+0x9ce/0xa20

2014-03-14 Thread Filipe David Manana

On Fri, Mar 14, 2014 at 3:35 PM, Josef Bacik  wrote:
> On 03/14/2014 11:34 AM, Sage Weil wrote:
>>
>> On Fri, 14 Mar 2014, Josef Bacik wrote:
>>>
>>> On 03/11/2014 07:44 PM, Sage Weil wrote:

 Hey,

 Is this something you guys have seen before?  This is from v3.13-rc2.

 kernel: [49432.696440] WARNING: CPU: 3 PID: 26411 at
 /srv/autobuild-ceph/gitbuilder.git/build/fs/btrfs/extent-tree.c:5748
 __btrfs_free_extent+0x9ce/0xa20 [btrfs]()
 kernel: [49432.710128] Modules linked in: arc4(F) md4(F) nls_utf8(F)
 cifs(F) ufs(F) qnx4(F) hfsplus(F) hfs(F) minix(F) ntfs(F) msdos(F) jfs(F)
 xfs(F) reiserfs(F) ext2(F) kvm_intel(F) kvm(F) ib_iser(F) rdma_cm(F)
 ib_cm(F) iw_cm(F) ib_sa(F) ib_mad(F) ib_core(F) ib_addr(F) iscsi_tcp(F)
 libiscsi_tcp(F) libiscsi(F) psmouse(F) ipmi_si(F) serio_raw(F) gpio_ich(F)
 joydev(F) dcdbas(F) i7core_edac(F) edac_core(F) ipmi_msghandler(F)
 mac_hid(F) acpi_power_meter(F) lpc_ich(F) tpm_tis(F) nfsd(F) nfs_acl(F)
 auth_rpcgss(F) scsi_transport_iscsi(F) nfs(F) fscache(F) lockd(F) lp(F)
 sunrpc(F) parport(F) hid_generic(F) usbhid(F) hid(F) btrfs(F) raid6_pq(F)
 mptsas(F) ixgbe(F) mptscsih(F) dca(F) mptbase(F) ptp(F) pps_core(F)
 scsi_transport_sas(F) xor(F) mdio(F) bnx2(F) libcrc32c(F)
 kernel: [49432.777445] CPU: 3 PID: 26411 Comm: ceph-osd Tainted: GF
 I  3.14.0-rc5-ceph-00016-gf31a96a #1
 kernel: [49432.786704] Hardware name: Dell Inc. PowerEdge R410/01V648,
 BIOS 1.6.3 02/07/2011
 kernel: [49432.794223]  1674 8800bf1cbac8
 816e4840 88022726ef90
 kernel: [49432.801700]   8800bf1cbb08
 810524ac a8b07e50
 kernel: [49432.809176]  880094e74120 
 b07c9000 
 kernel: [49432.816653] Call Trace:
 kernel: [49432.819119]  [] dump_stack+0x46/0x58
 kernel: [49432.825384]  []
 warn_slowpath_common+0x8c/0xc0
 kernel: [49432.831413]  []
 warn_slowpath_null+0x1a/0x20
 kernel: [49432.837284]  []
 __btrfs_free_extent+0x9ce/0xa20 [btrfs]
 kernel: [49432.844108]  []
 __btrfs_run_delayed_refs+0x428/0x11e0 [btrfs]
 kernel: [49432.851465]  [] ?
 block_rsv_release_bytes+0x108/0x190 [btrfs]
 kernel: [49432.858823]  []
 btrfs_run_delayed_refs+0x76/0x2a0 [btrfs]
 kernel: [49432.865869]  []
 __btrfs_end_transaction+0x26f/0x370 [btrfs]
 kernel: [49432.873044]  []
 btrfs_end_transaction+0x10/0x20 [btrfs]
 kernel: [49432.879872]  [] btrfs_link+0x13e/0x1d0
 [btrfs]
 kernel: [49432.885903]  [] vfs_link+0x1b1/0x270
 kernel: [49432.891060]  [] SyS_linkat+0x210/0x2d0
 kernel: [49432.896394]  [] SyS_link+0x1e/0x20
 kernel: [49432.901380]  []
 system_call_fastpath+0x1a/0x1f

 The full dump is at

 https://urldefense.proofpoint.com/v1/url?u=http://tracker.ceph.com/issues/7688&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=5Q0Wl4GGvXb3sw11Xy%2FYQnZbcMlzHHsbegI1uoQnEbE%3D%0A&s=f85b1094d776c10386c681a8a7b31e49f0621bf51829b6e7153095f2335a01c0

 https://urldefense.proofpoint.com/v1/url?u=http://tracker.ceph.com/attachments/download/1141/kern.log.gz&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=5Q0Wl4GGvXb3sw11Xy%2FYQnZbcMlzHHsbegI1uoQnEbE%3D%0A&s=dff103270aba751a919e566182a4d9482041f972ac72cb12435814ae75cacf14

>>>
>>> Filipe's looking at this Sage, you said it happend on v3.13-rc2 but the
>>> kernel line says 3.14.0-rc5, have you had it happen in both places?
>>> Thanks,
>>
>>
>> Whoops, that's my mistake.. it's 3.14-rc5.  The exact commit is it
>> git://github.com/ceph/ceph-client.git, if it matters; it's -rc5 + some
>> ceph patches.
>>
>
> Cool, not worried about what you guys are doing, just wondering if it may be
> related to me screwing around in delayed ref land recently or if you had
> seen it earlier too.  Thanks,

I ran into this a couple times months ago, definitely way before the
recent changes in the ref merging code added in 3.14. I had balance
running with concurrent snapshot creation and deletion at the time,
but unsuccessful so far to trigger it again.

>
> Josef
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: warn at fs/btrfs/extent-tree.c:5748 __btrfs_free_extent+0x9ce/0xa20

2014-03-14 Thread Josef Bacik


On 03/14/2014 11:34 AM, Sage Weil wrote:

On Fri, 14 Mar 2014, Josef Bacik wrote:

On 03/11/2014 07:44 PM, Sage Weil wrote:

Hey,

Is this something you guys have seen before?  This is from v3.13-rc2.

kernel: [49432.696440] WARNING: CPU: 3 PID: 26411 at 
/srv/autobuild-ceph/gitbuilder.git/build/fs/btrfs/extent-tree.c:5748 
__btrfs_free_extent+0x9ce/0xa20 [btrfs]()
kernel: [49432.710128] Modules linked in: arc4(F) md4(F) nls_utf8(F) cifs(F) 
ufs(F) qnx4(F) hfsplus(F) hfs(F) minix(F) ntfs(F) msdos(F) jfs(F) xfs(F) 
reiserfs(F) ext2(F) kvm_intel(F) kvm(F) ib_iser(F) rdma_cm(F) ib_cm(F) iw_cm(F) 
ib_sa(F) ib_mad(F) ib_core(F) ib_addr(F) iscsi_tcp(F) libiscsi_tcp(F) 
libiscsi(F) psmouse(F) ipmi_si(F) serio_raw(F) gpio_ich(F) joydev(F) dcdbas(F) 
i7core_edac(F) edac_core(F) ipmi_msghandler(F) mac_hid(F) acpi_power_meter(F) 
lpc_ich(F) tpm_tis(F) nfsd(F) nfs_acl(F) auth_rpcgss(F) scsi_transport_iscsi(F) 
nfs(F) fscache(F) lockd(F) lp(F) sunrpc(F) parport(F) hid_generic(F) usbhid(F) 
hid(F) btrfs(F) raid6_pq(F) mptsas(F) ixgbe(F) mptscsih(F) dca(F) mptbase(F) 
ptp(F) pps_core(F) scsi_transport_sas(F) xor(F) mdio(F) bnx2(F) libcrc32c(F)
kernel: [49432.777445] CPU: 3 PID: 26411 Comm: ceph-osd Tainted: GF I  
3.14.0-rc5-ceph-00016-gf31a96a #1
kernel: [49432.786704] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 
1.6.3 02/07/2011
kernel: [49432.794223]  1674 8800bf1cbac8 816e4840 
88022726ef90
kernel: [49432.801700]   8800bf1cbb08 810524ac 
a8b07e50
kernel: [49432.809176]  880094e74120  b07c9000 

kernel: [49432.816653] Call Trace:
kernel: [49432.819119]  [] dump_stack+0x46/0x58
kernel: [49432.825384]  [] warn_slowpath_common+0x8c/0xc0
kernel: [49432.831413]  [] warn_slowpath_null+0x1a/0x20
kernel: [49432.837284]  [] __btrfs_free_extent+0x9ce/0xa20 
[btrfs]
kernel: [49432.844108]  [] 
__btrfs_run_delayed_refs+0x428/0x11e0 [btrfs]
kernel: [49432.851465]  [] ? 
block_rsv_release_bytes+0x108/0x190 [btrfs]
kernel: [49432.858823]  [] btrfs_run_delayed_refs+0x76/0x2a0 
[btrfs]
kernel: [49432.865869]  [] 
__btrfs_end_transaction+0x26f/0x370 [btrfs]
kernel: [49432.873044]  [] btrfs_end_transaction+0x10/0x20 
[btrfs]
kernel: [49432.879872]  [] btrfs_link+0x13e/0x1d0 [btrfs]
kernel: [49432.885903]  [] vfs_link+0x1b1/0x270
kernel: [49432.891060]  [] SyS_linkat+0x210/0x2d0
kernel: [49432.896394]  [] SyS_link+0x1e/0x20
kernel: [49432.901380]  [] system_call_fastpath+0x1a/0x1f

The full dump is at


https://urldefense.proofpoint.com/v1/url?u=http://tracker.ceph.com/issues/7688&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=5Q0Wl4GGvXb3sw11Xy%2FYQnZbcMlzHHsbegI1uoQnEbE%3D%0A&s=f85b1094d776c10386c681a8a7b31e49f0621bf51829b6e7153095f2335a01c0

https://urldefense.proofpoint.com/v1/url?u=http://tracker.ceph.com/attachments/download/1141/kern.log.gz&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=5Q0Wl4GGvXb3sw11Xy%2FYQnZbcMlzHHsbegI1uoQnEbE%3D%0A&s=dff103270aba751a919e566182a4d9482041f972ac72cb12435814ae75cacf14



Filipe's looking at this Sage, you said it happend on v3.13-rc2 but the
kernel line says 3.14.0-rc5, have you had it happen in both places?  Thanks,


Whoops, that's my mistake.. it's 3.14-rc5.  The exact commit is it
git://github.com/ceph/ceph-client.git, if it matters; it's -rc5 + some
ceph patches.



Cool, not worried about what you guys are doing, just wondering if it 
may be related to me screwing around in delayed ref land recently or if 
you had seen it earlier too.  Thanks,


Josef

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: warn at fs/btrfs/extent-tree.c:5748 __btrfs_free_extent+0x9ce/0xa20

2014-03-14 Thread Sage Weil

On Fri, 14 Mar 2014, Josef Bacik wrote:
> On 03/11/2014 07:44 PM, Sage Weil wrote:
> > Hey,
> > 
> > Is this something you guys have seen before?  This is from v3.13-rc2.
> > 
> > kernel: [49432.696440] WARNING: CPU: 3 PID: 26411 at 
> > /srv/autobuild-ceph/gitbuilder.git/build/fs/btrfs/extent-tree.c:5748 
> > __btrfs_free_extent+0x9ce/0xa20 [btrfs]()
> > kernel: [49432.710128] Modules linked in: arc4(F) md4(F) nls_utf8(F) 
> > cifs(F) ufs(F) qnx4(F) hfsplus(F) hfs(F) minix(F) ntfs(F) msdos(F) jfs(F) 
> > xfs(F) reiserfs(F) ext2(F) kvm_intel(F) kvm(F) ib_iser(F) rdma_cm(F) 
> > ib_cm(F) iw_cm(F) ib_sa(F) ib_mad(F) ib_core(F) ib_addr(F) iscsi_tcp(F) 
> > libiscsi_tcp(F) libiscsi(F) psmouse(F) ipmi_si(F) serio_raw(F) gpio_ich(F) 
> > joydev(F) dcdbas(F) i7core_edac(F) edac_core(F) ipmi_msghandler(F) 
> > mac_hid(F) acpi_power_meter(F) lpc_ich(F) tpm_tis(F) nfsd(F) nfs_acl(F) 
> > auth_rpcgss(F) scsi_transport_iscsi(F) nfs(F) fscache(F) lockd(F) lp(F) 
> > sunrpc(F) parport(F) hid_generic(F) usbhid(F) hid(F) btrfs(F) raid6_pq(F) 
> > mptsas(F) ixgbe(F) mptscsih(F) dca(F) mptbase(F) ptp(F) pps_core(F) 
> > scsi_transport_sas(F) xor(F) mdio(F) bnx2(F) libcrc32c(F)
> > kernel: [49432.777445] CPU: 3 PID: 26411 Comm: ceph-osd Tainted: GF 
> > I  3.14.0-rc5-ceph-00016-gf31a96a #1
> > kernel: [49432.786704] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 
> > 1.6.3 02/07/2011
> > kernel: [49432.794223]  1674 8800bf1cbac8 816e4840 
> > 88022726ef90
> > kernel: [49432.801700]   8800bf1cbb08 810524ac 
> > a8b07e50
> > kernel: [49432.809176]  880094e74120  b07c9000 
> > 
> > kernel: [49432.816653] Call Trace:
> > kernel: [49432.819119]  [] dump_stack+0x46/0x58
> > kernel: [49432.825384]  [] warn_slowpath_common+0x8c/0xc0
> > kernel: [49432.831413]  [] warn_slowpath_null+0x1a/0x20
> > kernel: [49432.837284]  [] 
> > __btrfs_free_extent+0x9ce/0xa20 [btrfs]
> > kernel: [49432.844108]  [] 
> > __btrfs_run_delayed_refs+0x428/0x11e0 [btrfs]
> > kernel: [49432.851465]  [] ? 
> > block_rsv_release_bytes+0x108/0x190 [btrfs]
> > kernel: [49432.858823]  [] 
> > btrfs_run_delayed_refs+0x76/0x2a0 [btrfs]
> > kernel: [49432.865869]  [] 
> > __btrfs_end_transaction+0x26f/0x370 [btrfs]
> > kernel: [49432.873044]  [] 
> > btrfs_end_transaction+0x10/0x20 [btrfs]
> > kernel: [49432.879872]  [] btrfs_link+0x13e/0x1d0 [btrfs]
> > kernel: [49432.885903]  [] vfs_link+0x1b1/0x270
> > kernel: [49432.891060]  [] SyS_linkat+0x210/0x2d0
> > kernel: [49432.896394]  [] SyS_link+0x1e/0x20
> > kernel: [49432.901380]  [] system_call_fastpath+0x1a/0x1f
> > 
> > The full dump is at
> > 
> > 
> > https://urldefense.proofpoint.com/v1/url?u=http://tracker.ceph.com/issues/7688&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=5Q0Wl4GGvXb3sw11Xy%2FYQnZbcMlzHHsbegI1uoQnEbE%3D%0A&s=f85b1094d776c10386c681a8a7b31e49f0621bf51829b6e7153095f2335a01c0
> > 
> > https://urldefense.proofpoint.com/v1/url?u=http://tracker.ceph.com/attachments/download/1141/kern.log.gz&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=5Q0Wl4GGvXb3sw11Xy%2FYQnZbcMlzHHsbegI1uoQnEbE%3D%0A&s=dff103270aba751a919e566182a4d9482041f972ac72cb12435814ae75cacf14
> > 
> 
> Filipe's looking at this Sage, you said it happend on v3.13-rc2 but the
> kernel line says 3.14.0-rc5, have you had it happen in both places?  Thanks,

Whoops, that's my mistake.. it's 3.14-rc5.  The exact commit is it 
git://github.com/ceph/ceph-client.git, if it matters; it's -rc5 + some 
ceph patches.

Thanks!
sage
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: warn at fs/btrfs/extent-tree.c:5748 __btrfs_free_extent+0x9ce/0xa20

2014-03-14 Thread Josef Bacik

On 03/11/2014 07:44 PM, Sage Weil wrote:
> Hey,
> 
> Is this something you guys have seen before?  This is from v3.13-rc2.
> 
> kernel: [49432.696440] WARNING: CPU: 3 PID: 26411 at 
> /srv/autobuild-ceph/gitbuilder.git/build/fs/btrfs/extent-tree.c:5748 
> __btrfs_free_extent+0x9ce/0xa20 [btrfs]()
> kernel: [49432.710128] Modules linked in: arc4(F) md4(F) nls_utf8(F) cifs(F) 
> ufs(F) qnx4(F) hfsplus(F) hfs(F) minix(F) ntfs(F) msdos(F) jfs(F) xfs(F) 
> reiserfs(F) ext2(F) kvm_intel(F) kvm(F) ib_iser(F) rdma_cm(F) ib_cm(F) 
> iw_cm(F) ib_sa(F) ib_mad(F) ib_core(F) ib_addr(F) iscsi_tcp(F) 
> libiscsi_tcp(F) libiscsi(F) psmouse(F) ipmi_si(F) serio_raw(F) gpio_ich(F) 
> joydev(F) dcdbas(F) i7core_edac(F) edac_core(F) ipmi_msghandler(F) mac_hid(F) 
> acpi_power_meter(F) lpc_ich(F) tpm_tis(F) nfsd(F) nfs_acl(F) auth_rpcgss(F) 
> scsi_transport_iscsi(F) nfs(F) fscache(F) lockd(F) lp(F) sunrpc(F) parport(F) 
> hid_generic(F) usbhid(F) hid(F) btrfs(F) raid6_pq(F) mptsas(F) ixgbe(F) 
> mptscsih(F) dca(F) mptbase(F) ptp(F) pps_core(F) scsi_transport_sas(F) xor(F) 
> mdio(F) bnx2(F) libcrc32c(F)
> kernel: [49432.777445] CPU: 3 PID: 26411 Comm: ceph-osd Tainted: GF I 
>  3.14.0-rc5-ceph-00016-gf31a96a #1
> kernel: [49432.786704] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 
> 1.6.3 02/07/2011
> kernel: [49432.794223]  1674 8800bf1cbac8 816e4840 
> 88022726ef90
> kernel: [49432.801700]   8800bf1cbb08 810524ac 
> a8b07e50
> kernel: [49432.809176]  880094e74120  b07c9000 
> 
> kernel: [49432.816653] Call Trace:
> kernel: [49432.819119]  [] dump_stack+0x46/0x58
> kernel: [49432.825384]  [] warn_slowpath_common+0x8c/0xc0
> kernel: [49432.831413]  [] warn_slowpath_null+0x1a/0x20
> kernel: [49432.837284]  [] __btrfs_free_extent+0x9ce/0xa20 
> [btrfs]
> kernel: [49432.844108]  [] 
> __btrfs_run_delayed_refs+0x428/0x11e0 [btrfs]
> kernel: [49432.851465]  [] ? 
> block_rsv_release_bytes+0x108/0x190 [btrfs]
> kernel: [49432.858823]  [] 
> btrfs_run_delayed_refs+0x76/0x2a0 [btrfs]
> kernel: [49432.865869]  [] 
> __btrfs_end_transaction+0x26f/0x370 [btrfs]
> kernel: [49432.873044]  [] btrfs_end_transaction+0x10/0x20 
> [btrfs]
> kernel: [49432.879872]  [] btrfs_link+0x13e/0x1d0 [btrfs]
> kernel: [49432.885903]  [] vfs_link+0x1b1/0x270
> kernel: [49432.891060]  [] SyS_linkat+0x210/0x2d0
> kernel: [49432.896394]  [] SyS_link+0x1e/0x20
> kernel: [49432.901380]  [] system_call_fastpath+0x1a/0x1f
> 
> The full dump is at
> 
>   
> https://urldefense.proofpoint.com/v1/url?u=http://tracker.ceph.com/issues/7688&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=5Q0Wl4GGvXb3sw11Xy%2FYQnZbcMlzHHsbegI1uoQnEbE%3D%0A&s=f85b1094d776c10386c681a8a7b31e49f0621bf51829b6e7153095f2335a01c0
>   
> https://urldefense.proofpoint.com/v1/url?u=http://tracker.ceph.com/attachments/download/1141/kern.log.gz&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=5Q0Wl4GGvXb3sw11Xy%2FYQnZbcMlzHHsbegI1uoQnEbE%3D%0A&s=dff103270aba751a919e566182a4d9482041f972ac72cb12435814ae75cacf14
> 

Filipe's looking at this Sage, you said it happend on v3.13-rc2 but the
kernel line says 3.14.0-rc5, have you had it happen in both places?  Thanks,

Josef

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Incremental backup for a raid1

2014-03-14 Thread Austin S Hemmelgarn

On 2014-03-14 09:46, George Mitchell wrote:
> Actually, an interesting concept would be to have the initial two drive
> RAID 1 mirrored by 2 additional drives in 4-way configuration on a
> second machine at a remote location on a private high speed network with
> both machines up 24/7.  In that case, if such a configuration would
> work, either machine could be obliterated and the data would survive
> fully intact in full duplex mode.  It would just need to be remounted
> from the backup system and away it goes.  Just thinking of interesting
> possibilities with n-way mirroring.  Oh how I would love to have n-way
> mirroring to play with!
That can already be done, albeit slightly differently by stacking btrfs
RAID 1 on top of a pair of DRBD devices.  Of course, this doesn't
provide quite the same degree of safety as your suggestion, but it does
work (and DRBD makes the remote copy write-mostly for the local system
automatically).
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs: remove transaction from send

2014-03-14 Thread Josef Bacik

On 03/14/2014 09:13 AM, Wang Shilong wrote:
>> Lets try this again.  We can deadlock the box if we send on a box and try to
>> write onto the same fs with the app that is trying to listen to the send 
>> pipe.
>> This is because the writer could get stuck waiting for a transaction commit
>> which is being blocked by the send.  So fix this by making sure looking at 
>> the
>> commit roots is always going to be consistent.  We do this by keeping track 
>> of
>> which roots need to have their commit roots swapped during commit, and then
>> taking the commit_root_sem and swapping them all at once.  Then make sure we
>> take a read lock on the commit_root_sem in cases where we search the commit 
>> root
>> to make sure we're always looking at a consistent view of the commit roots.
>> Previously we had problems with this because we would swap a fs tree commit 
>> root
>> and then swap the extent tree commit root independently which would cause the
>> backref walking code to screw up sometimes.  With this patch we no longer
>> deadlock and pass all the weird send/receive corner cases.  Thanks,
> 
> Now btrfs send are alway searching commit root! Your codes only seems to 
> protect backref codes,
> it reduce transaction blocked but make it not safe as we have discussed 
> before.
> 
>

I was trying to remember why we didn't like this solution before but I
couldn't come up with anything.  Apparently I haven't completely fixed
the problem yet so stay tuned for what I do next ;).  Thanks,

Josef

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 3.14.0-rc3: btrfs send/receive blocks btrfs IO on other devices (near deadlocks)

2014-03-14 Thread Josef Bacik

On 03/12/2014 11:18 AM, Marc MERLIN wrote:
> I have a file server with 4 cpu cores and 5 btrfs devices: Label:
> btrfs_boot  uuid: e4c1daa8-9c39-4a59-b0a9-86297d397f3b Total
> devices 1 FS bytes used 48.92GiB devid1 size 79.93GiB used
> 73.04GiB path /dev/mapper/cryptroot
> 
> Label: varlocalspace  uuid: 9f46dbe2-1344-44c3-b0fb-af2888c34f18 
> Total devices 1 FS bytes used 1.10TiB devid1 size 1.63TiB used
> 1.50TiB path /dev/mapper/cryptraid0
> 
> Label: btrfs_pool1  uuid: 6358304a-2234-4243-b02d-4944c9af47d7 
> Total devices 1 FS bytes used 7.16TiB devid1 size 14.55TiB used
> 7.50TiB path /dev/mapper/dshelf1
> 
> Label: btrfs_pool2  uuid: cb9df6d3-a528-4afc-9a45-4fed5ec358d6 
> Total devices 1 FS bytes used 3.34TiB devid1 size 7.28TiB used
> 3.42TiB path /dev/mapper/dshelf2
> 
> Label: bigbackup  uuid: 024ba4d0-dacb-438d-9f1b-eeb34083fe49 Total
> devices 5 FS bytes used 6.02TiB devid1 size 1.82TiB used
> 1.43TiB path /dev/dm-9 devid2 size 1.82TiB used 1.43TiB path
> /dev/dm-6 devid3 size 1.82TiB used 1.43TiB path /dev/dm-5 devid
> 4 size 1.82TiB used 1.43TiB path /dev/dm-7 devid5 size 1.82TiB
> used 1.43TiB path /dev/dm-8
> 
> 
> I have a very long running btrfs send/receive from btrfs_pool1 to
> bigbackup (long running meaning that it's been slowly copying over
> 5 days)
> 
> The problem is that this is blocking IO to btrfs_pool2 which is
> using totally different drives. By blocking IO I mean that IO to
> pool2 kind of works sometimes, and hangs for very long times at
> other times.
> 
> It looks as if one rsync to btrfs_pool2 or one piece of IO hangs on
> a shared lock and once that happens, all IO to btrfs_pool2 stops
> for a long time. It does recover eventually without reboot, but the
> wait times are ridiculous (it could be 1H or more).
> 
> As I write this, I have a killall -9 rsync that waited for over
> 10mn before these processes would finally die: 23555   07:36
> wait_current_trans.isra.15 rsync -av -SH --delete (...) 23556
> 07:36 exit   [rsync]  25387
> 2-04:41:22 wait_current_trans.isra.15 rsync --password-file
> (...) 27481   31:26 wait_current_trans.isra.15 rsync
> --password-file  (...) 2926804:41:34 wait_current_trans.isra.15
> rsync --password-file  (...) 2934304:41:31 exit
> [rsync]  2949204:41:27 wait_current_trans.isra.15
> rsync --password-file  (...)
> 
> 1455907:14:49 wait_current_trans.isra.15 cp -i -al current
> 20140312-feisty
> 
> This is all stuck in btrfs kernel code. If someeone wants sysrq-w,
> there it is. 
> https://urldefense.proofpoint.com/v1/url?u=http://marc.merlins.org/tmp/btrfs_full.txt&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=NfFB494sWgA3qCQbFaAQO2FapIJ6kuZcyS6PlP%2FXkCg%3D%0A&s=573f0b2deecc8980550a7645c9627b918659e0ab067590577c8ead4a59498bc1
>
>  A quick summary: SysRq : Show Blocked State task
> PC stack   pid father btrfs-cleaner   D 8802126b0840 0
> 3332  2 0x 8800c5dc9d00 0046
> 8800c5dc9fd8 8800c69f6310 000141c0 8800c69f6310
> 88017574c170 880211e671e8  880211e67000
> 8801e5936e20 8800c5dc9d10 Call Trace: []
> schedule+0x73/0x75 []
> wait_current_trans.isra.15+0x98/0xf4 [] ?
> finish_wait+0x65/0x65 []
> start_transaction+0x48e/0x4f2 [] ?
> __btrfs_end_transaction+0x2a1/0x2c6 []
> btrfs_start_transaction+0x1b/0x1d []
> btrfs_drop_snapshot+0x443/0x610 [] ?
> _raw_spin_unlock+0x17/0x2a [] ?
> finish_task_switch+0x51/0xdb [] ?
> __schedule+0x537/0x5de []
> btrfs_clean_one_deleted_snapshot+0x103/0x10f []
> cleaner_kthread+0x103/0x136 [] ?
> btrfs_alloc_root+0x26/0x26 [] kthread+0xae/0xb6 
> [] ? __kthread_parkme+0x61/0x61 
> [] ret_from_fork+0x7c/0xb0 [] ?
> __kthread_parkme+0x61/0x61 btrfs-transacti D 88021387eb00 0
>   2 0x 8800c5dcb890 0046
> 8800c5dcbfd8 88021387e5d0 000141c0 88021387e5d0
> 88021f2141c0 88021387e5d0 8800c5dcb930 810fe574
> 0002 8800c5dcb8a0 Call Trace: []
> ? wait_on_page_read+0x3c/0x3c []
> schedule+0x73/0x75 [] io_schedule+0x60/0x7a 
> [] sleep_on_page+0xe/0x12 []
> __wait_on_bit+0x48/0x7a []
> wait_on_page_bit+0x7a/0x7c [] ?
> autoremove_wake_function+0x34/0x34 []
> read_extent_buffer_pages+0x1bf/0x204 [] ?
> free_root_pointers+0x5b/0x5b []
> btree_read_extent_buffer_pages.constprop.45+0x66/0x100 
> [] read_tree_block+0x2f/0x47 []
> read_block_for_search.isra.26+0x24a/0x287 []
> btrfs_search_slot+0x4f4/0x6bb []
> lookup_inline_extent_backref+0xda/0x3fb []
> __btrfs_free_extent+0xf4/0x712 []
> __btrfs_run_delayed_refs+0x939/0xbdf []
> btrfs_run_delayed_refs+0x81/0x18f []
> btrfs_commit_transaction+0x3a9/0x849 [] ?
> finish_wait+0x65/0x65 []
> transaction_kthread+0xf8/0x1ab [] ?
> btrfs_cleanup_transaction+0x43f/0x43f []
> kthread+0xae/0xb6 [] ?
> __kthread_parkme+0x61/0x61 []
> ret_from_fork+0x7c/0xb0 [] ?
> __kthread_parkme+0x61/0x

Re: [PATCH 1/2] btrfs: Cleanup the btrfs_workqueue related function type

2014-03-14 Thread David Sterba

On Thu, Mar 06, 2014 at 04:19:50AM +, quwen...@cn.fujitsu.com wrote:
> @@ -23,11 +23,13 @@
>  struct btrfs_workqueue;
>  /* Internal use only */
>  struct __btrfs_workqueue;
> +struct btrfs_work;
> +typedef void (*btrfs_func_t)(struct btrfs_work *arg);

I don't see what's wrong with the non-typedef type, CodingStyle
discourages from using typedefs in general (Chapter 5).

The name btrfs_func_t is a generic, if you really need to use a typedef
here, please change it to something closer to the workqueues, eg.
btrfs_work_func_t.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Incremental backup for a raid1

2014-03-14 Thread Duncan

George Mitchell posted on Fri, 14 Mar 2014 06:46:19 -0700 as excerpted:

> Actually, an interesting concept would be to have the initial two drive
> RAID 1 mirrored by 2 additional drives in 4-way configuration on a
> second machine at a remote location on a private high speed network with
> both machines up 24/7.  In that case, if such a configuration would
> work, either machine could be obliterated and the data would survive
> fully intact in full duplex mode.  It would just need to be remounted
> from the backup system and away it goes.  Just thinking of interesting
> possibilities with n-way mirroring.  Oh how I would love to have n-way
> mirroring to play with!

In terms of raid1, mdraid already supports such a concept with its "write 
mostly" component device designation.  A component device designated 
"write mostly" is never read from unless it becomes the only device 
available, so it's perfect for such an "over-the-net real-time-online-
backup" solution.

The other half of the solution are the various block-device-over-network 
drivers such as BLK_DEV_NBD (see Documentation/blockdev/nbd.txt) for the 
client side, the server-side of which is in userspace.  That lets you 
have what appears to be a block-device routed over the inet to that 
remote location.

Of course mdraid is lacking btrfs' data integrity features, etc, with its 
raid1 implementation entirely lacking any data integrity or real-time 
cross-checking at all, but unlike btrfs' N-way-mirroring it gets points 
for actually being available right now, so as they say, YMMV.

Of course the other notable issue with your idea is that while it DOES 
address the real-time remote redundancy issue, that doesn't (by itself) 
deal with fat-fingering or similar issues where real-time actually means 
the same problem's duplicated to the backup as well.

But btrfs snapshots address the fat-fingering issue and can be done on 
the partially-remote filesystem solution as well, and local or remote-
local solutions (like periodic btrfs send to a separate local filesystem 
at both ends) can deal with the filesystem damage possibilities.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs-progs: fsck: disable --init-extent-tree option when using snapshots

2014-03-14 Thread Josef Bacik

On 03/14/2014 09:36 AM, Wang Shilong wrote:
> Hi Josef,
> 
> Just ping this again.
> 
> Did you have any good ideas to rebuild extent tree if broken
> filesystem is filled with snapshots.?
> 
> I was working on this recently, i was blocked that i can not verify
> if an extent is *FULL BACKREF* mode or not. As a *FULL BACKREF*
> extent's refs can be 1 and more than 1..
> 
> I am willing to test  codes or have a try if you could give me some
> advice etc.
> 

Full backrefs aren't too hard.  Basically all you have to do is walk
down the fs tree and keep track of btrfs_header_owner(eb) for
everything we walk into.  If btrfs_header_owner(eb) == root->objectid
for the tree we are walking down then we need a ye olde normal backref
for this block.  If btrfs_header_owner(eb) != root->objectid we _may_
need a full backref, it depends on who owns the parent block.  The
following may be incomplete, I'm kind of sick

1) We walk down the original tree, every eb we encounter has
btrfs_header_owner(eb) == root->objectid.  We add normal references
for this root (BTRFS_TREE_BLOCK_REF_KEY) for this root.  World peace
is achieved.

2) We walk down the snapshotted tree.  Say we didn't change anything
at all, it was just a clean snapshot and then boom.  So the
btrfs_header_owner(root->node) == root->objectid, so normal backref.
We walk down to the next level, where btrfs_header_owner(eb) !=
root->objectid, but the level above did, so we add normal refs for all
of these blocks.  We go down the next level, now our
btrfs_header_owner(parent) != root->objectid and
btrfs_header_owner(eb) != root->objectid.  This is where we need to
now go back and see if btrfs_header_owner(eb) currently has a ref on
eb.  If it does we are done, move on to the next block in this same
level, we don't have to go further down.

3) Harder case, we snapshotted and then changed things in the original
root.  Do the same thing as in step 2, but now we get down to
btrfs_header_level(eb) != root->objectid && btrfs_header_level(parent)
!= root->objectid.  We lookup the references we have for eb and notice
that btrfs_header_owner(eb) no longer refers to eb.  So now we must
set FULL_BACKREF on this extent reference and add a
SHARED_BLOCK_REF_KEY for this eb using the parent->start as the
offset.  And we need to keep walking down and doing the same thing
until we either hit level 0 or btrfs_header_owner(eb) has a ref on the
block.

4) Not really a whole special case, just something to keep in mind, if
btrfs_header_owner(parent) == root->objectid but
btrfs_header_owner(eb) != root->objectid that means we have a normal
TREE_BLOCK_REF on eb, it's only when the parent doesn't match our
current root that it's a problem.

Does that make sense?  Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Incremental backup for a raid1

2014-03-14 Thread George Mitchell

Actually, an interesting concept would be to have the initial two drive 
RAID 1 mirrored by 2 additional drives in 4-way configuration on a 
second machine at a remote location on a private high speed network with 
both machines up 24/7.  In that case, if such a configuration would 
work, either machine could be obliterated and the data would survive 
fully intact in full duplex mode.  It would just need to be remounted 
from the backup system and away it goes.  Just thinking of interesting 
possibilities with n-way mirroring.  Oh how I would love to have n-way 
mirroring to play with!




On 03/14/2014 04:24 AM, Duncan wrote:

Michael Schuerig posted on Fri, 14 Mar 2014 09:56:20 +0100 as excerpted:

[Duncan posted...]


3) Disconnect the backup device(s).  (Don't btrfs device delete, this
would remove the copy.  Just disconnect.)

Hmm...  Looking back at what I wrote...

Presumably either have the filesystem unmounted for the disconnect (and
ideally, the system off, tho with modern drives in theory that's not an
issue, but still good if it can be done), or at least remounted read-only.

I had guessed that was implicit, but making it explicit is probably best
all around, just in case.  At least I can rest better with it, having
made that explicit.



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs-progs: fsck: disable --init-extent-tree option when using snapshots

2014-03-14 Thread Wang Shilong

Hi Josef,

Just ping this again.

Did you have any good ideas to rebuild extent tree if broken filesystem
is filled with snapshots.?

I was working on this recently, i was blocked that i can not verify if an extent
is *FULL BACKREF* mode or not. As a *FULL BACKREF* extent's refs can be 1
and more than 1..

I am willing to test  codes or have a try if you could give me some advice etc.

-Wang

> On 03/10/2014 11:50 PM, Josef Bacik wrote:
>> -BEGIN PGP SIGNED MESSAGE-
>> Hash: SHA1
>> 
>> On 03/10/2014 08:12 AM, Shilong Wang wrote:
>>> Hi Josef,
>>> 
>>> As i haven't thought any better ideas to rebuild extent tree which
>>> contains extent that owns 'FULL BACKREF' flag.
>>> 
>>> Considering an extent's refs can be equal or more than 1 if this
>>> extent has *FULL BACKREF* flag, so we could not make sure an
>>> extent's flag by only searching fs/file tree any more.
>>> 
>>> So until now, i just disable this option if snapshots exists,
>>> please correct me if i miss something here. Or you have any better
>>> ideas to solve this problem.~_~
>>> 
>>> 
>> I thought the fsck stuff rebuilds full backref refs properly, does it
>> not?  If it doesn't we need to fix that, however I'm fine with
>> disabling the option if snapshots exist for the time being.  Thanks,
> If there are no snapshots, --init-extent-tree can works as expected.
> I just have not thought a better idea to rebuild extent tree if we do have
> snapshots which means we may have an extent with *FULL BACKREF*
> flag.
> 
> Thanks,
> Wang
>> 
>> Josef
>> -BEGIN PGP SIGNATURE-
>> Version: GnuPG v1
>> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>> 
>> iQIcBAEBAgAGBQJTHd9NAAoJEANb+wAKly3BYCYP/0iTaaa7w0SnfXtgjoVyX+nT
>> +e0Pa46zeKzpTujotCDb9E/2PBesCAvA4Psog3rkfsqJ2nXN9cERN4E6/JG4nAHh
>> Hv4KPo+w+tCkC4U2wSoDivYrVk9G5SH25ewkgW6iheSYNIlm+PLbOQz9DzGjCFDp
>> 51J9tG5E010siOyhlLCyGj8ZTj+gXuoQVWKCS8dOpCLMrbYYjMDXa562hqWaLoS/
>> t3eSfP7Tnnpl43NiMZI4fWrzmlFa5lba5iJmG59FeyiseRH4Zrhee4St1L1xDL5A
>> /6f3tJJT7DJjRRJFv0nJAOvOPyFaK8bMaYmOQJg6VrhcyPKM3BxBVEab3HrmQ7jt
>> LCMWobpIcM7e5BugmbTGGsFymhv05SQgvYGzpzRVXdsSzqubuqTcXwloNU5RyyFF
>> sXT9IiW9wAibHe7mDN7V6nfo1bVfHsjvSVi1rqz4/zFOWyh8oqxfEhxUJIWhfFsn
>> j0WJevvqKnjBJujyyuQpL13tzh69qei0AHOEme3R46BSRMnyuacy/WOeyo4VXPcn
>> 0GIeWbngAIWF/quhoQGkvofRMlPgftiDge8uz9pbm3IEKeiP9dQ/HvKsIHMKjnKW
>> 3dEBvMV/CSUQNek4VjO1ALefTRZQvJVL8Wxdij4W+djJw/uVX7fOhuqdkqyfM3FY
>> CKSB3HUSUtDCammsvgQA
>> =OT98
>> -END PGP SIGNATURE-
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs: remove transaction from send

2014-03-14 Thread Wang Shilong

> Lets try this again.  We can deadlock the box if we send on a box and try to
> write onto the same fs with the app that is trying to listen to the send pipe.
> This is because the writer could get stuck waiting for a transaction commit
> which is being blocked by the send.  So fix this by making sure looking at the
> commit roots is always going to be consistent.  We do this by keeping track of
> which roots need to have their commit roots swapped during commit, and then
> taking the commit_root_sem and swapping them all at once.  Then make sure we
> take a read lock on the commit_root_sem in cases where we search the commit 
> root
> to make sure we're always looking at a consistent view of the commit roots.
> Previously we had problems with this because we would swap a fs tree commit 
> root
> and then swap the extent tree commit root independently which would cause the
> backref walking code to screw up sometimes.  With this patch we no longer
> deadlock and pass all the weird send/receive corner cases.  Thanks,

Now btrfs send are alway searching commit root! Your codes only seems to 
protect backref codes,
it reduce transaction blocked but make it not safe as we have discussed before.

-Wang
> 
> Reportedy-by: Hugo Mills 
> Signed-off-by: Josef Bacik 
> ---
> fs/btrfs/backref.c | 33 +++
> fs/btrfs/ctree.c   | 88 --
> fs/btrfs/ctree.h   |  3 +-
> fs/btrfs/disk-io.c |  3 +-
> fs/btrfs/extent-tree.c | 20 ++--
> fs/btrfs/inode-map.c   | 14 
> fs/btrfs/send.c| 57 ++--
> fs/btrfs/transaction.c | 45 --
> fs/btrfs/transaction.h |  1 +
> 9 files changed, 77 insertions(+), 187 deletions(-)
> 
> diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
> index 860f4f2..0be0e94 100644
> --- a/fs/btrfs/backref.c
> +++ b/fs/btrfs/backref.c
> @@ -329,7 +329,10 @@ static int __resolve_indirect_ref(struct btrfs_fs_info 
> *fs_info,
>   goto out;
>   }
> 
> - root_level = btrfs_old_root_level(root, time_seq);
> + if (path->search_commit_root)
> + root_level = btrfs_header_level(root->commit_root);
> + else
> + root_level = btrfs_old_root_level(root, time_seq);
> 
>   if (root_level + 1 == level) {
>   srcu_read_unlock(&fs_info->subvol_srcu, index);
> @@ -1092,9 +1095,9 @@ static int btrfs_find_all_leafs(struct 
> btrfs_trans_handle *trans,
>  *
>  * returns 0 on success, < 0 on error.
>  */
> -int btrfs_find_all_roots(struct btrfs_trans_handle *trans,
> - struct btrfs_fs_info *fs_info, u64 bytenr,
> - u64 time_seq, struct ulist **roots)
> +static int __btrfs_find_all_roots(struct btrfs_trans_handle *trans,
> +   struct btrfs_fs_info *fs_info, u64 bytenr,
> +   u64 time_seq, struct ulist **roots)
> {
>   struct ulist *tmp;
>   struct ulist_node *node = NULL;
> @@ -1130,6 +1133,20 @@ int btrfs_find_all_roots(struct btrfs_trans_handle 
> *trans,
>   return 0;
> }
> 
> +int btrfs_find_all_roots(struct btrfs_trans_handle *trans,
> +  struct btrfs_fs_info *fs_info, u64 bytenr,
> +  u64 time_seq, struct ulist **roots)
> +{
> + int ret;
> +
> + if (!trans)
> + down_read(&fs_info->commit_root_sem);
> + ret = __btrfs_find_all_roots(trans, fs_info, bytenr, time_seq, roots);
> + if (!trans)
> + up_read(&fs_info->commit_root_sem);
> + return ret;
> +}
> +
> /*
>  * this makes the path point to (inum INODE_ITEM ioff)
>  */
> @@ -1509,6 +1526,8 @@ int iterate_extent_inodes(struct btrfs_fs_info *fs_info,
>   if (IS_ERR(trans))
>   return PTR_ERR(trans);
>   btrfs_get_tree_mod_seq(fs_info, &tree_mod_seq_elem);
> + } else {
> + down_read(&fs_info->commit_root_sem);
>   }
> 
>   ret = btrfs_find_all_leafs(trans, fs_info, extent_item_objectid,
> @@ -1519,8 +1538,8 @@ int iterate_extent_inodes(struct btrfs_fs_info *fs_info,
> 
>   ULIST_ITER_INIT(&ref_uiter);
>   while (!ret && (ref_node = ulist_next(refs, &ref_uiter))) {
> - ret = btrfs_find_all_roots(trans, fs_info, ref_node->val,
> -tree_mod_seq_elem.seq, &roots);
> + ret = __btrfs_find_all_roots(trans, fs_info, ref_node->val,
> +  tree_mod_seq_elem.seq, &roots);
>   if (ret)
>   break;
>   ULIST_ITER_INIT(&root_uiter);
> @@ -1542,6 +1561,8 @@ out:
>   if (!search_commit_root) {
>   btrfs_put_tree_mod_seq(fs_info, &tree_mod_seq_elem);
>   btrfs_end_transaction(trans, fs_info->extent_root);
> + } else {
> + up_read(&fs_info->commit_root_sem);
>   }
> 
>   return ret;
> diff --git a/fs/btrfs/ctree

Re: [PATCH] Btrfs: fix joining same transaction handle more than twice

2014-03-14 Thread Wang Shilong


On 03/13/2014 10:05 PM, Josef Bacik wrote:

On 03/13/2014 01:19 AM, Wang Shilong wrote:

We hit something like the following function call flows:

|->run_delalloc_range()
  |->btrfs_join_transaction()
|->cow_file_range()
  |->btrfs_join_transaction()
|->find_free_extent()
  |->btrfs_join_transaction()

Trace infomation can be seen as:

[ 7411.127040] [ cut here ]
[ 7411.127060] WARNING: CPU: 0 PID: 11557 at fs/btrfs/transaction.c:383 
start_transaction+0x561/0x580 [btrfs]()
[ 7411.127079] CPU: 0 PID: 11557 Comm: kworker/u8:9 Tainted: G   O 
3.13.0+ #4
[ 7411.127080] Hardware name: LENOVO QiTianM4350/ , BIOS F1KT52AUS 05/24/2013
[ 7411.127085] Workqueue: writeback bdi_writeback_workfn (flush-btrfs-5)
[ 7411.127092] Call Trace:
[ 7411.127097]  [] dump_stack+0x45/0x56
[ 7411.127101]  [] warn_slowpath_common+0x7d/0xa0
[ 7411.127102]  [] warn_slowpath_null+0x1a/0x20
[ 7411.127109]  [] start_transaction+0x561/0x580 [btrfs]
[ 7411.127115]  [] btrfs_join_transaction+0x17/0x20 [btrfs]
[ 7411.127120]  [] find_free_extent+0xa21/0xb50 [btrfs]
[ 7411.127126]  [] btrfs_reserve_extent+0xa8/0x1a0 [btrfs]
[ 7411.127131]  [] btrfs_alloc_free_block+0xee/0x440 [btrfs]
[ 7411.127137]  [] ? btree_set_page_dirty+0xe/0x10 [btrfs]
[ 7411.127142]  [] __btrfs_cow_block+0x121/0x530 [btrfs]
[ 7411.127146]  [] btrfs_cow_block+0x11f/0x1c0 [btrfs]
[ 7411.127151]  [] btrfs_search_slot+0x1d4/0x9c0 [btrfs]
[ 7411.127157]  [] btrfs_lookup_file_extent+0x37/0x40 [btrfs]
[ 7411.127163]  [] __btrfs_drop_extents+0x16c/0xd90 [btrfs]
[ 7411.127169]  [] ? start_transaction+0x93/0x580 [btrfs]
[ 7411.127171]  [] ? kmem_cache_alloc+0x132/0x140
[ 7411.127176]  [] ? btrfs_alloc_path+0x1a/0x20 [btrfs]
[ 7411.127182]  [] cow_file_range_inline+0x181/0x2e0 [btrfs]
[ 7411.127187]  [] cow_file_range+0x2ed/0x440 [btrfs]
[ 7411.127194]  [] ? free_extent_buffer+0x4f/0xb0 [btrfs]
[ 7411.127200]  [] run_delalloc_nocow+0x38f/0xa60 [btrfs]
[ 7411.127207]  [] ? test_range_bit+0x30/0x180 [btrfs]
[ 7411.127212]  [] run_delalloc_range+0x2e8/0x350 [btrfs]
[ 7411.127219]  [] ? find_lock_delalloc_range+0x1a9/0x1e0 
[btrfs]
[ 7411.127222]  [] ? blk_queue_bio+0x2c1/0x330
[ 7411.127228]  [] __extent_writepage+0x2f4/0x760 [btrfs]

Here we fix it by avoiding joining transaction again if we have held
a transaction handle when allocating chunk in find_free_extent().



So I just put that warning there to see if we were ever embedding 3
joins at a time, not because it was an actual problem, I'd say just kill
the warning.  Thanks,
We need keep @orgin_rsv and restore it when ending transaction. So we'd 
better

not embed more than 2 joins now.

Thanks,
Wang


Josef

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: discard synchronous on most SSDs?

2014-03-14 Thread Duncan

Marc MERLIN posted on Thu, 13 Mar 2014 22:17:50 -0700 as excerpted:

> On Thu, Mar 13, 2014 at 09:39:02PM -0600, Chris Murphy wrote:
>> 
>> On Mar 13, 2014, at 8:11 PM, Marc MERLIN  wrote:
>> 
>> > On Sun, Mar 09, 2014 at 11:33:50AM +, Hugo Mills wrote:
>> >> discard is, except on the very latest hardware, a synchronous
>> >> command (it's a limitation of the SATA standard), and therefore
>> >> results in very very poor performance.
>> > 
>> > Interesting. How do I know if a given SSD will hang on discard?
>> > Is a Samsung EVO 840 1TB SSD latest hardware enough, or not? :)
>> 
>> smartctl -a or -x will tell you what SATA revision is in place. The
>> queued trim support is in SATA Rev 3.1. I'm not certain if this
>> requires only the drive to support that revision level, or both
>> controller and drive.
> 
> I'm not sure I'm seeing this, which field is that?

> ATA Version is:   8
> ATA Standard is:  ATA-8-ACS revision 4c

Your drive didn't report it, but here, I have SATA fields as well, in 
addition to the ATA fields:

Here's the fields from my Corsair Neutron SSDs:

ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 2.5, 6.0 Gb/s

Here's the fields from my Seagate 500-gig 2.5-inch spinning rust:

ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 3.0 Gb/s

(More about that below.)

Smartctl version here is 6.2 2013-07-26 r3841, according to the output. 
(I'm running gentoo/~amd64 FWIW so it's a local-build). You snipped that 
bit of your output so I can't compare.

But it may also depend on whether smartctl auto-detected and used the ATA 
or the SCSI (or something else) command set and how your devices are 
actually connected, plus BIOS settings, etc.  See the manpage 
documentation for the -d TYPE (--device=TYPE) option and the ATA/SCSI/SAT 
discussion rather further down the manpage for more.

Here I have direct SATA connections with the BIOS set to AHCI mode and am 
thus using the kernel's AHCI drivers, since that's the most common SATA 
chipset standard these days, thus increasing portability given my 
monolithic kernel build.

smartctl's -d test reports an original guess of scsi, changed to sat 
after detection.

Of course connection via USB bridge or the like complicates things 
considerably.

Meanwhile, SATA 2.5, 6 Gb/s on the SSDs, SATA 2.6, 3 Gb/s on the spinning 
rust?  WTF?  The SSDs have SATA 2.5 but 6 Gb/s while the spinning rust 
has a later 2.6 but only 3 Gb/s (tho of course on a mechanical drive the 
bus speed won't be the bottleneck)?  Now I'm confused.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Incremental backup for a raid1

2014-03-14 Thread Duncan

Michael Schuerig posted on Fri, 14 Mar 2014 09:56:20 +0100 as excerpted:

[Duncan posted...]

>> 3) Disconnect the backup device(s).  (Don't btrfs device delete, this
>> would remove the copy.  Just disconnect.)

Hmm...  Looking back at what I wrote...

Presumably either have the filesystem unmounted for the disconnect (and 
ideally, the system off, tho with modern drives in theory that's not an 
issue, but still good if it can be done), or at least remounted read-only.

I had guessed that was implicit, but making it explicit is probably best 
all around, just in case.  At least I can rest better with it, having 
made that explicit.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

UOB-X1H: Message..

2014-03-14 Thread Cham Tao Soon



I have proposal for you.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: How to view transaction log chronologically, human-readable?

2014-03-14 Thread Marcel Partap

[...]
Theoretically, there should be someone on this mailing list capable of
answering this question, no?
Please feel invited to share your insights ;)
#Regards



On 01/03/14 02:21, Marcel Partap wrote:
> Dear BTFRS devs,
> I have a 1TB btrfs volume mounted read-only since two years because I
> deleted a bunch of files and didn't want to give up on them.
> Now with latest btrfs-find-root and btrfs restore --dry-run -t in a
> loop, I generated the full list of files contained in the last several
> hundred root trees. However, diffing these, I find the current one being
> the same until 94 root trees back, and the ones before contain earlier
> changes. Maybe by my own fault that is..whatever.
> 
> Is there a way to just view the transaction history in a human-readable way?
> 
> #Regards
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Incremental backup for a raid1

2014-03-14 Thread Michael Schuerig

On Friday 14 March 2014 06:42:27 Duncan wrote:
> N-way-mirroring is actually my most hotly anticipated feature for a 
> different reason[2], but for you it would work like this:
> 
> 1) Setup the 3-way (or 4-way if preferred) mirroring and balance to 
> ensured copies of all data on all devices.
> 
> 2) Optionally scrub to ensure the integrity of all copies.
> 
> 3) Disconnect the backup device(s).  (Don't btrfs device delete, this 
> would remove the copy.  Just disconnect.)
> 
> 4) Store the backups.
> 
> 5) Periodically get them out and reconnect.
> 
> 6) Rebalance to update.  (Since the devices remain members of the
> mirror,  simply outdated, the balance should only update, not rewrite
> the entire thing.)
> 
> 7) Optionally scrub to verify.
> 
> 8) Repeat steps 3-7 as necessary.

Judging from your description, N-way mirroring is (going to be) exactly 
what I was hoping for.

Michael

-- 
Michael Schuerig
mailto:mich...@schuerig.de
http://www.schuerig.de/michael/

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Btrfs: remove transaction from send

2014-03-14 Thread Hugo Mills

On Thu, Mar 13, 2014 at 10:16:28PM +, Hugo Mills wrote:
> On Thu, Mar 13, 2014 at 03:42:13PM -0400, Josef Bacik wrote:
> > Lets try this again.  We can deadlock the box if we send on a box and try to
> > write onto the same fs with the app that is trying to listen to the send 
> > pipe.
> > This is because the writer could get stuck waiting for a transaction commit
> > which is being blocked by the send.  So fix this by making sure looking at 
> > the
> > commit roots is always going to be consistent.  We do this by keeping track 
> > of
> > which roots need to have their commit roots swapped during commit, and then
> > taking the commit_root_sem and swapping them all at once.  Then make sure we
> > take a read lock on the commit_root_sem in cases where we search the commit 
> > root
> > to make sure we're always looking at a consistent view of the commit roots.
> > Previously we had problems with this because we would swap a fs tree commit 
> > root
> > and then swap the extent tree commit root independently which would cause 
> > the
> > backref walking code to screw up sometimes.  With this patch we no longer
> > deadlock and pass all the weird send/receive corner cases.  Thanks,
> 
>There's something still going on here. I managed to get about twice
> as far through my test as I had before, but I again got an "unexpected
> EOF in stream", with btrfs send returning 1. As before, I have this in
> syslog:
> 
> Mar 13 22:09:12 s_src@amelia kernel: BTRFS error (device sda2): did not find 
> backref in send_root. inode=1786631, offset=825257984, disk_byte=36504023040 
> found extent=36504023040\x0a
> 
>So, on the evidence of one data point (I'll have another one when I
> wake up tomorrow morning), this has made the problem harder to trigger
> but it's still possible.

   Data point two has arrived, and it's gone boom at about the same
point. The first failed at:
2014-03-13 22:09:11,749INFO Read 7247356514 bytes total
and the second at:
2014-03-14 03:53:46,990INFO Read 7247357071 bytes total
at approximately 1h45 into the process. The boot and home subvols have
been OK, and have been backing up happily all this time, but both are
smaller than the (~10 GiB) root subvol.

   I can add a load of data to /home and see if the problem happens
with a larger send size, or if it's just the process writing to a
subvol that has the snapshot being sent that causes it.

   The interesting thing here is that the error seems to be fairly
reliably in the same place (more or less). Before this patch, I was
seeing lockups (or EOF, with the earlier version of this patch) at
approximately 3.6-3.8 GB. Now it looks like it's going to be 7.2 GB.

   At least it's not locking up any more, just dying noisily (which is
marginally preferable).

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 65E74AC0 from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
   --- Hail and greetings.  We are a flat-pack invasion force from ---   
 Planet Ikea. We come in pieces. 

signature.asc
Description: Digital signature

Re: discard synchronous on most SSDs?

2014-03-14 Thread Chris Samuel

Hi Marc,

On Thu, 13 Mar 2014 10:17:50 PM Marc MERLIN wrote:

> I'm not sure I'm seeing this, which field is that?

I *think* you want smartctl -i instead, and look for the field that says 
something like:

ATA Version is:   ATA8-ACS, ACS-2 T13/2015-D revision 3

So if my understanding is correct that says it's just rev. 3.0 so TRIM for 
this is synchronous.

Good luck!
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC



signature.asc
Description: This is a digitally signed message part.

Re: discard synchronous on most SSDs?

2014-03-14 Thread Chris Samuel

On Thu, 13 Mar 2014 09:39:02 PM Chris Murphy wrote:

> smartctl -a or -x will tell you what SATA revision is in place. The queued
> trim support is in SATA Rev 3.1. I'm not certain if this requires only the
> drive to support that revision level, or both controller and drive.

Both I'd say as I believe it's the controller that has to issue it to the 
drive, and the drive needs to understand it.

cheers,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC

signature.asc
Description: This is a digitally signed message part.

38 matches

Mail list logo