Re: trying to balance, filesystem keeps going read-only.

2015-11-02 Thread Austin S Hemmelgarn
On 2015-11-01 09:33, Ken Long wrote:
> I get a similar read-only status when I try to remove the drive from the 
> array..
> 
> Too bad the utility's function can not be slowed down.. to avoid
> triggering this error... ?
> 
Actually, there are a couple of ways you could do this.  The most reliable way 
to do it
(and arguably the only correct way) is to use the blkio cgroup to put bandwidth 
 or
IOPS limits on the process.  For authoritative info about how to do this, check
Documentation/cgroups/blkio-controller.txt in the Linux source tree.

If the issue really is the device not responding soon enough, you may also try 
increasing
the device timeout the kernel uses.  A udev rule like the following will 
increase the
timeout for all ATA/SCSI/USB (it says SCSI devices, but all ATA and USB devices 
get
routed through the SCSI subsystem anyway unless you're using really old and 
deprecated
drivers) devices to 150 seconds (2.5 minutes, which is reasonable for most 
non-enterprise
devices):

DRIVER=="sd", SUBSYSTEM=="scsi", ENV{DEVTYPE}=="scsi_device", 
ATTR{timeout}="150"



smime.p7s
Description: S/MIME Cryptographic Signature


Re: trying to balance, filesystem keeps going read-only.

2015-11-01 Thread Hugo Mills
On Sun, Nov 01, 2015 at 06:24:53AM -0500, Ken Long wrote:
> I have a file system of four 5TB drives. Well, one drive is 8TB with a
> 5TB partition.. the rest are 5TB drives.  I created the initial btrfs
> file system on on drive. rsync'd data to it. added another drive.
> rsync'd data. added a third drive, rsync'd data. Added a four drive,
> trying to balance. The file system gets an error and I have to reboot
> to get the file system out of read only.
> 
> I dont think it is hardware issue..but It could be...  or it could be
> some kind bug in btrfs?

   Looks very much like a hardware error to me. This stuff:

> [64947.160961] ata10.00: exception Emask 0x0 SAct 0x7fff SErr 0x0
> action 0x6 frozen
> [64947.160966] ata10.00: failed command: WRITE FPDMA QUEUED
> [64947.160970] ata10.00: cmd 61/c0:00:38:8a:1d/0f:00:0c:00:00/40 tag 0
> ncq 2064384 out
> res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask
> 0x4 (timeout)

is coming from the ATA layer, a couple of layers below btrfs, and
would definitely indicate some kind of issue with the hardware.

> [66025.199406] ata10: softreset failed (1st FIS failed)
> [66025.199417] ata10: hard resetting link
> [66030.407703] ata10: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
> [66030.407713] ata10.00: link online but device misclassified
> [66030.407746] ata10: EH complete
> [66030.408360] sd 9:0:0:0: [sdg] tag#16 FAILED Result:
> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> [66030.408363] sd 9:0:0:0: [sdg] tag#16 CDB: Write(16) 8a 00 00 00 00
> 00 09 a4 bf 80 00 00 49 80 00 00
> [66030.408365] blk_update_request: I/O error, dev sdg, sector 161791872
> [66030.408369] BTRFS: bdev /dev/sdg errs: wr 1, rd 0, flush 0, corrupt 0, gen > 0
> [66030.408439] BTRFS: bdev /dev/sdg errs: wr 2, rd 0, flush 0, corrupt 0, gen > 0
> [66030.408537] BTRFS: bdev /dev/sdg errs: wr 3, rd 0, flush 0, corrupt 0, gen > 0
> [66030.408643] BTRFS: bdev /dev/sdg errs: wr 4, rd 0, flush 0, corrupt 0, gen > 0
> [66030.408768] BTRFS: bdev /dev/sdg errs: wr 5, rd 0, flush 0, corrupt 0, gen > 0
> [66030.408880] BTRFS: bdev /dev/sdg errs: wr 6, rd 0, flush 0, corrupt 0, gen > 0
> [66030.408985] BTRFS: bdev /dev/sdg errs: wr 7, rd 0, flush 0, corrupt 0, gen > 0
> [66030.409082] BTRFS: bdev /dev/sdg errs: wr 8, rd 0, flush 0, corrupt 0, gen > 0
> [66030.409180] BTRFS: bdev /dev/sdg errs: wr 9, rd 0, flush 0, corrupt 0, gen > 0
> [66030.409284] BTRFS: bdev /dev/sdg errs: wr 10, rd 0, flush 0, corrupt 0, 
> gen 0
> [66030.409847] sd 9:0:0:0: [sdg] tag#17 FAILED Result:
> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> [66030.409850] sd 9:0:0:0: [sdg] tag#17 CDB: Write(16) 8a 00 00 00 00
> 00 09 a5 09 00 00 00 44 40 00 00
> [66030.409851] blk_update_request: I/O error, dev sdg, sector 161810688
> [66030.411235] sd 9:0:0:0: [sdg] tag#18 FAILED Result:
> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> [66030.411238] sd 9:0:0:0: [sdg] tag#18 CDB: Write(16) 8a 00 00 00 00
> 00 09 a5 4d 40 00 00 49 80 00 00
> [66030.411239] blk_update_request: I/O error, dev sdg, sector 161828160
> [66030.412695] sd 9:0:0:0: [sdg] tag#19 FAILED Result:
> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> [66030.412697] sd 9:0:0:0: [sdg] tag#19 CDB: Write(16) 8a 00 00 00 00
> 00 09 a5 96 c0 00 00 49 80 00 00
> [66030.412699] blk_update_request: I/O error, dev sdg, sector 161846976
> [66030.414113] sd 9:0:0:0: [sdg] tag#20 FAILED Result:
> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> [66030.414115] sd 9:0:0:0: [sdg] tag#20 CDB: Write(16) 8a 00 00 00 00
> 00 09 a5 e0 40 00 00 1f 80 00 00
> [66030.414117] blk_update_request: I/O error, dev sdg, sector 161865792
> [66030.414755] sd 9:0:0:0: [sdg] tag#21 FAILED Result:
> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> [66030.414758] sd 9:0:0:0: [sdg] tag#21 CDB: Write(16) 8a 00 00 00 00
> 00 09 a5 ff c0 00 00 15 00 00 00
> [66030.414759] blk_update_request: I/O error, dev sdg, sector 161873856
> [66030.415205] sd 9:0:0:0: [sdg] tag#22 FAILED Result:
> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> [66030.415207] sd 9:0:0:0: [sdg] tag#22 CDB: Write(16) 8a 00 00 00 00
> 00 09 a6 14 c0 00 00 44 40 00 00
> [66030.415208] blk_update_request: I/O error, dev sdg, sector 161879232
> [66030.416562] sd 9:0:0:0: [sdg] tag#23 FAILED Result:
> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> [66030.416564] sd 9:0:0:0: [sdg] tag#23 CDB: Write(16) 8a 00 00 00 00
> 00 09 a6 59 00 00 00 44 40 00 00
> [66030.416572] blk_update_request: I/O error, dev sdg, sector 161896704
> [66030.417922] sd 9:0:0:0: [sdg] tag#24 FAILED Result:
> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> [66030.417924] sd 9:0:0:0: [sdg] tag#24 CDB: Write(16) 8a 00 00 00 00
> 00 09 a6 9d 40 00 00 49 80 00 00
> [66030.417926] blk_update_request: I/O error, dev sdg, sector 161914176
> [66030.419365] sd 9:0:0:0: [sdg] tag#25 FAILED Result:
> hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
> [66030.419368] sd 9:0:0:0: [sdg] tag#25 CDB: Write(16) 8a 00 00 00 00
> 00 09 a6 e6 c0 00 00 49 80 00 00

   Here, we've got 

Re: trying to balance, filesystem keeps going read-only.

2015-11-01 Thread Ken Long
I get a similar read-only status when I try to remove the drive from the array..

Too bad the utility's function can not be slowed down.. to avoid
triggering this error... ?

I had some success putting data *onto* the drive by croning sync every
two seconds in a different terminal.

Doesn't seem to be fixed yet..
https://bugzilla.kernel.org/show_bug.cgi?id=93581


On Sun, Nov 1, 2015 at 9:17 AM, Roman Mamedov  wrote:
> On Sun, 1 Nov 2015 09:07:08 -0500
> Ken Long  wrote:
>
>> Yes, the one drive is that Seagate 8TB drive..
>>
>> Smart tools doesn't show anything outrageous or obvious in hardware.
>>
>> Is there any other info I can provide to isolate, troubleshoot further?
>>
>> I'm not sure how to correlate the dmesg message to a specific drive,
>> SATA cable etc..
>
> See this discussion: http://www.spinics.net/lists/linux-btrfs/msg48054.html
>
> My guess is these drives need to do a lot of housekeeping internally,
> especially during heavy write load or random writes, and do not reply to the
> host machine in time, which translates into those "frozen [...] failed
> command: WRITE FPDMA QUEUED" failures.
>
> I did not follow the issue closely enough to know if there's a solution yet, 
> or
> even if this is specific to Btrfs or to GNU/Linux in general. Maybe your best
> bet would be to avoid using that drive in your Btrfs array altogether for the
> time being.
>
> --
> With respect,
> Roman
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: trying to balance, filesystem keeps going read-only.

2015-11-01 Thread Roman Mamedov
On Sun, 1 Nov 2015 06:24:53 -0500
Ken Long  wrote:

> Well, one drive is 8TB with a 5TB partition.

Is this by any chance a Seagate "SMR" drive? From what I remember seeing on
the list, those do not work well with Btrfs currently, with symptoms very
similar to what you're seeing.

-- 
With respect,
Roman


signature.asc
Description: PGP signature


Re: trying to balance, filesystem keeps going read-only.

2015-11-01 Thread Ken Long
Yes, the one drive is that Seagate 8TB drive..

Smart tools doesn't show anything outrageous or obvious in hardware.

Is there any other info I can provide to isolate, troubleshoot further?

I'm not sure how to correlate the dmesg message to a specific drive,
SATA cable etc..



On Sun, Nov 1, 2015 at 8:48 AM, Roman Mamedov  wrote:
> On Sun, 1 Nov 2015 06:24:53 -0500
> Ken Long  wrote:
>
>> Well, one drive is 8TB with a 5TB partition.
>
> Is this by any chance a Seagate "SMR" drive? From what I remember seeing on
> the list, those do not work well with Btrfs currently, with symptoms very
> similar to what you're seeing.
>
> --
> With respect,
> Roman
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: trying to balance, filesystem keeps going read-only.

2015-11-01 Thread Roman Mamedov
On Sun, 1 Nov 2015 09:07:08 -0500
Ken Long  wrote:

> Yes, the one drive is that Seagate 8TB drive..
> 
> Smart tools doesn't show anything outrageous or obvious in hardware.
> 
> Is there any other info I can provide to isolate, troubleshoot further?
> 
> I'm not sure how to correlate the dmesg message to a specific drive,
> SATA cable etc..

See this discussion: http://www.spinics.net/lists/linux-btrfs/msg48054.html

My guess is these drives need to do a lot of housekeeping internally,
especially during heavy write load or random writes, and do not reply to the
host machine in time, which translates into those "frozen [...] failed
command: WRITE FPDMA QUEUED" failures.

I did not follow the issue closely enough to know if there's a solution yet, or
even if this is specific to Btrfs or to GNU/Linux in general. Maybe your best
bet would be to avoid using that drive in your Btrfs array altogether for the
time being.

-- 
With respect,
Roman


signature.asc
Description: PGP signature