Re: [zfs-discuss] Spare drive inherited cksum errors?

2012-05-29 Thread Stephan Budach

Hi Richard,

Am 29.05.12 06:54, schrieb Richard Elling:


On May 28, 2012, at 9:21 PM, Stephan Budach wrote:


Hi all,

just to wrap this issue up: as FMA didn't report any other error than 
the one which led to the degradation of the one mirror, I detached 
the original drive from the zpool which flagged the mirror vdev as 
ONLINE (although there was still a cksum error count of 23 on the 
spare drive).


You showed the result of the FMA diagnosis, but not the error reports.
One feature of the error reports on modern Solaris is that the 
expected and reported

bit images are described, showing the nature and extent of the corruption.

Are you referring to these errors:

root@solaris11c:~# fmdump -e -u f0601f5f-cb8b-67bc-bd63-e71948ea8428
TIME CLASS
Mai 27 10:24:23.3654 ereport.fs.zfs.checksum
Mai 27 10:24:23.3652 ereport.fs.zfs.checksum
Mai 27 10:24:23.3650 ereport.fs.zfs.checksum
Mai 27 10:24:23.3648 ereport.fs.zfs.checksum
Mai 27 10:24:23.3646 ereport.fs.zfs.checksum
Mai 27 10:24:23.2696 ereport.fs.zfs.checksum
Mai 27 10:24:23.2694 ereport.fs.zfs.checksum
Mai 27 10:24:23.2692 ereport.fs.zfs.checksum
Mai 27 10:24:23.2690 ereport.fs.zfs.checksum
Mai 27 10:24:23.2688 ereport.fs.zfs.checksum
Mai 27 10:24:23.2686 ereport.fs.zfs.checksum

And to pick one in detail:

root@solaris11c:~# fmdump -eV -u f0601f5f-cb8b-67bc-bd63-e71948ea8428
TIME   CLASS
Mai 27 2012 10:24:23.365451280 ereport.fs.zfs.checksum
nvlist version: 0
class = ereport.fs.zfs.checksum
ena = 0xdfb23b0bc971
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = zfs
pool = 0x855ebc6738ef6dd6
vdev = 0x52e3ca377dbdbec9
(end detector)

pool = obelixData
pool_guid = 0x855ebc6738ef6dd6
pool_context = 0
pool_failmode = wait
vdev_guid = 0x52e3ca377dbdbec9
vdev_type = disk
vdev_path = /dev/dsk/c9t211378AC02F4d9s0
vdev_devid = id1,sd@n2047001378ac02f4/a
parent_guid = 0x695bf14bdabd6714
parent_type = mirror
zio_err = 50
zio_offset = 0x2d8b974600
zio_size = 0x2
zio_objset = 0x81ea9
zio_object = 0x5594
zio_level = 0
zio_blkid = 0x3c
cksum_expected = 0x12869460bd5d 0x49e4661395e6973 
0xc974c2622ce7a035 0x81fe9ef14082a245
cksum_actual = 0x1bba2b185478 0x707883eac587dd3 
0x54de998365cc6a8d 0x6822e5f4add45237

cksum_algorithm = fletcher4
bad_ranges = 0x0 0x2
bad_ranges_min_gap = 0x8
bad_range_sets = 0x357a5
bad_range_clears = 0x3935b
bad_set_histogram = 0x8f3 0xdd4 0x52c 0x13d0 0xd76 0xea0 0xec1 
0x100f 0x8f0 0xdc7 0x51e 0x13e7 0xd6b 0xe87 0xf30 0xf9c 0x8cd 0xddc 
0x51a 0x1458 0xd93 0xf0a 0xf04 0x102d 0x8b4 0xdea 0x51a 0x141d 0xdd3 
0xefc 0xf18 0x1003 0x8bc 0xde9 0x52f 0x13a4 0xdd9 0xf07 0xea2 0x100d 
0x8c1 0xdf4 0x4e6 0x1368 0xdce 0xed9 0xf27 0x1002 0x8bf 0xdf4 0x4fe 
0x1396 0xd7d 0xee0 0xf2b 0xfcc 0x8d8 0xdd7 0x4fc 0x13b8 0xd8e 0xe8b 
0xedb 0x100e
bad_cleared_histogram = 0x0 0x46 0x211a 0xc77 0x124f 0x1146 
0x113b 0x1020 0x0 0x35 0x20df 0xc9f 0x12dc 0x110c 0x10fc 0x1018 0x0 0x37 
0x2103 0xcbb 0x12a9 0x113d 0x1100 0xf8d 0x0 0x35 0x210d 0xc6e 0x121a 
0x1171 0x108f 0x1020 0x0 0x46 0x20ec 0xc3f 0x12ba 0x10ce 0x1172 0x1009 
0x0 0x47 0x20a4 0xc5e 0x129f 0x1102 0x112e 0x1031 0x0 0x4a 0x20d1 0xc64 
0x126b 0x1159 0x111c 0x1074 0x0 0x3a 0x20ed 0xc5b 0x1245 0x1160 0x111c 0xfc0

__ttl = 0x1
__tod = 0x4fc1e4b7 0x15c85810

They were all from the same vdev_path and ranged through these block IDs:

zio_blkid = 0x3c
zio_blkid = 0x3e
zio_blkid = 0x40
zio_blkid = 0x3a
zio_blkid = 0x3d
zio_blkid = 0xf
zio_blkid = 0xc
zio_blkid = 0x10
zio_blkid = 0x12
zio_blkid = 0x14
zio_blkid = 0x11


I really was a bit surprised by the cksum errors on the spare drive, 
especially when no errors had been logged for the spare drive while it 
was resilvering.


We'll see what the scrub will tell us.

Thanks,
budy
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Advanced Format HDD's - are we there yet? (or - how to buy a drive that won't be teh sux0rs on zfs)

2012-05-29 Thread John Martin

On 05/28/12 08:48, Nathan Kroenert wrote:


Looking to get some larger drives for one of my boxes. It runs
exclusively ZFS and has been using Seagate 2TB units up until now (which
are 512 byte sector).

Anyone offer up suggestions of either 3 or preferably 4TB drives that
actually work well with ZFS out of the box? (And not perform like
rubbish)...

I'm using Oracle Solaris 11 , and would prefer not to have to use a
hacked up zpool to create something with ashift=12.


Are you replacing a failed drive or creating a new pool?

I had a drive in a mirrored pool recently fail.  Both
drives were 1TB Seagate ST310005N1A1AS-RK with 512 byte sectors.
All the 1TB Seagate boxed drives I could find with the same
part number on the box (with factory seals in place)
were really ST1000DM003-9YN1 with 512e/4196p.  Just being
cautious, I ended up migrating the pools over to a pair
of the new drives.  The pools were created with ashift=12
automatically:

  $ zdb -C | grep ashift
  ashift: 12
  ashift: 12
  ashift: 12

Resilvering the three pools concurrently went fairly quickly:

  $ zpool status
scan: resilvered 223G in 2h14m with 0 errors on Tue May 22 21:02:32 
2012
scan: resilvered 145G in 4h13m with 0 errors on Tue May 22 23:02:38 
2012
scan: resilvered 153G in 3h44m with 0 errors on Tue May 22 22:30:51 
2012


What performance problem do you expect?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Advanced Format HDD's - are we there yet? (or - how to buy a drive that won't be teh sux0rs on zfs)

2012-05-29 Thread bofh
On Tue, May 29, 2012 at 6:54 AM, John Martin  wrote:
>  $ zdb -C | grep ashift
>              ashift: 12
>              ashift: 12
>              ashift: 12
>

That's interesting.  I just created a raidz3 pool out of 7x3TB drives.
 My drives were
ST3000DM001-9YN1
Hitachi HDS72303
Hitachi HDS72303
ST3000DM001-9YN1
Hitachi HDS5C303
Hitachi HDS5C303
ST33000651AS

ashift:9  is that standard?  I did nothing but plug them in and zpool
create.  Seem to run pretty fast, I can have up to 400 MB/s writes
from /dev/zero... :)



-- 
http://www.glumbert.com/media/shift
http://www.youtube.com/watch?v=tGvHNNOLnCk
"This officer's men seem to follow him merely out of idle curiosity."
-- Sandhurst officer cadet evaluation.
"Securing an environment of Windows platforms from abuse - external or
internal - is akin to trying to install sprinklers in a fireworks
factory where smoking on the job is permitted."  -- Gene Spafford
learn french:  http://www.youtube.com/watch?v=30v_g83VHK4
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Spare drive inherited cksum errors?

2012-05-29 Thread Edward Ned Harvey
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Stephan Budach
> 
> Now, I will run a scrub once more to veryfy the zpool.

If you have a drive (or two drives) with bad sectors, they will only be
detected as long as the bad sectors get used.  Given that your pool is less
than 100% full, it means you might still have bad hardware going undetected,
if you pass your scrub.

You might consider creating a big file (dd if=/dev/zero of=bigfile.junk
bs=1024k) and then when you're out of disk space, scrub again.  (Obviously,
you would be unable to make new writes to pool as long as it's filled...)

And since certain types of checksum errors will only occur when you *change*
the bits on disk, repeat the same test.  rm bigfile.junk ; dd
if=/dev/urandom of=bigfile.junk bs=1024k   and then scrub again.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Advanced Format HDD's - are we there yet? (or - how to buy a drive that won't be teh sux0rs on zfs)

2012-05-29 Thread Nathan Kroenert

 Hi John,

Actually, last time I tried the whole AF (4k) thing, it's performance 
was worse than woeful.


But admittedly, that was a little while ago.

The drives were the seagate green barracuda IIRC, and performance for 
just about everything was 20MB/s per spindle or worse, when it should 
have been closer to 100MB/s when streaming. Things were worse still when 
doing random...


I'm actually looking to put in something larger than the 3*2TB drives 
(triple mirror for read perf) this pool has in it - preferably 3 * 4TB 
drives. (I don't want to put in more spindles - just replace the current 
ones...)


I might just have to bite the bullet and try something with current SW. :).

Nathan.


On 05/29/12 08:54 PM, John Martin wrote:

On 05/28/12 08:48, Nathan Kroenert wrote:


Looking to get some larger drives for one of my boxes. It runs
exclusively ZFS and has been using Seagate 2TB units up until now (which
are 512 byte sector).

Anyone offer up suggestions of either 3 or preferably 4TB drives that
actually work well with ZFS out of the box? (And not perform like
rubbish)...

I'm using Oracle Solaris 11 , and would prefer not to have to use a
hacked up zpool to create something with ashift=12.


Are you replacing a failed drive or creating a new pool?

I had a drive in a mirrored pool recently fail.  Both
drives were 1TB Seagate ST310005N1A1AS-RK with 512 byte sectors.
All the 1TB Seagate boxed drives I could find with the same
part number on the box (with factory seals in place)
were really ST1000DM003-9YN1 with 512e/4196p.  Just being
cautious, I ended up migrating the pools over to a pair
of the new drives.  The pools were created with ashift=12
automatically:

  $ zdb -C | grep ashift
  ashift: 12
  ashift: 12
  ashift: 12

Resilvering the three pools concurrently went fairly quickly:

  $ zpool status
scan: resilvered 223G in 2h14m with 0 errors on Tue May 22 
21:02:32 2012
scan: resilvered 145G in 4h13m with 0 errors on Tue May 22 
23:02:38 2012
scan: resilvered 153G in 3h44m with 0 errors on Tue May 22 
22:30:51 2012


What performance problem do you expect?


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Advanced Format HDD's - are we there yet? (or - how to buy a drive that won't be teh sux0rs on zfs)

2012-05-29 Thread John Martin

On 05/29/12 08:35, Nathan Kroenert wrote:

Hi John,

Actually, last time I tried the whole AF (4k) thing, it's performance
was worse than woeful.

But admittedly, that was a little while ago.

The drives were the seagate green barracuda IIRC, and performance for
just about everything was 20MB/s per spindle or worse, when it should
have been closer to 100MB/s when streaming. Things were worse still when
doing random...

I'm actually looking to put in something larger than the 3*2TB drives
(triple mirror for read perf) this pool has in it - preferably 3 * 4TB
drives. (I don't want to put in more spindles - just replace the current
ones...)

I might just have to bite the bullet and try something with current SW. :).



Raw read from one of the mirrors:

#  timex dd if=/dev/rdsk/c0t2d0s2 of=/dev/null bs=1024000 count=1
1+0 records in
1+0 records out

real  49.26
user   0.01
sys0.27


filebench filemicro_seqread reports an impossibly high number (4GB/s)
so the ARC is likely handling all reads.

The label on the boxes I bought say:

  1TB 32MB INTERNAL KIT 7200
  ST310005N1A1AS-RK
  S/N: ...
  PN:9BX1A8-573

The drives in the box were really
ST1000DM003-9YN162 with 64MB of cache.
I have multiple pools on each disk so the
cache should be disabled.  The drive reports
512 byte logical sectors and 4096 physical sectors.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Advanced Format HDD's - are we there yet? (or - how to buy a drive that won't be teh sux0rs on zfs)

2012-05-29 Thread John Martin

On 05/29/12 07:26, bofh wrote:


ashift:9  is that standard?


Depends on what the drive reports as physical sector size.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Advanced Format HDD's - are we there yet? (or - how to buy a drive that won't be teh sux0rs on zfs)

2012-05-29 Thread Casper . Dik


>The drives were the seagate green barracuda IIRC, and performance for 
>just about everything was 20MB/s per spindle or worse, when it should 
>have been closer to 100MB/s when streaming. Things were worse still when 
>doing random...

It is possible that your partitions weren't aligned at 4K and that will 
give serious issues with those drives (Solaris now tries to make sure that 
all partitions are on 4K boundaries or makes sure that the zpool dev_t is 
aligned to 4K.

Casper

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Advanced Format HDD's - are we there yet? (or - how to buy a drive that won't be teh sux0rs on zfs)

2012-05-29 Thread Jim Klimov

2012-05-29 16:35, Nathan Kroenert wrote:

Hi John,

Actually, last time I tried the whole AF (4k) thing, it's performance
was worse than woeful.

But admittedly, that was a little while ago.

The drives were the seagate green barracuda IIRC, and performance for
just about everything was 20MB/s per spindle or worse, when it should
have been closer to 100MB/s when streaming. Things were worse still when
doing random...


On one hand, it is possible that being green, the drives aren't very
capable of fast IO - they had different design goals and tradeoffs.

But actually I was going to ask if you paid attention to partitioning?
At what offsets did your ZFS pool data start? Was that offset divisible
by 4KB (i.e. 256 512byte sectors as is default now vs 34 sectors of
the older default)?

If the drive had 4kb native sectors but the logical FS blocks were
not aligned with that, then every write IO would involve RMW of
many sectors (perhaps disk's caching might alleviate this for
streaming writes though).

Also note that ZFS IO often is random even for reads, since you
have to read metadata and file data often from different dispersed
locations. Again, OS caching helps statistically, when you have
much RAM dedicated to caching. Hmmm... did you use dedup in those
tests?- that is another source of performance degradation on smaller
machines (under tens of GBs of RAM).

HTH,
//Jim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Spare drive inherited cksum errors?

2012-05-29 Thread Cindy Swearingen

Hi--

You don't see what release this is but I think that seeing the checkum
error accumulation on the spare was a zpool status formatting bug that
I have seen myself. This is fixed in a later Solaris release.

Thanks,

Cindy

On 05/28/12 22:21, Stephan Budach wrote:

Hi all,

just to wrap this issue up: as FMA didn't report any other error than
the one which led to the degradation of the one mirror, I detached the
original drive from the zpool which flagged the mirror vdev as ONLINE
(although there was still a cksum error count of 23 on the spare drive).

Afterwards I attached the formerly degraded drive again to the good
drive in that mirror and let the resilver finish, which didn't show any
errors at all. Finally I detached the former spare drive and re-added it
as a spare drive again.

Now, I will run a scrub once more to veryfy the zpool.

Cheers,
budy
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Spare drive inherited cksum errors?

2012-05-29 Thread Richard Elling
On May 29, 2012, at 8:12 AM, Cindy Swearingen wrote:

> Hi--
> 
> You don't see what release this is but I think that seeing the checkum
> error accumulation on the spare was a zpool status formatting bug that
> I have seen myself. This is fixed in a later Solaris release.
> 

Once again, Cindy beats me to it :-)

Verify that the ereports are logged against the original device and not the
spare. If there are no ereports for the spare, then Cindy gets the prize :-)
 -- richard

> Thanks,
> 
> Cindy
> 
> On 05/28/12 22:21, Stephan Budach wrote:
>> Hi all,
>> 
>> just to wrap this issue up: as FMA didn't report any other error than
>> the one which led to the degradation of the one mirror, I detached the
>> original drive from the zpool which flagged the mirror vdev as ONLINE
>> (although there was still a cksum error count of 23 on the spare drive).
>> 
>> Afterwards I attached the formerly degraded drive again to the good
>> drive in that mirror and let the resilver finish, which didn't show any
>> errors at all. Finally I detached the former spare drive and re-added it
>> as a spare drive again.
>> 
>> Now, I will run a scrub once more to veryfy the zpool.
>> 
>> Cheers,
>> budy
>> ___
>> zfs-discuss mailing list
>> zfs-discuss@opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

--
ZFS Performance and Training
richard.ell...@richardelling.com
+1-760-896-4422







___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Spare drive inherited cksum errors?

2012-05-29 Thread Stephan Budach

Am 29.05.12 18:59, schrieb Richard Elling:

On May 29, 2012, at 8:12 AM, Cindy Swearingen wrote:


Hi--

You don't see what release this is but I think that seeing the checkum
error accumulation on the spare was a zpool status formatting bug that
I have seen myself. This is fixed in a later Solaris release.



Once again, Cindy beats me to it :-)

Verify that the ereports are logged against the original device and 
not the
spare. If there are no ereports for the spare, then Cindy gets the 
prize :-)

 -- richard


Yeah, I verified that the error reports were only logged against the 
original device, as I stated earlier, so Cindy wins! :)
If now I'd only knew how to get the actual S11 release level of my box. 
Neither uname -a nor cat /etc/release does give me a clue, since they 
display all the same data when run on different hosts that are on 
different updates.


Thanks,
budy


Thanks,

Cindy

On 05/28/12 22:21, Stephan Budach wrote:

Hi all,

just to wrap this issue up: as FMA didn't report any other error than
the one which led to the degradation of the one mirror, I detached the
original drive from the zpool which flagged the mirror vdev as ONLINE
(although there was still a cksum error count of 23 on the spare drive).

Afterwards I attached the formerly degraded drive again to the good
drive in that mirror and let the resilver finish, which didn't show any
errors at all. Finally I detached the former spare drive and re-added it
as a spare drive again.

Now, I will run a scrub once more to veryfy the zpool.

Cheers,
budy
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org 
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org 
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


--
ZFS Performance and Training
richard.ell...@richardelling.com 
+1-760-896-4422










___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Remedies for suboptimal mmap performance on zfs

2012-05-29 Thread Iwan Aucamp

On 05/29/2012 03:29 AM, Daniel Carosone wrote:
For the mmap case: does the ARC keep a separate copy, or does the vm 
system map the same page into the process's address space? If a 
separate copy is made, that seems like a potential source of many 
kinds of problems - if it's the same page then the whole premise is 
essentially moot and there's no "double caching". 


As far as I understand, for mmap case, is that the page cache is 
distinct from ARC (i.e. normal simplified flow for reading from disk 
with mmap is DSK->ARC->PageCache) - and only page cache gets mapped into 
processes address space - which is what results in the double caching.


I have two other general questions regarding page cache with ZFS + Solaris:
 - Does anything else except mmap still use the page cache ?
 - Is there a parameter similar to /proc/sys/vm/swappiness that can 
control how long unused pages in page cache stay in physical ram if 
there is no shortage of physical ram ? And if not how long will unused 
pages stay in page cache stay in physical ram given there is no shortage 
of physical ram ?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Spare drive inherited cksum errors?

2012-05-29 Thread Peter Jeremy
On 2012-May-29 22:04:39 +1000, Edward Ned Harvey 
 wrote:
>If you have a drive (or two drives) with bad sectors, they will only be
>detected as long as the bad sectors get used.  Given that your pool is less
>than 100% full, it means you might still have bad hardware going undetected,
>if you pass your scrub.

One way around this is to 'dd' each drive to /dev/null (or do a "long"
test using smartmontools).  This ensures that the drive thinks all
sectors are readable.

>You might consider creating a big file (dd if=/dev/zero of=bigfile.junk
>bs=1024k) and then when you're out of disk space, scrub again.  (Obviously,
>you would be unable to make new writes to pool as long as it's filled...)

I'm not sure how ZFS handles "no large free blocks", so you might need
to repeat this more than once to fill the disk.

This could leave your drive seriously fragmented.  If you do try this,
I'd recommend creating a snapshot first and then rolling back to it,
rather than just deleting the junk file.  Also, this (obviously) won't
work at all on a filesystem with compression enabled.

-- 
Peter Jeremy


pgpwHwVLcSvcK.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Advanced Format HDD's - are we there yet? (or - how to buy a drive that won't be teh sux0rs on zfs)

2012-05-29 Thread Richard Elling
On May 29, 2012, at 6:10 AM, Jim Klimov wrote:
> Also note that ZFS IO often is random even for reads, since you
> have to read metadata and file data often from different dispersed
> locations.

This is true for almost all other file systems, too. For example, in UFS, 
metadata is stored in fixed locations on the disk as defined when the
filesystem is created.
 -- richard

--
ZFS Performance and Training
richard.ell...@richardelling.com
+1-760-896-4422







___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Spare drive inherited cksum errors?

2012-05-29 Thread John D Groenveld
In message <4fc509e8.8080...@jvm.de>, Stephan Budach writes:
>If now I'd only knew how to get the actual S11 release level of my box. 
>Neither uname -a nor cat /etc/release does give me a clue, since they 
>display all the same data when run on different hosts that are on 
>different updates.

$ pkg info entire

John
groenv...@acm.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Advanced Format HDD's - are we there yet? (or - how to buy a drive that won't be teh sux0rs on zfs)

2012-05-29 Thread nathan

On 29/05/2012 11:10 PM, Jim Klimov wrote:

2012-05-29 16:35, Nathan Kroenert wrote:

Hi John,

Actually, last time I tried the whole AF (4k) thing, it's performance
was worse than woeful.

But admittedly, that was a little while ago.

The drives were the seagate green barracuda IIRC, and performance for
just about everything was 20MB/s per spindle or worse, when it should
have been closer to 100MB/s when streaming. Things were worse still when
doing random...


On one hand, it is possible that being green, the drives aren't very
capable of fast IO - they had different design goals and tradeoffs.


Indeed! I just wasn't expecting it to be so profound.

But actually I was going to ask if you paid attention to partitioning?
At what offsets did your ZFS pool data start? Was that offset divisible
by 4KB (i.e. 256 512byte sectors as is default now vs 34 sectors of
the older default)?
It was. Actually I tried it in a variety of ways, including auto EFI 
partition (zpool create with the whole disk), using SMI label, and 
trying a variety of tricks with offsets. Again, it was a while ago - 
before the time of the SD RMW fix...


If the drive had 4kb native sectors but the logical FS blocks were
not aligned with that, then every write IO would involve RMW of
many sectors (perhaps disk's caching might alleviate this for
streaming writes though).
Yep - that's what it *felt* like, and I didn't seem to be able to change 
that at the time.


Also note that ZFS IO often is random even for reads, since you
have to read metadata and file data often from different dispersed
locations. Again, OS caching helps statistically, when you have
much RAM dedicated to caching. Hmmm... did you use dedup in those
tests?- that is another source of performance degradation on smaller
machines (under tens of GBs of RAM).


At the time, I had 1TB of data, and 1TB of space... I'd expect that most 
of the data would have been written 'closeish' to sequential on disk, 
though I'll confess I only spent a short time looking at the 'physical' 
read/write locations being send down through the stack. (where the drive 
writes them - well.. That's different. ;)


I have been contacted off list by a few folks that have indicated 
success with current drives and current Solaris bits. I'm thinking that 
it might be time to take another run at it.


I'll let the list know the results. ;)

Cheers

Nathan.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Zombie damaged zpool won't die

2012-05-29 Thread Dave Pooser
In the beginning, I created a mirror named DumpFiles on FreeBSD. Later, I
decided to move those drives to a new Solaris 11 server-- but rather than
import the old pool I'd create a new pool. And I liked the DumpFiles name,
so I stuck with it.

Oops.

Now whenever I run zpool import, it shows a faulted zpool that I can't
import and can't delete:
root@backbone:/home/dpooser# zpool import
  pool: DumpFiles
id: 16375225052759912554
 state: FAULTED
status: The pool was last accessed by another system.
action: The pool cannot be imported due to damaged devices or data.
The pool may be active on another system, but can be imported using
the '-f' flag.
   see: http://www.sun.com/msg/ZFS-8000-EY
config:

DumpFilesFAULTED  corrupted data
  mirror-0   ONLINE
c8t5000C5001B03A749d0p0  ONLINE
c9t5000C5001B062211d0p0  ONLINE

I deleted the new DumpFiles pool; no change. The -f flag doesn't help with
the import, and I've deleted the zpool.cache and rebooted without any
luck. Any suggestions appreciated-- there is no data on those drives that
I'm worried about, but I'd like to get rid of that error.

-- 
Dave Pooser
Manager of Information Services
Alford Media  http://www.alfordmedia.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss