Re: [zfs-discuss] Snapshot recycle freezes system activity

2010-03-09 Thread Gary Mills
On Mon, Mar 08, 2010 at 03:18:34PM -0500, Miles Nordin wrote:
  gm == Gary Mills mi...@cc.umanitoba.ca writes:
 
 gm destroys the oldest snapshots and creates new ones, both
 gm recursively.
 
 I'd be curious if you try taking the same snapshots non-recursively
 instead, does the pause go away?  

I'm still collecting statistics, but that is one of the things I'd
like to try.

 Because recursive snapshots are special: they're supposed to
 atomically synchronize the cut-point across all the filesystems
 involved, AIUI.  I don't see that recursive destroys should be
 anything special though.
 
 gm Is it destroying old snapshots or creating new ones that
 gm causes this dead time?
 
 sortof seems like you should tell us this, not the other way
 around. :)  Seriously though, isn't that easy to test?  And I'm curious
 myself too.

Yes, that's another thing I'd like to try.  I'll just put a `sleep'
in the script between the two actions to see if the dead time moves
later in the day.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Snapshot recycle freezes system activity

2010-03-09 Thread Gary Mills
On Mon, Mar 08, 2010 at 01:23:10PM -0800, Bill Sommerfeld wrote:
 On 03/08/10 12:43, Tomas Ögren wrote:
 So we tried adding 2x 4GB USB sticks (Kingston Data
 Traveller Mini Slim) as metadata L2ARC and that seems to have pushed the
 snapshot times down to about 30 seconds.
 
 Out of curiosity, how much physical memory does this system have?

Mine has 64 GB of memory with the ARC limited to 32 GB.  The Cyrus
IMAP processes, thousands of them, use memory mapping extensively.
I don't know if this design affects the snapshot recycle behavior.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] rpool devaliases

2010-03-09 Thread Tony MacDoodle
Can I create a devalias to boot the other mirror similar to UFS?


Thanks
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] terrible ZFS performance compared to UFS on ramdisk (70% drop)

2010-03-09 Thread Ross Walker
On Mar 8, 2010, at 11:46 PM, ольга крыжановская olga.kryzh 
anov...@gmail.com wrote:



tmpfs lacks features like quota and NFSv4 ACL support. May not be the
best choice if such features are required.


True, but if the OP is looking for those features they are more then  
unlikely looking for an in-memory file system.


This would be more for something like temp databases in a RDBMS or a  
cache of some sort.


-Ross

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to verify ecc for ram is active and enabled?

2010-03-09 Thread R.G. Keen
Yay! Something where I can contribute! Iam a hardware
guy trying to live in a software world, but I think I know
how this one works. 

 The reason is that the vendor (ACER) of the mainboard
 says it is not supported, and I can not get into the
 bios any more, but osol boots fine and sees 8GB.
 Crucial says it's not supported because Acer says
 it's not supported...  This is an MCP78S based
 motherboard (apparently equivalent Asus and Gigabyte
 boards are _supported_ platforms for this
  memory)...
The chipset may support ECC memory, and reply just
fine to the OS and drivers that no errors have occurred, 
and the memory chips may check ECC and generate the
ECC error signal to the chip set, but if the motherboard
does not have a copper trace between the pin on the 
memory socket that connects to the ECC error pin 
on the memory DIMM and the pin on the chipset
that receives the error signal, the chip set will never 
hear the memory complain about ECC errors whether
they happen or not. The phone line is cut.

If the motherboard maker doesn't assure you it's connected
by telling you that explicitly, or worse yet says it's not
supported, chances are it's not supported. Support
for a memory DIMM does not necessarily mean that 
the ECC works, only that the regular memory works.

I did not buy a Gigabyte board for the home server I'm 
laboriously (for a hardware guy in a software land) getting
running, because although Gigabyte says they support
the ECC memory DIMMs, they do not have any BIOS 
means for enabling/disabling the ECC in BIOS, and that
tells me that they *tolerate* ECC DIMMs rather than
*using* the ECC functions. ASUS, for the same chipset
in my case, has a BIOS setting for enable/disable ECC
reporting, so they have at least considered. it.

I have the same issue coming up, because even if ASUS
lets you turn reporting on an off, that's NOT a guarantee
that the copper trace is there and all connected. 

I read in this forum a method for inducing ECC errors 
involving holding a tungsten incandescent bulb near
the DIMMs to induce errors. It's worth a search. I will
be doing that test when I get to the point where I have
the thing running well enough for the test to be meaningful.

R.G.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Recover rpool

2010-03-09 Thread D. Pinnock
I redirected the console to the serial port and managed to capture the panic 
information below:

SunOS Release 5.11 Version snv_111b 64-bit
Copyright 1983-2009 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.

panic[cpu0]/thread=ff0007c39c60: mutex_enter: bad mutex, lp=e8 
owner=f000ec62f000ec60 thread=ff0007c39c60

ff0007c38ca0 unix:mutex_panic+73 ()
ff0007c38d00 unix:mutex_vector_enter+446 ()
ff0007c38d50 genunix:kmem_slab_alloc+31 ()
ff0007c38db0 genunix:kmem_cache_alloc+130 ()
ff0007c38dd0 zfs:zio_buf_alloc+2c ()
ff0007c38e10 zfs:arc_get_data_buf+173 ()
ff0007c38e60 zfs:arc_buf_alloc+a2 ()
ff0007c38f00 zfs:arc_read_nolock+137 ()
ff0007c38fa0 zfs:arc_read+75 ()
ff0007c390d0 zfs:scrub_visitbp+161 ()
ff0007c391e0 zfs:scrub_visitbp+27c ()
ff0007c392f0 zfs:scrub_visitbp+21d ()
ff0007c39400 zfs:scrub_visitbp+21d ()
ff0007c39510 zfs:scrub_visitbp+21d ()
ff0007c39620 zfs:scrub_visitbp+21d ()
ff0007c39730 zfs:scrub_visitbp+21d ()
ff0007c39840 zfs:scrub_visitbp+432 ()
ff0007c39890 zfs:scrub_visit_rootbp+4f ()
ff0007c398f0 zfs:scrub_visitds+7e ()
ff0007c39aa0 zfs:dsl_pool_scrub_sync+126 ()
ff0007c39b10 zfs:dsl_pool_sync+192 ()
ff0007c39ba0 zfs:spa_sync+32a ()
ff0007c39c40 zfs:txg_sync_thread+265 ()
ff0007c39c50 unix:thread_start+8 ()

skipping system dump - no dump device configured
rebooting...


Can anyone tell me what is going wrong?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can you manually trigger spares?

2010-03-09 Thread Mark J Musante

On Mon, 8 Mar 2010, Tim Cook wrote:


Is there a way to manually trigger a hot spare to kick in?


Yes - just use 'zpool replace fserv 12589257915302950264 c3t6d0'.  That's 
all the fma service does anyway.


If you ever get your drive to come back online, the fma service should 
recognize that and resilver it, switching the spare back to AVAIL.



Regards,
markm
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] rpool devaliases

2010-03-09 Thread Robert Milkowski

On 09/03/2010 13:18, Tony MacDoodle wrote:

Can I create a devalias to boot the other mirror similar to UFS?



yes
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to verify ecc for ram is active and enabled?

2010-03-09 Thread Richard PALO
I'm curious to know whether the following output :

bash-4.0# echo memscrub_scans_done/U | mdb -k
memscrub_scans_done:
memscrub_scans_done:1985

means that Solaris considers ECC memory is effectively installed  (the fact 
that it is non-zero)?


I have installed unbuffered ECC memory (2x4GB crucial kit CT2KIT51272AA667).

The reason is that the vendor (ACER) of the mainboard says it is not supported, 
and I can not get into the bios any more, but osol boots fine and sees 8GB.  
Crucial says it's not supported because Acer says it's not supported...  This 
is an MCP78S based motherboard (apparently equivalent Asus and Gigabyte boards 
are _supported_ platforms for this memory)...

The following output from smbios:

0 78   SMB_TYPE_BIOS (BIOS information)

  Vendor: Phoenix Technologies, LTD
  Version String: R01-B0
  Release Date: 03/31/2009
  Address Segment: 0xe000
  ROM Size: 524288 bytes
  Image Size: 131072 bytes
  Characteristics: 0x7fcb9e90
SMB_BIOSFL_ISA (ISA is supported)
SMB_BIOSFL_PCI (PCI is supported)
SMB_BIOSFL_PLUGNPLAY (Plug and Play is supported)
SMB_BIOSFL_APM (APM is supported)
SMB_BIOSFL_FLASH (BIOS is Flash Upgradeable)
SMB_BIOSFL_SHADOW (BIOS shadowing is allowed)
SMB_BIOSFL_CDBOOT (Boot from CD is supported)
SMB_BIOSFL_SELBOOT (Selectable Boot supported)
SMB_BIOSFL_ROMSOCK (BIOS ROM is socketed)
SMB_BIOSFL_EDD (EDD Spec is supported)
SMB_BIOSFL_525_360K (int 0x13 5.25 360K floppy)
SMB_BIOSFL_525_12M (int 0x13 5.25 1.2M floppy)
SMB_BIOSFL_35_720K (int 0x13 3.5 720K floppy)
SMB_BIOSFL_35_288M (int 0x13 3.5 2.88M floppy)
SMB_BIOSFL_I5_PRINT (int 0x5 print screen svcs)
SMB_BIOSFL_I9_KBD (int 0x9 8042 keyboard svcs)
SMB_BIOSFL_I14_SER (int 0x14 serial svcs)
SMB_BIOSFL_I17_PRINTER (int 0x17 printer svcs)
SMB_BIOSFL_I10_CGA (int 0x10 CGA svcs)
  Characteristics Extension Byte 1: 0x33
SMB_BIOSXB1_ACPI (ACPI is supported)
SMB_BIOSXB1_USBL (USB legacy is supported)
SMB_BIOSXB1_LS120 (LS-120 boot is supported)
SMB_BIOSXB1_ATZIP (ATAPI ZIP drive boot is supported)
  Characteristics Extension Byte 2: 0x5
SMB_BIOSXB2_BBOOT (BIOS Boot Specification supported)
SMB_BIOSXB2_ETCDIST (Enable Targeted Content Distrib.)
  Version Number: 0.0
  Embedded Ctlr Firmware Version Number: 0.0

IDSIZE TYPE
1 78   SMB_TYPE_SYSTEM (system information)

  Manufacturer: Acer
  Product: Aspire X3200
  Version: R01-A3
  Serial Number: 9E3PM75C7P839053093003

  UUID: ----
  Wake-Up Event: 0x6 (power switch)
  SKU Number:  
  Family:  

IDSIZE TYPE
2 62   SMB_TYPE_BASEBOARD (base board)

  Manufacturer: Acer
  Product: WMCP78M
  Version: 
  Serial Number: 00
  Asset Tag:  
  Location Tag:  

  Chassis: 48
  Flags: 0x1
SMB_BBFL_MOTHERBOARD (board is a motherboard)
  Board Type: 0xa (motherboard)

IDSIZE TYPE
3 76   SMB_TYPE_CHASSIS (system enclosure or chassis)

  Manufacturer: Acer
  Version: 
  Serial Number: 00
  Asset Tag: 00

  OEM Data: 0x0
  Lock Present: N
  Chassis Type: 0x3 (desktop)
  Boot-Up State: 0x2 (unknown)
  Power Supply State: 0x2 (unknown)
  Thermal State: 0x2 (unknown)
  Chassis Height: 0u
  Power Cords: 0
  Element Records: 0

IDSIZE TYPE
4 101  SMB_TYPE_PROCESSOR (processor)

  Manufacturer: AMD
  Version: AMD Phenom(tm) 9550 Quad-Core Processor
  Serial Number:  
  Asset Tag:  
  Location Tag: Socket AM2 
  Part Number:  

  Family: 1 (other)
  CPUID: 0x178bfbff00100f23
  Type: 3 (central processor)
  Socket Upgrade: 4 (ZIF socket)
  Socket Status: Populated
  Processor Status: 1 (enabled)
  Supported Voltages: 1.2V
  External Clock Speed: Unknown
  Maximum Speed: 2200MHz
  Current Speed: 2200MHz
  L1 Cache: 8
  L2 Cache: 9
  L3 Cache: None

IDSIZE TYPE
8 33   SMB_TYPE_CACHE (processor cache)

  Location Tag: Internal Cache

  Level: 1
  Maximum Installed Size: 131072 bytes
  Installed Size: 131072 bytes
  Speed: Unknown
  Supported SRAM Types: 0x20
SMB_CAT_SYNC (synchronous)
  Current SRAM Type: 0x20 (synchronous)
  Error Correction Type: 2 (unknown)
  Logical Cache Type: 2 (unknown)
  Associativity: 2 (unknown)
  Mode: 1 (write-back)
  Location: 0 (internal)
  Flags: 0x1
SMB_CAF_ENABLED (enabled at boot time)

IDSIZE TYPE
9 33   SMB_TYPE_CACHE (processor cache)

  Location Tag: External Cache

  Level: 2
  Maximum Installed Size: 524288 bytes
  Installed Size: 524288 bytes
  Speed: Unknown
  Supported SRAM Types: 0x20
SMB_CAT_SYNC (synchronous)
  Current SRAM Type: 0x20 (synchronous)
  Error Correction Type: 2 (unknown)
  Logical Cache Type: 2 (unknown)
  Associativity: 2 (unknown)
  Mode: 1 (write-back)
  Location: 0 (internal)
  Flags: 0x1
SMB_CAF_ENABLED (enabled at boot time)


Re: [zfs-discuss] Should ZFS write data out when disk are idle

2010-03-09 Thread Damon Atkins
I am talking about having a write queue, which points to ready to write, full 
stripes.

Ready to write full stripes would be
*The last byte of the full stripe has been updated. 
*The file has been closed for writing. (Exception to the above rule)

I believe there is now a scheduler for ZFS, to handle reads and write conflicts.

For example on a large Multi-Gigabyte NVRAM array, the only big consideration 
is how big is the Fibre Channel pipe is and the limit on outstanding I/Os

But on SATA off the motherboard, then it is about how much RAM cache each disk 
has is a consideration as well as the speed of the SATA connection as well as 
the number of outstanding I/Os

When it comes time to do txg some of the record blocks (most of the full 128k 
ones) will have been written out already. If we have only written out full 
record blocks then there has been no performance loss.

Eventually a txg going to happen, eventually these full writes will need to 
happen, but if we can choose a less busy time for them all the better.

e.g. on a raidz with 5 disks, if I have 128x4 worth of data to write, lets 
write it.
   on a mirror if I have 128k worth to write, lets write it. (record size 
128k), or let it be a tunable for zpool, as some arrays (RAID5) like to have 
larger chunks of data.

Why wait for the txg if the disk are not being pressured for reads. Rather than 
a pause every 30 seconds.

Bob wrote : (I may not have explained it well enough)
It is not true that there is no cost though. Since ZFS uses COW,
this approach requires that new blocks be allocated and written at a
much higher rate. There is also an opportunity cost in that if a
read comes in while these continuous writes are occurring, the read
will be delayed.

At some stage a write needs to happen. **Full** writes have very small COW cost 
compare with small writes. As I said above I talking about a write of 4x128k on 
a 5 disk raidz before the write would happen early. 

There are many applications which continually write/overwrite file
content, or which update a file at a slow pace. For example, log
files are typically updated at a slow rate. Updating a block requires
reading it first (if it is not already cached in the ARC), which can
be quite expensive. By waiting a bit longer, there is a much better
chance that the whole block is overwritten, so zfs can discard the
existing block on disk without bothering to re-read it.

Apps which update at slow pace will not trigger the above early write, until 
they have at least written a record size worth of data, application which write 
slow than 128k (recordsize) in more than 30 secs will never trigger the early 
write on a mirrored disk or even a raidz setup.

What this will catch is the big writer of files greater than 128k (recordsize) 
on mirrored disk; and files larger than (4x128k) on RaidZ 5disks sets.

So that commands like dd if=x of=y bs=512k will not cause issues 
(pauses/delays) when the txg timeout. 

PS I already set zfs:zfs_write_limit_override and I would not recommend anyone 
to set this very low to get the above effect.

It's just an idea on how to prevent the delay effect, it may not be practical?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Recover rpool

2010-03-09 Thread D. Pinnock
When I boot from a snv133 live cd and attempt to import the rpool it panics 
with this output:

Sun Microsystems Inc.   SunOS 5.11  snv_133 February 2010

j...@opensolaris:~$ pfexec su
Mar  9 03:11:37 opensolaris su: 'su root' succeeded for jack on /dev/console
j...@opensolaris:~# zpool import -f -o ro -o failmode=continue -R /mnt rpool

panic[cpu1]/thread=ff00086e0c60: BAD TRAP: type=e (#pf Page fault) 
rp=ff00086dfe60 addr=278 occurred in module unix due to a NULL pointer 
dereference

sched: #pf Page fault
Bad kernel fault at addr=0x278
pid=0, pc=0xfb862b6b, sp=0xff00086dff58, eflags=0x10246
cr0: 8005003bpg,wp,ne,et,ts,mp,pe cr4: 6f8xmme,fxsr,pge,mce,pae,pse,de
cr2: 278cr3: c80cr8: c

rdi:  278 rsi:4 rdx: ff00086e0c60
rcx:0  r8:   40  r9:21d9a
rax:0 rbx:0 rbp: ff00086dffb0
r10:   7f6fc8 r11:   6e r12:0
r13:  278 r14:4 r15: ff01cfe27e08
fsb:0 gsb: ff01ccfa5080  ds:   4b
 es:   4b  fs:0  gs:  1c3
trp:e err:2 rip: fb862b6b
 cs:   30 rfl:10246 rsp: ff00086dff58
 ss:   38

ff00086dfd40 unix:die+dd ()
ff00086dfe50 unix:trap+177e ()
ff00086dfe60 unix:cmntrap+e6 ()
ff00086dffb0 unix:mutex_enter+b ()
ff00086dffd0 zfs:zio_buf_alloc+2c ()
ff00086e0010 zfs:arc_get_data_buf+173 ()
ff00086e0060 zfs:arc_buf_alloc+a2 ()
ff00086e0100 zfs:arc_read_nolock+12f ()
ff00086e01a0 zfs:arc_read+75 ()
ff00086e0230 zfs:scrub_prefetch+b9 ()
ff00086e02f0 zfs:scrub_visitbp+5f1 ()
ff00086e03b0 zfs:scrub_visitbp+6e3 ()
ff00086e0470 zfs:scrub_visitbp+6e3 ()
ff00086e0530 zfs:scrub_visitbp+6e3 ()
ff00086e05f0 zfs:scrub_visitbp+6e3 ()
ff00086e06b0 zfs:scrub_visitbp+6e3 ()
ff00086e0750 zfs:scrub_visitdnode+84 ()
ff00086e0810 zfs:scrub_visitbp+1a6 ()
ff00086e0860 zfs:scrub_visit_rootbp+4f ()
ff00086e08c0 zfs:scrub_visitds+7e ()
ff00086e0a80 zfs:dsl_pool_scrub_sync+163 ()
ff00086e0af0 zfs:dsl_pool_sync+25b ()
ff00086e0ba0 zfs:spa_sync+36f ()
ff00086e0c40 zfs:txg_sync_thread+24a ()
ff00086e0c50 unix:thread_start+8 ()
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] new video: George Wilson on ZFS Dedup

2010-03-09 Thread Deirdre Straughan
Brand new video! George Wilson on ZFS Dedup - Oracle Solaris Video 
http://bit.ly/b5MMpn


--

best regards,
Deirdré Straughan

Solaris Technical Content

blog: Un Posto al Sole http://blogs.sun.com/deirdre/


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] what to do when errors occur during scrub

2010-03-09 Thread Harry Putnam
[I hope this isn't a repost double whammy.  I posted this message
under `Message-ID: 87fx4ai5sp@newsguy.com' over 15 hrs ago but
it never appeared on my nntp server (gmane) far as I can see]

I'm a little at a loss here as to what to do about these two errors
that turned up during a scrub.

The discs involved are a matched pair in mirror mode.

zpool status -v z3 (wrapped for mail):
----   ---=---   -   
scrub: scrub completed after 1h48m with 2 errors on Mon Mar 8
10:26:49 2010 config:

  NAMESTATE READ WRITE CKSUM
  z3  ONLINE   0 0 2
mirror-0  ONLINE   0 0 4
  c5d0ONLINE   0 0 4
  c6d0ONLINE   0 0 4

   errors: Permanent errors have been detected in the following files:
   [NOTE: Edited to ease reading -ed -hp]
z3/proje...@zfs-auto-snap:monthly-2009-08-30-09:26:/Training/\
[... huge path snipped ...]/2_Database.mov

/t/bk-test-DiskDamage-021710_005252/rsnap/misc/hourly.4/\
[... huge path snipped ...]/es.utf-8.sug

----   ---=---   -   

Those are just two on disk files.

Can it be as simple as just deleting them?

Or is something more technical required.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] what to do when errors occur during scrub

2010-03-09 Thread Harry Putnam
I'm a little at a loss here as to what to do about these two errors
that turned up during a scrub.

The discs involved are a matched pair in mirror mode.

zpool status -v z3 (wrapped for mail):
----   ---=---   -   
scrub: scrub completed after 1h48m with 2 errors on Mon Mar 8
10:26:49 2010 config:

  NAMESTATE READ WRITE CKSUM
  z3  ONLINE   0 0 2
mirror-0  ONLINE   0 0 4
  c5d0ONLINE   0 0 4
  c6d0ONLINE   0 0 4

   errors: Permanent errors have been detected in the following files:
   [NOTE: Edited to ease reading -ed -hp]
z3/proje...@zfs-auto-snap:monthly-2009-08-30-09:26:/Training/\
[... huge path snipped ...]/2_Database.mov

/t/bk-test-DiskDamage-021710_005252/rsnap/misc/hourly.4/\
[... huge path snipped ...]/es.utf-8.sug

----   ---=---   -   

Those are just two on disk files.

Can it be as simple as just deleting them?

Or is something more technical required.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] terrible ZFS performance compared to UFS on ramdisk (70% drop)

2010-03-09 Thread Matt Cowger
Ross is correct - advanced OS features are not required here - just the ability 
to store a file - don’t even need unix style permissions

-Original Message-
From: zfs-discuss-boun...@opensolaris.org 
[mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Ross Walker
Sent: Tuesday, March 09, 2010 6:23 AM
To: ольга крыжановская
Cc: zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] terrible ZFS performance compared to UFS on ramdisk 
(70% drop)

On Mar 8, 2010, at 11:46 PM, ольга крыжановская olga.kryzh 
anov...@gmail.com wrote:

 tmpfs lacks features like quota and NFSv4 ACL support. May not be the
 best choice if such features are required.

True, but if the OP is looking for those features they are more then  
unlikely looking for an in-memory file system.

This would be more for something like temp databases in a RDBMS or a  
cache of some sort.

-Ross

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Recover rpool

2010-03-09 Thread D. Pinnock
Found a site that recommended setting the following system file entries 

set zfs:zfs_recover=1
set aok=1

and running this command

zdb -e -bcsvL rpool

but I get the following error:

Traversing all blocks to verify checksums ...
out of memory -- generating core dump
Abort

The laptop has 4GB of memory, and I did not see memory utilization pass 400MB.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] and another video: ZFS Dynamic LUN Expansion

2010-03-09 Thread Deirdre Straughan
And another brand-new video: ZFS Dynamic LUN Expansion - Oracle Solaris 
Video http://bit.ly/cwwCZl

--

best regards,
Deirdré Straughan

Solaris Technical Content

blog: Un Posto al Sole http://blogs.sun.com/deirdre/


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Recover rpool

2010-03-09 Thread Cindy Swearingen

Hi D,

Is this a 32-bit system?

We were looking at your panic messages and they seem to indicate a
problem with memory and not necessarily a problem with the pool or
the disk. Your previous zpool status output also indicates that the
disk is okay.

Maybe someone with similar recent memory problems can advise.

Thanks,

Cindy

On 03/09/10 09:15, D. Pinnock wrote:

When I boot from a snv133 live cd and attempt to import the rpool it panics 
with this output:

Sun Microsystems Inc.   SunOS 5.11  snv_133 February 2010

j...@opensolaris:~$ pfexec su
Mar  9 03:11:37 opensolaris su: 'su root' succeeded for jack on /dev/console
j...@opensolaris:~# zpool import -f -o ro -o failmode=continue -R /mnt rpool

panic[cpu1]/thread=ff00086e0c60: BAD TRAP: type=e (#pf Page fault) 
rp=ff00086dfe60 addr=278 occurred in module unix due to a NULL pointer 
dereference

sched: #pf Page fault
Bad kernel fault at addr=0x278
pid=0, pc=0xfb862b6b, sp=0xff00086dff58, eflags=0x10246
cr0: 8005003bpg,wp,ne,et,ts,mp,pe cr4: 6f8xmme,fxsr,pge,mce,pae,pse,de
cr2: 278cr3: c80cr8: c

rdi:  278 rsi:4 rdx: ff00086e0c60
rcx:0  r8:   40  r9:21d9a
rax:0 rbx:0 rbp: ff00086dffb0
r10:   7f6fc8 r11:   6e r12:0
r13:  278 r14:4 r15: ff01cfe27e08
fsb:0 gsb: ff01ccfa5080  ds:   4b
 es:   4b  fs:0  gs:  1c3
trp:e err:2 rip: fb862b6b
 cs:   30 rfl:10246 rsp: ff00086dff58
 ss:   38

ff00086dfd40 unix:die+dd ()
ff00086dfe50 unix:trap+177e ()
ff00086dfe60 unix:cmntrap+e6 ()
ff00086dffb0 unix:mutex_enter+b ()
ff00086dffd0 zfs:zio_buf_alloc+2c ()
ff00086e0010 zfs:arc_get_data_buf+173 ()
ff00086e0060 zfs:arc_buf_alloc+a2 ()
ff00086e0100 zfs:arc_read_nolock+12f ()
ff00086e01a0 zfs:arc_read+75 ()
ff00086e0230 zfs:scrub_prefetch+b9 ()
ff00086e02f0 zfs:scrub_visitbp+5f1 ()
ff00086e03b0 zfs:scrub_visitbp+6e3 ()
ff00086e0470 zfs:scrub_visitbp+6e3 ()
ff00086e0530 zfs:scrub_visitbp+6e3 ()
ff00086e05f0 zfs:scrub_visitbp+6e3 ()
ff00086e06b0 zfs:scrub_visitbp+6e3 ()
ff00086e0750 zfs:scrub_visitdnode+84 ()
ff00086e0810 zfs:scrub_visitbp+1a6 ()
ff00086e0860 zfs:scrub_visit_rootbp+4f ()
ff00086e08c0 zfs:scrub_visitds+7e ()
ff00086e0a80 zfs:dsl_pool_scrub_sync+163 ()
ff00086e0af0 zfs:dsl_pool_sync+25b ()
ff00086e0ba0 zfs:spa_sync+36f ()
ff00086e0c40 zfs:txg_sync_thread+24a ()
ff00086e0c50 unix:thread_start+8 ()

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Using zfs-auto-snapshot for automatic backups

2010-03-09 Thread Brandon High
On Mon, Mar 8, 2010 at 1:47 PM, Tim Foster tim.fos...@sun.com wrote:
 Looking at the errors, it looks like SMF isn't exporting the values for
 action_authorization or value_authorization in the SMF manifest it
 produces, resulting the service not being allowed to set values in
 svccfg when it runs as 'zfssnap'.

After playing around a bit, I found the right way to verify this:

$ svcprop -p general zfs/auto-snapshot:rpool-backup
general/action_authorization astring solaris.smf.manage.zfs-auto-snapshot
general/value_authorization astring solaris.smf.manage.zfs-auto-snapshot
general/enabled boolean true
general/entity_stability astring Unstable

This is exactly the output that I'm getting for zfs/auto-snapshot:daily

I'm still seeing errors from svccfg before and after the zfs send.
This isn't affecting any of the default instances, since they don't
use backup-save-cmd.

$ svcprop -p start/user zfs/auto-snapshot:rpool-backup
zfssnap
$ svcprop -p stop/user zfs/auto-snapshot:rpool-backup
zfssnap

/etc/user_attr contains:
zfssnaptype=role;auths=solaris.smf.manage.zfs-auto-snapshot;profiles=ZFS
File System Management

The instance runs as zfssnap and the user *should* be able to change
values in smf, right?

-B

-- 
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] terrible ZFS performance compared to UFS on ramdisk (70% drop)

2010-03-09 Thread Richard Elling
On Mar 9, 2010, at 9:40 AM, Matt Cowger wrote:
 Ross is correct - advanced OS features are not required here - just the 
 ability to store a file - don’t even need unix style permissions

KISS.  Just use tmpfs, though you might also consider limiting its size.
 -- richard

ZFS storage and performance consulting at http://www.RichardElling.com
ZFS training on deduplication, NexentaStor, and NAS performance
http://nexenta-atlanta.eventbrite.com (March 16-18, 2010)




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Recover rpool

2010-03-09 Thread Tim Haley

On 03/ 9/10 10:53 AM, Cindy Swearingen wrote:

Hi D,

Is this a 32-bit system?

We were looking at your panic messages and they seem to indicate a
problem with memory and not necessarily a problem with the pool or
the disk. Your previous zpool status output also indicates that the
disk is okay.

To perhaps clarify, you're panicking trying to grab a mutex, which hints 
that

something has stomped on the memory containing that mutex.   The reason
for the 32-bit question is that sometimes a deep stack can overrun on a 
32-bit

box.  That's probably not what happened here, but we ask anyway.

-tim

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] about zfs exported on nfs

2010-03-09 Thread Harry Putnam
[First, a brief apology.  I inadvertently posted this message to the
`general' group when it should have been to the `zfs' group.

In that last few days I seem to be all thumbs when posting.. and have
created several bumbling posts to opensolaris lists.
]

summary:

  A zfs fs set with smb and nfs on, and set chmod g-s (set-gid) with
  a local users uid:gid is being mounted by a remote linux host (and
  windows hosts, but not discussing that here).

  The remote user is the same as the local user in both numeric UID
  and numeric GID
  
  The zfs nfs/cifs share is mounted like this on a linux client:
  mount -t nfs -o users,exec,dev,suid

  Any files/directories create by the linux user end up with
  nobody:nobody uid:gid and any attempt to change that from the client
  host fails, even if done as root.

Details:

I'm not sure when this trouble started... its been a while, long
enough to have existed over a couple of builds (b129 b133). But was
not always a problem.

I jumped from 129 to 133 so don't know about builds in between.

I have a zfs_fs .. /projects on zpool z3

this is a hierarchy that is fairly deep but only the top level is zfs.
(Aside:  That is something I intend to change soon)

That is, the whole thing, of course, is zfs, but the lower levels have
been created by whatever remote host was working there.

z3/projects has these two settings:
  z3/projects  sharenfs   on
  z3/projects  sharesmb   name=projects

So both cifs and nfs are turned on making the zfs host both a zfs and
nfs server.

Also when  z3/projects was created, it was set:
  chmod g-s (set gid) right away.

The remote linux user in this discussion has the same numeric UID and
GID as the local zfs user who is owner of /projects
 
Later, and more than once by now, I've run this command from the zfs
host:
  /bin/chmod -R A=everyone@:full_set:fd:allow /projects

to get read/write to work when working from windows hosts.

The filesystem is primarily accessed as an nsf mounted filesystem on a
linux (gentoo linux) host.  But is also used over cifs by a couple of
windows hosts.

On the linux client host, `/projects' gets mounted like this:
  mount -t nfs -o users,exec,dev,suid

That has been the case both before having the problem and now.

The trouble I see is that all files get created with: 
   nobody:nobody

as UID:GID, even though /projects is set as normal USER:GROUP of a user
on the zfs/nfs server.

From the remote (we only deal with the linux remote here) any attempt
to change uid:gid fails, even if done by root on the remote.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] terrible ZFS performance compared to UFS on ramdisk (70% drop)

2010-03-09 Thread Matt Cowger
That's a very good point - in this particular case, there is no option to
change the blocksize for the application.


On 3/9/10 10:42 AM, Roch Bourbonnais roch.bourbonn...@sun.com wrote:

 
 I think This is highlighting that there is extra CPU requirement to
 manage small blocks in ZFS.
 The table would probably turn over if you go to 16K zfs records and
 16K reads/writes form the application.
 
 Next step for you is to figure how much reads/writes IOPS do you
 expect to take in the real workloads and whether or not the filesystem
 portion
 will represent a significant drain of CPU resource.
 
 -r
 
 
 Le 8 mars 10 à 17:57, Matt Cowger a écrit :
 
 Hi Everyone,
 
 It looks like I¹ve got something weird going with zfs performance on
 a ramdiskS.ZFS is performing not even a 3rd of what UFS is doing.
 
 Short version:
 
 Create 80+ GB ramdisk (ramdiskadm), system has 96GB, so we aren¹t
 swapping
 Create zpool on it (zpool create ramS.)
 Change zfs options to turn off checksumming (don¹t want it or need
 it), atime, compression, 4K block size (this is the applications
 native blocksize) etc.
 Run a simple iozone benchmark (seq. write, seq. read, rndm write,
 rndm read).
 
 Same deal for UFS, replacing the ZFS stuff with newfs stuff and
 mounting the UFS forcedirectio (no point in using a buffer cache
 memory for something that¹s already in memory)
 
 Measure IOPs performance using iozone:
 
 iozone  -e -i 0 -i 1 -i 2 -n 5120 -O -q 4k -r 4k -s 5g
 
 With the ZFS filesystem I get around:
 ZFS 
  (seq
  write) 42360 (seq read)31010   (random
 read)20953   (random write)32525
 Not SOO bad, but here¹s UFS:
 UFS 
 (seq
  write )42853 (seq read) 100761(random read)
 100471   (random write) 101141
 
 For all tests besides the seq write, UFS utterly destroys ZFS.
 
 I¹m curious if anyone has any clever ideas on why this huge
 disparity in performance exists.  At the end of the day, my
 application will run on either filesystem, it just surprises me how
 much worse ZFS performs in this (admittedly edge case) scenario.
 
 --M
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] backup zpool to tape

2010-03-09 Thread Gregory Durham
Hello all,
I need to backup some zpools to tape. I currently have two servers,
for the purpose of this conversation we will call them server1 and
server2 respectively. Server1, has several zpools which are replicated
to a single zpool on server2 through a zfs send/recv script. This part
works perfectly. I now need to get this backed up to tape. My
origional plan was to have a disk set up which would hold a file based
zpool, and then do zfs send/recv to this pool. My problem however is I
run into this bug:
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6929751

In my case where I reboot the server I cannot get the pool to come
back up. It shows UNAVAIL, I have tried to export before reboot and
reimport it and have not been successful and I dont like this in the
case a power issue of some sort happens. My other option was to mount
using lofiadm however I cannot get it to mount on boot, so the same
thing happens. Does anyone have any experience with backing up zpools
to tape? Please any ideas would be greatly beneficial.

Thanks,
Greg
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Weird drive configuration, how to improve the situation

2010-03-09 Thread Thomas W
Okay... I found the solution to my problem.

And it has nothing to do with my hard drives... It was the Realtek NIC drivers. 
I read about problems and added a new driver (I got that from the forum 
thread). And now I have about 30MB/s read and 25MB/s write performance. That's 
enough (for the beginning).

Thanks for all your input and support. 

Thomas
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] what to do when errors occur during scrub

2010-03-09 Thread Cindy Swearingen

Hi Harry,

Reviewing other postings where permanent errors where found on redundant 
ZFS configs, one was resolved by re-running the zpool scrub and one

resolved itself because the files with the permanent errors were most
likely temporary files.

One of the files with permanent errors below is a snapshot and the other
looks another backup.

I would recommend the top section of this troubleshooting wiki to
determine if hardware issues are causing these permanent errors:

http://www.solarisinternals.com/wiki/index.php/ZFS_Troubleshooting_Guide

If it turns out that some hardware problem, power failure, or other
event caused these errors and if rerunning the scrub doesn't remove
these files, then I would remove them manually (if you have copies of
the data somewhere else).

Thanks,

Cindy


On 03/09/10 10:08, Harry Putnam wrote:

[I hope this isn't a repost double whammy.  I posted this message
under `Message-ID: 87fx4ai5sp@newsguy.com' over 15 hrs ago but
it never appeared on my nntp server (gmane) far as I can see]

I'm a little at a loss here as to what to do about these two errors
that turned up during a scrub.

The discs involved are a matched pair in mirror mode.

zpool status -v z3 (wrapped for mail):
----   ---=---   -   
scrub: scrub completed after 1h48m with 2 errors on Mon Mar 8

10:26:49 2010 config:

  NAMESTATE READ WRITE CKSUM
  z3  ONLINE   0 0 2
mirror-0  ONLINE   0 0 4
  c5d0ONLINE   0 0 4
  c6d0ONLINE   0 0 4

   errors: Permanent errors have been detected in the following files:
   [NOTE: Edited to ease reading -ed -hp]
z3/proje...@zfs-auto-snap:monthly-2009-08-30-09:26:/Training/\
[... huge path snipped ...]/2_Database.mov

/t/bk-test-DiskDamage-021710_005252/rsnap/misc/hourly.4/\
[... huge path snipped ...]/es.utf-8.sug

----   ---=---   -   


Those are just two on disk files.

Can it be as simple as just deleting them?

Or is something more technical required.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] backup zpool to tape

2010-03-09 Thread Miles Nordin
 gd == Gregory Durham gregory.dur...@gmail.com writes:

gd it to mount on boot

I do not understand why you have a different at-boot-mounting problem
with and without lofiadm: either way it's your script doing the
importing explicitly, right?  so just add lofiadm to your script.  I
guess you were exporting pools explicitly at shutdown because you
didn't trust solaris to unmount the two levels of zfs in the right
order?

Anyway I would guess it doesn't matter because my ``back up file
zpools to tape'' suggestion seems to be bogus bad advice.  The other
bug referenced in the one you quoted, 6915127, seems a lot more
disruptive and says there are weird corruption problems with using
file vdev's directly, and then there are deadlock problems with
lofiadm from the two layers of zfs that haven't been ironed out yet.
I guess file-based zpools do not work, and we're back to having no
good plan that I can see to back up zpools to tape that preserves
dedup, snapshots/clones, NFSv4 acl's, u.s.w.  I assumed they did work
because it looked like regression tests people were quoting and many
examples depended upon them, but now it seems they don't, which
explains some problems I had last month extracting an s10brand image
from a .VDI. :( (iirc i got the image out using lofiadm and just
assumed I was confused, banging away at things until they work and
then forgetting about them.  not good on me.)

There is only zfs send which is made with replication in mind (

 * it'll intentionally destroy the entire stream and any incremental
   descendents if there's a single bit-flip, which is a good feature
   to make sure the replication is retried if the copy's not faithful
   but a bad feature for tape.  If ZFS rallies against other
   filesystems for their fragile lack of metadata copies and
   checksums, why should the tape format be so oddly fragile that tape
   archives become massive gamma gremlin detectors?

 * and it has no scrub-like method analagous to 'tar t' or 'cpio -it'
   because it's assumed you'll always recv it in a situation where
   you've the opportunity to re-send, while a tape is something you
   might like to validate after transporting it or every few years.
   If pools need scrubing why don't tapes?

 * and no partial-restore feature because it assumes if you don't have
   enough space on the destination for the entire dataset you'll use
   rsync or cpio or some other tree-granularity tool instead of the
   replication toolkit.  a tool which does not fully exist (sparse
   files, 4GB files, NFSv4 ACL's), but that's a separate problem.

).

how about zpools on zvol's.  Does that avoid the deadlock/corruption
bugs with file vdevs?  It's not a workaround for the cases in the bug
becuase they wanted to use NFS to replace iSCSI, but for backups,
zvols might be okay, if they work?  It's certainly possible to write
them onto a tape (dd was originally meant for such things).


pgpaynQ63iMAj.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Recover rpool

2010-03-09 Thread D. Pinnock
My Laptop is a 64bit system

Dell Latitude D630
Intel Core2 Duo Processor T7100 
4GB RAM
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] what to do when errors occur during scrub

2010-03-09 Thread Harry Putnam
Cindy Swearingen cindy.swearin...@sun.com writes:

 Hi Harry,

 Reviewing other postings where permanent errors where found on
 redundant ZFS configs, one was resolved by re-running the zpool scrub
 and one
 resolved itself because the files with the permanent errors were most
 likely temporary files.

what search strings did you use to find those?... I always seem to use
search strings that miss what I'm after its helpful to see how
others conduct searches.

 One of the files with permanent errors below is a snapshot and the other
 looks another backup.

 I would recommend the top section of this troubleshooting wiki to
 determine if hardware issues are causing these permanent errors:

 http://www.solarisinternals.com/wiki/index.php/ZFS_Troubleshooting_Guide

A lot of that seems horribly complex for what is apparently (and this
may turn out to be wishful thinking) a pretty minor problem.  But it
does say that repeated scrubs will most likely remove all traces of
corruption (assuming its not caused by hardware).  However I see no
evidence that the `scrub' command is doing anything at all (more on
that below).

I decided to take the line of least Resistance and simply deleted the
file.

As you guessed, they were backups and luckily for me, redundant. 

So following a scrub... I see errors that look more technical.

But first the info given by `zpool status' appears to either be
referencing a earlier scrub or is seriously wrong in what it reports.

  root # zpool status -vx z3
pool: z3
   state: ONLINE
  status: One or more devices has experienced an error resulting in data
  corruption.  Applications may be affected.

  action: Restore the file in question if possible.  Otherwise restore
the entire pool from backup.  see:

http://www.sun.com/msg/ZFS-8000-8A scrub: scrub completed after
1h48m with 2 errors on Mon Mar 8 10:26:49 2010 config:

----   ---=---   -   

I just ran a scrub moments ago, but `status' is still reporting one 
from earlier in the day.  It says 1HR and 48 minutes but that is
completely wrong too.

----   ---=---   -   

  NAMESTATE READ WRITE CKSUM
  z3  ONLINE   0 0 2
mirror-0  ONLINE   0 0 4
  c5d0ONLINE   0 0 4
  c6d0ONLINE   0 0 4
  
  errors: Permanent errors have been detected in the following files:
  
  0x42:0x552d
  z3/t:0xe1d99f
----   ---=---   -   

The `status' report, even though it seems to have bogus information
about the scrub, does show different output for the errors.

Are those hex addresses of devices or what?  There is nothing at all
on z3/t  

Also - it appears `zpool scrub -s z3' doesn't really do anything.

The status report above is taken immediately after a scrub command.

The `scub -s' command just returns the prompt... no output and
apparently no scrub either.

Does the failure to scrub indicate it cannot be scrubbed?  Does a
status report that shows the pool on line and not degraded really mean
anything is that just as spurious as the scrub info there?

Sorry if I seem like a lazy dog but I don't really see a section in
the trouble shooting (from viewing the outline of sections) that
appears to deal with directly with scrubbing.

Apparently I'm supposed to read and digest the whole thing so as to
know what to do... but I get quickly completely lost in the
discussion.

They say to use fmdump for a list of defective hardware... but I don't
see anything that appears to indicate a problem unless the two entries
from March 5th mean something that is not apparent.

fmdump

(I removed the exact times from the lines so this wouldn't wrap)

  [...]  

  Mar 05 ... 9ea9e105-72b1-69bd-e1e6-88322cd8b847 ZFS-8000-GH
  Mar 05 ... a37779fb-8018-ec8c-bd72-ec32f4b40ff6 ZFS-8000-GH
  Mar 08 ... a37779fb-8018-ec8c-bd72-ec32f4b40ff6 FMD-8000-4M Repaired
  Mar 08 ... a37779fb-8018-ec8c-bd72-ec32f4b40ff6 FMD-8000-6U Resolved
  Mar 08 ... 9ea9e105-72b1-69bd-e1e6-88322cd8b847 FMD-8000-4M Repaired
  Mar 08 ... 9ea9e105-72b1-69bd-e1e6-88322cd8b847 FMD-8000-6U Resolved




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] terrible ZFS performance compared to UFS on ramdisk (70% drop)

2010-03-09 Thread Ross Walker
On Mar 9, 2010, at 1:42 PM, Roch Bourbonnais  
roch.bourbonn...@sun.com wrote:




I think This is highlighting that there is extra CPU requirement to  
manage small blocks in ZFS.
The table would probably turn over if you go to 16K zfs records and  
16K reads/writes form the application.


Next step for you is to figure how much reads/writes IOPS do you  
expect to take in the real workloads and whether or not the  
filesystem portion

will represent a significant drain of CPU resource.


I think it highlights more the problem of ARC vs ramdisk, or  
specifically ZFS on ramdisk while ARC is fighting with ramdisk for  
memory.


It is a wonder it didn't deadlock.

If I were to put a ZFS file system on a ramdisk, I would limit the  
size of the ramdisk and ARC so both, plus the kernel fit nicely in  
memory with room to spare for user apps.


-Ross

 
___

zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] terrible ZFS performance compared to UFS on ramdisk (70% drop)

2010-03-09 Thread ольга крыжановская
Could you retest it with mmap() used?

Olga

2010/3/9 Matt Cowger mcow...@salesforce.com:
 It can, but doesn't in the command line shown below.

 M



 On Mar 8, 2010, at 6:04 PM, ольга крыжановская olga.kryzh
 anov...@gmail.com wrote:

 Does iozone use mmap() for IO?

 Olga

 On Tue, Mar 9, 2010 at 2:57 AM, Matt Cowger mcow...@salesforce.com
 wrote:
 Hi Everyone,



 It looks like I've got something weird going with zfs performance
 on a
 ramdiskZFS is performing not even a 3rd of what UFS is doing.



 Short version:



 Create 80+ GB ramdisk (ramdiskadm), system has 96GB, so we aren't
 swapping

 Create zpool on it (zpool create ram)

 Change zfs options to turn off checksumming (don't want it or need
  it),
 atime, compression, 4K block size (this is the applications native
 blocksize) etc.

 Run a simple iozone benchmark (seq. write, seq. read, rndm write,
 rndm
 read).



 Same deal for UFS, replacing the ZFS stuff with newfs stuff and
 mounting the
 UFS forcedirectio (no point in using a buffer cache memory for
 something
 that's already in memory)



 Measure IOPs performance using iozone:



 iozone  -e -i 0 -i 1 -i 2 -n 5120 -O -q 4k -r 4k -s 5g



 With the ZFS filesystem I get around:

 ZFS
 (seq write) 42360 (seq read)31010   (random
 read)20953   (random write)32525

 Not SOO bad, but here's UFS:

 UFS
 (seq write )42853 (seq read) 100761(random
 read)
 100471   (random write) 101141



 For all tests besides the seq write, UFS utterly destroys ZFS.



 I'm curious if anyone has any clever ideas on why this huge dispar
 ity in
 performance exists.  At the end of the day, my application will run
 on
 either filesystem, it just surprises me how much worse ZFS performs
 in this
 (admittedly edge case) scenario.



 --M

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss





 --
  ,   __   ,
 { \/`o;-Olga Kryzhanovska   -;o`\/ }
 .'-/`-/ olga.kryzhanov...@gmail.com   \-`\-'.
 `'-..-| / Solaris/BSD//C/C++ programmer   \ |-..-'`
  /\/\ /\/\
  `--`  `--`




-- 
  ,   __   ,
 { \/`o;-Olga Kryzhanovska   -;o`\/ }
.'-/`-/ olga.kryzhanov...@gmail.com   \-`\-'.
 `'-..-| / Solaris/BSD//C/C++ programmer   \ |-..-'`
  /\/\ /\/\
  `--`  `--`
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] backup zpool to tape

2010-03-09 Thread Greg
Thank you for such a thorough look into my issue. As you said, I guess I am 
down to trying to backup to a zvol and then backing that up to tape. Has anyone 
tried this solution? I would be very interested to find out. Anyone else with 
any other solutions?

Thanks!
Greg
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] terrible ZFS performance compared to UFS on ramdisk (70% drop)

2010-03-09 Thread Matt Cowger
This is a good point, and something that I tried.  I limited the ARC to 1GB and 
4GB (both well within the memory footprint of the system even with the 
ramdisk).equally poor resultsthis doesn't feel like ARC righting with 
locked memory pages.

--M

-Original Message-
From: Ross Walker [mailto:rswwal...@gmail.com] 
Sent: Tuesday, March 09, 2010 3:53 PM
To: Roch Bourbonnais
Cc: Matt Cowger; zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] terrible ZFS performance compared to UFS on ramdisk 
(70% drop)

On Mar 9, 2010, at 1:42 PM, Roch Bourbonnais  
roch.bourbonn...@sun.com wrote:


 I think This is highlighting that there is extra CPU requirement to  
 manage small blocks in ZFS.
 The table would probably turn over if you go to 16K zfs records and  
 16K reads/writes form the application.

 Next step for you is to figure how much reads/writes IOPS do you  
 expect to take in the real workloads and whether or not the  
 filesystem portion
 will represent a significant drain of CPU resource.

I think it highlights more the problem of ARC vs ramdisk, or  
specifically ZFS on ramdisk while ARC is fighting with ramdisk for  
memory.

It is a wonder it didn't deadlock.

If I were to put a ZFS file system on a ramdisk, I would limit the  
size of the ramdisk and ARC so both, plus the kernel fit nicely in  
memory with room to spare for user apps.

-Ross

  
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Should ZFS write data out when disk are idle

2010-03-09 Thread Damon Atkins
Sorry, Full Stripe on a RaidZ is the recordsize ie if the record size is 128k 
on a RaidZ and its made up of 5 disks, then 128k is spread across 4 disks with 
the calc parity on the 5 disk, which means the writes are 32k to each disk.

For a RaidZ, when data is written to a disk, are individual 32k join together 
to the same disk and written out as a single I/O to the disk?
e.g. 128k for file a, 128k for file b, 128k for file c.   When written out does 
zfs do
 32k+32k+32k i/o to each disk, or will it do one 96k i/o if the space is 
available sequentially?

Cheers
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] sharenfs option rw,root=host1 don't take effect

2010-03-09 Thread mingli
Hi All,
I had create a ZFS filesystem test and shared it with zfs set 
sharenfs=root=host1 test, and I checked the sharenfs option and it already 
update to root=host1:
bash-3.00# zfs get sharenfs test
-
NAME  PROPERTY  VALUESOURCE
test  sharenfs  rw,root=host  local
-
and NFS command share show it already shared as rw,root=host1 also:
-
bash-3.00# share
-   /test   sec=sys,rw,root=host1 
-
But at host1, after I mounted this filesystem and tried to do some write 
operation at it, it still return permission denied:
 
-
bash-3.00# touch ll
touch: cannot create ll: Permission denied
-

Thanks for any reply.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] what to do when errors occur during scrub

2010-03-09 Thread David Dyer-Bennet

On 3/9/2010 4:57 PM, Harry Putnam wrote:

Also - it appears `zpool scrub -s z3' doesn't really do anything.
The status report above is taken immediately after a scrub command.

The `scub -s' command just returns the prompt... no output and
apparently no scrub either.
   


The -s switch is documented to STOP a scrub, though I've never used it.

--

David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] sharenfs option rw,root=host1 don't take effect

2010-03-09 Thread Dennis Clarke

 Hi All,
 I had create a ZFS filesystem test and shared it with zfs set
 sharenfs=root=host1 test, and I checked the sharenfs option and it
 already update to root=host1:

Try to use a backslash to escape those special chars like so :


zfs set sharenfs=nosub\,nosuid\,rw\=hostname1\:hostname2\,root\=hostname2
zpoolname/zfsname/pathname

Dennis


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [osol-discuss] Moving Storage to opensolaris+zfs. What about backup?

2010-03-09 Thread rwalists
On Mar 8, 2010, at 7:55 AM, Erik Trimble wrote:

 Assume your machine has died the True Death, and you are starting with new 
 disks (and, at least a similar hardware setup).
 
 I'm going to assume that you named the original snapshot 
 'rpool/ROOT/whate...@today'
 
 (1)   Boot off the OpenSolaris LiveCD
 
 
...
 
 (10)  Activate the restored BE:
   # beadm activate New
 
 
 You should now be all set.   Note:  I have not /explicitly/ tried the above - 
 I should go do that now to see what happens.  :-)

If anyone is going to implement this, much the same procedure is documented at 
Simon Breden's blog:

http://breden.org.uk/2009/08/29/home-fileserver-mirrored-ssd-zfs-root-boot/

which walks through the commands for executing the backup and the restore.

--Ware
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Should ZFS write data out when disk are idle

2010-03-09 Thread Richard Elling
On Mar 9, 2010, at 6:13 PM, Damon Atkins wrote:

 Sorry, Full Stripe on a RaidZ is the recordsize ie if the record size is 128k 
 on a RaidZ and its made up of 5 disks, then 128k is spread across 4 disks 
 with the calc parity on the 5 disk, which means the writes are 32k to each 
 disk.

Nominally.

 For a RaidZ, when data is written to a disk, are individual 32k join together 
 to the same disk and written out as a single I/O to the disk?

I/Os can be coalesced, but there is no restriction as to what can be coalesced.
In other words, subsequent writes can also be coalesced if they are contiguous.

 e.g. 128k for file a, 128k for file b, 128k for file c.   When written out 
 does zfs do
 32k+32k+32k i/o to each disk, or will it do one 96k i/o if the space is 
 available sequentially?

I'm not sure how one could write one 96KB physical I/O to three different disks?
 -- richard

ZFS storage and performance consulting at http://www.RichardElling.com
ZFS training on deduplication, NexentaStor, and NAS performance
http://nexenta-atlanta.eventbrite.com (March 16-18, 2010)




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] sharenfs option rw,root=host1 don't take effect

2010-03-09 Thread mingli
And I update the sharenfs option with rw,ro...@100.198.100.0/24, it works 
fine, and the NFS client can do the write without error.

Thanks.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to verify ecc for ram is active and enabled?

2010-03-09 Thread Richard PALO
Hi, thanks for the reply... I guess I'm so far as well, but my question is 
targetted at understanding the realworld implication of the kernel software 
memory scrubber.

That is, in looking through the code a bit I notice that if hardware ECC is 
active the software scrubber is disabled.  It is also disabled in absence of 
ECC memory (or unmatched ECC memory).

In my particular case:
bash-4.0# echo memscrub_scans_done/U | mdb -k
memscrub_scans_done:
memscrub_scans_done: 1985

It appears not to be disabled.  

My question, I guess, put differently is if it _is_ enabled does it indeed do 
something useful in the sense of error detection?  

That is, if it is enabled but *cannot* determine anything related to ECC, _why_ 
is it running in the first place? That is, if ECC is crippled then the software 
scrubber gives false impression of doing something useful and is perhaps a bug.

On the other hand, if it *can* determine ECC (not crippled), then can we 
conclude that it is effective [enough] to be able to run as a small and 
reasonably reliable server? That is, correct correctable errors and be able to 
log memory errors for eventual action...

cheers
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] (FreeBSD) ZFS RAID: Disk fails while replacing another disk

2010-03-09 Thread Victor Latushkin

Christian Hessmann wrote:

Victor,


Btw, they affect some files referenced by snapshots as
'zpool status -v' suggests:

  tank/DVD:0x9cd tank/d...@2010025100:/Memento.m4v
  tank/d...@2010025100:/Payback.m4v
  tank/d...@2010025100:/TheManWhoWasntThere.m4v

In case of OpenSolaris it is not that difficult to work around this bug
without getting rid of files (snapshots referencing them) with errors,
but in I'm not sure how to do the same on FreeBSD.
But you always have option of destroying snapshot indicated above (and may
be more).


I'm still reluctant to reboot the machine, so what I did now was as you
suggested destroy these snapshots (after deleting the files from the
current filesystem, of course).
I'm not so sure the result is good, though:

===
[r...@camelot /tank/DVD]# zpool status -v tank
  pool: tank
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: resilver completed after 10h42m with 136 errors on Tue Mar  2
07:55:05 2010
config:

NAME   STATE READ WRITE CKSUM
tank   DEGRADED   137 0 0
  raidz1   ONLINE   0 0 0
ad17p2 ONLINE   0 0 0
ad18p2 ONLINE   0 0 0
ad20p2 ONLINE   0 0 0
  raidz1   DEGRADED   326 0 0
replacing  DEGRADED 0 0 0
  ad16p2   OFFLINE  2  241K 6
  ad4p2ONLINE   0 0 0  839G resilvered
ad14p2 ONLINE   0 0 0  5.33G resilvered
ad15p2 ONLINE 418 0 0  5.33G resilvered

errors: Permanent errors have been detected in the following files:

tank/DVD:0x9cd
0x2064:0x25a4
0x20ae:0x503
0x20ae:0x9cd
===

Any further information available on this hex messages?


This tells that ZFS can no longer map object numbers from errlog into meaningful 
 names, and this is expected, as you have destroyed them.


Now you need to rerun a scrub.

regards,
victor

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss