[zfs-discuss] ZFS - SWAP and lucreate..

2009-06-29 Thread Patrick Bittner
Good morning everybody

I was migrating my ufs – rootfilesystem to a zfs – one, but was a little upset 
finding out that it became bigger (what was clear because of the swap and dump 
size).

Now I am questioning myself if it is possible to set the swap and dump size by 
using the lucreate – command (I wanna try it again but on less space). 
Unfortunately I did not find any advice in manpages. 


Maybe you can help me?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool import issue

2009-06-29 Thread Moutacim LACHHAB

Hi,

you have to upgrade your pool:

The pool is formatted using an older on-disk version.

# *zpool upgrade -v*

Then it should works fine.

Kind regards,
Moutacim


Ketan schrieb:
I'm having following issue .. i import the zpool and it shows pool imported correctly but after few seconds when i issue command zpool list .. it does not show any pool and when again i try to import it says  device is missing in pool .. what could be the reason for this .. and yes this all started after i upgraded the powerpath 



abc # zpool import
  pool: emcpool1
id: 5596268873059055768
 state: ONLINE
status: The pool is formatted using an older on-disk version.
action: The pool can be imported using its name or numeric identifier, though
some features will not be available without an explicit 'zpool upgrade'.
config:

emcpool1  ONLINE
  emcpower0c  ONLINE
abc # zpool list
no pools available
  



--


Moutacim LACHHAB
Service Engineer
Software Technical Services Center Sun Microsystems Inc.
Email moutacim.lach...@sun.com mailto:moutacim.lach...@sun.com
+33(0)134030594 x31457
For knowledge and support: http://sunsolve.sun.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool import issue

2009-06-29 Thread Victor Latushkin

On 29.06.09 11:41, Ketan wrote:

I'm having following issue .. i import the zpool and it shows pool imported 
correctly


'zpool import' only show what pools are available to import. In order to 
actually import pool you need to to


zpool import emcpool1


but after few seconds when i issue command zpool list .. it does not show any 
pool


this is expected as you did not do import yet.


and when again i try to import it says  device is missing in pool ..
what could be the reason for this .. and yes this all started after i
upgraded the powerpath


I suspect PowerPath may be playing tricks with you, but cannot comment 
any further because there's no more data to comment on...


This is not related to pool on-disk version though.

victor


abc # zpool import
  pool: emcpool1
id: 5596268873059055768
 state: ONLINE
status: The pool is formatted using an older on-disk version.
action: The pool can be imported using its name or numeric identifier, though
some features will not be available without an explicit 'zpool upgrade'.
config:

emcpool1  ONLINE
  emcpower0c  ONLINE
abc # zpool list
no pools available

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool import issue

2009-06-29 Thread Ketan
didn't help .. tried 


r...@essapl020-u006 # zpool import
  pool: emcpool1
id: 5596268873059055768
 state: UNAVAIL
status: One or more devices are missing from the system.
action: The pool cannot be imported. Attach the missing
devices and try again.
   see: http://www.sun.com/msg/ZFS-8000-3C
config:

emcpool1  UNAVAIL  insufficient replicas
  emcpower0c  UNAVAIL  cannot open
r...@essapl020-u006 # zpool upgrade -v
This system is currently running ZFS pool version 10.

The following versions are supported:

VER  DESCRIPTION
---  
 1   Initial ZFS version
 2   Ditto blocks (replicated metadata)
 3   Hot spares and double parity RAID-Z
 4   zpool history
 5   Compression using the gzip algorithm
 6   bootfs pool property
 7   Separate intent log devices
 8   Delegated administration
 9   refquota and refreservation properties
 10  Cache devices
For more information on a particular version, including supported releases, see:

http://www.opensolaris.org/os/community/zfs/version/N

Where 'N' is the version number.
r...@essapl020-u006 # zpool upgrade emcpool1
This system is currently running ZFS pool version 10.

cannot open 'emcpool1': no such pool
r...@essapl020-u006 # zpool upgrade
This system is currently running ZFS pool version 10.

All pools are formatted using this version.
r...@essapl020-u006 #
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool import issue

2009-06-29 Thread Ketan
And i just found out that one of my disk in the pool is showing missing labels 

r...@essapl020-u006 # zdb -l /dev/dsk/emcpower0c

LABEL 0

version=4
name='emcpool1'
state=0
txg=6973090
pool_guid=5596268873059055768
hostid=2228473662
hostname='essapl020-u006'
top_guid=3858675847091731383
guid=3858675847091731383
vdev_tree
type='disk'
id=0
guid=3858675847091731383
path='/dev/dsk/emcpower0c'
phys_path='/pseudo/e...@0:c,blk'
whole_disk=0
metaslab_array=14
metaslab_shift=32
ashift=9
asize=477788372992
is_log=0

LABEL 1

version=4
name='emcpool1'
state=0
txg=6973090
pool_guid=5596268873059055768
hostid=2228473662
hostname='essapl020-u006'
top_guid=3858675847091731383
guid=3858675847091731383
vdev_tree
type='disk'
id=0
guid=3858675847091731383
path='/dev/dsk/emcpower0c'
phys_path='/pseudo/e...@0:c,blk'
whole_disk=0
metaslab_array=14
metaslab_shift=32
ashift=9
asize=477788372992
is_log=0

LABEL 2

failed to read label 2

LABEL 3



is there anyway to recover it ?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS and Dinamic Stripe

2009-06-29 Thread Jose Luis Barquín Guerola
Hello.
I have a question about how ZFS works with Dinamic Stripe.

Well, start with the next situation:
  - 4 Disk of 100MB in stripe format under ZFS.
  - We use the stripe in a 75%, so we have free 100MB. (easy)

Well, we add a new disk of 100MB in the pool. So we have 200MB free but only 
100MB will have the speed of 4 disk and, the rest 100MB will have the speed of 
1 disk.

The questions are:
   - Have ZFS any kind of reorganization of the data in the stripe that change 
this situation and become in 200MB free with the speed of 5 disks?
   - If the answer is yes, how is it does? in the background?

Thanks for your time (and sorry for my english).

JLBG
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS disk label missing

2009-06-29 Thread Vibhor Neb
I upgraded powepath on my system and exported the zfs pool after upgradation
i was able to import the pool but after reboot i 'm not able to import the
pool and it fails with error
 zpool import emcpool1
cannot import 'emcpool1': invalid vdev configuration


and digging a lil bit into it i found following things


Jun 29 2009 01:54:14.928954823 ereport.fs.zfs.vdev.open_failed
nvlist version: 0
class = ereport.fs.zfs.vdev.open_failed
ena = 0xe503fd84f321
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = zfs
pool = 0x4da9f0e3c642b898
vdev = 0x358cc53d21483bb7
(end detector)

pool = emcpool1
pool_guid = 0x4da9f0e3c642b898
pool_context = 1
pool_failmode = wait
vdev_guid = 0x358cc53d21483bb7
vdev_type = disk
vdev_path = /dev/dsk/emcpower0c
parent_guid = 0x4da9f0e3c642b898
parent_type = root
prev_state = 0x1
__ttl = 0x1
__tod = 0x4a486516 0x375eb9c7



and zdb for this vdev i get


r...@essapl020-u006 # zdb -l /dev/dsk/emcpower0c

LABEL 0

version=4
name='emcpool1'
state=0
txg=6973090
pool_guid=5596268873059055768
hostid=2228473662
hostname='essapl020-u006'
top_guid=3858675847091731383
guid=3858675847091731383
vdev_tree
type='disk'
id=0
guid=3858675847091731383
path='/dev/dsk/emcpower0c'
phys_path='/pseudo/e...@0:c,blk'
whole_disk=0
metaslab_array=14
metaslab_shift=32
ashift=9
asize=477788372992
is_log=0

LABEL 1

version=4
name='emcpool1'
state=0
txg=6973090
pool_guid=5596268873059055768
hostid=2228473662
hostname='essapl020-u006'
top_guid=3858675847091731383
guid=3858675847091731383
vdev_tree
type='disk'
id=0
guid=3858675847091731383
path='/dev/dsk/emcpower0c'
phys_path='/pseudo/e...@0:c,blk'
whole_disk=0
metaslab_array=14
metaslab_shift=32
ashift=9
asize=477788372992
is_log=0

LABEL 2

failed to read label 2

LABEL 3

failed to read label 3
r...@essapl020-u006 #


si there any way i can recover it without loosing my data on it ?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and Dinamic Stripe

2009-06-29 Thread Erik Trimble

Jose Luis Barquín Guerola wrote:

Hello.
I have a question about how ZFS works with Dinamic Stripe.

Well, start with the next situation:
  - 4 Disk of 100MB in stripe format under ZFS.
  - We use the stripe in a 75%, so we have free 100MB. (easy)

Well, we add a new disk of 100MB in the pool. So we have 200MB free but only 
100MB will have the speed of 4 disk and, the rest 100MB will have the speed of 
1 disk.

The questions are:
   - Have ZFS any kind of reorganization of the data in the stripe that change 
this situation and become in 200MB free with the speed of 5 disks?
   - If the answer is yes, how is it does? in the background?

Thanks for your time (and sorry for my english).

JLBG
  


When you add more vdevs to the zpool, NEW data is written to the new 
stripe width.   That is, when data was written to the original pool, it 
was written across 4 drives. It now will be written across 5 drives.  
Existing data WILL NOT be changed.


So, for a zpool 75% full, you will NOT get to immediately use the first 
75% of the new vdevs added.


Thus, in your case, you started with a 400MB zpool (with 300MB of data). 
You added another 100MB vdev, resulting in a 500MB zpool.   300MB is 
written across 4 drives, and will have the appropriate speed.  75% of 
the new vdev isn't immediately usable (as it corresponds to the 75% 
in-use on the other 4 vdevs), so you effectively only have added 25MB of 
immediately usable space.  Thus, you have:


300MB across 4 vdevs
125MB across 5 vdevs
75MB wasted space on 1 vdev

To correct this - that is, to recover the 75MB of wasted space and to 
move the 300MB from spanning 4 vdevs to spanning 5 vdevs -  you need to 
re-write the entire existing data space. Right now, there is no 
background or other automatic method to do this.  'cp -rp' or 'rsync' is 
a good idea.  


We really should have something like 'zpool scrub' do this automatically.

--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS - SWAP and lucreate..

2009-06-29 Thread Bob Benites
 # *swap -d /dev/zvol/dsk/rpool/swap*
 # *zfs volsize=8G rpool/swap*
 # *swap -a /dev/zvol/dsk/rpool/swap*

I'm still a bit fuzzy about how swap/dump
and ZFS. If I have a pool:

  pool: pool1
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
pool1   ONLINE   0 0 0
  c3t0d0s1  ONLINE   0 0 0

which occupies nearly the entire disk, should
I allocate both swap and dump in the same
pool?

Right now I have a swap on a second disk:

# swap -l 
swapfile dev  swaplo blocks   free
/dev/dsk/c3t1d0s0   32,8  16 8395184 8395184

Would I do something like:

# zfs volsize=4G pool1/swap
# swap -a  /dev/zvol/dsk/pool1/swap
# swap -d /dev/dsk/c3t1d0s0

What about a dump, any recommendations?

Thanks...
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS disk label missing

2009-06-29 Thread Richard Elling

This can occur if the location of the end of the partition has
changed locations.  This could be due to the partition actually
shrinking or if more than one partition references the same
starting block, but different ending locations.  Check your
partition configuration in format and debug with zdb -l
When you can see all 4 labels, then you should be able to
import.
-- richard

Vibhor Neb wrote:
I upgraded powepath on my system and exported the zfs pool after 
upgradation i was able to import the pool but after reboot i 'm not 
able to import the pool and it fails with error 


 zpool import emcpool1
cannot import 'emcpool1': invalid vdev configuration


and digging a lil bit into it i found following things 



Jun 29 2009 01:54:14.928954823 ereport.fs.zfs.vdev.open_failed
nvlist version: 0
class = ereport.fs.zfs.vdev.open_failed
ena = 0xe503fd84f321
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = zfs
pool = 0x4da9f0e3c642b898
vdev = 0x358cc53d21483bb7
(end detector)

pool = emcpool1
pool_guid = 0x4da9f0e3c642b898
pool_context = 1
pool_failmode = wait
vdev_guid = 0x358cc53d21483bb7
vdev_type = disk
vdev_path = /dev/dsk/emcpower0c
parent_guid = 0x4da9f0e3c642b898
parent_type = root
prev_state = 0x1
__ttl = 0x1
__tod = 0x4a486516 0x375eb9c7



and zdb for this vdev i get 



r...@essapl020-u006 # zdb -l /dev/dsk/emcpower0c

LABEL 0

version=4
name='emcpool1'
state=0
txg=6973090
pool_guid=5596268873059055768
hostid=2228473662
hostname='essapl020-u006'
top_guid=3858675847091731383
guid=3858675847091731383
vdev_tree
type='disk'
id=0
guid=3858675847091731383
path='/dev/dsk/emcpower0c'
phys_path='/pseudo/e...@0:c,blk'
whole_disk=0
metaslab_array=14
metaslab_shift=32
ashift=9
asize=477788372992
is_log=0

LABEL 1

version=4
name='emcpool1'
state=0
txg=6973090
pool_guid=5596268873059055768
hostid=2228473662
hostname='essapl020-u006'
top_guid=3858675847091731383
guid=3858675847091731383
vdev_tree
type='disk'
id=0
guid=3858675847091731383
path='/dev/dsk/emcpower0c'
phys_path='/pseudo/e...@0:c,blk'
whole_disk=0
metaslab_array=14
metaslab_shift=32
ashift=9
asize=477788372992
is_log=0

LABEL 2

failed to read label 2

LABEL 3

failed to read label 3
r...@essapl020-u006 #


si there any way i can recover it without loosing my data on it ? 




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and Dinamic Stripe

2009-06-29 Thread Richard Elling



Erik Trimble wrote:

Jose Luis Barquín Guerola wrote:

Hello.
I have a question about how ZFS works with Dinamic Stripe.

Well, start with the next situation:
  - 4 Disk of 100MB in stripe format under ZFS.
  - We use the stripe in a 75%, so we have free 100MB. (easy)

Well, we add a new disk of 100MB in the pool. So we have 200MB free 
but only 100MB will have the speed of 4 disk and, the rest 100MB will 
have the speed of 1 disk.


The questions are:
   - Have ZFS any kind of reorganization of the data in the stripe 
that change this situation and become in 200MB free with the speed of 
5 disks?

   - If the answer is yes, how is it does? in the background?


Yes, new writes are biased towards the more-empty vdev.



Thanks for your time (and sorry for my english).

JLBG
  


When you add more vdevs to the zpool, NEW data is written to the new 
stripe width.   That is, when data was written to the original pool, 
it was written across 4 drives. It now will be written across 5 
drives.  Existing data WILL NOT be changed.


So, for a zpool 75% full, you will NOT get to immediately use the 
first 75% of the new vdevs added.


Thus, in your case, you started with a 400MB zpool (with 300MB of 
data). You added another 100MB vdev, resulting in a 500MB zpool.   
300MB is written across 4 drives, and will have the appropriate 
speed.  75% of the new vdev isn't immediately usable (as it 
corresponds to the 75% in-use on the other 4 vdevs), so you 
effectively only have added 25MB of immediately usable space.  Thus, 
you have:


300MB across 4 vdevs
125MB across 5 vdevs
75MB wasted space on 1 vdev

To correct this - that is, to recover the 75MB of wasted space and 
to move the 300MB from spanning 4 vdevs to spanning 5 vdevs -  you 
need to re-write the entire existing data space. Right now, there is 
no background or other automatic method to do this.  'cp -rp' or 
'rsync' is a good idea. 
We really should have something like 'zpool scrub' do this automatically.




No.  Dynamic striping is not RAID-0, which is what you are describing.
In a dynamic stripe, the data written is not divided up amongst the current
devices in the stripe.  Rather, data is chunked and written to the vdevs.
When about 500 kBytes has been written to a vdev, the next chunk is
written to another vdev.  The choice of which vdev to go to next is based,
in part, on the amount of free space available on the vdev.  So you get
your cake (stochastic spreading of data across vdevs) and you get to
eat it (use all available space), too.
-- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best controller card for 2 to 4 SATA drives ?

2009-06-29 Thread Patrick O'Sullivan
I've had success with the SIIG SC-SAE012-S2. PCIe and no problems  
booting off of it in 2008.11.


On Jun 27, 2009, at 3:02 PM, Simon Breden no-re...@opensolaris.org  
wrote:



Hi,

Does anyone know of a reliable 2 or 4 port SATA card with a solid  
driver, that plugs into a PCIe slot, so that I can benefit from the  
high read speeds available from adding a couple of SSDs to form my  
ZFS root/boot pool?

(Each SSD is capable of reading at around 150-200 MBytes/sec)

After initially thinking I would move my existing 6-drive RAID-Z2  
array to a new 8-port SATA controller, I finally decided to leave  
the drives connected to the motherboard SATA ports, and instead to  
get an additional smaller SATA card to allow me to connect 2 boot  
drives to form a mirror.


For anyone considering a controller card to support 8 SATA drives,  
see this thread which has got some great comments from people  
experienced with using these larger cards. No doubt I will refer to  
it again when I build another storage system one day :)

See: http://www.opensolaris.org/jive/thread.jspa?threadID=106210

Thanks,
Simon

http://breden.org.uk/2008/03/02/a-home-fileserver-using-zfs/
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool import: Cannot mount,

2009-06-29 Thread Carsten Aulbert
Hi

a small addendum. It seems that all sub ZFS below /atlashome/BACKUP are
already mounted when /atlashome/BACKUP is tried to be mounted:

# zfs get all atlashome/BACKUP|head -15
NAME  PROPERTY   VALUE  SOURCE
atlashome/BACKUP  type   filesystem -
atlashome/BACKUP  creation   Thu Oct  9 16:30 2008  -
atlashome/BACKUP  used   9.95T  -
atlashome/BACKUP  available  1.78T  -
atlashome/BACKUP  referenced 172K   -
atlashome/BACKUP  compressratio  1.47x  -
atlashome/BACKUP  mountedno -
atlashome/BACKUP  quota  none   default
atlashome/BACKUP  reservationnone   default
atlashome/BACKUP  recordsize 32Kinherited from
atlashome
atlashome/BACKUP  mountpoint /atlashome/BACKUP  default
atlashome/BACKUP  sharenfs   on inherited from
atlashome
atlashome/BACKUP  checksum   on default
atlashome/BACKUP  compressionon local

while
# ls -l /atlashome/BACKUP | wc -l
  33


Is there any way to force zpool import to re-order that? I could delete
all stuff under BACKUP, however given the size I don't really want to.

Cheers

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and Dinamic Stripe

2009-06-29 Thread Jose Luis Barquín Guerola
Thank you Relling and et151817 for your answers.

So just to end the post:

Relling supouse the next situation:
   One zpool in Dinamic Stripe with two disk, one of 100MB and the second 
with 200MB

if the spread is stochastic spreading of data across vdevs you will have the 
double of possibilities of   save one chunk in the second disk than in the 
first, right?

Thanks for your time (and sorry for my english).

JLBG
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool import: Cannot mount,

2009-06-29 Thread Mark J Musante

On Mon, 29 Jun 2009, Carsten Aulbert wrote:

Is there any way to force zpool import to re-order that? I could delete 
all stuff under BACKUP, however given the size I don't really want to.


Do a zpool export first, and then check to see what's in /atlashome.  My 
bet is that the BACKUP directory is still there.  If so, do an rmdir on 
/atlashome/BACKUP and then try the import again.



Regards,
markm
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best controller card for 2 to 4 SATA drives ?

2009-06-29 Thread Eric D. Mudama

On Mon, Jun 29 at 11:43, Patrick O'Sullivan wrote:
I've had success with the SIIG SC-SAE012-S2. PCIe and no problems  
booting off of it in 2008.11.


I think there's a 4-port version of the 1068e-based chips from LSI,
and I believe this is it:

http://www.lsi.com/storage_home/products_home/host_bus_adapters/sas_hbas/lsisas3041er/index.html

but I don't have one on-hand to confirm.  The product brief states
the LSISAS3041E-R leverages the LSISAS1064E controller ASIC's
advanced Fusion-MPT architecture. so it appears to be the right
board.

Froogle lists it for about $150 from various websites.

--eric

--
Eric D. Mudama
edmud...@mail.bounceswoosh.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slow ls or slow zfs

2009-06-29 Thread NightBird
 On Fri, 26 Jun 2009, Richard Elling wrote:
 
  All the tools I have used show no IO problems. I
 think the problem is 
  memory but I am unsure on how to troubleshoot it.
 
  Look for latency, not bandwidth.  iostat will show
 latency at the
  device level.
 
 Unfortunately, the effect may not be all that obvious
 since the disks 
 will only be driven as hard as the slowest disk and
 so the slowest 
 disk may not seem much slower.
 
 Bob
 --
 Bob Friesenhahn
 bfrie...@simple.dallas.tx.us,
 http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,
http://www.GraphicsMagick.org/
 
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discu
 ss

I checked the output of iostat. svc_t is between 5 and 50, depending on when 
data is flushed to the disk (CIFS write pattern). %b is between 10 and 50.
%w is always 0.
Example:
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
sd27 31.5  127.0  935.9  616.7  0.0 11.9   75.2   0  66
sd28  5.00.0  320.00.0  0.0  0.1   18.0   0   9

This tells me disks are busy but I do not know what they are doing? are they 
spending time seeking, writting or reading?

I also review some ARC stats. Here is the output.
ARC Efficency:
 Cache Access Total: 199758875
 Cache Hit Ratio:  74%   148652045  [Defined State for 
buffer]
 Cache Miss Ratio: 25%   51106830   [Undefined State for 
Buffer]
 REAL Hit Ratio:   73%   146091795  [MRU/MFU Hits Only]

 Data Demand   Efficiency:94%
 Data Prefetch Efficiency:15%

CACHE HITS BY CACHE LIST:
  Anon:   --%Counter Rolled.
  Most Recently Used: 22%33843327 (mru) [ 
Return Customer ]
  Most Frequently Used:   75%112248468 (mfu)[ 
Frequent Customer ]
  Most Recently Used Ghost:3%4833189 (mru_ghost)[ 
Return Customer Evicted, Now Back ]
  Most Frequently Used Ghost: 22%33831706 (mfu_ghost)   [ 
Frequent Customer Evicted, Now Back ]


It seems to me that mfu_ghost being at 22%, I may need a bigger ARC.
Is ARC also designed to work with large memory foot prints (128GB for example 
or higher)? Will it be as efficient?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slow ls or slow zfs

2009-06-29 Thread Richard Elling

NightBird wrote:

On Fri, 26 Jun 2009, Richard Elling wrote:



All the tools I have used show no IO problems. I

think the problem is 


memory but I am unsure on how to troubleshoot it.


Look for latency, not bandwidth.  iostat will show
  

latency at the


device level.
  

Unfortunately, the effect may not be all that obvious
since the disks 
will only be driven as hard as the slowest disk and
so the slowest 
disk may not seem much slower.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us,
http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,
   http://www.GraphicsMagick.org/

zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discu
ss



I checked the output of iostat. svc_t is between 5 and 50, depending on when 
data is flushed to the disk (CIFS write pattern). %b is between 10 and 50.
%w is always 0.
Example:
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
sd27 31.5  127.0  935.9  616.7  0.0 11.9   75.2   0  66
  


This is a slow disk.  Put your efforts here.


sd28  5.00.0  320.00.0  0.0  0.1   18.0   0   9

This tells me disks are busy but I do not know what they are doing? are they 
spending time seeking, writting or reading?

I also review some ARC stats. Here is the output.
ARC Efficency:
 Cache Access Total: 199758875
 Cache Hit Ratio:  74%   148652045  [Defined State for 
buffer]
 Cache Miss Ratio: 25%   51106830   [Undefined State for 
Buffer]
 REAL Hit Ratio:   73%   146091795  [MRU/MFU Hits Only]

 Data Demand   Efficiency:94%
 Data Prefetch Efficiency:15%

CACHE HITS BY CACHE LIST:
  Anon:   --%Counter Rolled.
  


That is interesting... but only from a developer standpoint.


  Most Recently Used: 22%33843327 (mru) [ 
Return Customer ]
  Most Frequently Used:   75%112248468 (mfu)[ 
Frequent Customer ]
  Most Recently Used Ghost:3%4833189 (mru_ghost)[ 
Return Customer Evicted, Now Back ]
  Most Frequently Used Ghost: 22%33831706 (mfu_ghost)   [ 
Frequent Customer Evicted, Now Back ]


It seems to me that mfu_ghost being at 22%, I may need a bigger ARC.
Is ARC also designed to work with large memory foot prints (128GB for example 
or higher)? Will it be as efficient?
  

Caching isn't your problem, though adding memory may hide the
real problem for a while.  You need faster disk.
-- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slow ls or slow zfs

2009-06-29 Thread Bob Friesenhahn

On Mon, 29 Jun 2009, NightBird wrote:


I checked the output of iostat. svc_t is between 5 and 50, depending on when 
data is flushed to the disk (CIFS write pattern). %b is between 10 and 50.
%w is always 0.
Example:
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b
sd27 31.5  127.0  935.9  616.7  0.0 11.9   75.2   0  66
sd28  5.00.0  320.00.0  0.0  0.1   18.0   0   9

This tells me disks are busy but I do not know what they are doing? 
are they spending time seeking, writting or reading?


It looks like your sd27 is being pounded with write iops.  It is close 
to its limit.


Can you post complete iostat output?  Since you have so many disks, 
(which may not always be involved in the same stripe) you may need to 
have iostat average over a long period of time such as 30 or 60 
seconds in order to see a less responsive disk.  Disks could be less 
responsive for many reasons, including vibrations in their operating 
environment.


Also see Jeff Bonwick's diskqual.sh as described at 
http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg15384.html 
which is good at helping to find pokey disks.  A slightly modified 
version is included below.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

#!/bin/ksh

# Date: Mon, 14 Apr 2008 15:49:41 -0700
# From: Jeff Bonwick jeff.bonw...@sun.com
# To: Henrik Hjort hj...@dhs.nu
# Cc: zfs-discuss@opensolaris.org
# Subject: Re: [zfs-discuss] Performance of one single 'cp'
# 
# No, that is definitely not expected.
# 
# One thing that can hose you is having a single disk that performs

# really badly.  I've seen disks as slow as 5 MB/sec due to vibration,
# bad sectors, etc.  To see if you have such a disk, try my diskqual.sh
# script (below).  On my desktop system, which has 8 drives, I get:
# 
# # ./diskqual.sh

# c1t0d0 65 MB/sec
# c1t1d0 63 MB/sec
# c2t0d0 59 MB/sec
# c2t1d0 63 MB/sec
# c3t0d0 60 MB/sec
# c3t1d0 57 MB/sec
# c4t0d0 61 MB/sec
# c4t1d0 61 MB/sec
# 
# The diskqual test is non-destructive (it only does reads), but to

# get valid numbers you should run it on an otherwise idle system.

disks=`format /dev/null | grep ' c.t' | nawk '{print $2}'`

getspeed1()
{
ptime dd if=/dev/rdsk/${1}s0 of=/dev/null bs=64k count=1024 21 |
nawk '$1 == real { printf(%.0f\n, 67.108864 / $2) }'
}

getspeed()
{
# Best out of 6
for iter in 1 2 3 4 5 6
do
getspeed1 $1
done | sort -n | tail -2 | head -1
}

for disk in $disks
do
echo $disk `getspeed $disk` MB/sec
done

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and Dinamic Stripe

2009-06-29 Thread Richard Elling

Jose Luis Barquín Guerola wrote:

Thank you Relling and et151817 for your answers.

So just to end the post:

Relling supouse the next situation:
   One zpool in Dinamic Stripe with two disk, one of 100MB and the second 
with 200MB

if the spread is stochastic spreading of data across vdevs you will have the 
double of possibilities of   save one chunk in the second disk than in the first, right?
  


The simple answer is yes.

The more complex answer is that copies will try to be spread across
different vdevs.  Metadata, by default, uses copies=2, so you could
expect the metadata to be more evenly spread across the disks.
-- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slow ls or slow zfs

2009-06-29 Thread John
 On Mon, 29 Jun 2009, NightBird wrote:
 
  I checked the output of iostat. svc_t is between 5
 and 50, depending on when data is flushed to the disk
 (CIFS write pattern). %b is between 10 and 50.
  %w is always 0.
  Example:
  devicer/sw/s   kr/s   kw/s wait actv  svc_t
  %w  %b
 sd27 31.5  127.0  935.9  616.7  0.0 11.9   75.2
0  66
 d28  5.00.0  320.00.0  0.0  0.1   18.0
   0   9
  This tells me disks are busy but I do not know what
 they are doing? 
  are they spending time seeking, writting or
 reading?
 
 It looks like your sd27 is being pounded with write
 iops.  It is close 
 to its limit.
 
 Can you post complete iostat output?  Since you have
 so many disks, 
 (which may not always be involved in the same stripe)
 you may need to 
 have iostat average over a long period of time such
 as 30 or 60 
 seconds in order to see a less responsive disk.
  Disks could be less 
 esponsive for many reasons, including vibrations in
 their operating 
 environment.
 
 Also see Jeff Bonwick's diskqual.sh as described at
 
 http://www.mail-archive.com/zfs-disc...@opensolaris.or
 g/msg15384.html 
 which is good at helping to find pokey disks.  A
 slightly modified 
 version is included below.
 
 Bob
 --
 Bob Friesenhahn
 bfrie...@simple.dallas.tx.us,
 http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,
http://www.GraphicsMagick.org/

I will run the script when the server is idle as recommended and report back.
Here is the full iostat output (30sec). c9t40d0 seems to have a consistently 
higher svc_t time.

   r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.02.62.1   11.0  0.0  0.00.06.5   0   1 c8t0d0
0.12.60.4   11.0  0.0  0.00.05.1   0   0 c8t1d0
6.06.2  380.1  147.8  0.0  0.30.0   23.6   0  13 c9t8d0
6.26.5  390.8  147.9  0.0  0.30.0   21.9   0  13 c9t9d0
6.36.2  386.1  147.6  0.0  0.30.0   26.7   0  12 c9t10d0
6.76.2  413.5  147.8  0.0  0.30.0   21.9   0  14 c9t11d0
6.15.7  371.1  147.6  0.0  0.30.0   21.2   0  11 c9t12d0
6.75.9  407.3  147.6  0.0  0.30.0   21.4   0  13 c9t13d0
5.76.3  347.6  147.7  0.0  0.30.0   22.4   0  12 c9t14d0
7.05.9  426.5  147.5  0.0  0.30.0   20.6   0  13 c9t15d0
6.66.1  405.0  147.6  0.0  0.30.0   21.1   0  12 c9t16d0
6.66.2  405.2  147.7  0.0  0.30.0   21.1   0  12 c9t17d0
7.16.3  432.9  147.8  0.0  0.30.0   20.9   0  14 c9t18d0
6.76.5  411.6  147.9  0.0  0.30.0   23.6   0  13 c9t19d0
6.46.4  390.3  148.1  0.0  0.30.0   21.7   0  13 c9t20d0
6.96.9  424.4  147.9  0.0  0.30.0   19.8   0  13 c9t21d0
6.26.9  375.3  148.1  0.0  0.30.0   20.2   0  12 c9t22d0
5.76.8  349.5  147.9  0.0  0.30.0   20.9   0  12 c9t23d0
6.26.6  377.5  147.9  0.0  0.30.0   20.6   0  11 c9t24d0
5.46.7  328.2  147.9  0.0  0.20.0   20.7   0  11 c9t25d0
6.76.7  407.3  148.0  0.0  0.30.0   19.8   0  12 c9t26d0
6.56.9  396.7  148.1  0.0  0.30.0   20.4   0  13 c9t27d0
6.46.6  390.4  147.9  0.0  0.30.0   21.3   0  13 c9t28d0
6.86.3  416.0  147.6  0.0  0.40.0   26.8   0  13 c9t29d0
6.86.3  413.9  147.8  0.0  0.30.0   23.5   0  13 c9t30d0
7.5   33.5  446.9  312.0  0.0  1.80.0   45.0   0  18 c9t31d0
8.2   33.6  491.7  312.0  0.0  2.10.0   51.1   0  21 c9t32d0
7.0   34.3  414.9  312.3  0.0  1.90.0   47.0   0  20 c9t33d0
7.6   34.1  463.4  312.2  0.0  2.10.0   51.2   0  21 c9t34d0
7.9   33.5  474.4  312.0  0.0  2.20.0   52.9   0  21 c9t35d0
8.2   33.2  496.0  311.7  0.0  2.40.0   59.1   0  23 c9t36d0
8.0   33.2  481.0  311.9  0.0  2.00.0   48.8   0  21 c9t37d0
7.8   33.4  469.9  311.9  0.0  2.30.0   56.4   0  20 c9t38d0
8.5   34.1  518.7  312.4  0.0  2.30.0   54.3   0  22 c9t39d0
8.4   32.9  510.5  311.8  0.0  2.90.0   70.6   0  27 c9t40d0
8.2   34.3  501.5  312.4  0.0  2.30.0   55.1   0  24 c9t41d0
8.1   34.3  491.1  312.5  0.0  2.30.0   55.4   0  21 c9t42d0
8.5   34.3  510.9  312.7  0.0  2.30.0   53.3   0  23 c9t43d0
7.5   34.3  453.1  312.6  0.0  2.30.0   54.4   0  20 c9t44d0
7.0   33.7  420.9  312.1  0.0  2.30.0   55.7   0  19 c9t45d0
7.0   34.2  420.9  312.4  0.0  2.30.0   55.2   0  20 c9t46d0
7.9   35.1  474.6  312.5  0.0  2.10.0   49.1   0  22 c9t47d0
8.1   35.0  487.4  312.8  0.0  2.30.0   52.8   0  22 c9t48d0
8.1   34.2  491.3  312.2  0.0  2.10.0   50.2   0  20 c9t49d0
7.2   34.6  429.4  312.5  0.0  2.10.0   51.3   0  20 c9t50d0
7.6   35.2  459.3  312.6  0.0  2.30.0   54.1   0  21 c9t51d0
7.7   35.0  463.5  312.6  0.0  2.10.0   49.2   0  21 c9t52d0
7.4   35.1  442.6  312.8  0.0  2.10.0   48.8   0  21 c9t53d0
-- 
This message posted from 

Re: [zfs-discuss] zpool import: Cannot mount,

2009-06-29 Thread Carsten Aulbert
Hi

Mark J Musante wrote:
 
 Do a zpool export first, and then check to see what's in /atlashome.  My
 bet is that the BACKUP directory is still there.  If so, do an rmdir on
 /atlashome/BACKUP and then try the import again.

Sorry, I meant to copy this earlier:

s11 console login: root
Password:
Last login: Mon Jun 29 10:37:47 on console
Sun Microsystems Inc.   SunOS 5.10  Generic January 2005
s11:~# zpool export atlashome
s11:~# ls -l /atlashome
/atlashome: No such file or directory
s11:~# zpool import atlashome
cannot mount '/atlashome/BACKUP': directory is not empty
s11:~# ls -l /atlashome/BACKUP/|wc -l
  33
s11:~#

Thus you see that probably zpool import does the wrong thing(TM) (or
wrong order)

Any idea?

Cheers

Carsten

PS: I opened a case for that, but waited for the call back. When solving
this problem, I can post the case ID for further reference.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Cloning a ZFS compact flash card

2009-06-29 Thread Lori Alt

On 06/28/09 08:41, Ross wrote:

Can't you just boot from an OpenSolaris CD, create a ZFS pool on the new 
device, and just do a ZFS send/receive directly to it?  So long as there's 
enough space for the data, a send/receive won't care at all that the systems 
are different sizes.

I don't know what you need to do to a pool to make it bootable I'm afraid, so I 
don't know if this will just work or if it'll need some more tweaking.  However 
if you have any problems you should be able to find more information about how 
ZFS boot works online.
  

A procedure for making a backup of a root pool and then restoring it is at:

http://www.solarisinternals.com/wiki/index.php/ZFS_Troubleshooting_Guide#ZFS_Root_Pool_Recovery

Lori
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool import: Cannot mount,

2009-06-29 Thread Mark J Musante

On Mon, 29 Jun 2009, Carsten Aulbert wrote:


s11 console login: root
Password:
Last login: Mon Jun 29 10:37:47 on console
Sun Microsystems Inc.   SunOS 5.10  Generic January 2005
s11:~# zpool export atlashome
s11:~# ls -l /atlashome
/atlashome: No such file or directory
s11:~# zpool import atlashome
cannot mount '/atlashome/BACKUP': directory is not empty
s11:~# ls -l /atlashome/BACKUP/|wc -l
 33
s11:~#


OK, looks like you're running into CR 6827199.

There's a workaround for that as well.  After the zpool import, manually 
zfs umount all the datasets under /atlashome/BACKUP.  Once you've done 
that, the BACKUP directory will still be there.  Manually mount the 
dataset that corresponds to /atlashome/BACKUP, and then try 'zfs mount 
-a'.



Regards,
markm
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool import: Cannot mount,

2009-06-29 Thread Carsten Aulbert
Hi Mark,

Mark J Musante wrote:
 
 OK, looks like you're running into CR 6827199.
 
 There's a workaround for that as well.  After the zpool import, manually
 zfs umount all the datasets under /atlashome/BACKUP.  Once you've done
 that, the BACKUP directory will still be there.  Manually mount the
 dataset that corresponds to /atlashome/BACKUP, and then try 'zfs mount -a'.

I did that (needed to rmdir the directories under BACKUP) and then
finally it worked - and the best even after a reboot it was able to
mount all file systems again.

Great and a lot of thanks!

One question:

Where can I find more about CR 6827199? I logged into sun.com with my
service contract enabled log-in but I cannot find it there (or the
search function does not like me too much).

Cheers

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool import: Cannot mount,

2009-06-29 Thread Victor Latushkin

On 29.06.09 23:01, Carsten Aulbert wrote:

One question:

Where can I find more about CR 6827199? I logged into sun.com with my
service contract enabled log-in but I cannot find it there (or the
search function does not like me too much).


You can try bugs.opensolaris.org too:

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6827199

victor
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS - SWAP and lucreate..

2009-06-29 Thread Cindy . Swearingen

Hi Patrick,

To answer your original question, yes, you can create your root swap
and dump volumes before you run the lucreate operation. LU won't change
them if they are already created.

Keep in mind that you'll need approximately 10 GBs of disk space for the
ZFS root BE and the swap/dump volumes.

See the steps below.

Cindy

Patrick Bittner wrote:

So there is no possibility to do this with or before the lucreate command?

hm. well- 
thank you anyway then



# zpool create rpool c0t0d0s0
# zfs create -V 1g rpool/dump
# zfs create -V 2g -b 8k rpool/swap
# zfs list
NAME USED  AVAIL  REFER  MOUNTPOINT
rpool   3.00G  30.2G18K  /rpool
rpool/dump 1G  31.2G16K  -
rpool/swap 2G  32.2G16K  -
# lucreate -c ufsBE -n zfsBE -p rpool
.
.
.
.
Population of boot environment zfsBE successful.
Creation of boot environment zfsBE successful.
# zfs list
NAME   USED  AVAIL  REFER  MOUNTPOINT
rpool 7.62G  25.6G  92.5K  /rpool
rpool/ROOT4.62G  25.6G18K  /rpool/ROOT
rpool/ROOT/zfsBE  4.62G  25.6G  4.62G  /
rpool/dump   1G  26.6G16K  -
rpool/swap   2G  27.6G16K  -

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] L2ARC availability w/ Solaris 10

2009-06-29 Thread Ravi Kota
Hi,
Is there a time frame when L2ARC would be available in Solaris 10.  With the 
latest U7 release, L2ARC appears to be disabled (operation not supported on 
this type of pool).  

Thanks,
Ravi
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] L2ARC availability w/ Solaris 10

2009-06-29 Thread Ravi Kota

Hi,
Is there a time frame when L2ARC would be available in Solaris 10.  With 
the latest U7 release, L2ARC appears to be disabled (operation not 
supported on this type of pool).  


Thanks,
Ravi


--
*Ravi Kota
ISV Engineering
Sun Microsystems, Inc.
Phone: 408-228-1264, x69401
Mobile: 408-393-3624
Email: ravi.k...@sun.com mailto:ravi.k...@sun.com*


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and Dinamic Stripe

2009-06-29 Thread Rob Logan

 try to be spread across different vdevs.

% zpool iostat -v
   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
z686G   434G 40  5  2.46M   271K
  c1t0d0s7   250G   194G 14  1   877K  94.2K
  c1t1d0s7   244G   200G 15  2   948K  96.5K
  c0d0   193G  39.1G 10  1   689K  80.2K


note that c0d0 is basically full, but still serving 10
of every 15 reads, and 82% of the writes.

Rob
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SXCE, ZFS root, b101 - b103 fails with ERROR: No upgradeable file systems

2009-06-29 Thread Lori Alt

On 06/27/09 23:50, Ian Collins wrote:

Leela wrote:

So no one has any idea?
  

About what?

This was in regards to a question sent to the install-discuss alias on 
6/18 and later copied to zfs-discuss.  I have answered it on the install 
alias, if anyone is following the issue.


Lori
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-29 Thread Bob Friesenhahn

On Wed, 24 Jun 2009, Lejun Zhu wrote:


There is a bug in the database about reads blocked by writes which may be 
related:

http://bugs.opensolaris.org/view_bug.do?bug_id=6471212

The symptom is sometimes reducing queue depth makes read perform better.


I have been banging away at this issue without resolution.  Based on 
Roch Bourbonnais's blog description of the ZFS write throttle code, it 
seems that I am facing a perfect storm.  Both the storage write 
bandwidth (800+ MB/second) and the memory size of my system (20 GB) 
result in the algorithm batching up 2.5 GB of user data to write. 
Since I am using mirrors, this results in 5 GB of data being written 
at full speed to the array on a very precise schedule since my 
application is processing fixed-sized files with a fixed algorithm. 
The huge writes lead to at least 3 seconds of read starvation, 
resulting in a stalled application and a square-wave of system CPU 
utilization.  I could attempt to modify my application to read ahead 
by 3 seconds but that would require gigabytes of memory, lots of 
complexity, and would not be efficient.


Richard Elling thinks that my array is pokey, but based on write speed 
and memory size, ZFS is always going to be batching up data to fill 
the write channel for 5 seconds so it does not really matter how fast 
that write channel is.  If I had 32GB of RAM and 2X the write speed, 
the situation would be identical.


Hopefully someone at Sun is indeed working this read starvation issue 
and it will be resolved soon.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-29 Thread Brent Jones
On Mon, Jun 29, 2009 at 2:48 PM, Bob
Friesenhahnbfrie...@simple.dallas.tx.us wrote:
 On Wed, 24 Jun 2009, Lejun Zhu wrote:

 There is a bug in the database about reads blocked by writes which may be
 related:

 http://bugs.opensolaris.org/view_bug.do?bug_id=6471212

 The symptom is sometimes reducing queue depth makes read perform better.

 I have been banging away at this issue without resolution.  Based on Roch
 Bourbonnais's blog description of the ZFS write throttle code, it seems that
 I am facing a perfect storm.  Both the storage write bandwidth (800+
 MB/second) and the memory size of my system (20 GB) result in the algorithm
 batching up 2.5 GB of user data to write. Since I am using mirrors, this
 results in 5 GB of data being written at full speed to the array on a very
 precise schedule since my application is processing fixed-sized files with a
 fixed algorithm. The huge writes lead to at least 3 seconds of read
 starvation, resulting in a stalled application and a square-wave of system
 CPU utilization.  I could attempt to modify my application to read ahead by
 3 seconds but that would require gigabytes of memory, lots of complexity,
 and would not be efficient.

 Richard Elling thinks that my array is pokey, but based on write speed and
 memory size, ZFS is always going to be batching up data to fill the write
 channel for 5 seconds so it does not really matter how fast that write
 channel is.  If I had 32GB of RAM and 2X the write speed, the situation
 would be identical.

 Hopefully someone at Sun is indeed working this read starvation issue and it
 will be resolved soon.

 Bob
 --
 Bob Friesenhahn
 bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


I see similar square-wave performance. However, my load is primarily
write-based, when those commits happen, I see all network activity
pause while the buffer is commited to disk.
I write about 750Mbit/sec over the network to the X4540's during
backup windows using primarily iSCSI. When those writes occur to my
RaidZ volume, all activity pauses until the writes are fully flushed.

One thing to note, on 117, the effects are seemingly reduced and a bit
more even performance, but it is still there.

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-29 Thread Lejun Zhu
 On Wed, 24 Jun 2009, Lejun Zhu wrote:
 
  There is a bug in the database about reads blocked
 by writes which may be related:
 
 
 http://bugs.opensolaris.org/view_bug.do?bug_id=6471212
 
  The symptom is sometimes reducing queue depth makes
 read perform better.
 
 I have been banging away at this issue without
 resolution.  Based on 
 Roch Bourbonnais's blog description of the ZFS write
 throttle code, it 
 seems that I am facing a perfect storm.  Both the
 storage write 
 bandwidth (800+ MB/second) and the memory size of my
 system (20 GB) 
 result in the algorithm batching up 2.5 GB of user
 data to write. 

With ZFS write throttle, the number 2.5GB is tunable. From what I've read in 
the code, it is possible to e.g. set zfs:zfs_write_limit_override = 0x800 
(bytes) to make it write 128M instead.

 Since I am using mirrors, this results in 5 GB of
 data being written 
 at full speed to the array on a very precise schedule
 since my 
 application is processing fixed-sized files with a
 fixed algorithm. 
 The huge writes lead to at least 3 seconds of read
 starvation, 
 resulting in a stalled application and a square-wave
 of system CPU 
 utilization.  I could attempt to modify my
 application to read ahead 
 by 3 seconds but that would require gigabytes of
 memory, lots of 
 complexity, and would not be efficient.
 
 Richard Elling thinks that my array is pokey, but
 based on write speed 
 and memory size, ZFS is always going to be batching
 up data to fill 
 the write channel for 5 seconds so it does not really
 matter how fast 
 that write channel is.  If I had 32GB of RAM and 2X
 the write speed, 
 the situation would be identical.
 
 Hopefully someone at Sun is indeed working this read
 starvation issue 
 and it will be resolved soon.
 
 Bob
 --
 Bob Friesenhahn
 bfrie...@simple.dallas.tx.us,
 http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,
http://www.GraphicsMagick.org/
 
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discu
 ss
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss