Re: [zfs-discuss] live upgrade with lots of zfs filesystems

2009-08-27 Thread Paul B. Henson
On Thu, 27 Aug 2009, Paul B. Henson wrote:

> However, I went to create a new boot environment to install the patches
> into, and so far that's been running for about an hour and a half :(,
> which was not expected or planned for.
[...]
> I don't think I'm going to make my downtime window :(, and will probably
> need to reschedule the patching. I never considered I might have to start
> the patch process six hours before the window.

Well, so far lucreate took 3.5 hours, lumount took 1.5 hours, applying the
patches took all of 10 minutes, luumount took about 20 minutes, and
luactivate has been running for about 45 minutes. I'm assuming it will
probably take at least the 1.5 hours of the lumount (particularly
considering it appears to be running a lumount process under the hood) if
not the 3.5 hours of lucreate. Add in the 1-1.5 hours to reboot, and, well,
so much for patches this maintenance window.

The lupi_bebasic process seems to be the time killer here. Not sure what
it's doing, but it spent 75 minutes running strcmp. Pretty much nothing but
strcmp. 75 CPU minutes running strcmp I took a look for the source but
I guess that component's not a part of opensolaris, or at least I couldn't
find it.

Hopefully I can figure out how to make this perform a little more
acceptably before our next maintenance window.


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Boot error

2009-08-27 Thread Grant Lowe
Hi Cindy,

I tried booting from DVD but nothing showed up.  Thanks for the ideas, though.  
Maybe your other sources might have something?



- Original Message 
From: Cindy Swearingen 
To: Grant Lowe 
Cc: zfs-discuss@opensolaris.org
Sent: Thursday, August 27, 2009 6:24:00 PM
Subject: Re: [zfs-discuss] Boot error

Hi Grant,

I don't have all my usual resources at the moment, but I would 
boot from alternate media and use the format utility to check 
the partitioning on newly added disk, and look for something 
like overlapping partitions. Or, possibly, a mismatch between
the actual root slice and the one you are trying to boot from.

Cindy

- Original Message -
From: Grant Lowe 
Date: Thursday, August 27, 2009 5:06 pm
Subject: [zfs-discuss] Boot error
To: zfs-discuss@opensolaris.org

> I've got a 240z with Solaris 10 Update 7, all the latest patches from 
> Sunsolve.  I've installed a boot drive with ZFS.  I mirrored the drive 
> with zpool.  I installed the boot block.  The system had been working 
> just fine.  But for some reason, when I try to boot, I get the error: 
> 
> 
> {1} ok boot -s
> Boot device: /p...@1c,60/s...@2/d...@0,0  File and args: -s
> SunOS Release 5.10 Version Generic_141414-08 64-bit
> Copyright 1983-2009 Sun Microsystems, Inc.  All rights reserved.
> Use is subject to license terms.
> Division by Zero
> {1} ok
> 
> Any ideas?
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] live upgrade with lots of zfs filesystems

2009-08-27 Thread Paul B. Henson
On Thu, 27 Aug 2009, Trevor Pretty wrote:

> My S10 Virtual machine is not booted but you can put all the "excluded"
> file systems in a file and use -f from memory.

Unfortunately, I wasn't that stupid. I saw the -f option, but it's not
applicable to ZFS root:

 -f exclude_list_file

 Use  the  contents  of  exclude_list_file   to   exclude
 specific  files  (including  directories) from the newly
 created BE. exclude_list_file contains a list  of  files
 and directories, one per line. If a line item is a file,
 only that file is excluded; if a directory, that  direc-
 tory  and  all  files  beneath that directory, including
 subdirectories, are excluded.

 This option is not supported when the source BE is on  a
 ZFS file system.

After it finished unmounting everything from the alternative root, it seems
to have spawned *another* lupi_bebasic process which has eaten up 62
minutes of CPU time so far. Evidentally it's doing a lot of string
comparisons (per truss):

/1...@1:   <- libc:strcmp() = 0
/1...@1:   -> libc:strcmp(0x86fceec, 0xfefa1218)
/1...@1:   <- libc:strcmp() = 0
/1...@1:   -> libc:strcmp(0x86fd534, 0xfefa1218)
/1...@1:   <- libc:strcmp() = 0
/1...@1:   -> libc:strcmp(0x86fdccc, 0xfefa1218)
/1...@1:   <- libc:strcmp() = 0
/1...@1:   -> libc:strcmp(0x86fdcfc, 0xfefa1218)
/1...@1:   <- libc:strcmp() = 0
/1...@1:   -> libc:strcmp(0x86fec84, 0xfefa1218)
/1...@1:   <- libc:strcmp() = 0
/1...@1:   -> libc:strcmp(0x86fecb4, 0xfefa1218)
/1...@1:   <- libc:strcmp() = 0

The first one finished in a bit over an hour, hopefully this one's about
done too and there's not any more stuff to do.

Thanks...


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Status/priority of 6761786

2009-08-27 Thread Trevor Pretty




Dave

This helps:- http://defect.opensolaris.org/bz/page.cgi?id=fields.html

The most common thing you will see is "Duplicate". As different people
find the same problem at different times in different ways and when they searched  database to see if it was "known"
they could not find a bug  description  that seems to match their
problem.  I logged quite a few of these :-)    

The other common state is "Incomplete" typically because the submitter
has not provided enough info. for the evaluator to evaluate it.

Oh and what other company would allow you to see this data? :-
http://defect.opensolaris.org/bz/reports.cgi (Old Charts is interesting)

Trevor

Trevor Pretty wrote:

  
  Dave
  
Yep that's an RFE. (Request For Enchantment) that's how things are
reported to engineers to fix things inside Sun.  If it's an honest to
goodness CR = bug (However it normally need a real support paying
customer to have a problem to go from RFE to CR) the "responsible
engineer" evaluates it, and eventually gets it fixed, or not. When I
worked at Sun I logged a lot of RFEs, only a few where accepted as bugs
and fixed. 
  
Click on the "new Search" link and look at the type and state menus. It
gives you an idea of the states a RFE and CR goes through. It's
probably documented somewhere, but I can't find it. Part of the joy of
Sun putting out in public something most other vendors would not dream
of doing.
  
Oh and it doesn't help both RFEs and CR are labelled "bug" at
  http://bugs.opensolaris.org/
  
So. Looking at your RFE.
  
It tells you which version on Nevada it was reported against
(translating this into an Opensolaris version is easy - NOT!)
  
Look at "Related
Bugs  6612830
"
  
This will tell you the 
  
"Responsible
Engineer  Richard
Morris" 
  
and when it was fixed 
  
"Release Fixed  , solaris_10u6(s10u6_01) (Bug
ID:2160894)
"
  
Although as nothing in life is guaranteed it looks like another bug 
2160894 has been identified and that's not yet on bugs.opensolaris.org 
  
Hope that helps.
  
Trevor
  
  
Dave wrote:
  
Just to make sure we're looking at the same thing:

http://bugs.opensolaris.org/view_bug.do?bug_id=6761786

This is not an issue of auto snapshots. If I have a ZFS server that 
exports 300 zvols via iSCSI and I have daily snapshots retained for 14 
days, that is a total of 4200 snapshots. According to the link/bug 
report above it will take roughly 5.5 hours to import my pool (even when 
the pool is operating perfectly fine and is not degraded or faulted).

This is obviously unacceptable to anyone in an HA environment. Hopefully 
someone close to the issue can clarify.

--
Dave

Blake wrote:
  

  I think the value of auto-snapshotting zvols is debatable.  At least,
there are not many folks who need to do this.

What I'd rather see is a default property of 'auto-snapshot=off' for zvols.

Blake

On Thu, Aug 27, 2009 at 4:29 PM, Tim Cook wrote:

  
On Thu, Aug 27, 2009 at 3:24 PM, Remco Lengers  wrote:
  

  Dave,

Its logged as an RFE (Request for Enhancement) not as a CR (bug).

The status is 3-Accepted/  P1  RFE

RFE's are generally looked at in a much different way then a CR.

..Remco


Seriously?  It's considered "works as designed" for a system to take 5+
hours to boot?  Wow.

--Tim

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


  
  
  ___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  
  
  
  
  
  
  
  
  
  










www.eagle.co.nz 
This email is confidential and may be legally 
privileged. If received in error please destroy and immediately notify 
us.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] shrink the rpool zpool or increase rpool zpool via add disk.

2009-08-27 Thread Randall Badilla
Hi all:
First; it is possible modify the boot zpool rpool after OS installation...?
I install the OS on the whole 72GB harddisk.. it is mirrored so If I want to
decrease the rpool; for example resize to a 36GB slice it can be done?
As far I remember on UFS/SVM I was able to resize boot OS disk via detach
mirror (so tranforming to one-way mirror); ajust the partitions then attach
de mirror. After sync boot form the resized mirror; re-doing the resize on
the remaining mirror and attach mirror and reboot.
Dowtime reduced to a reboot times.

Second: In the first can't be done; I was guessing I could increase rpool
size via adding more hard disk. As you know that must be done with SMI
labeled hard disk; well I have tried change  the start cyl; changed the
label type almost everything and I still get the error
 zpool add rpool mirror c1t2d0 c1t5d0
cannot label 'c1t2d0': EFI labeled devices are not supported on root pools.

partition> print
Current partition table (original):
Total disk cylinders available: 24620 + 2 (reserved cylinders)

Part  TagFlag Cylinders SizeBlocks
  0   homewm   0 - 24619   33.92GB(24620/0/0) 71127180
  1 unassignedwm   00 (0/0/0)0
  2 backupwm   0 - 24619   33.92GB(24620/0/0) 71127180
  3 unassignedwm   00 (0/0/0)0
  4 unassignedwm   00 (0/0/0)0
  5 unassignedwm   00 (0/0/0)0
  6 unassignedwm   00 (0/0/0)0
  7 unassignedwm   00 (0/0/0)0
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Status/priority of 6761786

2009-08-27 Thread thomas
For whatever it's worth to have someone post on a list.. I would *really*
like to see this improved as well. The time it takes to iterate over
both thousands of filesystems and thousands of snapshots makes me very
cautious about taking advantage of some of the built-in zfs features in
an HA environment.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] live upgrade with lots of zfs filesystems

2009-08-27 Thread Trevor Pretty




Paul

You need to exclude all the file system that are not the "OS"

My S10 Virtual machine is not booted but you can put all the "excluded"
file systems in a file and use   -f   from memory.

You use to have to do this if there was a DVD in the drive otherwise
/cdrom got copied to the new boot environment. I know this because I
logged an RFE when Live Upgrade first appeared, and it was put into
state Deferred as the workaround is to just exclude it. I think it did
get fixed however in a later release.

trevor




Paul B. Henson wrote:

  Well, so I'm getting ready to install the first set of patches on my x4500
since we deployed into production, and have run into an unexpected snag.

I already knew that with about 5-6k file systems the reboot cycle was going
to be over an hour (not happy about, but knew about and planned for).

However, I went to create a new boot environment to install the patches
into, and so far that's been running for about an hour and a half :(,
which was not expected or planned for.

First, it looks like the ludefine script spent about 20 minutes iterating
through all of my zfs file systems, and then something named lupi_bebasic
ran for over an hour, and then it looks like it mounted all of my zfs
filesystems under /.alt.tmp.b-nAe.mnt, and now it looks like it is
unmounting all of them.

I hadn't noticed before, but when I went to check on my test system (with
only a handful of filesystems), but evidently when I get to the point of
using lumount to mount the boot environment for patching, it's going to
again mount all of my zfs file systems under the alternative root, and then
need to unmount them all again after I'm done patching, which is going to
add probably another hour or two.

I don't think I'm going to make my downtime window :(, and will probably
need to reschedule the patching. I never considered I might have to start
the patch process six hours before the window.

I poked around a bit, but have not come across any way to exclude zfs
filesystems not part of the boot os pool from the copy and mount process.
I'm really hoping I'm just being stupid and missing something blindingly
obvious. Given a boot pool named ospool, and a data pool named export, is
there anyway to make live upgrade completely ignore the data pool? There
is no need for my 6k user file systems to be mounted in the alternative
environment during patching. I only want the file systems in the ospool
copied, processed, and mounted.

 Thanks...



  









www.eagle.co.nz 
This email is confidential and may be legally 
privileged. If received in error please destroy and immediately notify 
us.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Boot error

2009-08-27 Thread Cindy Swearingen
Hi Grant,

I don't have all my usual resources at the moment, but I would 
boot from alternate media and use the format utility to check 
the partitioning on newly added disk, and look for something 
like overlapping partitions. Or, possibly, a mismatch between
the actual root slice and the one you are trying to boot from.

Cindy

- Original Message -
From: Grant Lowe 
Date: Thursday, August 27, 2009 5:06 pm
Subject: [zfs-discuss] Boot error
To: zfs-discuss@opensolaris.org

> I've got a 240z with Solaris 10 Update 7, all the latest patches from 
> Sunsolve.  I've installed a boot drive with ZFS.  I mirrored the drive 
> with zpool.  I installed the boot block.  The system had been working 
> just fine.  But for some reason, when I try to boot, I get the error: 
> 
> 
> {1} ok boot -s
> Boot device: /p...@1c,60/s...@2/d...@0,0  File and args: -s
> SunOS Release 5.10 Version Generic_141414-08 64-bit
> Copyright 1983-2009 Sun Microsystems, Inc.  All rights reserved.
> Use is subject to license terms.
> Division by Zero
> {1} ok
> 
> Any ideas?
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] live upgrade with lots of zfs filesystems

2009-08-27 Thread Paul B. Henson

Well, so I'm getting ready to install the first set of patches on my x4500
since we deployed into production, and have run into an unexpected snag.

I already knew that with about 5-6k file systems the reboot cycle was going
to be over an hour (not happy about, but knew about and planned for).

However, I went to create a new boot environment to install the patches
into, and so far that's been running for about an hour and a half :(,
which was not expected or planned for.

First, it looks like the ludefine script spent about 20 minutes iterating
through all of my zfs file systems, and then something named lupi_bebasic
ran for over an hour, and then it looks like it mounted all of my zfs
filesystems under /.alt.tmp.b-nAe.mnt, and now it looks like it is
unmounting all of them.

I hadn't noticed before, but when I went to check on my test system (with
only a handful of filesystems), but evidently when I get to the point of
using lumount to mount the boot environment for patching, it's going to
again mount all of my zfs file systems under the alternative root, and then
need to unmount them all again after I'm done patching, which is going to
add probably another hour or two.

I don't think I'm going to make my downtime window :(, and will probably
need to reschedule the patching. I never considered I might have to start
the patch process six hours before the window.

I poked around a bit, but have not come across any way to exclude zfs
filesystems not part of the boot os pool from the copy and mount process.
I'm really hoping I'm just being stupid and missing something blindingly
obvious. Given a boot pool named ospool, and a data pool named export, is
there anyway to make live upgrade completely ignore the data pool? There
is no need for my 6k user file systems to be mounted in the alternative
environment during patching. I only want the file systems in the ospool
copied, processed, and mounted.

 Thanks...



-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Status/priority of 6761786

2009-08-27 Thread Trevor Pretty




Dave

Yep that's an RFE. (Request For Enchantment) that's how things are
reported to engineers to fix things inside Sun.  If it's an honest to
goodness CR = bug (However it normally need a real support paying
customer to have a problem to go from RFE to CR) the "responsible
engineer" evaluates it, and eventually gets it fixed, or not. When I
worked at Sun I logged a lot of RFEs, only a few where accepted as bugs
and fixed. 

Click on the "new Search" link and look at the type and state menus. It
gives you an idea of the states a RFE and CR goes through. It's
probably documented somewhere, but I can't find it. Part of the joy of
Sun putting out in public something most other vendors would not dream
of doing.

Oh and it doesn't help both RFEs and CR are labelled "bug" at
http://bugs.opensolaris.org/

So. Looking at your RFE.

It tells you which version on Nevada it was reported against
(translating this into an Opensolaris version is easy - NOT!)

Look at "Related
Bugs  6612830
"

This will tell you the 

"Responsible
Engineer  Richard
Morris" 

and when it was fixed 

"Release Fixed  , solaris_10u6(s10u6_01) (Bug
ID:2160894)
"

Although as nothing in life is guaranteed it looks like another bug 
2160894 has been identified and that's not yet on bugs.opensolaris.org 

Hope that helps.

Trevor


Dave wrote:

  Just to make sure we're looking at the same thing:

http://bugs.opensolaris.org/view_bug.do?bug_id=6761786

This is not an issue of auto snapshots. If I have a ZFS server that 
exports 300 zvols via iSCSI and I have daily snapshots retained for 14 
days, that is a total of 4200 snapshots. According to the link/bug 
report above it will take roughly 5.5 hours to import my pool (even when 
the pool is operating perfectly fine and is not degraded or faulted).

This is obviously unacceptable to anyone in an HA environment. Hopefully 
someone close to the issue can clarify.

--
Dave

Blake wrote:
  
  
I think the value of auto-snapshotting zvols is debatable.  At least,
there are not many folks who need to do this.

What I'd rather see is a default property of 'auto-snapshot=off' for zvols.

Blake

On Thu, Aug 27, 2009 at 4:29 PM, Tim Cook wrote:


  On Thu, Aug 27, 2009 at 3:24 PM, Remco Lengers  wrote:
  
  
Dave,

Its logged as an RFE (Request for Enhancement) not as a CR (bug).

The status is 3-Accepted/  P1  RFE

RFE's are generally looked at in a much different way then a CR.

..Remco

  
  Seriously?  It's considered "works as designed" for a system to take 5+
hours to boot?  Wow.

--Tim

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


  

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

  
  ___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  












www.eagle.co.nz 
This email is confidential and may be legally 
privileged. If received in error please destroy and immediately notify 
us.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] utf8only and normalization properties

2009-08-27 Thread Nicolas Williams
So, the manpage seems to have a bug in it.  The valid values for the
normalization property are:

none | formC | formD | formKC | formKD

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Boot error

2009-08-27 Thread Grant Lowe
I've got a 240z with Solaris 10 Update 7, all the latest patches from Sunsolve. 
 I've installed a boot drive with ZFS.  I mirrored the drive with zpool.  I 
installed the boot block.  The system had been working just fine.  But for some 
reason, when I try to boot, I get the error: 

{1} ok boot -s
Boot device: /p...@1c,60/s...@2/d...@0,0  File and args: -s
SunOS Release 5.10 Version Generic_141414-08 64-bit
Copyright 1983-2009 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
Division by Zero
{1} ok

Any ideas?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Connect couple of SATA JBODs to one storage server

2009-08-27 Thread Ron Mexico
This non-raid sas controller is $199 and is based on the LSI SAS 1068.

http://accessories.us.dell.com/sna/products/Networking_Communication/productdetail.aspx?c=us&l=en&s=bsd&cs=04&sku=310-8285&~lt=popup&~ck=TopSellers

What kind of chassis do these drives currently reside in? Does the backplane 
have a sata connector for each drive, or does it have a sas backplane [i.e. one 
SFF 8087 for every four drive slots]?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Status/priority of 6761786

2009-08-27 Thread Dave

Just to make sure we're looking at the same thing:

http://bugs.opensolaris.org/view_bug.do?bug_id=6761786

This is not an issue of auto snapshots. If I have a ZFS server that 
exports 300 zvols via iSCSI and I have daily snapshots retained for 14 
days, that is a total of 4200 snapshots. According to the link/bug 
report above it will take roughly 5.5 hours to import my pool (even when 
the pool is operating perfectly fine and is not degraded or faulted).


This is obviously unacceptable to anyone in an HA environment. Hopefully 
someone close to the issue can clarify.


--
Dave

Blake wrote:

I think the value of auto-snapshotting zvols is debatable.  At least,
there are not many folks who need to do this.

What I'd rather see is a default property of 'auto-snapshot=off' for zvols.

Blake

On Thu, Aug 27, 2009 at 4:29 PM, Tim Cook wrote:


On Thu, Aug 27, 2009 at 3:24 PM, Remco Lengers  wrote:

Dave,

Its logged as an RFE (Request for Enhancement) not as a CR (bug).

The status is 3-Accepted/  P1  RFE

RFE's are generally looked at in a much different way then a CR.

..Remco


Seriously?  It's considered "works as designed" for a system to take 5+
hours to boot?  Wow.

--Tim

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pulsing write performance

2009-08-27 Thread Ross Walker
On Aug 27, 2009, at 11:29 AM, Bob Friesenhahn > wrote:



On Thu, 27 Aug 2009, David Bond wrote:


I just noticed that if the server hasnt hit its target arc size,  
the pauses are for maybe .5 seconds, but as soon as it hits its arc  
target, the iops drop to around 50% of what it was and then there  
are the longer pauses around 4-5 seconds. and then after every  
pause the performance slows even more. So it appears it is  
definately server side.


This is known behavior of zfs for asynchronous writes.  Recent zfs  
defers/aggregates writes up to one of these limits:


 * 7/8ths of available RAM
 * 5 seconds worth of write I/O (full speed write)
 * 30 seconds aggregation time

Notice the 5 seconds.  This 5 seconds results in the 4-6 second  
pause and it seems that the aggregation time is 10 seconds on your  
system with this write load.  Systems with large amounts of RAM  
encounter this issue more than systems with limited RAM.


I encountered the same problem so I put this in /etc/system:

* Set ZFS maximum TXG group size to 393216
set zfs:zfs_write_limit_override = 0xea60


That's the option. When I was experiencing my writes starving reads I  
set this to 512MB or the size of my NVRAM cache for my controller and  
everything was happy again. Write flushes happened in less then a  
second and my IO flattened out nicely.


-Ross

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Connect couple of SATA JBODs to one storage server

2009-08-27 Thread Scott Meilicke
Roman, are you saying you want to install OpenSolaris on your old servers, or 
make the servers look like an external JBOD array, that another server will 
then connect to?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Status/priority of 6761786

2009-08-27 Thread Remco Lengers

Tim,

> Seriously?  It's considered "works as designed" for a system to take 5+
> hours to boot?  Wow.

Thats not what I am saying...I am merely stating the administrative 
facts, as it may explain the inactivity on this matter. I am unsure if 
it is supposed to be an RFE or it became one by mistake.


Regards,

..Remco

Tim Cook wrote:



On Thu, Aug 27, 2009 at 3:24 PM, Remco Lengers > wrote:


Dave,

Its logged as an RFE (Request for Enhancement) not as a CR (bug).

The status is 3-Accepted/  P1  RFE

RFE's are generally looked at in a much different way then a CR.

..Remco



Seriously?  It's considered "works as designed" for a system to take 5+ 
hours to boot?  Wow.


--Tim
 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pulsing write performance

2009-08-27 Thread Tristan
I saw similar behavior when I was running under the kernel debugger (-k 
switch the the kernel). It largely went away when I went back to "normal".


T

David Bond wrote:

Hi,

I was directed here after posting in CIFS discuss (as i first thought that it 
could be a CIFS problem).

I posted the following in CIFS:

When using iometer from windows to the file share on opensolaris svn101 and 
svn111 I get pauses every 5 seconds of around 5 seconds (maybe a little less) 
where no data is transfered, when data is transfered it is at a fair speed and 
gets around 1000-2000 iops with 1 thread (depending on the work type). The 
maximum read response time is 200ms and the maximum write response time is 
9824ms, which is very bad, an almost 10 seconds delay in being able to send 
data to the server.
This has been experienced on 2 test servers, the same servers have also been 
tested with windows server 2008 and they havent shown this problem (the share 
performance was slightly lower than CIFS, but it was consistent, and the 
average access time and maximums were very close.


I just noticed that if the server hasnt hit its target arc size, the pauses are 
for maybe .5 seconds, but as soon as it hits its arc target, the iops drop to 
around 50% of what it was and then there are the longer pauses around 4-5 
seconds. and then after every pause the performance slows even more. So it 
appears it is definately server side.

This is with 100% random io with a spread of 33% write 66% read, 2KB blocks. 
over a 50GB file, no compression, and a 5.5GB target arc size.



Also I have just ran some tests with different IO patterns and 100 sequencial 
writes produce and consistent IO of 2100IOPS, except when it pauses for maybe 
.5 seconds every 10 - 15 seconds.

100% random writes produce around 200 IOPS with a 4-6 second pause around every 
10 seconds.

100% sequencial reads produce around 3700IOPS with no pauses, just random peaks 
in response time (only 16ms) after about 1 minute of running, so nothing to 
complain about.

100% random reads produce around 200IOPS, with no pauses.

So it appears that writes cause a problem, what is causing these very long 
write delays?

A network capture shows that the server doesnt respond to the write from the 
client when these pauses occur.

Also, when using iometer, the initial file creation doesnt have and pauses in 
the creation, so it  might only happen when modifying files.

Any help on finding a solution to this would be really appriciated.

David
  


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Status/priority of 6761786

2009-08-27 Thread Blake
I think the value of auto-snapshotting zvols is debatable.  At least,
there are not many folks who need to do this.

What I'd rather see is a default property of 'auto-snapshot=off' for zvols.

Blake

On Thu, Aug 27, 2009 at 4:29 PM, Tim Cook wrote:
>
>
> On Thu, Aug 27, 2009 at 3:24 PM, Remco Lengers  wrote:
>>
>> Dave,
>>
>> Its logged as an RFE (Request for Enhancement) not as a CR (bug).
>>
>> The status is 3-Accepted/  P1  RFE
>>
>> RFE's are generally looked at in a much different way then a CR.
>>
>> ..Remco
>
>
> Seriously?  It's considered "works as designed" for a system to take 5+
> hours to boot?  Wow.
>
> --Tim
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Status/priority of 6761786

2009-08-27 Thread Tim Cook
On Thu, Aug 27, 2009 at 3:24 PM, Remco Lengers  wrote:

> Dave,
>
> Its logged as an RFE (Request for Enhancement) not as a CR (bug).
>
> The status is 3-Accepted/  P1  RFE
>
> RFE's are generally looked at in a much different way then a CR.
>
> ..Remco
>


Seriously?  It's considered "works as designed" for a system to take 5+
hours to boot?  Wow.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Status/priority of 6761786

2009-08-27 Thread Remco Lengers

Dave,

Its logged as an RFE (Request for Enhancement) not as a CR (bug).

The status is 3-Accepted/  P1  RFE

RFE's are generally looked at in a much different way then a CR.

..Remco

Dave wrote:
Can anyone from Sun comment on the status/priority of bug ID 6761786? 
Seems like this would be a very high priority bug, but it hasn't been 
updated since Oct 2008.


Has anyone else with thousands of volume snapshots experienced the hours 
long import process?


--
Dave
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Status/priority of 6761786

2009-08-27 Thread Dave
Can anyone from Sun comment on the status/priority of bug ID 6761786? 
Seems like this would be a very high priority bug, but it hasn't been 
updated since Oct 2008.


Has anyone else with thousands of volume snapshots experienced the hours 
long import process?


--
Dave
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pulsing write performance

2009-08-27 Thread Henrik Johansen


Ross Walker wrote:

On Aug 27, 2009, at 4:30 AM, David Bond  wrote:


Hi,

I was directed here after posting in CIFS discuss (as i first  
thought that it could be a CIFS problem).


I posted the following in CIFS:

When using iometer from windows to the file share on opensolaris  
svn101 and svn111 I get pauses every 5 seconds of around 5 seconds  
(maybe a little less) where no data is transfered, when data is  
transfered it is at a fair speed and gets around 1000-2000 iops with  
1 thread (depending on the work type). The maximum read response  
time is 200ms and the maximum write response time is 9824ms, which  
is very bad, an almost 10 seconds delay in being able to send data  
to the server.
This has been experienced on 2 test servers, the same servers have  
also been tested with windows server 2008 and they havent shown this  
problem (the share performance was slightly lower than CIFS, but it  
was consistent, and the average access time and maximums were very  
close.



I just noticed that if the server hasnt hit its target arc size, the  
pauses are for maybe .5 seconds, but as soon as it hits its arc  
target, the iops drop to around 50% of what it was and then there  
are the longer pauses around 4-5 seconds. and then after every pause  
the performance slows even more. So it appears it is definately  
server side.


This is with 100% random io with a spread of 33% write 66% read, 2KB  
blocks. over a 50GB file, no compression, and a 5.5GB target arc size.




Also I have just ran some tests with different IO patterns and 100  
sequencial writes produce and consistent IO of 2100IOPS, except when  
it pauses for maybe .5 seconds every 10 - 15 seconds.


100% random writes produce around 200 IOPS with a 4-6 second pause  
around every 10 seconds.


100% sequencial reads produce around 3700IOPS with no pauses, just  
random peaks in response time (only 16ms) after about 1 minute of  
running, so nothing to complain about.


100% random reads produce around 200IOPS, with no pauses.

So it appears that writes cause a problem, what is causing these  
very long write delays?


A network capture shows that the server doesnt respond to the write  
from the client when these pauses occur.


Also, when using iometer, the initial file creation doesnt have and  
pauses in the creation, so it  might only happen when modifying files.


Any help on finding a solution to this would be really appriciated.


What version? And system configuration?

I think it might be the issue where ZFS/ARC write caches more then the  
underlying storage can handle writing in a reasonable time.


There is a parameter to control how much is write cached, I believe it  
is zfs_write_override.


You should be able to disable the write throttle mechanism altogether
with the undocumented zfs_no_write_throttle tunable.

I never got around to testing this though ...



-Ross
 
___

zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


--
Med venlig hilsen / Best Regards

Henrik Johansen
hen...@scannet.dk
Tlf. 75 53 35 00

ScanNet Group
A/S ScanNet 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [cifs-discuss] CIFS pulsing transfers

2009-08-27 Thread Woong Bin Kang
Hi,
It's funny, since I spent hours researching this problem yesterday, too.
I kind of fixed the problem by 
echo 'zfs_txg_synctime/W 0t1' | mdb -kw
(Pasted from  )

I only say kind of, because zfs response time sometimes is still horrible some 
time.

I think It's a zfs flush problem.I hope it helps.
wbkang
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] snv_110 -> snv_121 produces checksum errors on Raid-Z pool

2009-08-27 Thread Adam Leventhal

Hey Gary,

There appears to be a bug in the RAID-Z code that can generate  
spurious checksum errors. I'm looking into it now and hope to have it  
fixed in build 123 or 124. Apologies for the inconvenience.


Adam

On Aug 25, 2009, at 5:29 AM, Gary Gendel wrote:

I have a 5-500GB disk Raid-Z pool that has been producing checksum  
errors right after upgrading SXCE to build 121.  They seem to be  
randomly occurring on all 5 disks, so it doesn't look like a disk  
failure situation.


Repeatingly running a scrub on the pools randomly repairs between 20  
and a few hundred checksum errors.


Since I hadn't physically touched the machine, it seems a very  
strong coincidence that it started right after I upgraded to 121.


This machine is a SunFire v20z with a Marvell SATA 8-port controller  
(the same one as in the original thumper).  I've seen this kind of  
problem way back around build 40-50 ish, but haven't seen it after  
that until now.


Anyone else experiencing this problem or knows how to isolate the  
problem definitively?


Thanks,
Gary
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



--
Adam Leventhal, Fishworkshttp://blogs.sun.com/ahl

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Problem booting with zpool

2009-08-27 Thread Stephen Green

Mark J Musante wrote:


Hi Stephen,

Have you got many zvols (or snapshots of zvols) in your pool?  You could 
be running into CR 6761786 and/or 6693210.


There are four volumes on that pool:

stgr...@blue:/tank/tivo/videos$ zfs list -t volume
NAME   USED  AVAIL  REFER  MOUNTPOINT
rpool/dump6.00G   350G  6.00G  -
rpool/swap6.00G   355G   618M  -
tank/iscsi/mac-backup  119G  1.73T   119G  -
tank/iscsi/macup   423G  2.02T   118G  -
tank/iscsi/macup-newer30.6G  1.73T   143G  -
tank/iscsi/macup-recover  25.1M  1.73T   115G  -

These are the iscsi targets that I use for backing up the mac.  They are 
auto-snapshotted (Time machine + ZFS Time slider!):



stgr...@blue:/tank/iscsi$ zfs list -rt snapshot tank/iscsi | wc
   6243   31215  555636


So, yeah, there's a lot of snapshots.  I'll have a look at those bugs. 
I guess I should turn of the auto snapshots and clear out the old ones, 
but those snapshots saved my behind when the wife's mac went crazy...


Thanks!

Steve
--
Stephen Green  //   stephen.gr...@sun.com
Principal Investigator \\   http://blogs.sun.com/searchguy
The AURA Project   //   Voice: +1 781-442-0926
Sun Microsystems Labs  \\   Fax:   +1 781-442-0399
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Connect couple of SATA JBODs to one storage server

2009-08-27 Thread Roman Naumenko
Can somebody help me with this?

I'd like to convert old storage servers into JBODs for zfs storage to use them 
as pools for backups. There are sata drives in it, and they have backplane. 
Right now there are 8 ports hardware raid cards in it.

What cables/converters do I need for this? 
What card should I use in a storage server to connect many sata? Is this one ok?

http://www.adaptec.com/en-US/products/Controllers/Hardware/sata/performance/SAS-5805/

--
Roman
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Problem booting with zpool

2009-08-27 Thread Mark J Musante


Hi Stephen,

Have you got many zvols (or snapshots of zvols) in your pool?  You could 
be running into CR 6761786 and/or 6693210.


On Thu, 27 Aug 2009, Stephen Green wrote:


I'm having trouble booting with one of my zpools.  It looks like this:

 pool: tank
state: ONLINE
scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
tankONLINE   0 0 0
  raidz1ONLINE   0 0 0
c4d0ONLINE   0 0 0
c4d1ONLINE   0 0 0
c5d0ONLINE   0 0 0
c5d1ONLINE   0 0 0
logs
  c11d0 ONLINE   0 0 0

I'm running OpenSolaris 2009.06 updated to build 118.

Basically the system won't boot until I boot with a CD, zpool import -f the 
pool and then zpool export it.  Even that doesn't really work.  When I do the 
zpool import -f from the CD, the command runs forever (well, I let it run for 
a good hour or so before I stopped it).  The command seems to be running in a 
loop.  If I truss the process I see the following:


door_call(6, 0x0803D390)= 0
lwp_sigmask(SIG_SETMASK, 0x, 0x) = 0xFFBFFEFF [0x]
close(6)= 0
resolvepath("/", "/", 1024) = 1
resolvepath("/", "/", 1024) = 1
open("/etc/dev/.devlink_db", O_RDONLY)= 6
fxstat(2, 6, 0x0803D870)= 0
mmap(0x, 40, PROT_READ, MAP_SHARED, 6, 0) = 0xFE95
mmap(0x, 81920, PROT_READ, MAP_SHARED, 6, 45056) = 0xFE93B000
munmap(0xFE93B000, 81920)   = 0
munmap(0xFE95, 40)  = 0
close(6)= 0
ioctl(3, ZFS_IOC_SNAPSHOT_LIST_NEXT, 0x0803EEF0) = 0
ioctl(3, ZFS_IOC_CREATE_MINOR, 0x0803D9B0)  = 0
getppriv(PRIV_EFFECTIVE, {}) = 0
open("/etc/devname_check_RDONLY", O_WRONLY|O_CREAT|O_TRUNC, 0644) = 6
close(6)= 0
unlink("/etc/devname_check_RDONLY")   = 0
xstat(2, "//etc/dev/.devfsadm_synch_door", 0x0803CF00) = 0
open("//etc/dev/.devfsadm_synch_door", O_RDONLY) = 6
lwp_sigmask(SIG_SETMASK, 0xFFBFFEFF, 0xFFF7) = 0xFFBFFEFF [0x]
door_call(6, 0x0803D390)= 0
lwp_sigmask(SIG_SETMASK, 0x, 0x) = 0xFFBFFEFF [0x]
close(6)= 0
resolvepath("/", "/", 1024) = 1
resolvepath("/", "/", 1024) = 1
open("/etc/dev/.devlink_db", O_RDONLY)= 6
fxstat(2, 6, 0x0803D870)= 0
mmap(0x, 40, PROT_READ, MAP_SHARED, 6, 0) = 0xFE95
mmap(0x, 81920, PROT_READ, MAP_SHARED, 6, 45056) = 0xFE93B000
munmap(0xFE93B000, 81920)   = 0
munmap(0xFE95, 40)  = 0
close(6)= 0
ioctl(3, ZFS_IOC_SNAPSHOT_LIST_NEXT, 0x0803EEF0) = 0
ioctl(3, ZFS_IOC_CREATE_MINOR, 0x0803D9B0)  = 0
getppriv(PRIV_EFFECTIVE, {}) = 0
open("/etc/devname_check_RDONLY", O_WRONLY|O_CREAT|O_TRUNC, 0644) = 6
close(6)= 0
unlink("/etc/devname_check_RDONLY")   = 0
xstat(2, "//etc/dev/.devfsadm_synch_door", 0x0803CF00) = 0
open("//etc/dev/.devfsadm_synch_door", O_RDONLY) = 6
lwp_sigmask(SIG_SETMASK, 0xFFBFFEFF, 0xFFF7) = 0xFFBFFEFF [0x]
door_call(6, 0x0803D390)= 0

This sequence is repeated over and over.  I realize now that I should have 
done a pfiles on the process to figure out what the file descriptors were 
mapping to.  I can do that if it will help diagnose this.


If I kill the zpool import at this point, all of my filesystems are there and 
mounted and everything seems to be fine.  I'm going to scrub the pool over 
night tonight, but I'd appreciate any suggestions as to how to fix this 
problem in the future.


Steve Green
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




Regards,
markm
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pulsing write performance

2009-08-27 Thread Roman Naumenko
Hi David,

Just wanted to ask you, how your windows server behaves during these pauses? 
Are there any clients, connected to it?

The issue you've described might be related to one I saw on my server, see here:
http://www.opensolaris.org/jive/thread.jspa?threadID=110013&tstart=0

I just wonder how windows behaves during these pauses.

--
Roman Naumenko
ro...@frontline.ca
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pulsing write performance

2009-08-27 Thread Bob Friesenhahn

On Thu, 27 Aug 2009, David Bond wrote:


I just noticed that if the server hasnt hit its target arc size, the 
pauses are for maybe .5 seconds, but as soon as it hits its arc 
target, the iops drop to around 50% of what it was and then there 
are the longer pauses around 4-5 seconds. and then after every pause 
the performance slows even more. So it appears it is definately 
server side.


This is known behavior of zfs for asynchronous writes.  Recent zfs 
defers/aggregates writes up to one of these limits:


  * 7/8ths of available RAM
  * 5 seconds worth of write I/O (full speed write)
  * 30 seconds aggregation time

Notice the 5 seconds.  This 5 seconds results in the 4-6 second pause 
and it seems that the aggregation time is 10 seconds on your system 
with this write load.  Systems with large amounts of RAM encounter 
this issue more than systems with limited RAM.


I encountered the same problem so I put this in /etc/system:

* Set ZFS maximum TXG group size to 393216
set zfs:zfs_write_limit_override = 0xea60

By limiting the TXG group size, the size of the data burst is limited, 
but since zfs still writes the TXG as fast as it can, other I/O will 
cease during that time.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pulsing write performance

2009-08-27 Thread Ross Walker

On Aug 27, 2009, at 4:30 AM, David Bond  wrote:


Hi,

I was directed here after posting in CIFS discuss (as i first  
thought that it could be a CIFS problem).


I posted the following in CIFS:

When using iometer from windows to the file share on opensolaris  
svn101 and svn111 I get pauses every 5 seconds of around 5 seconds  
(maybe a little less) where no data is transfered, when data is  
transfered it is at a fair speed and gets around 1000-2000 iops with  
1 thread (depending on the work type). The maximum read response  
time is 200ms and the maximum write response time is 9824ms, which  
is very bad, an almost 10 seconds delay in being able to send data  
to the server.
This has been experienced on 2 test servers, the same servers have  
also been tested with windows server 2008 and they havent shown this  
problem (the share performance was slightly lower than CIFS, but it  
was consistent, and the average access time and maximums were very  
close.



I just noticed that if the server hasnt hit its target arc size, the  
pauses are for maybe .5 seconds, but as soon as it hits its arc  
target, the iops drop to around 50% of what it was and then there  
are the longer pauses around 4-5 seconds. and then after every pause  
the performance slows even more. So it appears it is definately  
server side.


This is with 100% random io with a spread of 33% write 66% read, 2KB  
blocks. over a 50GB file, no compression, and a 5.5GB target arc size.




Also I have just ran some tests with different IO patterns and 100  
sequencial writes produce and consistent IO of 2100IOPS, except when  
it pauses for maybe .5 seconds every 10 - 15 seconds.


100% random writes produce around 200 IOPS with a 4-6 second pause  
around every 10 seconds.


100% sequencial reads produce around 3700IOPS with no pauses, just  
random peaks in response time (only 16ms) after about 1 minute of  
running, so nothing to complain about.


100% random reads produce around 200IOPS, with no pauses.

So it appears that writes cause a problem, what is causing these  
very long write delays?


A network capture shows that the server doesnt respond to the write  
from the client when these pauses occur.


Also, when using iometer, the initial file creation doesnt have and  
pauses in the creation, so it  might only happen when modifying files.


Any help on finding a solution to this would be really appriciated.


What version? And system configuration?

I think it might be the issue where ZFS/ARC write caches more then the  
underlying storage can handle writing in a reasonable time.


There is a parameter to control how much is write cached, I believe it  
is zfs_write_override.


-Ross
 
___

zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] snv_110 -> snv_121 produces checksum errors on Raid-Z pool

2009-08-27 Thread Casper . Dik

>It looks like It's definitely related to the snv_121 upgrade.  I decided to 
>roll
>back to snv_110 and the checksum errors have disappeared.  I'd like to issue a
>bug report, but I don't have any information that might help track this down,
>just lots of checksum errors.

>Looks like I'm stuck at snv_110 until someone figures out what is broken. 
>If it helps, here is my properly list for this pool.

There are many components in the stack: zfs, the device drivers and such.

What hardware are you using?

Perhaps it's an issue of your SATA driver or something else.  I've seen no 
checksum errors on snv_121

Casper

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] snv_110 -> snv_121 produces checksum errors on Raid-Z pool

2009-08-27 Thread Albert Chin
On Thu, Aug 27, 2009 at 06:29:52AM -0700, Gary Gendel wrote:
> It looks like It's definitely related to the snv_121 upgrade.  I
> decided to roll back to snv_110 and the checksum errors have
> disappeared.  I'd like to issue a bug report, but I don't have any
> information that might help track this down, just lots of checksum
> errors.

So, on snv_121, can you read the files with checksum errors? Is it
simply the reporting mechanism that is wrong or are the files really
damaged?

-- 
albert chin (ch...@thewrittenword.com)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] snv_110 -> snv_121 produces checksum errors on Raid-Z pool

2009-08-27 Thread Gary Gendel
It looks like It's definitely related to the snv_121 upgrade.  I decided to 
roll back to snv_110 and the checksum errors have disappeared.  I'd like to 
issue a bug report, but I don't have any information that might help track this 
down, just lots of checksum errors.

Looks like I'm stuck at snv_110 until someone figures out what is broken.  If 
it helps, here is my properly list for this pool.

g...@phoenix[~]101>zfs get all archive
NAME PROPERTY  VALUE  SOURCE
archive  type  filesystem -
archive  creation  Mon Jun 18 20:40 2007  -
archive  used  787G   -
archive  available 1.01T  -
archive  referenced125G   -
archive  compressratio 1.13x  -
archive  mounted   yes-
archive  quota none   default
archive  reservation   none   default
archive  recordsize128K   default
archive  mountpoint/archive   default
archive  sharenfs  offdefault
archive  checksum  on default
archive  compression   on local
archive  atime offlocal
archive  devices   on default
archive  exec  on default
archive  setuidon default
archive  readonly  offdefault
archive  zoned offdefault
archive  snapdir   hidden default
archive  aclmode   groupmask  default
archive  aclinheritrestricted default
archive  canmount  on default
archive  shareiscsioffdefault
archive  xattr on default
archive  copies1  default
archive  version   3  -
archive  utf8only  off-
archive  normalization none   -
archive  casesensitivity   sensitive  -
archive  vscan offdefault
archive  nbmandoffdefault
archive  sharesmb  offlocal
archive  refquota  none   default
archive  refreservationnone   default
archive  primarycache  alldefault
archive  secondarycachealldefault

And each of the sub-pools look like this:

g...@phoenix[~]101>zfs get all archive/gary
archive/gary  type  filesystem -
archive/gary  creation  Mon Jun 18 20:56 2007  -
archive/gary  used  141G   -
archive/gary  available 1.01T  -
archive/gary  referenced141G   -
archive/gary  compressratio 1.22x  -
archive/gary  mounted   yes-
archive/gary  quota none   default
archive/gary  reservation   none   default
archive/gary  recordsize128K   default
archive/gary  mountpoint/archive/gary  default
archive/gary  sharenfs  offdefault
archive/gary  checksum  on default
archive/gary  compression   on inherited from 
archive
archive/gary  atime offinherited from 
archive
archive/gary  devices   on default
archive/gary  exec  on default
archive/gary  setuidon default
archive/gary  readonly  offdefault
archive/gary  zoned offdefault
archive/gary  snapdir   hidden default
archive/gary  aclmode   groupmask  default
archive/gary  aclinheritpassthroughlocal
archive/gary  canmount  on default
archive/gary  shareiscsioffdefault
archive/gary  xattr on default
archive/gary  copies1  default
archive/gary  version   3  -
archive/gary  utf8only  off-
archive/gary  normalization none   -
archive/gary  casesensitivity   sensitive  -
archive/gary  vscan offdefault
archive/gary  nbm

[zfs-discuss] ARC limits not obeyed in OSol 2009.06

2009-08-27 Thread Udo Grabowski
Hi,
we've capped Arcsize via set zfs:zfs_arc_max = 0x2000 in /etc/system to 512 
MB, since ARC 
still does not release memory when applications need it (this is another bug). 
But this hard limit is 
not obeyed, instead, when traversing all files in a large and deep directory, 
we see the values below 
(arc started with 300 MB). After a while, machine (Ultra 20 M2 with 6GB) swaps 
and then, hours later, freezes completely (even no reaction on quick push power 
button, no ping, no mouse, have to hard 
reset). Arc summary shows clearly that limits are not what they supposed to be. 
If this is working as
intended, then the intention must be changed. As poorly as ARC is working now, 
it's absolutely 
necessary that a hard limit is indeed a hard limit for ARC. Please fix this. Is 
there anything I can do to
really limit or switch off the ARC completely ? It's breaking our production 
work often since we've
installed OSol (we came from SXDE 1.08 which worked better), we must find a way 
to stop this 
problem as fast as possible !

arcstat:
Time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz c  
13:22:16   95M   23M 24   10M   14   12M   64   22M   24   963M  536M  
13:22:172K   256 10796   177   15   2229   965M  536M  
13:22:182K   490 22   119   10   371   38   482   22   970M  536M  
13:22:194K   214  4   1506643   1403   971M  536M  
13:22:202K   427 19574   370   37   419   19   971M  536M  
13:22:211K   208 19   103   17   105   21   202   19   971M  536M  

13:23:161K   481 27808   401   47   478   27 1G  536M  
13:23:172K   255 11   125   10   130   13   218   10 1G  536M  
and counting...
arc_summary:
System Memory:
 Physical RAM:  6134 MB
 Free Memory :  1739 MB
 LotsFree:  95 MB

ZFS Tunables (/etc/system):
 set zfs:zfs_arc_max = 0x2000

ARC Size:
 Current Size: 1357 MB (arcsize)
 Target Size (Adaptive):   512 MB (c)
 Min Size (Hard Limit):191 MB (zfs_arc_min)
 Max Size (Hard Limit):512 MB (zfs_arc_max)

ARC Size Breakdown:
 Most Recently Used Cache Size:  93%479 MB (p)
 Most Frequently Used Cache Size: 6%32 MB (c-p)

ARC Efficency:
 Cache Access Total: 97131108
 Cache Hit Ratio:  75%   7321   [Defined State for 
buffer]
 Cache Miss Ratio: 24%   23886667   [Undefined State for 
Buffer]
 REAL Hit Ratio:   67%   65874421   [MRU/MFU Hits Only]

 Data Demand   Efficiency:66%
 Data Prefetch Efficiency: 8%

CACHE HITS BY CACHE LIST:
  Anon:   --%Counter Rolled.
  Most Recently Used: 15%11463028 (mru) [ 
Return Customer ]
  Most Frequently Used:   74%54411393 (mfu) [ 
Frequent Customer ]
  Most Recently Used Ghost:   10%7537123 (mru_ghost)[ 
Return Customer Evicted, Now Back ]
  Most Frequently Used Ghost: 19%14619417 (mfu_ghost)   [ 
Frequent Customer Evicted, Now Back ]
CACHE HITS BY DATA TYPE:
  Demand Data: 3%2716192 
  Prefetch Data:   0%3506 
  Demand Metadata:86%63089419 
  Prefetch Metadata:  10%7435324 
CACHE MISSES BY DATA TYPE:
  Demand Data: 5%1365132 
  Prefetch Data:   0%36544 
  Demand Metadata:40%9664064 
  Prefetch Metadata:  53%12820927
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool import hangs with indefinite writes

2009-08-27 Thread noguaran
I used the GUI to delete all my snapshots, and after that, "zfs list" worked 
without hanging.  I did a "zpool scrub" and will wait to see what happens with 
that. I DID have automatic snapshots enabled before.  They are disabled now.  I 
don't know how the snapshots work to be honest, so maybe I ran into some upper 
limit with the amount of snapshots?  I AM running daily backups on all 
computers: Windows, Linux, and Mac OS.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Pulsing write performance

2009-08-27 Thread David Bond
Hi,

I was directed here after posting in CIFS discuss (as i first thought that it 
could be a CIFS problem).

I posted the following in CIFS:

When using iometer from windows to the file share on opensolaris svn101 and 
svn111 I get pauses every 5 seconds of around 5 seconds (maybe a little less) 
where no data is transfered, when data is transfered it is at a fair speed and 
gets around 1000-2000 iops with 1 thread (depending on the work type). The 
maximum read response time is 200ms and the maximum write response time is 
9824ms, which is very bad, an almost 10 seconds delay in being able to send 
data to the server.
This has been experienced on 2 test servers, the same servers have also been 
tested with windows server 2008 and they havent shown this problem (the share 
performance was slightly lower than CIFS, but it was consistent, and the 
average access time and maximums were very close.


I just noticed that if the server hasnt hit its target arc size, the pauses are 
for maybe .5 seconds, but as soon as it hits its arc target, the iops drop to 
around 50% of what it was and then there are the longer pauses around 4-5 
seconds. and then after every pause the performance slows even more. So it 
appears it is definately server side.

This is with 100% random io with a spread of 33% write 66% read, 2KB blocks. 
over a 50GB file, no compression, and a 5.5GB target arc size.



Also I have just ran some tests with different IO patterns and 100 sequencial 
writes produce and consistent IO of 2100IOPS, except when it pauses for maybe 
.5 seconds every 10 - 15 seconds.

100% random writes produce around 200 IOPS with a 4-6 second pause around every 
10 seconds.

100% sequencial reads produce around 3700IOPS with no pauses, just random peaks 
in response time (only 16ms) after about 1 minute of running, so nothing to 
complain about.

100% random reads produce around 200IOPS, with no pauses.

So it appears that writes cause a problem, what is causing these very long 
write delays?

A network capture shows that the server doesnt respond to the write from the 
client when these pauses occur.

Also, when using iometer, the initial file creation doesnt have and pauses in 
the creation, so it  might only happen when modifying files.

Any help on finding a solution to this would be really appriciated.

David
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool import hangs with indefinite writes

2009-08-27 Thread noguaran
Thank you so much for your reply! 
Here are the outputs:

>1. Find PID of the hanging 'zpool import', e.g. with 'ps -ef | grep zpool'
r...@mybox:~# ps -ef|grep zpool
root   915   908   0 03:34:46 pts/3   0:00 grep zpool
root   901   874   1 03:34:09 pts/2   0:00 zpool import drowning

>2. Substitute PID with actual number in the below command
>echo "0tPID::pid2proc|::walk thread|::findstack -v" | mdb -k

r...@mybox:~# echo "0t901::pid2proc|::walk thread|::findstack -v" | mdb -k
stack pointer for thread ff02ed8c7880: ff0010191a10
[ ff0010191a10 _resume_from_idle+0xf1() ]
  ff0010191a40 swtch+0x147()
  ff0010191a70 cv_wait+0x61(ff02eb010dda, ff02eb010d98)
  ff0010191ac0 txg_wait_synced+0x7f(ff02eb010c00, 31983c5)
  ff0010191b00 dsl_sync_task_group_wait+0xee(ff02f1d11bd8)
  ff0010191b80 dsl_sync_task_do+0x65(ff02eb010c00, f78be1f0, 
  f78be250, ff02edc38400, ff0010191b98, 0)
  ff0010191bd0 dsl_dataset_rollback+0x53(ff02edc38400, 2)
  ff0010191c00 dmu_objset_rollback+0x46(ff02eb674b20)
  ff0010191c40 zfs_ioc_rollback+0x10d(ff02f2b58000)
  ff0010191cc0 zfsdev_ioctl+0x10b(b6, 5a1a, 803e240, 13, 
  ff02ee813338, ff0010191de4)
  ff0010191d00 cdev_ioctl+0x45(b6, 5a1a, 803e240, 13, 
  ff02ee813338, ff0010191de4)
  ff0010191d40 spec_ioctl+0x83(ff02df6a7480, 5a1a, 803e240, 13, 
  ff02ee813338, ff0010191de4, 0)
  ff0010191dc0 fop_ioctl+0x7b(ff02df6a7480, 5a1a, 803e240, 13, 
  ff02ee813338, ff0010191de4, 0)
  ff0010191ec0 ioctl+0x18e(3, 5a1a, 803e240)
  ff0010191f10 _sys_sysenter_post_swapgs+0x14b()

>3. Do
>echo "::spa" | mdb -k

r...@mybox:~# echo "::spa" | mdb -k
ADDR STATE NAME
ff02f2b8b800ACTIVE mypool
ff02d589ACTIVE rpool

>4. Find address of your pool in the output of stage 3 and replace ADDR with it
>in the below command (it is single line):
>echo "ADDR::print spa_t spa_dsl_pool->dp_tx.tx_sync_thread|::findstack -v" | 
>mdb -k

r...@mybox:~# echo "ff02f2b8b800::print spa_t 
spa_dsl_pool->dp_tx.tx_sync_thread|::findstack -v" | mdb -k
mdb: spa_t is not a struct or union type

So I decided to remove "spa_t" to see what would happen:

r...@mybox:~# echo "ff02f2b8b800::print 
spa_dsl_pool->dp_tx.tx_sync_thread|::findstack -v" | mdb -k
mdb: failed to look up type spa_dsl_pool->dp_tx.tx_sync_thread: no symbol 
corresponds to address

>What do you mean by halt here? Are you able to interrupt 'zpool import' with 
>CTRL-C?
Yes

>Does 'zfs list' provide any output?
JACKPOT!  When I run "zfs list", the import completes!  Instead, "zfs list" 
hangs just like "zpool import" did.

r...@mybox:~# ps -ef | grep zfs
root   940   874   0 03:49:15 pts/2   0:00 grep zfs
root   936   908   0 03:44:28 pts/3   0:01 zfs list

r...@mybox:~# echo "0t936::pid2proc|::walk thread|::findstack -v" | mdb -k
stack pointer for thread ff02d72ea020: ff000fdeaa10
[ ff000fdeaa10 _resume_from_idle+0xf1() ]
  ff000fdeaa40 swtch+0x147()
  ff000fdeaa70 cv_wait+0x61(ff02eb010dda, ff02eb010d98)
  ff000fdeaac0 txg_wait_synced+0x7f(ff02eb010c00, 31990da)
  ff000fdeab00 dsl_sync_task_group_wait+0xee(ff02f1d11bd8)
  ff000fdeab80 dsl_sync_task_do+0x65(ff02eb010c00, f78be1f0, 
  f78be250, ff02f1d0ce00, ff000fdeab98, 0)
  ff000fdeabd0 dsl_dataset_rollback+0x53(ff02f1d0ce00, 2)
  ff000fdeac00 dmu_objset_rollback+0x46(ff02eb3322a8)
  ff000fdeac40 zfs_ioc_rollback+0x10d(ff02ebf4e000)
  ff000fdeacc0 zfsdev_ioctl+0x10b(b6, 5a1a, 8043a20, 13, 
  ff02ee813e78, ff000fdeade4)
  ff000fdead00 cdev_ioctl+0x45(b6, 5a1a, 8043a20, 13, 
  ff02ee813e78, ff000fdeade4)
  ff000fdead40 spec_ioctl+0x83(ff02df6a7480, 5a1a, 8043a20, 13, 
  ff02ee813e78, ff000fdeade4, 0)
  ff000fdeadc0 fop_ioctl+0x7b(ff02df6a7480, 5a1a, 8043a20, 13, 
  ff02ee813e78, ff000fdeade4, 0)
  ff000fdeaec0 ioctl+0x18e(3, 5a1a, 8043a20)
  ff000fdeaf10 _sys_sysenter_post_swapgs+0x14b()


>Apparently as you have 5TB of data there, it worked fine some time ago. What
>happened to the pool before this issue was noticed?
A reboot?
This box acts as network storage for all of my computers.  All of the PCs in 
the house are set to back up to it daily, and it is like an extra hard drive 
for my wife's netbook and laptop.  We dump all of the pictures off of the 
camera there as well as any HD video we capture.  I NEVER reboot this box 
unless I am prompted to.  I'm running OpenSolaris (uname -a: SunOS mybox 5.11 
snv_111b i86pc i386 i86pc Solaris), and if I remember right, I was prompted to 
update.  I did so, and needed to reboot.  Rebooted, and the box would not 
start.  I used another PC to find out how to start 

[zfs-discuss] Problem booting with zpool

2009-08-27 Thread Stephen Green

I'm having trouble booting with one of my zpools.  It looks like this:

  pool: tank
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
tankONLINE   0 0 0
  raidz1ONLINE   0 0 0
c4d0ONLINE   0 0 0
c4d1ONLINE   0 0 0
c5d0ONLINE   0 0 0
c5d1ONLINE   0 0 0
logs
  c11d0 ONLINE   0 0 0

I'm running OpenSolaris 2009.06 updated to build 118.

Basically the system won't boot until I boot with a CD, zpool import -f 
the pool and then zpool export it.  Even that doesn't really work.  When 
I do the zpool import -f from the CD, the command runs forever (well, I 
let it run for a good hour or so before I stopped it).  The command 
seems to be running in a loop.  If I truss the process I see the following:


door_call(6, 0x0803D390)= 0
lwp_sigmask(SIG_SETMASK, 0x, 0x) = 0xFFBFFEFF [0x]
close(6)= 0
resolvepath("/", "/", 1024) = 1
resolvepath("/", "/", 1024) = 1
open("/etc/dev/.devlink_db", O_RDONLY)= 6
fxstat(2, 6, 0x0803D870)= 0
mmap(0x, 40, PROT_READ, MAP_SHARED, 6, 0) = 0xFE95
mmap(0x, 81920, PROT_READ, MAP_SHARED, 6, 45056) = 0xFE93B000
munmap(0xFE93B000, 81920)   = 0
munmap(0xFE95, 40)  = 0
close(6)= 0
ioctl(3, ZFS_IOC_SNAPSHOT_LIST_NEXT, 0x0803EEF0) = 0
ioctl(3, ZFS_IOC_CREATE_MINOR, 0x0803D9B0)  = 0
getppriv(PRIV_EFFECTIVE, {}) = 0
open("/etc/devname_check_RDONLY", O_WRONLY|O_CREAT|O_TRUNC, 0644) = 6
close(6)= 0
unlink("/etc/devname_check_RDONLY")   = 0
xstat(2, "//etc/dev/.devfsadm_synch_door", 0x0803CF00) = 0
open("//etc/dev/.devfsadm_synch_door", O_RDONLY) = 6
lwp_sigmask(SIG_SETMASK, 0xFFBFFEFF, 0xFFF7) = 0xFFBFFEFF [0x]
door_call(6, 0x0803D390)= 0
lwp_sigmask(SIG_SETMASK, 0x, 0x) = 0xFFBFFEFF [0x]
close(6)= 0
resolvepath("/", "/", 1024) = 1
resolvepath("/", "/", 1024) = 1
open("/etc/dev/.devlink_db", O_RDONLY)= 6
fxstat(2, 6, 0x0803D870)= 0
mmap(0x, 40, PROT_READ, MAP_SHARED, 6, 0) = 0xFE95
mmap(0x, 81920, PROT_READ, MAP_SHARED, 6, 45056) = 0xFE93B000
munmap(0xFE93B000, 81920)   = 0
munmap(0xFE95, 40)  = 0
close(6)= 0
ioctl(3, ZFS_IOC_SNAPSHOT_LIST_NEXT, 0x0803EEF0) = 0
ioctl(3, ZFS_IOC_CREATE_MINOR, 0x0803D9B0)  = 0
getppriv(PRIV_EFFECTIVE, {}) = 0
open("/etc/devname_check_RDONLY", O_WRONLY|O_CREAT|O_TRUNC, 0644) = 6
close(6)= 0
unlink("/etc/devname_check_RDONLY")   = 0
xstat(2, "//etc/dev/.devfsadm_synch_door", 0x0803CF00) = 0
open("//etc/dev/.devfsadm_synch_door", O_RDONLY) = 6
lwp_sigmask(SIG_SETMASK, 0xFFBFFEFF, 0xFFF7) = 0xFFBFFEFF [0x]
door_call(6, 0x0803D390)= 0

This sequence is repeated over and over.  I realize now that I should 
have done a pfiles on the process to figure out what the file 
descriptors were mapping to.  I can do that if it will help diagnose this.


If I kill the zpool import at this point, all of my filesystems are 
there and mounted and everything seems to be fine.  I'm going to scrub 
the pool over night tonight, but I'd appreciate any suggestions as to 
how to fix this problem in the future.


Steve Green
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss