Re: [zfs-discuss] Re: suggestion: directory promotion to filesystem

2007-02-22 Thread Matthew Ahrens

Adrian Saul wrote:

Any idea on the timeline or future of "zfs split" ?


It isn't a priority for now.

--matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Overview (rollup) of recent activity on zfs-discuss

2007-02-22 Thread Eric Boutilier

For background on what this is, see:

http://www.opensolaris.org/jive/message.jspa?messageID=24416#24416
http://www.opensolaris.org/jive/message.jspa?messageID=25200#25200

=
zfs-discuss 02/01 - 02/15
=

Size of all threads during period:

Thread size Topic
--- -
 17   Implementing fbarrier() on ZFS
 14   NFS/ZFS performance problems - txg_wait_open() deadlocks?
 14   How to backup a slice ? - newbie
 14   Downsides to zone roots on ZFS?
 14   118855-36 & ZFS
 13   ZFS vs NFS vs array caches, revisited
 13   Project Proposal: Availability Suite
 11   RACF: SunFire x4500 Thumper Evaluation
  8   ZFS checksums - block or file level
  7   snapdir visable recursively throughout a dataset
  7   ZFS multi-threading
  7   ZFS and thin provisioning
  7   Shrinking a zpool?
  7   Read Only Zpool: ZFS and Replication
  7   Advice on a cheap home NAS machine using ZFS
  6   ZFS inode equivalent)
  6   zfs corruption -- odd inum?
  6   solaris - ata over ethernet - zfs - HPC
  6   se3510 and ZFS
  6   poor zfs performance on my home server
  6   FYI: ZFS on USB sticks (from Germany)
  5   ZFS limits on zpool snapshots
  5   Which label a ZFS/ZPOOL device has ? VTOC or EFI ?
  5   The ZFS MOS and how DNODES are stored
  5   NFS share problem with mac os x client
  5   Best Practises => Keep Pool Below 80%?
  4   zfs send - single threaded
  4   hot spares - in standby?
  4   ZFS panic on B54
  4   ZFS on PC Based Hardware for NAS?
  4   ZFS and question on repetative data migrating to it efficiently...
  4   Why doesn't Solaris remove a faulty disk from operation?
  4   Re[2]: se3510 and ZFS
  4   Peculiar behavior of snapshot after zfs receive
  4   Need help making lsof work with ZFS
  4   Meta data corruptions on ZFS.
  4   Is ZFS file system supports short writes ?
  3   zpool export consumes whole CPU and takes more than 30 minutes to 
complete
  3   zfs legacy filesystem remounted rw: atime temporary off?
  3   boot disks & controller layout...
  3   ZFS fragmentation
  3   ZFS Degraded Disks
  3   What SATA controllers are people using for ZFS?
  3   Preferred ZFS backup solution
  3   Peculiar behaviour of snapshot after zfs receive
  3   Managing ZFS - perspective from the intended users
  3   Disk Failure Rates and Error Rates -- ( Off topic: Jim Gray lost 
at sea)
  3   Cheap ZFS homeserver.
  2   ZFS volume is hosing BIOS POST on Ultra20 (BIOS 2.1.7)
  2   number of lun's that zfs can handle
  2   Zpool complain about missing devices
  2   ZFS tie-breaking
  2   ZFS or UFS - what to do?
  2   Work arounds for bug #6456888?
  2   VxVM volumes in a zpool.
  2   Thumper Origins Q
  2   SPEC SFS benchmark of NFS/ZFS/B56 - please help to improve it!
  2   Limit ZFS Memory Utilization
  1   zpool lost
  1   zfs rewrite?
  1   zfs magic still missing
  1   zfs crashing
  1   update of zfs boot support
  1   unable to mount legacy vol - panic in zfs:space_map_remove - zdb 
crashes
  1   rewrite-style scrubbing...
  1   impressive
  1   corrupted files and improved 'zpool status -v'
  1   ZFS mirrored laptop
  1   UPDATE: FROSUG February Meeting (2/22/2007)
  1   UFS on zvol: volblocksize and maxcontig
  1   Shrinking a zpool? (refered to "Meta data corruptions on ZFS.")
  1   SPEC SFS testing of NFS/ZFS/B56
  1   RealNeel : ZFS and DB performance
  1   Re[2]: Why doesn't Solaris remove a faulty disk from operation?
  1   Panic with "really out of space."
  1   Not about Implementing fbarrier() on ZFS
  1   Low-end JBOD - am I nuts?
  1   FROSUG February Meeting Announcement (2/22/2007)
  1   ENOSPC on full FS (was: Meta data corruptions on ZFS.)
  1   Disk Failure Rates and Error Rates


Posting activity by person for period:

# of posts  By
--   --
 36   rmilkowski at task.gda.pl (robert milkowski)
 15   matthew.ahrens at sun.com (matthew ahrens)
 11   richard.elling at sun.com (richard elling)
 10   tmcmahon2 at yahoo.com (torrey mcmahon)
 10   roch.bourbonnais at sun.com (roch - pae)
 10   ddunham at taos.com (darren dunham)
  9   udippel at gmail.com (uwe dippel)
  8   eric.kustarz at sun.com (eric kustarz)
  7 

Re: [zfs-discuss] Re: Re: How much do we really want zpool remove?

2007-02-22 Thread Jason J. W. Williams

Hi Przemol,

I think Casper had a good point bringing up the data integrity
features when using ZFS for RAID. Big companies do a lot of things
"just because that's the certified way" that end up biting them in the
rear. Trusting your SAN arrays is one of them. That all being said,
the need to do migrations is a very valid concern.

Best Regards,
Jason

On 2/22/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:

On Wed, Feb 21, 2007 at 04:43:34PM +0100, [EMAIL PROTECTED] wrote:
>
> >I cannot let you say that.
> >Here in my company we are very interested in ZFS, but we do not care
> >about the RAID/mirror features, because we already have a SAN with
> >RAID-5 disks, and dual fabric connection to the hosts.
>
> But you understand that these underlying RAID mechanism give absolutely
> no guarantee about data integrity but only that some data was found were
> some (possibly other) data was written?  (RAID5 never verifies the
> checkum is correct on reads; it only uses it to reconstruct data when
> reads fail)

But you understand that he perhaps knows that but so far nothing wrong
happened [*] and migration is still very important feature for him ?

[*] almost every big company has its data center with SAN and FC
connections with RAID-5 or RAID-10 in their storage arrays
and they are treated as reliable

przemol

--
Wpadka w kosciele - zobacz >> http://link.interia.pl/f19ea

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] HELIOS and ZFS cache

2007-02-22 Thread Jason J. W. Williams

Hi Eric,

Everything Mark said.

We as a customer ran into this running MySQL on a Thumper (and T2000).
We solved it on the Thumper by limiting the ARC to 4GB:

/etc/system: set zfs:zfs_arc_max = 0x1 #4GB

This has worked marvelously over the past 50 days. The ARC stays
around 5-6GB now. Leaving 11GB for the DB.

Best Regards,
Jason

On 2/22/07, Mark Maybee <[EMAIL PROTECTED]> wrote:

This issue has been discussed a number of times in this forum.
To summerize:

ZFS (specifically, the ARC) will try to use *most* of the systems
available memory to cache file system data.  The default is to
max out at physmem-1GB (i.e., use all of physical memory except
for 1GB).  In the face of memory pressure, the ARC will "give up"
memory, however there are some situations where we are unable to
free up memory fast enough for an application that needs it (see
example in the HELIOS note below).  In these situations, it may
be necessary to lower the ARCs maximum memory footprint, so that
there is a larger amount of memory immediately available for
applications.  This is particularly relevant in situations where
there is a known amount of memory that will always be required for
use by some application (databases often fall into this category).
The tradeoff here is that the ARC will not be able to cache as much
file system data, and that could impact performance.

For example, if you know that an application will need 5GB on a
36GB machine, you could set the arc maximum to 30GB (0x78000).

In ZFS on on10 prior to update 4, you can only change the arc max
size via explicit actions with mdb(1):

# mdb -kw
 > arc::print -a c_max
 c_max = 
 > /Z 

In the current opensolaris nevada bits, and in s10u4, you can use
the system variable 'zfs_arc_max' to set the maximum arc size.  Just
set this in /etc/system.

-Mark

Erik Vanden Meersch wrote:
>
> Could someone please provide comments or solution for this?
>
> Subject: Solaris 10 ZFS problems with database applications
>
>
> HELIOS TechInfo #106
> 
>
>
> Tue, 20 Feb 2007
>
> Solaris 10 ZFS problems with database applications
> --
>
> We have tested Solaris 10 release 11/06 with ZFS without any problems
> using all HELIOS UB based products, including very high load tests.
>
> However we learned from customers that some database solutions (known
> are Sybase and Oracle), when allocating a large amount of memory may
> slow down or even freeze the system for up to a minute. This can
> result in RPC timeout messages and service interrupts for HELIOS
> processes. ZFS is basically using most memory for file caching.
> Freeing this ZFS memory for the database memory allocation can result
> into serious delays. This does not occur when using HELIOS products
> only.
>
> HELIOS tested system was using 4GB memory.
> Customer production machine was using 16GB memory.
>
>
> Contact your SUN representative how to limit the ZFS cache and what
> else to consider using ZFS in your workflow.
>
> Check also with your application vendor for recommendations using ZFS
> with their applications.
>
>
> Best regards,
>
> HELIOS Support
>
> HELIOS Software GmbH
> Steinriede 3
> 30827 Garbsen (Hannover)
> Germany
>
> Phone:  +49 5131 709320
> FAX:+49 5131 709325
> http://www.helios.de
>
> --
>   * Erik Vanden Meersch *
> Solution Architect
>
> *Sun Microsystems, Inc.*
> Phone x48835/+32-2-704 8835
> Mobile 0479/95 05 98
> Email [EMAIL PROTECTED]
>
>
> 
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Another paper

2007-02-22 Thread Eric Schrock
On Thu, Feb 22, 2007 at 10:45:04AM -0800, Olaf Manczak wrote:
> 
> Obviously, scrubbing and correcting "hard" errors that result in
> ZFS checksum errors is very beneficial. However, it won't address the
> case of "soft" errors when the disk returns correct data but
> observes some problems reading it. There are at least two good reasons
> to pay attention to these "soft" errors:
> 
> a) Preemptive detection and rewriting of partially defective but
>still correctable sectors may prevent future data loss. Thus,
>it improves perceived reliability of disk drives, which is
>especially important in the JBOD case (including a single-drive JBOD).

These types of soft errors will be logged, managed, and (eventually)
diagnosed by SCSI FMA work currently in development.  If the SCSI DE
diagnoses a disk as faulty, then the ZFS agent will be able to respond
appropriately.

> b) It is not uncommon for such successful reads of partially defective
>media to happen only after several retries. It is somewhat unfortunate
>that there is no simple way to tell the drive how many times to retry.
>Firmware in ATA/SATA drives, used predominantly in single-disk PCs,
>will typically do a heroic effort to retrieve the data. It will
>make numerous attempts to reposition the actuator, recalibrate the
>head current, etc. It can take up to 20-40 seconds! Such strategy
>is reasonable for a desktop PC but in it happens in an busy
>enterprise file server it results in a temporary availability loss
>(the drive freezes for up 20-40 seconds every time you try to
>read this sector). Also, this strategy does not make any sense if
>a RAID group in which the drive participates has redundant data
>elsewhere, which is why SCSI/FC drives give up after a few retries.
> 
> One can detect (and repair) such problematic areas on disk by monitoring
> the  SMART counters during scrubbing, and/or by monitoring physical
> read timings (looking for abnormally slow ones).

Solaris currently has a disk monitoring FMA module that is specific to
Thumper (x4500) and monitors only the most basic information (overtemp,
self-test fail, predictive failure).  I have separated this out into a
common FMA transport module which will bring this functionality to all
platforms (though support for SCSI devices will depend on the
aforementioned SCSI FMA portfolio).  This should be putback soon.
Future work could expand this beyond the simple indicators into more
detailed analysis of various counters.

All of this is really a common FMA problem, not ZFS-specific.  All that
is needed in ZFS is an agent actively responding to external diagnoses.
I am laying the groundwork for this as part of ongoing ZFS/FMA work
mentioned in other threads.  For more information on ongoing FMA work, I
recommend visiting the FMA discussion forum.

- Eric

--
Eric Schrock, Solaris Kernel Development   http://blogs.sun.com/eschrock
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Another paper

2007-02-22 Thread Olaf Manczak

Eric Schrock wrote:


1. Some sort of background process to proactively find errors on disks
   in use by ZFS.  This will be accomplished by a background scrubbing
   option, dependent on the block-rewriting work Matt and Mark are
   working on.  This will allow something like "zpool set scrub=2weeks",
   which will tell ZFS to "scrub my data at an interval such that all
   data is touched over a 2 week period".


Obviously, scrubbing and correcting "hard" errors that result in
ZFS checksum errors is very beneficial. However, it won't address the
case of "soft" errors when the disk returns correct data but
observes some problems reading it. There are at least two good reasons
to pay attention to these "soft" errors:

a) Preemptive detection and rewriting of partially defective but
   still correctable sectors may prevent future data loss. Thus,
   it improves perceived reliability of disk drives, which is
   especially important in the JBOD case (including a single-drive JBOD).

b) It is not uncommon for such successful reads of partially defective
   media to happen only after several retries. It is somewhat unfortunate
   that there is no simple way to tell the drive how many times to retry.
   Firmware in ATA/SATA drives, used predominantly in single-disk PCs,
   will typically do a heroic effort to retrieve the data. It will
   make numerous attempts to reposition the actuator, recalibrate the
   head current, etc. It can take up to 20-40 seconds! Such strategy
   is reasonable for a desktop PC but in it happens in an busy
   enterprise file server it results in a temporary availability loss
   (the drive freezes for up 20-40 seconds every time you try to
   read this sector). Also, this strategy does not make any sense if
   a RAID group in which the drive participates has redundant data
   elsewhere, which is why SCSI/FC drives give up after a few retries.

One can detect (and repair) such problematic areas on disk by monitoring
the  SMART counters during scrubbing, and/or by monitoring physical
read timings (looking for abnormally slow ones).

-- Olaf

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS multi-threading

2007-02-22 Thread eric kustarz


On Feb 22, 2007, at 10:01 AM, Carisdad wrote:


eric kustarz wrote:


On Feb 9, 2007, at 8:02 AM, Carisdad wrote:

I've seen very good performance on streaming large files to ZFS  
on a T2000.  We have been looking at using the T2000 as a disk  
storage unit for backups.  I've been able to push over 500MB/s to  
the disks. Setup is EMC Clariion CX3 with 84 500GB SATA drives  
connected w/ 4Gbps all the way to the disk shelves.  The 84  
drives are presented as raw luns to the T2000 -- no HW RAID  
enabled on the Clariion.  The problem we've seen comes when  
enabling compression, as that is single threaded per zpool.   
Enabling compression drops our throughput to 12-15MB/s per pool.
This is bugid: 6460622, the fix is apparently set to be put back  
into Nevada fairly soon.


This was just putback and will be in snv_59.

enjoy your unconstrained compressed I/O,
eric



Awesome, thank you.

I almost hate to ask, but...  Is there any way to track when the  
fix will make its way back into Solaris 10 via update4(?) or a patch?


we're gonna try and squeeze it into s10u4, no guarantees just yet  
though.  If it makes it, i'll update the list...


eric

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS multi-threading

2007-02-22 Thread Carisdad

eric kustarz wrote:


On Feb 9, 2007, at 8:02 AM, Carisdad wrote:

I've seen very good performance on streaming large files to ZFS on a 
T2000.  We have been looking at using the T2000 as a disk storage 
unit for backups.  I've been able to push over 500MB/s to the disks. 
Setup is EMC Clariion CX3 with 84 500GB SATA drives connected w/ 
4Gbps all the way to the disk shelves.  The 84 drives are presented 
as raw luns to the T2000 -- no HW RAID enabled on the Clariion.  The 
problem we've seen comes when enabling compression, as that is single 
threaded per zpool.  Enabling compression drops our throughput to 
12-15MB/s per pool.
This is bugid: 6460622, the fix is apparently set to be put back into 
Nevada fairly soon.


This was just putback and will be in snv_59.

enjoy your unconstrained compressed I/O,
eric



Awesome, thank you.

I almost hate to ask, but...  Is there any way to track when the fix 
will make its way back into Solaris 10 via update4(?) or a patch?


Thanks again,

-Andy
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] HELIOS and ZFS cache

2007-02-22 Thread Mark Maybee

This issue has been discussed a number of times in this forum.
To summerize:

ZFS (specifically, the ARC) will try to use *most* of the systems
available memory to cache file system data.  The default is to
max out at physmem-1GB (i.e., use all of physical memory except
for 1GB).  In the face of memory pressure, the ARC will "give up"
memory, however there are some situations where we are unable to
free up memory fast enough for an application that needs it (see
example in the HELIOS note below).  In these situations, it may
be necessary to lower the ARCs maximum memory footprint, so that
there is a larger amount of memory immediately available for
applications.  This is particularly relevant in situations where
there is a known amount of memory that will always be required for
use by some application (databases often fall into this category).
The tradeoff here is that the ARC will not be able to cache as much
file system data, and that could impact performance.

For example, if you know that an application will need 5GB on a
36GB machine, you could set the arc maximum to 30GB (0x78000).

In ZFS on on10 prior to update 4, you can only change the arc max
size via explicit actions with mdb(1):

# mdb -kw
> arc::print -a c_max
 c_max = 
> /Z 

In the current opensolaris nevada bits, and in s10u4, you can use
the system variable 'zfs_arc_max' to set the maximum arc size.  Just
set this in /etc/system.

-Mark

Erik Vanden Meersch wrote:


Could someone please provide comments or solution for this?

Subject: Solaris 10 ZFS problems with database applications


HELIOS TechInfo #106



Tue, 20 Feb 2007

Solaris 10 ZFS problems with database applications
--

We have tested Solaris 10 release 11/06 with ZFS without any problems
using all HELIOS UB based products, including very high load tests.

However we learned from customers that some database solutions (known
are Sybase and Oracle), when allocating a large amount of memory may
slow down or even freeze the system for up to a minute. This can
result in RPC timeout messages and service interrupts for HELIOS
processes. ZFS is basically using most memory for file caching.
Freeing this ZFS memory for the database memory allocation can result
into serious delays. This does not occur when using HELIOS products
only.

HELIOS tested system was using 4GB memory.
Customer production machine was using 16GB memory.


Contact your SUN representative how to limit the ZFS cache and what
else to consider using ZFS in your workflow.

Check also with your application vendor for recommendations using ZFS
with their applications.


Best regards,

HELIOS Support

HELIOS Software GmbH
Steinriede 3
30827 Garbsen (Hannover)
Germany

Phone:  +49 5131 709320
FAX:+49 5131 709325
http://www.helios.de

--
  * Erik Vanden Meersch *
Solution Architect

*Sun Microsystems, Inc.*
Phone x48835/+32-2-704 8835
Mobile 0479/95 05 98
Email [EMAIL PROTECTED]




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS multi-threading

2007-02-22 Thread eric kustarz


On Feb 9, 2007, at 8:02 AM, Carisdad wrote:

I've seen very good performance on streaming large files to ZFS on  
a T2000.  We have been looking at using the T2000 as a disk storage  
unit for backups.  I've been able to push over 500MB/s to the  
disks. Setup is EMC Clariion CX3 with 84 500GB SATA drives  
connected w/ 4Gbps all the way to the disk shelves.  The 84 drives  
are presented as raw luns to the T2000 -- no HW RAID enabled on the  
Clariion.  The problem we've seen comes when enabling compression,  
as that is single threaded per zpool.  Enabling compression drops  
our throughput to 12-15MB/s per pool.
This is bugid: 6460622, the fix is apparently set to be put back  
into Nevada fairly soon.


This was just putback and will be in snv_59.

enjoy your unconstrained compressed I/O,
eric

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS failed Disk Rebuild time on x4500

2007-02-22 Thread Bart Smaalders





I've measured resync on some slow IDE disks (*not* an X4500) at an average
of 20 MBytes/s.  So if you have a 500 GByte drive, that would resync a 100%
full file system in about 7 hours versus 11 days for some other systems



My experience is that a set of 80% full 250 MB drives took a bit less
than 2 hours each  to replace in a 4x raidz config.  The majority of
space used was taken by large files (isos, music and movie files
(yes, I have teenagers)), although there's a large number of small files 
as well.  This makes for a performance of a bit less than 40 MB/sec 
during resilvering.  The system was pretty sluggish during this 
operation, but it had only got 1GB of RAM, half of which

firefox wanted :-/.

This was build 55 of Nevada.

- Bart


--
Bart Smaalders  Solaris Kernel Performance
[EMAIL PROTECTED]   http://blogs.sun.com/barts
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Another paper

2007-02-22 Thread Nicolas Williams
On Wed, Feb 21, 2007 at 04:20:58PM -0800, Eric Schrock wrote:
> Seems like there are a two pieces you're suggesting here:
> 
> 1. Some sort of background process to proactively find errors on disks
>in use by ZFS.  This will be accomplished by a background scrubbing
>option, dependent on the block-rewriting work Matt and Mark are
>working on.  This will allow something like "zpool set scrub=2weeks",
>which will tell ZFS to "scrub my data at an interval such that all
>data is touched over a 2 week period".  This will test reading from
>every block and verifying checksums.  Stressing write failures is a
>little more difficult.

I got the impression that testing free disk space was also desired.

> 2. Distinguish "slow" drives from "normal" drives and proactively mark
>them faulted.  This shouldn't require an explicit "zpool dft", as
>we should be watching the response times of the various drives and
>keep this as a statistic.  We want to incorporate this information
>to allow better allocation amongst slower and faster drives.
>Determining that a drive is "abnormally slow" is much more difficult,
>though it could theoretically be done if we had some basis - either
>historical performance for the same drive or comparison to identical
>drives (manufacturer/model) within the pool.  While we've thought
>about these same issues, there is currently no active effort to keep
>track of these statistics or do anything with them.

I would imagine that "slow" as in "long average seek times" should be
relatively easy to detect, whereas "slow" as in "low bandwidth" might be
harder (since I/O bandwidth might depend on characteristics of the
device path and how saturated it is).  Are long average seek times an
indication of trouble?

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: suggestion: directory promotion to filesystem

2007-02-22 Thread Adrian Saul
thanks for the replies - I imagined it would have been discussed but must have 
been searching the wrong terms :)

Any idea on the timeline or future of "zfs split" ?

Cheers,
 Adrian
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] HELIOS and ZFS cache

2007-02-22 Thread Erik Vanden Meersch


Could someone please provide comments or solution for this?

Subject: Solaris 10 ZFS problems with database applications


HELIOS TechInfo #106



Tue, 20 Feb 2007

Solaris 10 ZFS problems with database applications
--

We have tested Solaris 10 release 11/06 with ZFS without any problems
using all HELIOS UB based products, including very high load tests.

However we learned from customers that some database solutions (known
are Sybase and Oracle), when allocating a large amount of memory may
slow down or even freeze the system for up to a minute. This can
result in RPC timeout messages and service interrupts for HELIOS
processes. ZFS is basically using most memory for file caching.
Freeing this ZFS memory for the database memory allocation can result
into serious delays. This does not occur when using HELIOS products
only.

HELIOS tested system was using 4GB memory.
Customer production machine was using 16GB memory.


Contact your SUN representative how to limit the ZFS cache and what
else to consider using ZFS in your workflow.

Check also with your application vendor for recommendations using ZFS
with their applications.


Best regards,

HELIOS Support

HELIOS Software GmbH
Steinriede 3
30827 Garbsen (Hannover)
Germany

Phone:  +49 5131 709320
FAX:+49 5131 709325
http://www.helios.de

--
  * Erik Vanden Meersch *
Solution Architect

*Sun Microsystems, Inc.*
Phone x48835/+32-2-704 8835
Mobile 0479/95 05 98
Email [EMAIL PROTECTED]

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS failed Disk Rebuild time on x4500

2007-02-22 Thread Robert Milkowski
Hello Richard,

Thursday, February 22, 2007, 3:32:07 AM, you wrote:

RE> Nissim Ben Haim wrote:
>> I was asked by a customer considering the x4500 - how much time should 
>> it take to rebuild a failed Disk under RaidZ ?
>> This question keeps popping because customers perceive software RAID as 
>> substantially inferior to HW raids.
>> I could not find someone who has really measured this under several 
>> scenarios.

While hot-spare support is currently lacking I can't agree that SW
RAID is inferior to HW RAID - actually the opposite in many
situations. I see that the customer is considering x4500 - so no
centralized SAN solution is needed.

The other factor is what is his workload and what kind of RAID is he
going to use? With raid-z he/she should keep in mind that random reads
on raid-z won't give stellar performance, while writes will.



-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Another paper

2007-02-22 Thread Joerg Schilling
Richard Elling <[EMAIL PROTECTED]> wrote:

> > If a disk fitness test were available to verify disk read/write and 
> > performance, future drive problems could be avoided.
> > 
> > Some example tests:
> > - full disk read
> > - 8kb r/w iops
> > - 1mb r/w iops
> > - raw throughput
>
> Some problems can be seen by doing a simple sequential read and comparing
> it to historical data.  It depends on the failure mode, though.

Something that people often forget about are the bearings. Sometimes,
the disks do write too early asuming that the head is already on track.
The work out bearing however causes a track following problem

For this reason, you need to run a random write test on old disks.
sformat includes such a test...

Jörg

-- 
 EMail:[EMAIL PROTECTED] (home) Jörg Schilling D-13353 Berlin
   [EMAIL PROTECTED](uni)  
   [EMAIL PROTECTED] (work) Blog: http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/old/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS failed Disk Rebuild time on x4500

2007-02-22 Thread Robert Milkowski
Hello Richard,

Thursday, February 22, 2007, 3:32:07 AM, you wrote:

RE> Nissim Ben Haim wrote:
>> I was asked by a customer considering the x4500 - how much time should 
>> it take to rebuild a failed Disk under RaidZ ?
>> This question keeps popping because customers perceive software RAID as 
>> substantially inferior to HW raids.
>> I could not find someone who has really measured this under several 
>> scenarios.

RE> It is a function of the amount of space used.  As space used -> 0, it
RE> becomes infinitely fast.  As space used -> 100% is approaches the speed
RE> of the I/O subsystem.  In my experience, no hardware RAID array comes
RE> close, they all throttle the resync, though some of them allow you to
RE> tune it a little bit.  The key advantage over a hardware RAID system is
RE> that ZFS knows where the data is and doesn't need to replicate unused
RE> space.  A hardware RAID array doesn't know anything about the data, so
RE> it must reconstruct the entire disk.

RE> Also, the reconstruction is done in time order.  See Jeff Bonwick's blog:
RE> http://blogs.sun.com/bonwick/entry/smokin_mirrors

RE> I've measured resync on some slow IDE disks (*not* an X4500) at an average
RE> of 20 MBytes/s.  So if you have a 500 GByte drive, that would resync a 100%
RE> full file system in about 7 hours versus 11 days for some other systems
RE> (who shall remain nameless :-)

I wish it worked that good.

raid-z2 made of 11 disks on x4500.
When server wasn't doing anything - just resync, with pool almost full
with lots of small files, it took about 2 days to re-sync. I think
with lot of small files and almost full pool classic approach to
resync is actually faster.

With a server under some load (not that big) it took about two weeks!
to re-sync disk.

Then you've got to remember that you can't create new snapshots until
resync finishes or it will start all over again and you never resync.
This is known bug and very, very annoying on x4500 (or any other
server).

You should also keep in mind that current hot-spare support in ZFS is somewhat
lacking - if write fails to a given disk system will panic instead put
hot-spare in. Then after reboot depending on a disk failure type zfs
will note disk is bad and use hot-spare or it will not (if it can
still open that disk).

Also if you have a failing disk (I had) you can manually initiate
re-sync however until it finishes (two weeks???) you can't replace the
old drive online as it needs to export an pool! You also can't detach
a disk from raid-z[12] group until you re-sync new one.

IMHO current hot-spare support in ZFS is very basic only and needs lot
of work to be done.

-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: How much do we really want zpool remove?

2007-02-22 Thread przemolicc
On Wed, Feb 21, 2007 at 04:43:34PM +0100, [EMAIL PROTECTED] wrote:
> 
> >I cannot let you say that.
> >Here in my company we are very interested in ZFS, but we do not care
> >about the RAID/mirror features, because we already have a SAN with
> >RAID-5 disks, and dual fabric connection to the hosts.
> 
> But you understand that these underlying RAID mechanism give absolutely
> no guarantee about data integrity but only that some data was found were
> some (possibly other) data was written?  (RAID5 never verifies the
> checkum is correct on reads; it only uses it to reconstruct data when
> reads fail)

But you understand that he perhaps knows that but so far nothing wrong
happened [*] and migration is still very important feature for him ?

[*] almost every big company has its data center with SAN and FC
connections with RAID-5 or RAID-10 in their storage arrays
and they are treated as reliable

przemol

--
Wpadka w kosciele - zobacz >> http://link.interia.pl/f19ea

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] speedup 2-8x of "tar xf" on ZFS

2007-02-22 Thread Thomas Maier-Komor
Hi,

now, as I'm back to Germany,I've got access to my machine at home with ZFS, so 
I could test my binary patch for multi-threading with tar on a ZFS filesystems.

Results look like this:
.tar, small files (e.g. gcc source tree), speedup: x8
.tar.gz, small files (gcc sources tree), speedup x4
.tar, medium size files (e.g. object files of a compile binutil tree), speedup 
x5
.tar.gz, medium size files, speedup x2-x3

Speedup is a comparison of the wallclock time (timex real) of tar with the 
patched multi-threaded tar, where the patched version is 2x-8x faster. Be aware 
that on UFS filesystem it is about 1:1 speed - you may even suffer a 5%-10% 
decrease of performance.

This test was on a Blade 2500, with 5GB RAM (i.e. everything in cache) running 
Solaris 10U3, and a ZFS filesystem on two 10k rpm 146G SCSI drives arranged as 
a ZFS mirror.

To me this looks like a pretty good speedup. If you also want to benefit from 
this patch, grab it here (http://www.maier-komor.de/mtwrite.html). The current 
version includes a wrapper for tar called mttar, to ease use, and has some 
enhancements concerning performance and errorhandling (see Changelog for 
details).

Have fun with Solaris!

Cheers,
Thomas
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss