Re: [zfs-discuss] [zfs] Re: how to know available disk space, 22% free space missing

2013-02-20 Thread Pasi Kärkkäinen
Hello,

Any comments/suggestions about this would be very nice.. 

Thanks!

-- Pasi

On Fri, Feb 08, 2013 at 05:09:56PM +0200, Pasi Kärkkäinen wrote:
 
 I'm seeing weird output aswell:
 
 # zpool list foo
 NAME  SIZE  ALLOC   FREECAP  DEDUP  HEALTH  ALTROOT
 foo  5.44T  4.44T  1023G81%  14.49x  ONLINE  -
 
 # zfs list | grep foo
 foo  62.9T  0   250G  /volumes/foo
 foo/.nza-reserve   31K   100M31K  none
 foo/foo  62.6T  0  62.6T  /volumes/foo/foo
 
 # zfs list -o space foo
 NAME AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
 foo  0  62.9T 0250G  0  62.7T
 
 # zfs list -o space foo/foo
 NAME AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
 foo/foo  0  62.6T 0   62.6T  0  0
 
 
 What's the correct way of finding out what actually uses/reserves that 1023G 
 of FREE in the zpool? 
 
 At this point the filesystems are full, and it's not possible to write to 
 them anymore.
 Also creating new filesystems to the pool fail:
 
 Operation completed with error: cannot create 'foo/Test': out of space
 
 So the zpool is full for real.
 
 I'd like to better understand what actually uses that 1023G of FREE space 
 reported by zpool..
 1023G out of 4.32T is around 22% overhead..
 zpool foo consists of 3x mirror vdevs, so there's no raidz involved.
 
 62.6T / 14.49x dedup-ratio = 4.32T 
 Which is pretty close to the ALLOC value reported by zpool.. 
 
 Data on the filesystem is VM images written over NFS.
 
 
 Thanks,
 
 -- Pasi
 
 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] how to know available disk space

2013-02-08 Thread Pasi Kärkkäinen
On Wed, Feb 06, 2013 at 08:03:13PM -0700, Jan Owoc wrote:
 On Wed, Feb 6, 2013 at 4:26 PM, Edward Ned Harvey
 (opensolarisisdeadlongliveopensolaris)
 opensolarisisdeadlongliveopensola...@nedharvey.com wrote:
 
  When I used zpool status after the system crashed, I saw this:
  NAME  SIZE  ALLOC   FREE  EXPANDSZCAP  DEDUP  HEALTH  ALTROOT
  storage   928G   568G   360G -61%  1.00x  ONLINE  -
 
  I did some cleanup, so I could turn things back on ... Freed up about 4G.
 
  Now, when I use zpool status I see this:
  NAME  SIZE  ALLOC   FREE  EXPANDSZCAP  DEDUP  HEALTH  ALTROOT
  storage   928G   564G   364G -60%  1.00x  ONLINE  -
 
  When I use zfs list storage I see this:
  NAME  USED  AVAIL  REFER  MOUNTPOINT
  storage   909G  4.01G  32.5K  /storage
 
  So I guess the lesson is (a) refreservation and zvol alone aren't enough to
  ensure your VM's will stay up.  and (b) if you want to know how much room is
  *actually* available, as in usable, as in, how much can I write before I
  run out of space, you should use zfs list and not zpool status
 
 Could you run zfs list -o space storage? It will show how much is
 used by the data, the snapshots, refreservation, and children (if
 any). I read somewhere that one should always use zfs list to
 determine how much space is actually available to be written on a
 given filesystem.
 
 I have an idea, but it's a long shot. If you created more than one zfs
 on that pool, and added a reservation to each one, then that space is
 still technically unallocated as far as zpool list is concerned, but
 is not available to writing when you do zfs list. I would imagine
 you have one or more of your VMs that grew outside of their
 refreservation and now crashed for lack of free space on their zfs.
 Some of the other VMs aren't using their refreservation (yet), so they
 could, between them, still write 360GB of stuff to the drive.
 

I'm seeing weird output aswell:

# zpool list foo
NAME  SIZE  ALLOC   FREECAP  DEDUP  HEALTH  ALTROOT
foo  5.44T  4.44T  1023G81%  14.49x  ONLINE  -

# zfs list | grep foo
foo  62.9T  0   250G  /volumes/foo
foo/.nza-reserve   31K   100M31K  none
foo/foo  62.6T  0  62.6T  /volumes/foo/foo

# zfs list -o space foo
NAME AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
foo  0  62.9T 0250G  0  62.7T

# zfs list -o space foo/foo
NAME AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
foo/foo  0  62.6T 0   62.6T  0  0


What's the correct way of finding out what actually uses/reserves that 1023G of 
FREE in the zpool? 

At this point the filesystems are full, and it's not possible to write to them 
anymore.
Also creating new filesystems to the pool fail:

Operation completed with error: cannot create 'foo/Test': out of space

So the zpool is full for real.

I'd like to better understand what actually uses that 1023G of FREE space 
reported by zpool..
1023G out of 4.32T is around 22% overhead..
zpool foo consists of 3x mirror vdevs, so there's no raidz involved.

62.6T / 14.49x dedup-ratio = 4.32T 
Which is pretty close to the ALLOC value reported by zpool.. 

Data on the filesystem is VM images written over NFS.


Thanks,

-- Pasi

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] how to know available disk space

2013-02-08 Thread Pasi Kärkkäinen
On Fri, Feb 08, 2013 at 09:47:38PM +, Edward Ned Harvey 
(opensolarisisdeadlongliveopensolaris) wrote:
  From: Pasi Kärkkäinen [mailto:pa...@iki.fi]
  
  What's the correct way of finding out what actually uses/reserves that 1023G
  of FREE in the zpool?
 
 Maybe this isn't exactly what you need, but maybe:
 
 for fs in `zfs list -H -o name` ; do echo $fs ; zfs get 
 reservation,refreservation,usedbyrefreservation $fs ; done


I checked this and there are no reservations configured (or well, there are the 
100MB defaults, but not more than that). So reservations don't explain this..
 
 
  At this point the filesystems are full, and it's not possible to write to 
  them
  anymore.
 
 You'll have to either reduce your reservations, or destroy old snapshots.  Or 
 add more disks.


There aren't any snapshots either.. I know adding disks will fix the problem,
but I'd like to understand why zpool says there is almost 1TB of FREE space 
when clearly there isn't..


Thanks for the reply!

-- Pasi

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] RFE: Un-dedup for unique blocks

2013-02-03 Thread Pasi Kärkkäinen
On Sun, Jan 20, 2013 at 07:51:15PM -0800, Richard Elling wrote:
 
  2. VAAI support.
 
VAAI has 4 features, 3 of which have been in illumos for a long time. The
remaining
feature (SCSI UNMAP) was done by Nexenta and exists in their NexentaStor
product,
but the CEO made a conscious (and unpopular) decision to keep that code
from the
community. Over the summer, another developer picked up the work in the
community,
but I've lost track of the progress and haven't seen an RTI yet.
 

I assume SCSI UNMAP is implemented in Comstar in NexentaStor? 
Isn't Comstar CDDL licensed? 

There's also this:
https://www.illumos.org/issues/701

.. which says UNMAP support was added to Illumos Comstar 2 years ago.


-- Pasi

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Has anyone used a Dell with a PERC H310?

2013-01-08 Thread Pasi Kärkkäinen
On Tue, Jan 08, 2013 at 06:36:18AM -0500, Ray Arachelian wrote:
 On 01/07/2013 04:16 PM, Sašo Kiselkov wrote:
  PERC H200 are well behaved cards that are easy to reflash and work
  well (even in JBOD mode) on Illumos - they are essentially a LSI SAS
  9211. If you can get them, they're one heck of a reliable beast, and
  cheap too!
 
 I've had trouble with one of those (Dell PERC H200) in a Z68X-UD3H-B3
 motherboard.  When it was inserted in any slot, the machine wouldn't
 power on.  I put it in a Dell desktop I borrowed for a day and it worked
 there.  Any idea as to what might be the trouble?  Couldn't even get it
 working long enough to attempt to reflash its BIOS.
 
 The machine would power on for a few seconds and immediately turn off.


wild guess: Not enough available PCI option rom memory for the H200 card on 
that motherboard? 

-- Pasi

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Appliance as a general-purpose server question

2012-11-29 Thread Pasi Kärkkäinen
On Tue, Nov 27, 2012 at 08:52:06AM +0100, Grégory Giannoni wrote:
 
 The LSI 9240-4I was not able to connect to the 25-drives bay ; Not tested  
 LSI 9260-16I or LSI 9280-24i.
 

What was the problem connecting LSI 9240-4i to the 25-drives bay?

-- Pasi

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Appliance as a general-purpose server question

2012-11-29 Thread Pasi Kärkkäinen
On Thu, Nov 29, 2012 at 09:42:21AM +0100, Grégory Giannoni wrote:
 
 Le 29 nov. 2012 à 09:27, Pasi Kärkkäinen a écrit :
  The LSI 9240-4I was not able to connect to the 25-drives bay ; Not tested  
  LSI 9260-16I or LSI 9280-24i.
  
  
  What was the problem connecting LSI 9240-4i to the 25-drives bay?
  
 
 The 25-drives backplane needs two SFF-8087 (multilane cables) to work 
 correctly. The LSI 9240-4i has just one SFF-8087 port.


Yeah, that explains :) 

-- Pasi
 
 Using 2 LSI 9240-4i didn't worked also.
 
 -- 
 Grégory Giannoni
 http://www.wmaker.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] all in one server

2012-09-18 Thread Pasi Kärkkäinen
On Tue, Sep 18, 2012 at 05:30:56PM +0200, Erik Ableson wrote:
 
 If you're running ESXi with a vSphere license, I'd recommend looking at VDR 
 (free with the vCenter license) for backing up the VMs to the little HPs 
 since you get compressed and deduplicated backups that will minimize the 
 replication bandwidth requirements.
 

Don't look at VDR. It's known to be very buggy and corrupt itself in no time. 
Also it's known to do bad restores overwriting *wrong* VMs.

VMware also killed it and replaced it with another product.

-- Pasi

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks

2012-06-26 Thread Pasi Kärkkäinen
On Fri, Jun 15, 2012 at 06:23:42PM -0500, Timothy Coalson wrote:
 Sorry, if you meant distinguishing between true 512 and emulated
 512/4k, I don't know, it may be vendor-specific as to whether they
 expose it through device commands at all.
 

At least on Linux you can see the info from:

/sys/block/disk/queue/logical_block_size=512
/sys/block/disk/queue/physical_block_size=4096


-- Pasi

 Tim
 
 On Fri, Jun 15, 2012 at 6:02 PM, Timothy Coalson tsc...@mst.edu wrote:
  On Fri, Jun 15, 2012 at 5:35 PM, Jim Klimov jimkli...@cos.ru wrote:
  2012-06-16 0:05, John Martin wrote:
 
  Its important to know...
 
  ...whether the drive is really 4096p or 512e/4096p.
 
 
  BTW, is there a surefire way to learn that programmatically
  from Solaris or its derivates
 
  prtvtoc device should show the block size the OS thinks it has.  Or
  you can use format, select the disk from a list that includes the
  model number and size, and use verify.
 
  Tim
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks

2012-06-26 Thread Pasi Kärkkäinen
On Wed, Jun 27, 2012 at 01:42:27AM +0300, Pasi Kärkkäinen wrote:
 On Fri, Jun 15, 2012 at 06:23:42PM -0500, Timothy Coalson wrote:
  Sorry, if you meant distinguishing between true 512 and emulated
  512/4k, I don't know, it may be vendor-specific as to whether they
  expose it through device commands at all.
  
 
 At least on Linux you can see the info from:
 
 /sys/block/disk/queue/logical_block_size=512
 /sys/block/disk/queue/physical_block_size=4096
 

Oh, and also these methods work on Linux:

# hdparm -I /dev/sdc | grep Sector
Logical  Sector size:   512 bytes
Physical Sector size:  4096 bytes
Logical Sector-0 offset:512 bytes

And then there's the BLKPBSZGET ioctl. 
So I'd be surprised if that stuff isn't implemented on *solaris..

-- Pasi

 
  Tim
  
  On Fri, Jun 15, 2012 at 6:02 PM, Timothy Coalson tsc...@mst.edu wrote:
   On Fri, Jun 15, 2012 at 5:35 PM, Jim Klimov jimkli...@cos.ru wrote:
   2012-06-16 0:05, John Martin wrote:
  
   Its important to know...
  
   ...whether the drive is really 4096p or 512e/4096p.
  
  
   BTW, is there a surefire way to learn that programmatically
   from Solaris or its derivates
  
   prtvtoc device should show the block size the OS thinks it has.  Or
   you can use format, select the disk from a list that includes the
   model number and size, and use verify.
  
   Tim
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and spread-spares (kinda like GPFS declustered RAID)?

2012-01-08 Thread Pasi Kärkkäinen
On Sun, Jan 08, 2012 at 06:59:57AM +0400, Jim Klimov wrote:
 2012-01-08 5:37, Richard Elling ??:
 The big question is whether they are worth the effort. Spares solve a 
 serviceability
 problem and only impact availability in an indirect manner. For single-parity
 solutions, spares can make a big difference in MTTDL, but have almost no 
 impact
 on MTTDL for double-parity solutions (eg. raidz2).

 Well, regarding this part: in the presentation linked in my OP,
 the IBM presenter suggests that for a 6-disk raid10 (3 mirrors)
 with one spare drive, overall a 7-disk set, there are such
 options for critical hits to data redundancy when one of
 drives dies:

 1) Traditional RAID - one full disk is a mirror of another
full disk; 100% of a disk's size is critical and has to
be prelicated into a spare drive ASAP;

 2) Declustered RAID - all 7 disks are used for 2 unique data
blocks from original setup and one spare block (I am not
sure I described it well in words, his diagram shows it
better); if a single disk dies, only 1/7 worth of disk
size is critical (not redundant) and can be fixed faster.

For their typical 47-disk sets of RAID-7-like redundancy,
under 1% of data becomes critical when 3 disks die at once,
which is (deemed) unlikely as is.

 Apparently, in the GPFS layout, MTTDL is much higher than
 in raid10+spare with all other stats being similar.

 I am not sure I'm ready (or qualified) to sit down and present
 the math right now - I just heard some ideas that I considered
 worth sharing and discussing ;)


Thanks for the video link (http://www.youtube.com/watch?v=2g5rx4gP6yU). 
It's very interesting!

GPFS Native RAID seems to be more advanced than current ZFS,
and it even has rebalancing implemented (the infamous missing zfs bp-rewrite).

It'd definitely be interesting to have something like this implemented in ZFS.

-- Pasi

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] weird bug with Seagate 3TB USB3 drive

2011-11-13 Thread Pasi Kärkkäinen
On Sat, Nov 12, 2011 at 10:08:04AM -0800, Richard Elling wrote:
 
 On Nov 12, 2011, at 8:31 AM, Pasi Kärkkäinen wrote:
 
  On Sat, Nov 12, 2011 at 08:15:31AM -0500, David Magda wrote:
  On Nov 12, 2011, at 00:55, Richard Elling wrote:
  
  Better than ?
  If the disks advertise 512 bytes, the only way around it is with a 
  whitelist. I would
  be rather surprised if Oracle sells 4KB sector disks for Solaris systems?
  
  Solaris 10. OpenSolaris.
  
  But would it be surprising to use SANs with Solaris? Or perhaps run 
  Solaris under some kind of virtualized environment where the virtual disk 
  has a particular block size? Or maybe SSDs, which tend to 
  read/write/delete in certain block sizes?
  
  In these situations simply assuming 512 may slow things down.
  
  And if Solaris 11 is going to be around for a decade or so, I'd hazard to 
  guess that 512B sector disks will become less and less prevalent as time 
  goes on. Might as well enable the functionality now, when 4K is rarer, so 
  you have more time to test and tunes things out?rather than later when you 
  can potentially be left scrambling.
  
  As Pasi Kärkkäinen mentions, there's not much you can do if the disks lies 
  (just as has been seen with disks that lie about flushing the cache). This 
  is mostly a temporary kludge for legacy's sake. More and more disks will 
  be truthful as times goes on.
  
  
  Most 4kB/sector disks already today properly report both the physical 
  (4kB) and logical (512b) sector sizes.
  It sounds like *solaris is only checking the logical (512b) sector size, 
  not the physical (4kB) sector size..
 
 ZFS uses the physical block size.
 http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/fs/zfs/vdev_disk.c#294
 

Hmm.. so everything should just work? 
Does some other part of the code use logical block size then, for example to 
calculate the ashift? 

Maybe I should read the code :) 

-- Pasi

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] weird bug with Seagate 3TB USB3 drive

2011-11-12 Thread Pasi Kärkkäinen
On Fri, Nov 11, 2011 at 09:55:29PM -0800, Richard Elling wrote:
 On Nov 10, 2011, at 7:47 PM, David Magda wrote:
 
  On Nov 10, 2011, at 18:41, Daniel Carosone wrote:
  
  On Tue, Oct 11, 2011 at 08:17:55PM -0400, John D Groenveld wrote:
  Under both Solaris 10 and Solaris 11x, I receive the evil message:
  | I/O request is not aligned with 4096 disk sector size.
  | It is handled through Read Modify Write but the performance is very low.
  
  I got similar with 4k sector 'disks' (as a comstar target with
  blk=4096) when trying to use them to force a pool to ashift=12. The
  labels are found at the wrong offset when the block numbers change,
  and maybe the GPT label has issues too. 
  
  Anyone know if Solaris 11 has better support for detecting the native block 
  size of the underlying storage?
 
 Better than ?
 If the disks advertise 512 bytes, the only way around it is with a whitelist. 
 I would
 be rather surprised if Oracle sells 4KB sector disks for Solaris systems?


Afaik the disks advertise both the physical and logical sector size..
at least on Linux you can see that the disk emulates 512 bytes/sector,
but natively it uses 4kB/sector.

/sys/block/disk/queue/logical_block_size=512
/sys/block/disk/queue/physical_block_size=4096

The info should be available through IDENTIFY DEVICE (ATA) or READ CAPACITY 16 
(SCSI) commands.

-- Pasi

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] weird bug with Seagate 3TB USB3 drive

2011-11-12 Thread Pasi Kärkkäinen
On Sat, Nov 12, 2011 at 08:15:31AM -0500, David Magda wrote:
 On Nov 12, 2011, at 00:55, Richard Elling wrote:
 
  Better than ?
  If the disks advertise 512 bytes, the only way around it is with a 
  whitelist. I would
  be rather surprised if Oracle sells 4KB sector disks for Solaris systems?
 
 Solaris 10. OpenSolaris.
 
 But would it be surprising to use SANs with Solaris? Or perhaps run Solaris 
 under some kind of virtualized environment where the virtual disk has a 
 particular block size? Or maybe SSDs, which tend to read/write/delete in 
 certain block sizes?
 
 In these situations simply assuming 512 may slow things down.
 
 And if Solaris 11 is going to be around for a decade or so, I'd hazard to 
 guess that 512B sector disks will become less and less prevalent as time goes 
 on. Might as well enable the functionality now, when 4K is rarer, so you have 
 more time to test and tunes things out?rather than later when you can 
 potentially be left scrambling.
 
 As Pasi Kärkkäinen mentions, there's not much you can do if the disks lies 
 (just as has been seen with disks that lie about flushing the cache). This is 
 mostly a temporary kludge for legacy's sake. More and more disks will be 
 truthful as times goes on.
 

Most 4kB/sector disks already today properly report both the physical (4kB) 
and logical (512b) sector sizes.
It sounds like *solaris is only checking the logical (512b) sector size, not 
the physical (4kB) sector size..

-- Pasi

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [OpenIndiana-discuss] Question about WD drives with Super Micro systems

2011-08-08 Thread Pasi Kärkkäinen
On Sat, Aug 06, 2011 at 07:45:31PM +0200, Roy Sigurd Karlsbakk wrote:
  Might this be the SATA drives taking too long to reallocate bad
  sectors? This is a common problem desktop drives have, they will
  stop and basically focus on reallocating the bad sector as long as it
  takes, which causes the raid setup to time out the operation and flag
  the drive as failed. The enterprise sata drives, typically the same
  as the high performing desktop drive, only they have a short timeout
  on how long they are allowed to try and reallocate a bad sector so
  they don't hit the failed drive timeout. Some drive firmwares, such as
  older WD blacks if memory serves, had the ability to be forced to
  behave like the enterprise drive, but WD updated the firmware so this
  is longer possible.
  
  This is why you see SATA drives that typically have almost identical
  specs, but one will be $69 and the other $139 - the former is a
  desktop model while the latter is an enterprise or raid specific
  model. I believe it's called different things by different brands:
  TLER, ERC, and CCTL (?).
 
 I doubt this is about the lack of TLER et al. Some, or most, of the drives 
 ditched by ZFS have shown to be quite good indeed. I guess this is a WD vs 
 Intel SAS expanders issue
 

What exact chassis / backplane / SAS-expander is that? (with Intel SAS 
expander).

-- Pasi

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Question about drive LEDs

2011-06-22 Thread Pasi Kärkkäinen
On Sat, Jun 18, 2011 at 09:49:44PM +0200, Roy Sigurd Karlsbakk wrote:
 Hi all
 
 I have a few machines setup with OI 148, and I can't make the LEDs on the 
 drives work when something goes bad. The chassies are supermicro ones, and 
 work well, normally. Any idea how to make drive LEDs wirk with this setup?
 

Some questions:
- So the Supermicro chassis has SES support? 
- You're able to see which disk in which chassis slot, by the info from SES?
- Are you able to control the LEDs manually through SES? 
- Did you configure FMA in any way? 


-- Pasi

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Impact of L2ARC device failure and SSD recommendations

2011-06-12 Thread Pasi Kärkkäinen
On Sat, Jun 11, 2011 at 08:26:34PM +0400, Jim Klimov wrote:
 2011-06-11 19:15, Pasi Kärkkäinen ??:
 On Sat, Jun 11, 2011 at 08:35:19AM -0500, Edmund White wrote:
 I've had two incidents where performance tanked suddenly, leaving the VM
 guests and Nexenta SSH/Web consoles inaccessible and requiring a full
 reboot of the array to restore functionality. In both cases, it was the
 Intel X-25M L2ARC SSD that failed or was offlined. NexentaStor failed 
 to
 alert me on the cache failure, however the general ZFS FMA alert was
 visible on the (unresponsive) console screen.

 The zpool status output showed:

   cache
   c6t5001517959467B45d0 FAULTED  2   542 0  too many errors

 This did not trigger any alerts from within Nexenta.

 I was under the impression that an L2ARC failure would not impact the
 system. But in this case, it was the culprit. I've never seen any
 recommendations to RAID L2ARC for resiliency. Removing the bad SSD
 entirely from the server got me back running, but I'm concerned about 
 the
 impact of the device failure and the lack of notification from
 NexentaStor.
 IIRC recently there was discussion on this list about firmware bug
 on the Intel X25 SSDs causing them to fail under high disk IO with reset 
 storms.
 Even if so, this does not forgive ZFS hanging - especially
 if it detected the drive failure, and especially if this drive
 is not required for redundant operation.

 I've seen similar bad behaviour on my oi_148a box when
 I tested USB flash devices as L2ARC caches and
 occasionally they died by slightly moving out of the
 USB socket due to vibration or whatever reason ;)

 Similarly, this oi_148a box hung upon loss of SATA
 connection to a drive in the raidz2 disk set due to
 unreliable cable connectors, while it should have
 stalled IOs to that pool but otherwise the system
 should have remained remain responsive (tested
 failmode=continue and failmode=wait on different
 occasions).

 So I can relate - these things happen, they do annoy,
 and I hope they will be fixed sometime soon so that
 ZFS matches its docs and promises ;)


True, definitely sounds like a bug in ZFS aswell..

-- Pasi

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Impact of L2ARC device failure and SSD recommendations

2011-06-11 Thread Pasi Kärkkäinen
On Sat, Jun 11, 2011 at 08:35:19AM -0500, Edmund White wrote:
Posted in greater detail at Server Fault
- [1]http://serverfault.com/q/277966/13325
 
I have an HP ProLiant DL380 G7 system running NexentaStor. The server has
36GB RAM, 2 LSI 9211-8i SAS controllers (no SAS expanders), 2 SAS system
drives, 12 SAS data drives, a hot-spare disk, an Intel X25-M L2ARC cache
and a DDRdrive PCI ZIL accelerator. This system serves NFS to multiple
VMWare hosts. I also have about 90-100GB of deduplicated data on the
array.
 
I've had two incidents where performance tanked suddenly, leaving the VM
guests and Nexenta SSH/Web consoles inaccessible and requiring a full
reboot of the array to restore functionality. In both cases, it was the
Intel X-25M L2ARC SSD that failed or was offlined. NexentaStor failed to
alert me on the cache failure, however the general ZFS FMA alert was
visible on the (unresponsive) console screen.
 
The zpool status output showed:
 
  cache
  c6t5001517959467B45d0 FAULTED  2   542 0  too many errors
 
This did not trigger any alerts from within Nexenta.
 
I was under the impression that an L2ARC failure would not impact the
system. But in this case, it was the culprit. I've never seen any
recommendations to RAID L2ARC for resiliency. Removing the bad SSD
entirely from the server got me back running, but I'm concerned about the
impact of the device failure and the lack of notification from
NexentaStor.
 
What's the current best-choice SSD for L2ARC cache applications these
days? It seems as though the Intel units are no longer well-regarded.
 

IIRC recently there was discussion on this list about firmware bug
on the Intel X25 SSDs causing them to fail under high disk IO with reset 
storms.

Maybe you're hitting that firmware bug.

-- Pasi

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] best migration path from Solaris 10

2011-03-19 Thread Pasi Kärkkäinen
On Fri, Mar 18, 2011 at 06:26:37PM -0700, Michael DeMan wrote:
 ZFSv28 is in HEAD now and will be out in 8.3.
 
 ZFS + HAST in 9.x means being able to cluster off different hardware.
 
 In regards to OpenSolaris and Indiana - can somebody clarify the relationship 
 there?  It was clear with OpenSolaris that the latest/greatest ZFS would 
 always be available since it was a guinea-pig product for cost conscious 
 folks and served as an excellent area for Sun to get marketplace feedback and 
 bug fixes done before rolling updates into full Solaris.
 
 To me it seems that Open Indiana is basically a green branch off of a dead 
 tree - if I am wrong, please enlighten me.
 

Illumos project was started as a fork of OpenSolaris when Oracle was still 
publishing OpenSolaris sources.

Then Oracle closed OpenSolaris development, and decided to call upcoming 
(closed) versions Solaris 11 Express,
with no source included.

Illumos project continued the development based on the latest published 
OpenSolaris sources, 
and a bit later OpenIndiana *distribution* was announced to deliver a binary 
distro based on OpenSolaris/Illumos.

So in short Illumos is the development project, which hosts the new sources, 
and OpenIndiana is a binary distro based on it.


-- Pasi

 On Mar 18, 2011, at 6:16 PM, Roy Sigurd Karlsbakk wrote:
 
  I think we all feel the same pain with Oracle's purchase of Sun.
  
  FreeBSD that has commercial support for ZFS maybe?
  
  Fbsd currently has a very old zpool version, not suitable for running with 
  SLOGs, since if you lose it, you may lose the pool, which isn't very 
  amusing...
  
  Vennlige hilsener / Best regards
  
  roy
  --
  Roy Sigurd Karlsbakk
  (+47) 97542685
  r...@karlsbakk.net
  http://blogg.karlsbakk.net/
  --
  I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det 
  er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse 
  av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer 
  adekvate og relevante synonymer på norsk.
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] native ZFS on Linux

2011-02-13 Thread Pasi Kärkkäinen
On Sat, Feb 12, 2011 at 08:54:26PM +0100, Roy Sigurd Karlsbakk wrote:
  I see that Pinguy OS, an uber-Ubuntu o/s, includes native ZFS support.
  Any pointers to more info on this?
 
 There are some work in progress from http://zfsonlinux.org/, but the posix 
 layer was still lacking last I checked
 

kqstor made the posix layer.

-- Pasi

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and TRIM

2011-01-31 Thread Pasi Kärkkäinen
On Mon, Jan 31, 2011 at 03:41:52PM +0100, Joerg Schilling wrote:
 Brandon High bh...@freaks.com wrote:
 
  On Sat, Jan 29, 2011 at 8:31 AM, Edward Ned Harvey
  opensolarisisdeadlongliveopensola...@nedharvey.com wrote:
   What is the status of ZFS support for TRIM?
 
  I believe it's been supported for a while now.
  http://www.c0t0d0s0.org/archives/6792-SATA-TRIM-support-in-Opensolaris.html
 
 The command is implemented in the sata driver but there does ot seem to be 
 any 
 user of the code.
 

Btw is the SCSI equivalent also implemented? iirc it was called SCSI UNMAP (for 
SAS).

-- Pasi

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] reliable, enterprise worthy JBODs?

2011-01-25 Thread Pasi Kärkkäinen
On Tue, Jan 25, 2011 at 11:53:49AM -0800, Rocky Shek wrote:
 Philip,
 
 You can consider DataON DNS-1600 4U 24Bay 6Gb/s SAS JBOD Storage. 
 http://dataonstorage.com/dataon-products/dns-1600-4u-6g-sas-to-sas-sata-jbod
 -storage.html
 
 It is the best fit for ZFS Storage application. It can be a good replacement
 of Sun/Oracle J4400 and J4200   
 
 There are also Ultra density DNS-1660 4U 60 Bay 6Gb/s SAS JBOD Storage and
 other form factor JBOD.   
 
 http://dataonstorage.com/dataon-products/6g-sas-jbod/dns-1660-4u-60-bay-6g-3
 5inch-sassata-jbod.html
 

Does (Open)Solaris FMA work with these DataON JBODs? 
.. meaning do the failure LEDs work automatically in the case of disk failure?

I guess that requires the SES chip on the JBOD to include proper drive
identification for all slots.

-- Pasi

 
 Rocky
 
 -Original Message-
 From: zfs-discuss-boun...@opensolaris.org
 [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Philip Brown
 Sent: Tuesday, January 25, 2011 10:05 AM
 To: zfs-discuss@opensolaris.org
 Subject: [zfs-discuss] reliable, enterprise worthy JBODs?
 
 So, another hardware question :)
 
 ZFS has been touted as taking maximal advantage of disk hardware, to the
 point where it can be used efficiently and cost-effectively on JBODs, rather
 than having to throw more expensive RAID arrays at it.
 
 Only trouble is.. JBODs seem to have disappeared :(
 Sun/Oracle has discontinued its j4000 line, with no replacement that I can
 see.
 
 IBM seems to have some nice looking hardware in the form of its EXP3500
 expansion trays... but they only support it connected to an IBM (SAS)
 controller... which is only supported when plugged into IBM server hardware
 :(
 
 Any other suggestions for (large-)enterprise-grade, supported JBOD hardware
 for ZFS these days?
 Either fibre or SAS would be okay.
 -- 
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A few questions

2011-01-09 Thread Pasi Kärkkäinen
On Sat, Jan 08, 2011 at 12:33:50PM -0500, Edward Ned Harvey wrote:
  From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
  boun...@opensolaris.org] On Behalf Of Garrett D'Amore
  
  When you purchase NexentaStor from a top-tier Nexenta Hardware Partner,
  you get a product that has been through a rigorous qualification process
 
 How do I do this, exactly?  I am serious.  Before too long, I'm going to
 need another server, and I would very seriously consider reprovisioning my
 unstable Dell Solaris server to become a linux or some other stable machine.
 The role it's currently fulfilling is the backup server, which basically
 does nothing except zfs receive from the primary Sun solaris 10u9 file
 server.  Since the role is just for backups, it's a perfect opportunity for
 experimentation, hence the Dell hardware with solaris.  I'd be happy to put
 some other configuration in there experimentally instead ... say ...
 nexenta.  Assuming it will be just as good at zfs receive from the primary
 server.
 
 Is there some specific hardware configuration you guys sell?  Or recommend?
 How about a Dell R510/R610/R710?  Buy the hardware separately and buy
 NexentaStor as just a software product?  Or buy a somehow more certified
 hardware  software bundle together?
 
 If I do encounter a bug, where the only known fact is that the system keeps
 crashing intermittently on an approximately weekly basis, and there is
 absolutely no clue what's wrong in hardware or software...  How do you guys
 handle it?
 
 If you'd like to follow up offlist, that's fine.  Then just email me at the
 email address:  nexenta at nedharvey.com
 (I use disposable email addresses on mailing lists like this, so at any
 random unknown time, I'll destroy my present alias and start using a new
 one.)
 

Hey,

Other OS's have had problems with the Broadcom NICs aswell..

See for example this RHEL5 bug: 
https://bugzilla.redhat.com/show_bug.cgi?id=520888
Host crashing probably due to MSI-X IRQs with bnx2 NIC..

And VMware vSphere ESX/ESXi 4.1 crashing with bnx2x: 
http://kb.vmware.com/selfservice/microsites/search.do?language=en_UScmd=displayKCexternalId=1029368

So I guess there are firmware/driver problems affecting not just Solaris
but also other operating systems..

-- Pasi

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Looking for 3.5 SSD for ZIL

2010-12-22 Thread Pasi Kärkkäinen
On Wed, Dec 22, 2010 at 11:36:48AM +0100, Stephan Budach wrote:
Hello all,
 
I am shopping around for 3.5 SSDs that I can mount into my storage and
use as ZIL drives.
As of yet, I have only found 3.5 models with the Sandforce 1200, which
was not recommended on this list.


I think the recommendation was not to use SSDs at all for ZIL,
not just specifially Sandforce controllers?

-- Pasi

Does anyone maybe know of a model that has the Sandforce 1500 and is 3.5?
Or any other 3.5 SSD that he/she can recommend?
 
Cheers,
budy

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Looking for 3.5 SSD for ZIL

2010-12-22 Thread Pasi Kärkkäinen
On Wed, Dec 22, 2010 at 01:43:35PM +, Jabbar wrote:
Hello,
 
I was thinking of buying a couple of SSD's until I found out that Trim is
only supported with SATA drives. 


Yes, because TRIM is ATA command. SATA means Serial ATA.
SCSI (SAS) drives have WRITE SAME command, which is the equivalent command 
there.

-- Pasi

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] AHCI or IDE?

2010-12-16 Thread Pasi Kärkkäinen
On Thu, Dec 16, 2010 at 08:43:02PM +0100, Alexander Lesle wrote:
 Hello All,
 
 I want to build a home file and media server now. After experiment with a
 Asus Board and running in unsolve problems I have bought this
 Supermicro Board X8SIA-F with Intel i3-560 and 8 GB Ram
 http://www.supermicro.com/products/motherboard/Xeon3000/3400/X8SIA.cfm?IPMI=Y
 also the LSI HBA SAS 9211-8i
 http://lsi.com/storage_home/products_home/host_bus_adapters/sas_hbas/internal/sas9211-8i/index.html
 
 rpool = 2vdev mirror
 tank = 2 x 2vdev mirror. For the future I want to have the option to
 expand up to 12 x 2vdev mirror.
 
 After reading the board manual I found at page 4-9 where I can set
 SATA#1 from IDE to AHCI.
 
 Can zfs handle AHCI for rpool?
 Can zfs handle AHCI for tank?
 
 Thx for helping.


You definitely want to use AHCI and not the legacy IDE.

AHCI enables:
- disk hotswap.
- NCQ (Native Command Queuing) to execute multiple commands at the same 
time.


-- Pasi

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool does not like iSCSI ?

2010-11-30 Thread Pasi Kärkkäinen
On Tue, Nov 09, 2010 at 04:18:17AM -0800, Andreas Koppenhoefer wrote:
 From Oracle Support we got the following info:
 
 Bug ID: 6992124 reboot of Sol10 u9 host makes zpool FAULTED when zpool uses 
 iscsi LUNs
 This is a duplicate of:
 Bug ID: 6907687 zfs pool is not automatically fixed when disk are brought 
 back online or after boot
 
 An IDR patch already exists, but no official patch yet.
 

Do you know if these bugs are fixed in Solaris 11 Express ?

-- Pasi

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Faster than 1G Ether... ESX to ZFS

2010-11-17 Thread Pasi Kärkkäinen
On Wed, Nov 17, 2010 at 10:14:10AM +, Bruno Sousa wrote:
Hi all,
 
Let me tell you all that the MC/S *does* make a difference...I had a
windows fileserver using an ISCSI connection to a host running snv_134
with an average speed of 20-35 mb/s...After the upgrade to snv_151a
(Solaris 11 express) this same fileserver got a performance boost and now
has an average speed of 55-60mb/s.
 
Not double performance, but WAY better , specially if we consider that
this performance boost was purely software based :)
 

Did you verify you're using more connections after the update? 
Or was is just *other* COMSTAR (and/or kernel) updates making the difference..

-- Pasi


 
 
Nice...nice job COMSTAR guys!
 
 
 
Bruno
 
 
 
On Tue, 16 Nov 2010 19:49:59 -0500, Jim Dunham james.dun...@oracle.com
wrote:
 
  On Nov 16, 2010, at 6:37 PM, Ross Walker wrote:
 
On Nov 16, 2010, at 4:04 PM, Tim Cook [1]...@cook.ms wrote:
 
  AFAIK, esx/i doesn't support L4 hash, so that's a non-starter.
 
For iSCSI one just needs to have a second (third or fourth...) iSCSI
session on a different IP to the target and run mpio/mpxio/mpath
whatever your OS calls multi-pathing.
 
  MC/S (Multiple Connections per Sessions) support was added to the iSCSI
  Target in COMSTAR, now available in Oracle Solaris 11 Express.
  - Jim
 
-Ross
___
zfs-discuss mailing list
[2]zfs-disc...@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 
  --
  This message has been scanned for viruses and
  dangerous content by [3]MailScanner, and is
  believed to be clean.
 
 
 
  --
  Bruno Sousa
 
--
This message has been scanned for viruses and
dangerous content by [4]MailScanner, and is
believed to be clean.
 
 References
 
Visible links
1. mailto:t...@cook.ms
2. mailto:zfs-discuss@opensolaris.org
3. http://www.mailscanner.info/
4. http://www.mailscanner.info/

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Finding corrupted files

2010-10-16 Thread Pasi Kärkkäinen
On Sat, Oct 16, 2010 at 08:38:28AM -0700, Richard Elling wrote:
On Oct 15, 2010, at 6:18 AM, Stephan Budach wrote:
 
  So, what would you suggest, if I wanted to create really big pools? Say
  in the 100 TB range? That would be quite a number of single drives then,
  especially when you want to go with zpool raid-1.
 
For 100 TB, the methods change dramatically.  You can't just reload 100 TB
from CD
or tape. When you get to this scale you need to be thinking about raidz2+
*and*
mirroring.
I will be exploring these issues of scale at the Techniques for Managing
Huge
Amounts of Data tutorial at the USENIX LISA '10 Conference.
[1]http://www.usenix.org/events/lisa10/training/

Hopefully your presentation will be available online after the event!

-- Pasi

 -- richard
 
--
OpenStorage Summit, October 25-27, Palo Alto, CA
[2]http://nexenta-summit2010.eventbrite.com
USENIX LISA '10 Conference November 8-16
ZFS and performance consulting
[3]http://www.RichardElling.com
 
 References
 
Visible links
1. http://www.usenix.org/events/lisa10/training/
2. http://nexenta-summit2010.eventbrite.com/
3. http://www.richardelling.com/

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] dedicated ZIL/L2ARC

2010-09-14 Thread Pasi Kärkkäinen
On Tue, Sep 14, 2010 at 08:08:42AM -0700, Ray Van Dolson wrote:
 On Tue, Sep 14, 2010 at 06:59:07AM -0700, Wolfraider wrote:
  We are looking into the possibility of adding a dedicated ZIL and/or
  L2ARC devices to our pool. We are looking into getting 4 ??? 32GB
  Intel X25-E SSD drives. Would this be a good solution to slow write
  speeds? We are currently sharing out different slices of the pool to
  windows servers using comstar and fibrechannel. We are currently
  getting around 300MB/sec performance with 70-100% disk busy.
  
  Opensolaris snv_134
  Dual 3.2GHz quadcores with hyperthreading
  16GB ram
  Pool_1 ??? 18 raidz2 groups with 5 drives a piece and 2 hot spares
  Disks are around 30% full
  No dedup
 
 It'll probably help.
 
 I'd get two X-25E's for ZIL (and mirror them) and one or two of Intel's
 lower end X-25M for L2ARC.
 
 There are some SSD devices out there with a super-capacitor and
 significantly higher IOPs ratings than the X-25E that might be a better
 choice for a ZIL device, but the X-25E is a solid drive and we have
 many of them deployed as ZIL devices here.
 

I thought Intel SSDs didn't respect CACHE FLUSH command and thus
are subject to ZIL corruption if the server crashes or runs out of electricity?

-- Pasi

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] carrying on [was: Legality and the future of zfs...]

2010-07-19 Thread Pasi Kärkkäinen
On Sat, Jul 17, 2010 at 12:57:40AM +0200, Richard Elling wrote:
 
  Because of BTRFS for Linux, Linux's popularity itself and also thanks
  to the Oracle's help.
 
 BTRFS does not matter until it is a primary file system for a dominant 
 distribution.  
 From what I can tell, the dominant Linux distribution file system is ext.  
 That will 
 change some day, but we heard the same story you are replaying about BTRFS 
 from the Reiser file system aficionados and the XFS evangelists. There is 
 absolutely no doubt that Solaris will use ZFS as its primary file system. But 
 there is 
 no internal or external force causing Red Hat to change their primary file 
 system 
 from ext.


Redhat Fedora 13 includes BTRFS, but it's not used as a default (yet). 
F13 also supports yum (package management) rollback using BTRFS snapshots.
I'm not sure if Fedora 14 will have BTRFS as a default.. 

RHEL6 beta also includes BTRFS support (tech preview), but again, 
not enabled as a default filesystem.

Upcoming Ubuntu 10.10 will use BTRFS as a default.

That's the status in Linux world, afaik :)

-- Pasi

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NexentaStor Community edition 3.0.3 released

2010-07-01 Thread Pasi Kärkkäinen
On Tue, Jun 15, 2010 at 10:57:53PM +0530, Anil Gulecha wrote:
 Hi All,
 
 On behalf of NexentaStor team, I'm happy to announce the release of
 NexentaStor Community Edition 3.0.3. This release is the result of the
 community efforts of Nexenta Partners and users.
 
 Changes over 3.0.2 include
 * Many fixes to ON/ZFS backported to b134.
 * Multiple bug fixes in the appliance.
 
 With the addition of many new features, NexentaStor CE is the *most
 complete*, and feature-rich gratis unified storage solution today.
 
 Quick Summary of Features
 -
 * ZFS additions: Deduplication (based on OpenSolaris b134).
 * Free for upto 12 TB of *used* storage
 * Community edition supports easy upgrades
 * Many new features in the easy to use management interface.
 * Integrated search
 
 Grab the iso from
 http://www.nexentastor.org/projects/site/wiki/CommunityEdition
 
 If you are a storage solution provider, we invite you to join our
 growing social network at http://people.nexenta.com.
 

Hey,

I tried installing Nexenta 3.0.3 on an old HP DL380G4 server,
and it installed ok, but it crashes all the time.. 

basicly 5-30 seconds after login prompt shows up on the console
the server will reboot due to kernel crash.

the error seems to be about the broadcom nic driver..
Is this a known bug? 

See the screenshots for the kernel error message:

http://pasik.reaktio.net/nexenta/nexenta303-crash02.jpg
http://pasik.reaktio.net/nexenta/nexenta303-crash01.jpg

-- Pasi

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Erratic behavior on 24T zpool

2010-06-18 Thread Pasi Kärkkäinen
On Fri, Jun 18, 2010 at 01:26:11AM -0700, artiepen wrote:
 Well, I've searched my brains out and I can't seem to find a reason for this.
 
 I'm getting bad to medium performance with my new test storage device. I've 
 got 24 1.5T disks with 2 SSDs configured as a zil log device. I'm using the 
 Areca raid controller, the driver being arcmsr. Quad core AMD with 16 gig of 
 RAM OpenSolaris upgraded to snv_134.
 
 The zpool has 2 11-disk raidz2's and I'm getting anywhere between 1MB/sec to 
 40MB/sec with zpool iostat. On average, though it's more like 5MB/sec if I 
 watch while I'm actively doing some r/w. I know that I should be getting 
 better performance.
 

How are you measuring the performance? 
Do you understand raidz2 with that big amount of disks in it will give you 
really poor random write performance? 

-- Pasi

 I'm new to OpenSolaris, but I've been using *nix systems for a long time, so 
 if there's any more information that I can provide, please let me know. Am I 
 doing anything wrong with this configuration? Thanks in advance.
 -- 
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] OCZ Devena line of enterprise SSD

2010-06-18 Thread Pasi Kärkkäinen
On Thu, Jun 17, 2010 at 09:58:25AM -0700, Ray Van Dolson wrote:
 On Thu, Jun 17, 2010 at 09:54:59AM -0700, Ragnar Sundblad wrote:
  
  On 17 jun 2010, at 18.17, Richard Jahnel wrote:
  
   The EX specs page does list the supercap
   
   The pro specs page does not.
  
  They do for both on the Specifications tab on the web page:
  http://www.ocztechnology.com/products/solid-state-drives/2-5--sata-ii/maximum-performance-enterprise-solid-state-drives/ocz-vertex-2-pro-series-sata-ii-2-5--ssd-.html
  But not in the product brief PDFs.
  
  It doesn't say how many rewrites you can do either.
  
  An Intel X25-E 32G has, according to the product manual, a write
  endurance of 1 petabyte. In full write speed, 250 MB/s, that is equal
  to 400 seconds, or about 46 days. (On the other hand you have a
  five year warranty, and I have been told that you can get them
  replaced if they wear out.)
 
 Do the drives keep any sort of internal counter so you get an idea of
 how much of the rated drive lifetime you've chewed through?
 

Heh.. the marketing stuff on the 'front' page says:
Vertex 2 EX has an ultra-reliable 10 million hour MTBF and comes backed by a 
three-year warranty. 

And then on the specifications:
MTBF: 2 million hours

:)

-- Pasi

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Erratic behavior on 24T zpool

2010-06-18 Thread Pasi Kärkkäinen
On Fri, Jun 18, 2010 at 04:52:02AM -0400, Curtis E. Combs Jr. wrote:
 I am new to zfs, so I am still learning. I'm using zpool iostat to
 measure performance. Would you say that smaller raidz2 sets would give
 me more reliable and better performance? I'm willing to give it a
 shot...
 

Yes, more smaller raid-sets will give you better performance, 
since zfs distributes (stripes) data on all of them.

What's your IO pattern? random writes? sequential writes? 

Basicly if you have 2x 11-disk raidz2 sets you'll be limited to around
performance of 2 disks, in the worst case of small random IO.
(the parity needs to be written and that limits the performance of raidz/z2/z3 
to the performance of single disk).

This is not really zfs specific at all, it's the same with any raid 
implementation.

-- Pasi

 On Fri, Jun 18, 2010 at 4:42 AM, Pasi Kärkkäinen pa...@iki.fi wrote:
  On Fri, Jun 18, 2010 at 01:26:11AM -0700, artiepen wrote:
  Well, I've searched my brains out and I can't seem to find a reason for 
  this.
 
  I'm getting bad to medium performance with my new test storage device. 
  I've got 24 1.5T disks with 2 SSDs configured as a zil log device. I'm 
  using the Areca raid controller, the driver being arcmsr. Quad core AMD 
  with 16 gig of RAM OpenSolaris upgraded to snv_134.
 
  The zpool has 2 11-disk raidz2's and I'm getting anywhere between 1MB/sec 
  to 40MB/sec with zpool iostat. On average, though it's more like 5MB/sec 
  if I watch while I'm actively doing some r/w. I know that I should be 
  getting better performance.
 
 
  How are you measuring the performance?
  Do you understand raidz2 with that big amount of disks in it will give you 
  really poor random write performance?
 
  -- Pasi
 
  I'm new to OpenSolaris, but I've been using *nix systems for a long time, 
  so if there's any more information that I can provide, please let me know. 
  Am I doing anything wrong with this configuration? Thanks in advance.
  --
  This message posted from opensolaris.org
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 
 
 
 
 -- 
 Curtis E. Combs Jr.
 System Administrator Associate
 University of Georgia
 High Performance Computing Center
 ceco...@uga.edu
 Office: (706) 542-0186
 Cell: (706) 206-7289
 Gmail Chat: psynoph...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Erratic behavior on 24T zpool

2010-06-18 Thread Pasi Kärkkäinen
On Fri, Jun 18, 2010 at 05:15:44AM -0400, Thomas Burgess wrote:
On Fri, Jun 18, 2010 at 4:42 AM, Pasi Kärkkäinen [1]pa...@iki.fi wrote:
 
  On Fri, Jun 18, 2010 at 01:26:11AM -0700, artiepen wrote:
   Well, I've searched my brains out and I can't seem to find a reason
  for this.
  
   I'm getting bad to medium performance with my new test storage device.
  I've got 24 1.5T disks with 2 SSDs configured as a zil log device. I'm
  using the Areca raid controller, the driver being arcmsr. Quad core AMD
  with 16 gig of RAM OpenSolaris upgraded to snv_134.
  
   The zpool has 2 11-disk raidz2's and I'm getting anywhere between
  1MB/sec to 40MB/sec with zpool iostat. On average, though it's more like
  5MB/sec if I watch while I'm actively doing some r/w. I know that I
  should be getting better performance.
  
 
  How are you measuring the performance?
  Do you understand raidz2 with that big amount of disks in it will give
  you really poor random write performance?
  -- Pasi
 
i have a media server with 2 raidz2 vdevs 10 drives wide myself without a
ZIL (but with a 64 gb l2arc)
I can write to it about 400 MB/s over the network, and scrubs show 600
MB/s but it really depends on the type of i/o you haverandom i/o
across 2 vdevs will be REALLY slow (as slow as the slowest 2 drives in
your pool basically)
40 MB/s might be right if it's randomthough i'd still expect to see
more.
 

7200 RPM SATA disk can do around 120 IOPS max (7200/60 = 120), so if you're 
doing
4 kB random IO you end up getting 4*120 = 480 kB/sec throughput max from a 
single disk 
(in the worst case).

40 MB/sec of random IO throughput using 4 kB IOs would be around 10240 IOPS..
you'd need 85x SATA 7200 RPM disks in raid-0 (striping) for that :)

-- Pasi

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Erratic behavior on 24T zpool

2010-06-18 Thread Pasi Kärkkäinen
On Fri, Jun 18, 2010 at 02:21:15AM -0700, artiepen wrote:
 40MB/sec is the best that it gets. Really, the average is 5. I see 4, 5, 2, 
 and 6 almost 10x as many times as I see 40MB/sec. It really only bumps up to 
 40 very rarely.
 
 As far as random vs. sequential. Correct me if I'm wrong, but if I used dd to 
 make files from /dev/zero, wouldn't that be sequential? I measure with zpool 
 iostat 2 in another ssh session while making files of various sizes.
 

Yep, dd will generate sequential IO. 
Did you specify blocksize for dd? (bs=1024k for example).

As a default dd does 4 kB IOs.. which won't be very fast.

-- Pasi

 This is a test system. I'm wondering, now, if I should just reconfigure with 
 maybe 7 disks and add another spare. Seems to be the general consensus that 
 bigger raid pools = worse performance. I thought the opposite was true...
 -- 
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Homegrown Hybrid Storage

2010-06-11 Thread Pasi Kärkkäinen
On Tue, Jun 08, 2010 at 08:33:40PM -0500, Bob Friesenhahn wrote:
 On Tue, 8 Jun 2010, Miles Nordin wrote:

 re == Richard Elling richard.ell...@gmail.com writes:

re Please don't confuse Ethernet with IP.

 okay, but I'm not.  seriously, if you'll look into it.

 Did you misread where I said FC can exert back-pressure?  I was
 contrasting with Ethernet.

 You're really confused, though I'm sure you're going to deny it.

 I don't think so.  I think that it is time to reset and reboot yourself 
 on the technology curve.  FC semantics have been ported onto ethernet.  
 This is not your grandmother's ethernet but it is capable of supporting 
 both FCoE and normal IP traffic.  The FCoE gets per-stream QOS similar to 
 what you are used to from Fibre Channel. Quite naturally, you get to pay 
 a lot more for the new equipment and you have the opportunity to discard 
 the equipment you bought already.


Yeah, today enterprise iSCSI vendors like Equallogic (bought by Dell)
_recommend_ using flow control. Their iSCSI storage arrays are designed
to work properly with flow control and perform well.

Of course you need a proper (certified) switches aswell.

Equallogic says the delays from flow control pause frames are shorter
than tcp retransmits, so that's why they're using and recommending it.

-- Pasi

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Homegrown Hybrid Storage

2010-06-11 Thread Pasi Kärkkäinen
On Fri, Jun 11, 2010 at 03:30:26PM -0400, Miles Nordin wrote:
  pk == Pasi Kärkkäinen pa...@iki.fi writes:
 
  You're really confused, though I'm sure you're going to deny
  it.
 
   I don't think so.  I think that it is time to reset and reboot
  yourself on the technology curve.  FC semantics have been
  ported onto ethernet.  This is not your grandmother's ethernet
  but it is capable of supporting both FCoE and normal IP
  traffic.  The FCoE gets per-stream QOS similar to what you are
  used to from Fibre Channel.
 
 FCoE != iSCSI.
 
 FCoE was not being discussed in the part you're trying to contradict.
 If you read my entire post, I talk about FCoE at the end and say more
 or less ``I am talking about FCoE here only so you don't try to throw
 out my entire post by latching onto some corner case not applying to
 the OP by dragging FCoE into the mix'' which is exactly what you did.
 I'm guessing you fired off a reply without reading the whole thing?
 
 pk Yeah, today enterprise iSCSI vendors like Equallogic (bought
 pk by Dell) _recommend_ using flow control. Their iSCSI storage
 pk arrays are designed to work properly with flow control and
 pk perform well.
 
 pk Of course you need a proper (certified) switches aswell.
 
 pk Equallogic says the delays from flow control pause frames are
 pk shorter than tcp retransmits, so that's why they're using and
 pk recommending it.
 
 please have a look at the three links I posted about flow control not
 being used the way you think it is by any serious switch vendor, and
 the explanation of why this limitation is fundamental, not something
 that can be overcome by ``technology curve.''  It will not hurt
 anything to allow autonegotiation of flow control on non-broken
 switches so I'm not surprised they recommend it with ``certified''
 known-non-broken switches, but it also will not help unless your
 switches have input/backplane congestion which they usually don't, or
 your end host is able to generate PAUSE frames for PCIe congestion
 which is maybe more plausible.  In particular it won't help with the
 typical case of the ``incast'' problem in the experiment in the FAST
 incast paper URL I gave, because they narrowed down what was happening
 in their experiment to OUTPUT queue congestion, which (***MODULO
 FCoE*** mr ``reboot yourself on the technology curve'') never invokes
 ethernet flow control.
 
 HTH.
 
 ok let me try again:
 
 yes, I agree it would not be stupid to run iSCSI+TCP over a CoS with
 blocking storage-friendly buffer semantics if your FCoE/CEE switches
 can manage that, but I would like to hear of someone actually DOING it
 before we drag it into the discussion.  I don't think that's happening
 in the wild so far, and it's definitely not the application for which
 these products have been flogged.
 
 I know people run iSCSI over IB (possibly with RDMA for moving the
 bulk data rather than TCP), and I know people run SCSI over FC, and of
 course SCSI (not iSCSI) over FCoE.  Remember the original assertion
 was: please try FC as well as iSCSI if you can afford it.
 
 Are you guys really saying you believe people are running ***iSCSI***
 over the separate HOL-blocking hop-by-hop pause frame CoS's of FCoE
 meshes?  or are you just spewing a bunch of noxious white paper
 vapours at me?  because AIUI people using the
 lossless/small-output-buffer channel of FCoE are running the FC
 protocol over that ``virtual channel'' of the mesh, not iSCSI, are
 they not?

I was talking about iSCSI over TCP over IP over Ethernet. No FcOE. No IB.

-- Pasi

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Intel X25-E SSD in x4500 followup

2010-06-10 Thread Pasi Kärkkäinen
On Thu, Jun 10, 2010 at 05:46:19AM -0700, Peter Eriksson wrote:
 Just a quick followup that the same issue still seems to be there on our 
 X4500s with the latest Solaris 10 with all the latest patches and the 
 following SSD disks:
 
 Intel X25-M G1 firmware 8820 (80GB MLC)
 Intel X25-M G2 firmware 02HD (160GB MLC)
 

What problems did you have with the X25-M models?

-- Pasi

 However - things seem to work smoothly with:
 
 Intel X25-E G1 firmware 8850 (32GB SLC)
 OCZ Vertex 2 firmware 1.00 and 1.02 (100GB MLC)
 
 I'm currently testing a setup with dual OCZ Vertex 2 100GB SSD units that 
 will be used both as mirrored boot/root (32GB of the 100GB), and the use the 
 rest of those disks as L2ARC cache devices for the big data zpool. And have 
 two mirrored X25-E as slog devices:
 
 zpool create DATA raidz2 c0t0d0 c0t1d0 c1t0d0 c1t1d0 c2t0d0 c2t1d0 c3t1d0 \
   raidz2 c4t0d0 c4t1d0 c5t0d0 c5t1d0 c0t2d0 c0t3d0 c3t2d0 \
   raidz2 c1t2d0 c1t3d0 c2t2d0 c2t3d0 c4t2d0 c4t3d0 c3t3d0 \
   raidz2 c5t2d0 c5t3d0 c0t4d0 c0t5d0 c1t4d0 c1t5d0 c3t5d0 \
   raidz2 c2t4d0 c2t5d0 c4t4d0 c4t5d0 c5t4d0 c5t5d0 c3t6d0 \
   raidz2 c0t6d0 c0t7d0 c1t6d0 c1t7d0 c2t6d0 c2t7d0 c3t7d0 \
   spare c4t6d0 c5t6d0 \
   cache c3t0d0s3 c3t4d0s3 \
   log mirror c4t7d0 c5t7d0
 -- 
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] nfs share of nested zfs directories?

2010-06-04 Thread Pasi Kärkkäinen
On Fri, Jun 04, 2010 at 08:43:32AM -0400, Cassandra Pugh wrote:
Thank you, when I manually mount using the mount -t nfs4 option, I am
able to see the entire tree, however, the permissions are set as
nfsnobody.
Warning: rpc.idmapd appears not to be running.
 All uids will be mapped to the nobody uid.
 

Did you actually read the error message? :)
Finding a solution shouldn't be too difficult after that..

-- Pasi

-
Cassandra
(609) 243-2413
Unix Administrator
 
From a little spark may burst a mighty flame.
-Dante Alighieri
 
On Thu, Jun 3, 2010 at 4:33 PM, Brandon High [1]bh...@freaks.com wrote:
 
  On Thu, Jun 3, 2010 at 12:50 PM, Cassandra Pugh [2]cp...@pppl.gov
  wrote:
   The special case here is that I am trying to traverse NESTED zfs
  systems,
   for the purpose of having compressed and uncompressed directories.
 
  Make sure to use mount -t nfs4 on your linux client. The standard
  nfs type only supports nfs v2/v3.
 
  -B
  --
  Brandon High : [3]bh...@freaks.com
 
 References
 
Visible links
1. mailto:bh...@freaks.com
2. mailto:cp...@pppl.gov
3. mailto:bh...@freaks.com

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [ZIL device brainstorm] intel x25-M G2 has ram cache?

2010-05-25 Thread Pasi Kärkkäinen
On Tue, May 25, 2010 at 10:08:57AM +0100, Karl Pielorz wrote:


 --On 24 May 2010 23:41 -0400 rwali...@washdcmail.com wrote:

 I haven't seen where anyone has tested this, but the MemoRight SSD (sold
 by RocketDisk in the US) seems to claim all the right things:

 http://www.rocketdisk.com/vProduct.aspx?ID=1

 pdf specs:

 http://www.rocketdisk.com/Local/Files/Product-PdfDataSheet-1_MemoRight%20
 SSD%20GT%20Specification.pdf

 They claim to support the cache flush command, and with respect to DRAM
 cache backup they say (p. 14/section 3.9 in that pdf):

 At the risk of this getting a little off-topic (but hey, we're all 
 looking for ZFS ZIL's ;) We've had similar issues when looking at SSD's 
 recently (lack of cache protection during power failure) - the above 
 SSD's look interesting [finally someone's noted you need to protect the 
 cache] - but from what I've read about the Intel X25-E performance - the 
 Intel drive with write cache turned off appears to be as fast, if not 
 faster than those drives anyway...

 I've tried contacting Intel to find out if it's true their enterprise 
 SSD has no cache protection on it, and what the effect of turning the 
 write cache off would have on both performance and write endurance, but 
 not heard anything back yet.


I guess the problem is not the cache by itself, but the fact that they
ignore the CACHE FLUSH command.. and thus the non-battery-backed cache
becomes a problem.

-- Pasi

 Picking apart the Intel benchmarks published - they always have the  
 write-cache enabled, which probably speaks volumes...

 -Karl
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [ZIL device brainstorm] intel x25-M G2 has ram cache?

2010-05-25 Thread Pasi Kärkkäinen
On Tue, May 25, 2010 at 01:52:47PM +0100, Karl Pielorz wrote:

 --On 25 May 2010 15:28 +0300 Pasi Kärkkäinen pa...@iki.fi wrote:

 I've tried contacting Intel to find out if it's true their enterprise
 SSD has no cache protection on it, and what the effect of turning the
 write cache off would have on both performance and write endurance, but
 not heard anything back yet.


 I guess the problem is not the cache by itself, but the fact that they
 ignore the CACHE FLUSH command.. and thus the non-battery-backed cache
 becomes a problem.

 The X25-E's do apparently honour the 'Disable Write Cache' command -  
 without write cache, there is no cache to flush - all data is written to  
 flash immediately - presumably before it's ACK'd to the host.

 I've seen a number of other sites do some testing with this - and found  
 that it 'works' (i.e. with write-cache enabled, you get nasty data loss 
 if the power is lost - with it disabled, it closes that window). But you  
 obviously take quite a sizeable performance hit.


Yeah.. what I meant is: if you have write cache enabled, and the ssd drive
honours 'CACHE FLUSH' command, then you should be safe.. 

Based on what I've understood the Intel SSDs ignore the CACHE FLUSH command,
and thus it's not safe to run them with caches enabled..

 We've got an X25-E here which we intend to test for ourselves (wisely ;) 
 - to make sure that is the case...


Please let us know how it goes :)

-- Pasi

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Using WD Green drives?

2010-05-18 Thread Pasi Kärkkäinen
On Mon, May 17, 2010 at 03:12:44PM -0700, Erik Trimble wrote:
 On Mon, 2010-05-17 at 12:54 -0400, Dan Pritts wrote:
  On Mon, May 17, 2010 at 06:25:18PM +0200, Tomas Ögren wrote:
   Resilver does a whole lot of random io itself, not bulk reads.. It reads
   the filesystem tree, not block 0, block 1, block 2... You won't get
   60MB/s sustained, not even close.
  
  Even with large, unfragmented files?  
  
  danno
  --
  Dan Pritts, Sr. Systems Engineer
  Internet2
  office: +1-734-352-4953 | mobile: +1-734-834-7224
 
 Having large, unfragmented files will certainly help keep sustained
 throughput.  But, also, you have to consider the amount of deletions
 done on the pool.
 
 For instance, let's say you wrote files A, B, and C one right after
 another, and they're all big files.  Doing a re-silver, you'd be pretty
 well off on getting reasonable throughput reading A, then B, then C,
 since they're going to be contiguous on the drive (both internally, and
 across the three files).  However, if you have deleted B at some time,
 and say wrote a file D (where D  B in size) into B's old space, then,
 well, you seek to A, read A, seek forward to C, read C, seek back to D,
 etc.
 
 Thus, you'll get good throughput for resilver on these drives pretty
 much in just ONE case:  large files with NO deletions.  If you're using
 them for write-once/read-many/no-delete archives, then you're OK.
 Anything else is going to suck.
 
 :-)
 

So basicly if you have a lot of small files with a lot of changes
and deletions resilver is going to be really slow.

Sounds like the traditional RAID would be better/faster to rebuild in this 
case..

-- Pasi

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Ideal SATA/SAS Controllers for ZFS

2010-05-15 Thread Pasi Kärkkäinen
On Sat, May 15, 2010 at 11:01:00AM +, Marc Bevand wrote:
 I have done quite some research over the past few years on the best (ie. 
 simple, robust, inexpensive, and performant) SATA/SAS controllers for ZFS. 
 Especially in terms of throughput analysis (many of them are designed with an 
 insufficient PCIe link width). I have seen many questions on this list about 
 which one to buy, so I thought I would share my knowledge: 
 http://blog.zorinaq.com/?e=10 Very briefly:
 
 - The best 16-port one is probably the LSI SAS2116, 6Gbps, PCIe (gen2) x8. 
 Because it is quite pricey, it's probably better to buy 2 8-port controllers.
 - The best 8-port is the LSI SAS2008 (faster, more expensive) or SAS1068E 
 (150MB/s/port should be sufficient).
 - The best 2-port is the Marvell 88SE9128 or 88SE9125 or 88SE9120 because of 
 PCIe gen2 allowing a throughput of at least 300MB/s on the PCIe link with 
 Max_Payload_Size=128. And this one is particularly cheap ($35). AFAIK this is 
 the _only_ controller of the entire market allowing 2 drives to not 
 bottleneck 
 an x1 link.
 
 I hope this helps ZFS users here!
 

Excellent post! It'll definitely help many.

Thanks!

-- Pasi

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance of the ZIL

2010-05-06 Thread Pasi Kärkkäinen
On Wed, May 05, 2010 at 11:32:23PM -0400, Edward Ned Harvey wrote:
  From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
  boun...@opensolaris.org] On Behalf Of Robert Milkowski
 
  if you can disable ZIL and compare the performance to when it is off it
  will give you an estimate of what's the absolute maximum performance
  increase (if any) by having a dedicated ZIL device.
 
 I'll second this suggestion.  It'll cost you nothing to disable the ZIL
 temporarily.  (You have to dismount the filesystem twice.  Once to disable
 the ZIL, and once to re-enable it.)  Then you can see if performance is
 good.  If performance is good, then you'll know you need to accelerate your
 ZIL.  (Because disabled ZIL is the fastest thing you could possibly ever
 do.)
 
 Generally speaking, you should not disable your ZIL for the long run.  But
 in some cases, it makes sense.
 
 Here's how you determine if you want to disable your ZIL permanently:
 
 First, understand that with the ZIL disabled, all sync writes are treated as
 async writes.  This is buffered in ram before being written to disk, so the
 kernel can optimize and aggregate the write operations into one big chunk.
 
 No matter what, if you have an ungraceful system shutdown, you will lose all
 the async writes that were waiting in ram.
 
 If you have ZIL disabled, you will also lose the sync writes that were
 waiting in ram (because those are being handled as async.)
 
 In neither case do you have data or filesystem corruption.
 

ZFS probably is still OK, since it's designed to handle this (?),
but the data can't be OK if you lose 30 secs of writes.. 30 secs of writes
that have been ack'd being done to the servers/applications..

 The risk of running with no ZIL is:  In the case of ungraceful shutdown, in
 addition to the (up to 30 sec) async writes that will be lost, you will also
 lose up to 30 sec of sync writes.
 

-- Pasi

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss