Re: [zfs-discuss] One dataset per user?

2010-06-21 Thread Roy Sigurd Karlsbakk
- Original Message -
 On Jun 20, 2010, at 11:55, Roy Sigurd Karlsbakk wrote:
 
  There will also be a few common areas for each department and
  perhaps a backup area.
 
 The back up area should be on a different set of disks.
 
 IMHO, a back up isn't a back up unless it is an /independent/ copy of
 the data. The copy can be made via ZFS send/recv, tar, rsync, Legato/
 NetBackup, etc., but it needs to be on independent media. Otherwise,
 if the original copy goes, so does the backup.

I think you misunderstand me here. The backup area will be a storage area for 
Ahsay (see http://www.ahsay.com/ ) for client and application (Oracle, Sybase, 
Exchange etc). All datasets will be copied to a secondary node either with ZFS 
Send/receive or (more probably) NexentaStore HA Cluster (   http://kurl.no/KzHU 
).

  I have read people are having problems with lengthy boot times with
  lots of datasets. We're planning to do extensive snapshotting on
  this system, so there might be close to a hundred snapshots per
  dataset, perhaps more. With 200 users and perhaps 10-20 shared
  department datasets, the number of filesystems, snapshots included,
  will be around 20k or more.
 
 You may also want to consider breaking things up into different pools
 as well. There seems to be an implicit assumption in this conversation
 that everything will be in one pool, and that may not be the best
 course of action.
 
 Perhaps one pool for users' homedirs, and another for the departmental
 stuff? Or perhaps even two different pools for homedirs, with users
 'randomly' distributed between the two (though definitely don't do
 something like alphabetical (it'll be non-even) or departmental
 (people transfer) distribution).
 
 This could add a bit of overhead, but I don't think have 2 or 3 pools
 would be much more of a big deal than one.

So far the plan is to keep it in one pool for design and administration 
simplicity. Why would you want to split up (net) 40TB into more pools? Seems to 
me that'll mess up things a bit, having to split up SSDs for use on different 
pools, loosing the flexibility of a common pool etc. Why?
 
Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] One dataset per user?

2010-06-21 Thread Roy Sigurd Karlsbakk
 Btw, what did you plan to use as L2ARC/slog?

I was thinking of using four Crucial RealSSD 256MB SSDs with a small RAID1+0 
for SLOG and the rest for L2ARC. The system will be mainly used for reads, so I 
don't think the SLOG needs will be too tough. If you have another suggestion, 
please tell :)

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.

-- 
Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] One dataset per user?

2010-06-21 Thread David Magda

On Jun 21, 2010, at 05:00, Roy Sigurd Karlsbakk wrote:

So far the plan is to keep it in one pool for design and  
administration simplicity. Why would you want to split up (net) 40TB  
into more pools? Seems to me that'll mess up things a bit, having to  
split up SSDs for use on different pools, loosing the flexibility of  
a common pool etc. Why?


If different groups or areas have different I/O characteristics for  
one. If in one case (users) you want responsiveness, you could go with  
striped-mirrors. However, if departments have lots of data, it may be  
worthwhile to put it on a RAID-Z pool for better storage efficiency.


Just a thought.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] One dataset per user?

2010-06-21 Thread Roy Sigurd Karlsbakk
- Original Message -
 On Jun 21, 2010, at 05:00, Roy Sigurd Karlsbakk wrote:
 
  So far the plan is to keep it in one pool for design and
  administration simplicity. Why would you want to split up (net) 40TB
  into more pools? Seems to me that'll mess up things a bit, having to
  split up SSDs for use on different pools, loosing the flexibility of
  a common pool etc. Why?
 
 If different groups or areas have different I/O characteristics for
 one. If in one case (users) you want responsiveness, you could go with
 striped-mirrors. However, if departments have lots of data, it may be
 worthwhile to put it on a RAID-Z pool for better storage efficiency.

We have considered RAID-1+0 and concluded with no current needs for this, as of 
now. Close to 1TB SSD cache will also help to boost read speeds, so I think it 
will be sufficient, at least for now. About different I/O characteristics in 
different groups/areas, this is not something we have data on for  now. Do you 
know a good way to check this? The data is located on two different zpools 
(sol10) today.

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] One dataset per user?

2010-06-21 Thread Arne Jansen
David Magda wrote:
 On Jun 21, 2010, at 05:00, Roy Sigurd Karlsbakk wrote:
 
 So far the plan is to keep it in one pool for design and
 administration simplicity. Why would you want to split up (net) 40TB
 into more pools? Seems to me that'll mess up things a bit, having to
 split up SSDs for use on different pools, loosing the flexibility of a
 common pool etc. Why?
 
 If different groups or areas have different I/O characteristics for one.
 If in one case (users) you want responsiveness, you could go with
 striped-mirrors. However, if departments have lots of data, it may be
 worthwhile to put it on a RAID-Z pool for better storage efficiency.
 

Especially if the characteristics are different I find it a good idea
to mix all on one set of spindles. This way you have lots of spindles
for fast access and lots of space for the sake of space. If you devide
the available spindles in two sets you will have much fewer spindles
available for the responsiveness goal. I don't think taking them into
a mirror can compensate that.

--Arne
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] One dataset per user?

2010-06-21 Thread Edward Ned Harvey
 From: James C. McPherson [mailto:j...@opensolaris.org]
 
 On the build systems that I maintain inside the firewall,
 we mandate one filesystem per user, which is a very great
 boon for system administration. 

What's the reasoning behind it?


 My management scripts are
 considerably faster running when I don't have to traverse
 whole directory trees (ala ufs).

That's a good reason.  Why would you have to traverse whole directory 
structures if you had a single zfs filesystem in a single zpool, instead of 
many zfs filesystems in a single zpool?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] One dataset per user?

2010-06-21 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Roy Sigurd Karlsbakk
 
 Close to 1TB SSD cache will also help to boost read
 speeds, 

Remember, this will not boost large sequential reads.  (Could possibly maybe 
even hurt it.)  This will only boost random reads.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] One dataset per user?

2010-06-21 Thread James C. McPherson

On 21/06/10 10:38 PM, Edward Ned Harvey wrote:

From: James C. McPherson [mailto:j...@opensolaris.org]

On the build systems that I maintain inside the firewall,
we mandate one filesystem per user, which is a very great
boon for system administration.


What's the reasoning behind it?


Politeness, basically. Every user on these machines is expected
to make and use their own disk-space sandpit - having their own
dataset makes that work nicely.


My management scripts are
considerably faster running when I don't have to traverse
whole directory trees (ala ufs).


That's a good reason.  Why would you have to traverse whole

 directory structures if you had a single zfs filesystem in
 a single zpool, instead of many zfs filesystems in a single zpool?

For instance, if I've got users a, b and c, who have their own
datasets, and users z, y and x who do not:

df -h /builds/[abczyx]

will show me disk usage of /builds for z, y and x, but

/builds/a
/builds/b
/builds/c

for the ones who do have their own dataset. So when I'm
trying to figure out who I need to yell at because they're
using more than our acceptable limit (30Gb), I have to run
du -s /builds/[zyx]. And that takes time. Lots of time.
Especially on these systems which are in huge demand from
people all over Solaris-land.



James C. McPherson
--
Senior Software Engineer, Solaris
Oracle
http://www.jmcp.homeunix.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] One dataset per user?

2010-06-21 Thread Darren J Moffat

On 21/06/2010 13:59, James C. McPherson wrote:

On 21/06/10 10:38 PM, Edward Ned Harvey wrote:

From: James C. McPherson [mailto:j...@opensolaris.org]

On the build systems that I maintain inside the firewall,
we mandate one filesystem per user, which is a very great
boon for system administration.


What's the reasoning behind it?


Politeness, basically. Every user on these machines is expected
to make and use their own disk-space sandpit - having their own
dataset makes that work nicely.


Plus it allows delegation of snapshot/clone/send/recv to the users on 
certain systems.


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] One dataset per user?

2010-06-21 Thread Fredrich Maney
On Mon, Jun 21, 2010 at 8:59 AM, James C. McPherson
j...@opensolaris.org wrote:
[...]
 So when I'm
 trying to figure out who I need to yell at because they're
 using more than our acceptable limit (30Gb), I have to run
 du -s /builds/[zyx]. And that takes time. Lots of time.
[...]

Why not just use quotas?

fpsm
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] One dataset per user?

2010-06-21 Thread Bob Friesenhahn

On Mon, 21 Jun 2010, Edward Ned Harvey wrote:


From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
boun...@opensolaris.org] On Behalf Of Roy Sigurd Karlsbakk

Close to 1TB SSD cache will also help to boost read speeds,


Remember, this will not boost large sequential reads.  (Could 
possibly maybe even hurt it.)  This will only boost random reads.


Or more accurately, it boosts repeated reads.  It won't help much in 
the case where data is accessed only once.  It is basically a 
poor-man's substitute for caching data in RAM.  The RAM is at least 
20X faster so the system should be stuffed with RAM first as long as 
the budget can afford it.


Luckily, most servers experience mostly repeated reads.

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] One dataset per user?

2010-06-21 Thread Bob Friesenhahn

On Mon, 21 Jun 2010, Arne Jansen wrote:


Especially if the characteristics are different I find it a good 
idea to mix all on one set of spindles. This way you have lots of 
spindles for fast access and lots of space for the sake of space. If 
you devide the available spindles in two sets you will have much 
fewer spindles available for the responsiveness goal. I don't think 
taking them into a mirror can compensate that.


This is something that I can agree with.  Total vdevs in the pool is 
what primarily determines its responsiveness.  while using the same 
number of devices, splitting the pool might not result in more vdevs 
in either pool.  Mirrors do double the amount of readable devices but 
the side selected to read is random so the actual read performance 
improvement is perhaps on the order of 50% rather than 100%.  Raidz 
does steal IOPS so smaller raidz vdevs will help and result in more 
vdevs in the pool.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Many checksum errors during resilver.

2010-06-21 Thread Justin Daniel Meyer
I've decided to upgrade my home server capacity by replacing the disks in one 
of my mirror vdevs.  The procedure appeared to work out, but during resilver, a 
couple million checksum errors were logged on the new device. I've read through 
quite a bit of the archive and searched around a bit, but can not find anything 
definitive to ease my mind on whether to proceed.


SunOS deepthought 5.10 Generic_142901-13 i86pc i386 i86pc

  pool: tank
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress for 0h0m, 0.00% done, 691h28m to go
config:

NAME  STATE READ WRITE CKSUM
tank  DEGRADED 0 0 0
  mirror  DEGRADED 0 0 0
replacing DEGRADED   215 0 0
  c1t6d0s0/o  FAULTED  0 0 0  corrupted data
  c1t6d0  ONLINE   0 0   215  3.73M resilvered
c1t2d0ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c1t1d0ONLINE   0 0 0
c1t5d0ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c1t0d0ONLINE   0 0 0
c1t4d0ONLINE   0 0 0
logs
  c8t1d0p1ONLINE   0 0 0
cache
  c2t1d0p2ONLINE   0 0 0


During the resilver, the cache device and the zil were both removed for errors 
(1-2k each).  (Despite the c2/c8 discrepancy, they are partitions on the same 
OCZvertexII device.)


# zpool status -xv tank
  pool: tank
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: resilver completed after 9h20m with 0 errors on Sat Jun 19 22:07:27 2010
config:

NAMESTATE READ WRITE CKSUM
tankDEGRADED 0 0 0
  mirrorONLINE   0 0 0
c1t6d0  ONLINE   0 0 2.69M  539G resilvered
c1t2d0  ONLINE   0 0 0
  mirrorONLINE   0 0 0
c1t1d0  ONLINE   0 0 0
c1t5d0  ONLINE   0 0 0
  mirrorONLINE   0 0 0
c1t0d0  ONLINE   0 0 0
c1t4d0  ONLINE   0 0 0
logs
  c8t1d0p1  REMOVED  0 0 0
cache
  c2t1d0p2  REMOVED  0 0 0

I cleared the errors (about 5000/GB resilvered!), removed the cache device, and 
replaced the zil partition with the whole device.  After 3 pool scrubs with no 
errors, I want to check with someone else that it appears okay to replace the 
second drive in this mirror vdev.  The one thing I have not tried is a large 
file transfer to the server, as I am also dealing with an NFS mount problem 
which popped up suspiciously close to my most recent patch update.


# zpool status -v tank
  pool: tank
 state: ONLINE
 scrub: scrub completed after 3h26m with 0 errors on Mon Jun 21 01:45:00 2010
config:

NAMESTATE READ WRITE CKSUM
tankONLINE   0 0 0
  mirrorONLINE   0 0 0
c1t6d0  ONLINE   0 0 0
c1t2d0  ONLINE   0 0 0
  mirrorONLINE   0 0 0
c1t1d0  ONLINE   0 0 0
c1t5d0  ONLINE   0 0 0
  mirrorONLINE   0 0 0
c1t0d0  ONLINE   0 0 0
c1t4d0  ONLINE   0 0 0
logs
  c0t0d0ONLINE   0 0 0

errors: No known data errors


/var/adm/messages is positively over-run with these triplets/quadruplets, not 
all of which end which end up as fatal type.


Jun 19 21:43:19 deepthought scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci1043,8...@7/d...@1,0 (sd14):
Jun 19 21:43:19 deepthought Error for Command: write(10)   
Error Level: Retryable
Jun 19 21:43:19 deepthought scsi: [ID 107833 kern.notice]   Requested 
Block: 26721062  Error Block: 26721062
Jun 19 21:43:19 deepthought scsi: [ID 107833 kern.notice]   Vendor: ATA 
   Serial Number:
Jun 19 21:43:19 deepthought scsi: [ID 107833 kern.notice]   Sense Key: 
Aborted Command
Jun 19 21:43:19 deepthought scsi: [ID 107833 kern.notice]   ASC: 0x8 (LUN 
communication failure), ASCQ: 0x0, FRU: 0x0
Jun 19 21:43:19 deepthought scsi: [ID 107833 kern.warning] WARNING: 

Re: [zfs-discuss] Many checksum errors during resilver.

2010-06-21 Thread Cindy Swearingen

Hi Justin,

This looks like an older Solaris 10 release. If so, this looks like
a zpool status display bug, where it looks like the checksum errors
are occurring on the replacement device, but they are not.

I would review the steps described in the hardware section of the ZFS
troubleshooting wiki to confirm that the new disk is working as
expected:

http://www.solarisinternals.com/wiki/index.php/ZFS_Troubleshooting_Guide

Then, follow steps in the Notify FMA That Device Replacement is Complete
section to reset FMA. Then, start monitoring the replacement device
with fmdump to see if any new activity occurs on this device.

Thanks,

Cindy


On 06/21/10 10:21, Justin Daniel Meyer wrote:

I've decided to upgrade my home server capacity by replacing the disks in one 
of my mirror vdevs.  The procedure appeared to work out, but during resilver, a 
couple million checksum errors were logged on the new device. I've read through 
quite a bit of the archive and searched around a bit, but can not find anything 
definitive to ease my mind on whether to proceed.


SunOS deepthought 5.10 Generic_142901-13 i86pc i386 i86pc

  pool: tank
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress for 0h0m, 0.00% done, 691h28m to go
config:

NAME  STATE READ WRITE CKSUM
tank  DEGRADED 0 0 0
  mirror  DEGRADED 0 0 0
replacing DEGRADED   215 0 0
  c1t6d0s0/o  FAULTED  0 0 0  corrupted data
  c1t6d0  ONLINE   0 0   215  3.73M resilvered
c1t2d0ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c1t1d0ONLINE   0 0 0
c1t5d0ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c1t0d0ONLINE   0 0 0
c1t4d0ONLINE   0 0 0
logs
  c8t1d0p1ONLINE   0 0 0
cache
  c2t1d0p2ONLINE   0 0 0


During the resilver, the cache device and the zil were both removed for errors 
(1-2k each).  (Despite the c2/c8 discrepancy, they are partitions on the same 
OCZvertexII device.)


# zpool status -xv tank
  pool: tank
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: resilver completed after 9h20m with 0 errors on Sat Jun 19 22:07:27 2010
config:

NAMESTATE READ WRITE CKSUM
tankDEGRADED 0 0 0
  mirrorONLINE   0 0 0
c1t6d0  ONLINE   0 0 2.69M  539G resilvered
c1t2d0  ONLINE   0 0 0
  mirrorONLINE   0 0 0
c1t1d0  ONLINE   0 0 0
c1t5d0  ONLINE   0 0 0
  mirrorONLINE   0 0 0
c1t0d0  ONLINE   0 0 0
c1t4d0  ONLINE   0 0 0
logs
  c8t1d0p1  REMOVED  0 0 0
cache
  c2t1d0p2  REMOVED  0 0 0

I cleared the errors (about 5000/GB resilvered!), removed the cache device, and 
replaced the zil partition with the whole device.  After 3 pool scrubs with no 
errors, I want to check with someone else that it appears okay to replace the 
second drive in this mirror vdev.  The one thing I have not tried is a large 
file transfer to the server, as I am also dealing with an NFS mount problem 
which popped up suspiciously close to my most recent patch update.


# zpool status -v tank
  pool: tank
 state: ONLINE
 scrub: scrub completed after 3h26m with 0 errors on Mon Jun 21 01:45:00 2010
config:

NAMESTATE READ WRITE CKSUM
tankONLINE   0 0 0
  mirrorONLINE   0 0 0
c1t6d0  ONLINE   0 0 0
c1t2d0  ONLINE   0 0 0
  mirrorONLINE   0 0 0
c1t1d0  ONLINE   0 0 0
c1t5d0  ONLINE   0 0 0
  mirrorONLINE   0 0 0
c1t0d0  ONLINE   0 0 0
c1t4d0  ONLINE   0 0 0
logs
  c0t0d0ONLINE   0 0 0

errors: No known data errors


/var/adm/messages is positively over-run with these triplets/quadruplets, not all of 
which end which end up as fatal type.


Jun 19 21:43:19 deepthought scsi: [ID 107833 kern.warning] WARNING: 

Re: [zfs-discuss] One dataset per user?

2010-06-21 Thread Roy Sigurd Karlsbakk
- Original Message -
  From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
  boun...@opensolaris.org] On Behalf Of Roy Sigurd Karlsbakk
 
  Close to 1TB SSD cache will also help to boost read
  speeds,
 
 Remember, this will not boost large sequential reads. (Could possibly
 maybe even hurt it.) This will only boost random reads.

As far as I can see, we mostly have random reads, and not too much large 
sequential I/O, so this is what I'm looking for.
 
Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] SLOG striping?

2010-06-21 Thread Roy Sigurd Karlsbakk
Hi all

I plan to setup a new system with four Crucial RealSSD 256MB SSDs for both SLOG 
and L2ARC. The plan is to use four small slices for the SLOG, striping two 
mirrors. I have seen questions in here about the theoretical benefit of doing 
this, but I haven't seen any answers, just some doubt about the effect.

Does anyone know if this will help gaining performance? Or will it be bad?

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] VXFS to ZFS Quota

2010-06-21 Thread Roy Sigurd Karlsbakk
- Original Message -
 Hi
 Currently I have 400+ users with quota set to 500MB limit. Currently
 the file system is using veritas file system.
 
 I am planning to migrate all these home directory to a new server with
 ZFS. How can i migrate the quotas.
 
 I can create 400+ file system for each users,
 but will this affect my system performance during the system boot up?
 Is this recommanded or any alternate is available for this issue.

There's a lot of info in a thread I started quite recently One dataset per 
user?. Take a look in there.

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SLOG striping?

2010-06-21 Thread Bob Friesenhahn

On Mon, 21 Jun 2010, Roy Sigurd Karlsbakk wrote:

I plan to setup a new system with four Crucial RealSSD 256MB SSDs 
for both SLOG and L2ARC. The plan is to use four small slices for 
the SLOG, striping two mirrors. I have seen questions in here about 
the theoretical benefit of doing this, but I haven't seen any 
answers, just some doubt about the effect.


Does anyone know if this will help gaining performance? Or will it be bad?


I don't know anything about these SSDs but if they might lose the last 
record or two then striping would be bad since it would cause 
more writes to be lost.  Data would only be recovered up to the first 
point of loss, even though some newer data is still available on a 
different SSD.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SLOG striping?

2010-06-21 Thread Arne Jansen

Roy Sigurd Karlsbakk wrote:

Hi all

I plan to setup a new system with four Crucial RealSSD 256MB SSDs for both SLOG 
and L2ARC. The plan is to use four small slices for the SLOG, striping two 
mirrors. I have seen questions in here about the theoretical benefit of doing 
this, but I haven't seen any answers, just some doubt about the effect.

Does anyone know if this will help gaining performance? Or will it be bad?


I'm planning to do something similar, though I only want to install 2 devices.
Some thoughts I had so far:

 - mirroring l2arc won't gain anything, as it doesn't contain any information
   that cannot be rebuilt if a device is lost. Further, if a device is lost,
   the system just uses the remaining devices. So I wouldn't waste any space
   mirroring l2arc, I'll just stripe them.
 - the purpose of a zil device is to reduce latency. Throughput is probably not
   an issue, especially if you configure your pool so that large writes go to
   the main pool. As 2 devices don't have a lower latency than one, I see no
   real point in striping slog devices.
 - For slog you need SSD with supercap which are significantly more expensive
   than without. I'll try the OCZ Vertex 2 Pro in the next few days and can
   give a report how it performs. For L2ARC cheap MLC SSDs will do.

So if I had the chance to buy 4 devices, I'd probably buy 2 different sets.
2 cheap large L2ARC devices, 2 fast supercapped small ones. The 2 slog devices
would go into a mirror, the L2ARC devices in a stripe. I'd probably take the
remaining space of the slog devices into the stripe, too, though this might
affect write performance.

Just me thoughts...

--
Arne



Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SLOG striping?

2010-06-21 Thread Roy Sigurd Karlsbakk
 - mirroring l2arc won't gain anything, as it doesn't contain any
 information that cannot be rebuilt if a device is lost. Further, if a
 device is lost,
 the system just uses the remaining devices. So I wouldn't waste any
 space mirroring l2arc, I'll just stripe them.

I don't plan to attempt to mirror L2ARC. Even the docs say it's unsupported, so 
no point of that.

 - the purpose of a zil device is to reduce latency. Throughput is
 probably not
 an issue, especially if you configure your pool so that large writes
 go to
 the main pool. As 2 devices don't have a lower latency than one, I see
 no real point in striping slog devices.

My guess was a striped setup could give me 2xIOPS if SLOG is designed to do 
this. Any idea if it is?

 - For slog you need SSD with supercap which are significantly more
 expensive than without. I'll try the OCZ Vertex 2 Pro in the next few
 days and can
 give a report how it performs. For L2ARC cheap MLC SSDs will do.

hm... Last I checked those OCZ Vertexes were on the both large and expensive 
side. What do you pay for a couple of small ones? We'll be installing 48 gigs 
of memory in this box, but I doubt we'll need more than 4GB SLOG in terms of 
traffic.
 
 So if I had the chance to buy 4 devices, I'd probably buy 2 different
 sets. 2 cheap large L2ARC devices, 2 fast supercapped small ones. The
 2 slog devices
 would go into a mirror, the L2ARC devices in a stripe. I'd probably
 take the
 remaining space of the slog devices into the stripe, too, though this
 might affect write performance.

Any idea if something like a small, decently priced, supercapped SLC SSD exist?

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SLOG striping?

2010-06-21 Thread Brandon High
On Mon, Jun 21, 2010 at 11:24 AM, Roy Sigurd Karlsbakk
r...@karlsbakk.net wrote:
 Any idea if something like a small, decently priced, supercapped SLC SSD 
 exist?

The new OCZ Deneva drives (or others based on the SF-1500) should work
well, but I don't know if there's pricing available yet.

-B

-- 
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] does sharing an SSD as slog and l2arc reduces its life span?

2010-06-21 Thread Wes Felter

On 6/19/10 3:56 AM, Arne Jansen wrote:

while
thinking about using the OCZ Vertex 2 Pro SSD (which according
to spec page has supercaps built in) as a shared slog and L2ARC
device


IMO it might be better to use the smallest (50GB, maybe overprovisioned 
down to ~20GB) Vertex 2 Pro as slog and a much cheaper SSD (X25-M) as L2ARC.



But if 90% of the device are nearly statically allocated, the
devices possibilities for wear-leveling are very restricted.
If the ZIL is heavily used, the same 10% of the device get
written over and over again, reducing the life span by 90%.


As Bob Friesenhahn said, you're assuming dynamic wear leveling but 
modern SSDs also use static wear leveling, so this problem doesn't 
exist. (Note that in this context the terms dynamic and static may 
not mean what you think they mean.)


Wes Felter

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SLOG striping?

2010-06-21 Thread Arne Jansen

Roy Sigurd Karlsbakk wrote:

- mirroring l2arc won't gain anything, as it doesn't contain any
information that cannot be rebuilt if a device is lost. Further, if a
device is lost,
the system just uses the remaining devices. So I wouldn't waste any
space mirroring l2arc, I'll just stripe them.


I don't plan to attempt to mirror L2ARC. Even the docs say it's unsupported, so 
no point of that.


Oops, makes sense ;)




- For slog you need SSD with supercap which are significantly more
expensive than without. I'll try the OCZ Vertex 2 Pro in the next few
days and can
give a report how it performs. For L2ARC cheap MLC SSDs will do.


hm... Last I checked those OCZ Vertexes were on the both large and expensive 
side. What do you pay for a couple of small ones? We'll be installing 48 gigs 
of memory in this box, but I doubt we'll need more than 4GB SLOG in terms of 
traffic.


50GB for 400 Euro. They are MLC flash, but, as someone in a different
thread pointed out, they have 3 years warranty ;) My hope is that they
last long enough until cheaper options become available. My major concern
is that if I buy two identical models they'll break the same day. This
is not purely hypothetical. If they internally just count the write cycles
and trigger a SMART fail if a certain threshold is reached, exactly this
will happen.


--
Arne
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Seriously degraded SAS multipathing performance

2010-06-21 Thread Josh Simon
I'm seeing seriously degraded performance with round-robin SAS 
multipathing. I'm hoping you guys can help me achieve full throughput 
across both paths.


My System Config:
OpenSolaris snv_134
2 x E5520 2.4 GHz Xeon Quad-Core Processors
48 GB RAM
2 x LSI SAS 9200-8e (eight-port external 6Gb/s SATA and SAS PCIe 2.0 HBA)
1 X Mellanox 40 Gb/s dual port card PCIe 2.0
1 x JBOD: Supermicro SC846E2-R900B (Dual LSI SASX36 3Gb/s Expander 
Backplane, 24 Hot Swap drives)

22 x Seagate Constellation ES SAS drives

Performance I'm seeing with Multipathing Enabled (driver: mpt_sas):

With only one of the two paths connected:
1 drive connected: 137 MB/s sustained write, asvc_t: 8 ms
22 drives connected: 1.1 GB/s sustained write, asvc_t: 12 ms

With two paths connected, round-robin enabled:
1 drive connected: 13.7 MB/s sustained write, asvc_t: 25 ms
22 drives: 235 MB/s sustained write, asvc_t: 99 ms

With two paths connected, round-robin disabled, pin half the drives to 
one path (path A), the other half of the drives to the other path (path B):

22 drives: 2.2 GB/s sustained write (1.1 GB/s per path), asvc_t: 12 ms

Multipath support info:
mpathadm show mpath-support libmpscsi_vhci.so
mpath-support:  libmpscsi_vhci.so
Vendor:  Sun Microsystems
Driver Name:  scsi_vhci
Default Load Balance:  round-robin
Supported Load Balance Types:
round-robin
logical-block
Allows To Activate Target Port Group Access:  yes
Allows Path Override:  no
Supported Auto Failback Config:  1
Auto Failback:  on
Failback Polling Rate (current/max):  0/0
Supported Auto Probing Config:  0
Auto Probing:  NA
Probing Polling Rate (current/max):  NA/NA
Supported Devices:

Do I have to add an entry to this section of /kernel/drv/scsi_vhci.conf 
(if so, how do i find the information to add)?:


#
# For a device that has a GUID, discovered on a pHCI with mpxio enabled, 
vHCI access also depends on one of the scsi_vhci failover modules 
accepting the device.  The default way this occurs is by a failover 
module's probe implementation (sfo_device_probe) indicating the device 
is supported under scsi_vhci.  To override this default probe-oriented 
configuration in order to

#
#1) establish support for a device not currently accepted under 
scsi_vhci

#
# or 2) override the module selected by probe
#
# or 3) disable scsi_vhci support for a device
#
# you can add a 'scsi-vhci-failover-override' tuple, as documented in
# scsi_get_device_type_string(9F). For each tuple, the first part 
provides basic device identity information (vid/pid) and the second part 
selects the failover module by failover-module-name. If you want to 
disable scsi_vhci support for a device, use the special 
failover-module-name NONE.
# Currently, for each failover-module-name in 
'scsi-vhci-failover-override' (except NONE) there needs to be a
# misc/scsi_vhci/scsi_vhci_failover-module-name in 'ddi-forceload' 
above.

#
# 11
#   012345670123456789012345, failover-module-name or NONE
#   |-VID--||-PID--|,
# scsi-vhci-failover-override =
#   STK FLEXLINE 400, f_asym_lsi,
#   SUN T4,   f_tpgs,
#   CME XIRTEMMYS,NONE;
#
#END: FAILOVER_MODULE_BLOCK (DO NOT MOVE OR DELETE).

How can I get this working as expected (2.2 GB/s round-robin 
load-balanced across both paths)?


Thanks in advance for your assistance!

Josh Simon
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] does sharing an SSD as slog and l2arc reduces its life span?

2010-06-21 Thread Arne Jansen

Wes Felter wrote:

On 6/19/10 3:56 AM, Arne Jansen wrote:

while
thinking about using the OCZ Vertex 2 Pro SSD (which according
to spec page has supercaps built in) as a shared slog and L2ARC
device


IMO it might be better to use the smallest (50GB, maybe overprovisioned 
down to ~20GB) Vertex 2 Pro as slog and a much cheaper SSD (X25-M) as 
L2ARC.




No budget for this. Lucky if I can get the budget for the Vertex 2 Pro.
But if this sharing works (thanks to static wear leveling) it should be
sufficient to leave 10-20% space.


As Bob Friesenhahn said, you're assuming dynamic wear leveling but 
modern SSDs also use static wear leveling, so this problem doesn't 
exist. (Note that in this context the terms dynamic and static may 
not mean what you think they mean.)


Thanks for the term. Yes, this makes sense.

--
Arne
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs periodic writes on idle system [Re: Getting desktop to auto sleep]

2010-06-21 Thread Jürgen Keil
 Why does zfs produce a batch of writes every 30 seconds on opensolaris b134
 (5 seconds on a post b142 kernel), when the system is idle?

It was caused by b134 gnome-terminal. I had an iostat
running in a gnome-terminal window, and the periodic
iostat output is written to a temporary file by gnome-terminal.
This kept the hdd busy. Older gnome-terminals (b111)
didn't write terminal output to a disk file. Workaround is
to use xterm instead of b134 gnome-terminal. for a
command that periodically produces output
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] One dataset per user?

2010-06-21 Thread James C. McPherson

On 22/06/10 01:05 AM, Fredrich Maney wrote:

On Mon, Jun 21, 2010 at 8:59 AM, James C. McPherson
j...@opensolaris.org  wrote:
[...]

So when I'm
trying to figure out who I need to yell at because they're
using more than our acceptable limit (30Gb), I have to run
du -s /builds/[zyx]. And that takes time. Lots of time.

[...]

Why not just use quotas?


Quotas are not always appropriate.

Also, given our usage model, and wanting to provide
a service that gatelings can use to work on multiple
changesets concurrently, we figure that telling people


your limit is XGb, and we will publicly shame you if
you exceed it, then go and remove old stuff for you


is sufficiently hands-off. We're adults here, not children
or kiddies with no regard for our fellow engineers.


James C. McPherson
--
Senior Software Engineer, Solaris
Oracle
http://www.jmcp.homeunix.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SLOG striping?

2010-06-21 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Roy Sigurd Karlsbakk
 
 I plan to setup a new system with four Crucial RealSSD 256MB SSDs for
 both SLOG and L2ARC. The plan is to use four small slices for the SLOG,
 striping two mirrors. I have seen questions in here about the
 theoretical benefit of doing this, but I haven't seen any answers, just
 some doubt about the effect.
 
 Does anyone know if this will help gaining performance? Or will it be
 bad?

log and cache devices don't stripe.  You can add more than one, and the syntax 
is the same as if you were going to create a stripe in the main pool, but the 
end result is round-robin for log  cache devices.  It's tough to say how much 
this will benefit performance, unless you try it.

PS.  I'm only repeating what I believe to have learned by older posts on this 
list.  I didn't go read the code myself or anything like that, so I acknowledge 
some possibility that something I've said is wrong.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SLOG striping?

2010-06-21 Thread Richard Elling
On Jun 21, 2010, at 11:17 AM, Arne Jansen wrote:

 Roy Sigurd Karlsbakk wrote:
 Hi all
 I plan to setup a new system with four Crucial RealSSD 256MB SSDs for both 
 SLOG and L2ARC. The plan is to use four small slices for the SLOG, striping 
 two mirrors. I have seen questions in here about the theoretical benefit of 
 doing this, but I haven't seen any answers, just some doubt about the effect.
 Does anyone know if this will help gaining performance? Or will it be bad?
 
 I'm planning to do something similar, though I only want to install 2 devices.
 Some thoughts I had so far:
 
 - mirroring l2arc won't gain anything, as it doesn't contain any information
   that cannot be rebuilt if a device is lost.

This is not properly stated.  There is nothing in the L2ARC that is not also
in the pool, so it is simply a cache.  You are correct in that the L2ARC is
not rebuilt, but you miss the reason why the rebuild is not necessary (or
desired)

 Further, if a device is lost,
   the system just uses the remaining devices. So I wouldn't waste any space
   mirroring l2arc, I'll just stripe them.

Yes, this is the most economical solution.

 - the purpose of a zil device is to reduce latency.

The purpose of a separate device is to reduce latency for synchronous writes.

 Throughput is probably not
   an issue, especially if you configure your pool so that large writes go to
   the main pool. As 2 devices don't have a lower latency than one, I see no
   real point in striping slog devices.

Striping separate log devices can help for high transaction rate environments
or where the latency is constrained by bandwidth.

 - For slog you need SSD with supercap which are significantly more expensive
   than without. I'll try the OCZ Vertex 2 Pro in the next few days and can
   give a report how it performs. For L2ARC cheap MLC SSDs will do.

For a separate log, choose a device which honors cache flush commands.
Nonvolatile caches are nice, but rare -- HDDs do act well as ZIL devices, but
almost never have nonvolatile caches.
 -- richard

-- 
Richard Elling
rich...@nexenta.com   +1-760-896-4422
ZFS and NexentaStor training, Rotterdam, July 13-15, 2010
http://nexenta-rotterdam.eventbrite.com/




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SLOG striping?

2010-06-21 Thread Bob Friesenhahn

On Mon, 21 Jun 2010, Edward Ned Harvey wrote:


log and cache devices don't stripe.  You can add more than one, and


The term 'stripe' has been so outrageously severely abused in this 
forum that it is impossible to know what someone is talking about when 
they use the term.  Seemingly intelligent people continue to use wrong 
terminology because they think that protracting the confusion somehow 
helps new users.  We are left with no useful definition of 'striping'.


The slog does actually 'stripe' in some cases (depending on what 
'stripe' means) if you study how it works in sufficient detail.  At 
least that is what we have been told.


The slog does not do micro-striping, nano-striping, pico-striping, or 
femto-striping (not even at the sub-bit level) but it does do 
mega-striping.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss