[zfs-discuss] ZFS-Performance: Raid-Z vs. Raid5/6 vs. mirrored

2008-06-22 Thread Ralf Bertling

Hi list,
as this matter pops up every now and then in posts on this list I just  
want to clarify that the real performance of RaidZ (in its current  
implementation) is NOT anything that follows from raidz-style data  
efficient redundancy or the copy-on-write design used in ZFS.


In a M-Way mirrored setup of N disks you get the write performance of  
the worst disk and a read performance that is the sum of all disks  
(for streaming and random workloads, while latency is not improved)
Apart from the write performance you get very bad disk utilization  
from that scenario.


In Raid-Z currently we have to distinguish random reads from streaming  
reads:
- Write performance (with COW) is (N-M)*worst single disk write  
performance since all writes are streaming writes by design of ZFS  
(which is N-M-1 times faste than mirrored)
- Streaming read performance is N*worst read performance of a single  
disk (which is identical to mirrored if all disks have the same speed)
- The problem with the current implementation is that N-M disks in a  
vdev are currently taking part in reading a single byte from a it,  
which i turn results in the slowest performance of N-M disks in  
question.


Now lets see if this really has to be this way (this implies no,  
doesn't it ;-)
When reading small blocks of data (as opposed to streams discussed  
earlier) the requested data resides on a single disk and thus reading  
it does not require to send read commands to all disks in the vdev.  
Without detailed knowledge of the ZFS code, I suspect the problem is  
the logical block size of any ZFS operation always uses the full  
stripe. If true, I think this is a design error.
Without that, random reads to a raid-z are almost as fast as mirrored  
data.
The theoretical disadvantages come from disks that have different  
speed (probably insignificant in any real-life scenario) and the  
statistical probability that by chance a few particular random reads  
do in fact have to access the same disk drive to be fulfilled. (In a  
mirrored setup, ZFS can choose from all idle devices, whereas in RAID- 
Z it has to wait for the disk that holds the data to be ready  
processing its current requests).
Looking more closely, this effect mostly affects latency (not  
performance) as random read-requests coming in should be distributed  
equally across all devices even bette if the queue of requests gets  
longer (this would however require ZFS to reorder requests for maximum  
performance.


Since this seems to be a real issue for many ZFS users, it would be  
nice if someone who has more time than me to look into the code, can  
comment on the amount of work required to boost RaidZ read performance.


Doing so would level the tradeoff between read- write- performance and  
disk utilization significantly.
Obviously if disk space (and resulting electricity costs) do not  
matter compared to getting maximum read performance, you will always  
be best of with 3 or even more way mirrors and a very large number of  
vdevs in your pool.


A further question that springs to mind is if copies=N is also used to  
improve read performance. If so, you could have some read-optimized  
filesystems in a pool while others use maximum storage efficiency (as  
for backups).


Regards,
ralf
--
Ralf Bertling ___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zpool create behaviour

2008-06-22 Thread Cesare
Hy,

I'm facing to a problem where I configure and create a zpool on my
test bed. The hardware is: T-5120 with Solaris10 with latest patch and
Clariion CX3 attached by 2 HBA. In this type of configuration every
LUN exported by Clariion is viewed 4 times by operating system.

If I configure the latest disk by using a controller the zfs create
doesn't working telling me that there is a devices currently
unavailable. If I'll use a different controller  (but is the same LUN
from the Clariion) I'll not encountered the problem and the raidz pool
is created. I'm willing to use that controller for balance the I/O
between HBA and storage processor.

The output of zpool create is:

- Not Working:
 zpool create -f tank raidz c2t5006016041E0222Ed3
c3t5006016141E0222Ed0 c2t5006016041E0222Ed1 c3t5006016141E0222Ed2
cannot create 'tank': one or more devices is currently unavailable

- Working
zpool create -f tank raidz c2t5006016041E0222Ed3 c3t5006016141E0222Ed0
c2t5006016041E0222Ed1 c2t5006016841E0222Ed2

zpool status
  pool: tank
 state: ONLINE
 scrub: none requested
config:

NAME   STATE READ WRITE CKSUM
tank   ONLINE   0 0 0
  raidz1   ONLINE   0 0 0
c2t5006016041E0222Ed3  ONLINE   0 0 0
c3t5006016141E0222Ed0  ONLINE   0 0 0
c2t5006016041E0222Ed1  ONLINE   0 0 0
c2t5006016841E0222Ed2  ONLINE   0 0 0

errors: No known data errors
rmims03#
-

Thanks for any suggestion.

Cesare


The disks and format output are:

-- powermt display dev=all
Pseudo name=emcpower2a
CLARiiON ID=CK [Storage (T-5120)]
Logical device ID=YY0002D230CB033EDD11 [LUN 12]
state=alive; policy=CLAROpt; priority=0; queued-IOs=0
Owner: default=SP B, current=SP A   Array failover mode: 1
==
 Host ---   - Stor -   -- I/O Path -  -- Stats ---
###  HW PathI/O PathsInterf.   ModeState  Q-IOs Errors
==
3074 [EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED]/SUNW,[EMAIL 
PROTECTED]/[EMAIL PROTECTED],0 c2tXX6041XXd3s0 SP A0
active  alive  0  0
3074 [EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED]/SUNW,[EMAIL 
PROTECTED]/[EMAIL PROTECTED],0 c2tXX6841XXd3s0 SP B0
active  alive  0  0
3072 [EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED]/SUNW,[EMAIL 
PROTECTED],1/[EMAIL PROTECTED],0 c3tXX6141XXd3s0 SP
A1 active  alive  0  0
3072 [EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED]/SUNW,[EMAIL 
PROTECTED],1/[EMAIL PROTECTED],0 c3tXX6941XXd3s0 SP
B1 active  alive  0  0

Pseudo name=emcpower0a
CLARiiON ID=CK [Storage (T-5120)]
Logical device ID=YY0070885B63033EDD11 [LUN 1]
state=alive; policy=CLAROpt; priority=0; queued-IOs=0
Owner: default=SP A, current=SP A   Array failover mode: 1
==
 Host ---   - Stor -   -- I/O Path -  -- Stats ---
###  HW PathI/O PathsInterf.   ModeState  Q-IOs Errors
==
3074 [EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED]/SUNW,[EMAIL 
PROTECTED]/[EMAIL PROTECTED],0 c2tXX6041XXd0s0 SP A0
active  alive  0  0
3074 [EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED]/SUNW,[EMAIL 
PROTECTED]/[EMAIL PROTECTED],0 c2tXX6841XXd0s0 SP B0
active  alive  0  0
3072 [EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED]/SUNW,[EMAIL 
PROTECTED],1/[EMAIL PROTECTED],0 c3tXX6141XXd0s0 SP
A1 active  alive  0  0
3072 [EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED]/SUNW,[EMAIL 
PROTECTED],1/[EMAIL PROTECTED],0 c3tXX6941XXd0s0 SP
B1 active  alive  0  0

Pseudo name=emcpower3a
CLARiiON ID=CK [Storage (T-5120)]
Logical device ID=YY009419CFDA033EDD11 [LUN 21]
state=alive; policy=CLAROpt; priority=0; queued-IOs=0
Owner: default=SP A, current=SP A   Array failover mode: 1
==
 Host ---   - Stor -   -- I/O Path -  -- Stats ---
###  HW PathI/O PathsInterf.   ModeState  Q-IOs Errors
==
3074 [EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED]/SUNW,[EMAIL 
PROTECTED]/[EMAIL PROTECTED],0 c2tXX6041XXd1s0 SP A0
active  alive  0  0
3074 [EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED]/SUNW,[EMAIL 
PROTECTED]/[EMAIL PROTECTED],0 

Re: [zfs-discuss] ZFS-Performance: Raid-Z vs. Raid5/6 vs. mirrored

2008-06-22 Thread Bob Friesenhahn
On Sun, 22 Jun 2008, Ralf Bertling wrote:

 Now lets see if this really has to be this way (this implies no, doesn't it 
 ;-)
 When reading small blocks of data (as opposed to streams discussed earlier) 
 the requested data resides on a single disk and thus reading it does not 
 require to send read commands to all disks in the vdev. Without detailed 
 knowledge of the ZFS code, I suspect the problem is the logical block size of 
 any ZFS operation always uses the full stripe. If true, I think this is a 
 design error.
 Without that, random reads to a raid-z are almost as fast as mirrored data.

Keep in mind that ZFS checksums all data, the checksum is stored in a 
different block than the data, and that if ZFS were to checksum on the 
stripe segment level, a lot more checksums would need to be stored. 
All these extra checksums would require more data access, more 
checksum computations, and more stress on the free block allocator 
since ZFS uses copy-on-write in all cases.

Perhaps the solution is to install more RAM in the system so that the 
stripe is fully cached and ZFS does not need to go back to disk prior 
to writing an update.  The need to read prior to write is clearly what 
kills ZFS update performance.  That is why using 8K blocks helps 
database performance.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS-Performance: Raid-Z vs. Raid5/6 vs. mirrored

2008-06-22 Thread Brian Hechinger
On Sun, Jun 22, 2008 at 10:37:34AM -0500, Bob Friesenhahn wrote:
 
 Perhaps the solution is to install more RAM in the system so that the 
 stripe is fully cached and ZFS does not need to go back to disk prior 
 to writing an update.  The need to read prior to write is clearly what 
 kills ZFS update performance.  That is why using 8K blocks helps 
 database performance.

How much do slogs/cache disks help in this case?  I'm thinking fast SSD or
fast iRAM style devices (I really wish Gigabyte would update the iRAM to
SATA 3.0 and more ram, but I keep saying that, and it keeps not happening).

-brian
-- 
Coding in C is like sending a 3 year old to do groceries. You gotta
tell them exactly what you want or you'll end up with a cupboard full of
pop tarts and pancake mix. -- IRC User (http://www.bash.org/?841435)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool create behaviour

2008-06-22 Thread Peter Tribble
On Sun, Jun 22, 2008 at 2:06 PM, Cesare [EMAIL PROTECTED] wrote:
 Hy,

 I'm facing to a problem where I configure and create a zpool on my
 test bed. The hardware is: T-5120 with Solaris10 with latest patch and
 Clariion CX3 attached by 2 HBA. In this type of configuration every
 LUN exported by Clariion is viewed 4 times by operating system.

 If I configure the latest disk by using a controller the zfs create
 doesn't working telling me that there is a devices currently
 unavailable. If I'll use a different controller  (but is the same LUN
 from the Clariion) I'll not encountered the problem and the raidz pool
 is created. I'm willing to use that controller for balance the I/O
 between HBA and storage processor.

My experience is that zfs + powerpath + clariion doesn't work.

(Try a 'zpool export' followed by 'zpool import' - do you get your pool back?)

For this I've had to get rid of powerpath and use mpxio instead.

The problem seems to be that the clariion arrays are active/passive and
zfs trips up if it tries to use one of the passive links. Using mpxio hides
this and works fine. And powerpath on the (active/active) DMX-4 seems
to be OK too.

-- 
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] What is this, who is doing it, and how do I get you to stop?

2008-06-22 Thread Brian Hechinger
Every time I post to this list, I get an AUTOREPLY from somebody who if
you ask me is up to no good, otherwise they would set a proper From: address
instead of spoofing my domain.

 Received: from mail01.csw-datensysteme.de ([62.153.225.98])
  by wiggum.4amlunch.net
  (Sun Java(tm) System Messaging Server 6.3-6.03 (built Mar 14 2008; 32bit))
  with ESMTP id [EMAIL PROTECTED] for [EMAIL PROTECTED];
  Sun, 22 Jun 2008 11:46:14 -0400 (EDT)
 Original-recipient: rfc822;[EMAIL PROTECTED]
 From: [EMAIL PROTECTED]
 Subject: AUTOREPLY Re: [zfs-discuss] ZFS-Performance: Raid-Z vs. Raid5/6 v...
 To: [EMAIL PROTECTED]
 Date: Sun, 22 Jun 2008 15:44:50 +
 Priority: normal
 X-Priority: 3 (Normal)
 Importance: normal
 X-Mailer: DvISE by Tobit Software, Germany (0241.444A46454D464F4D4E50),
  Mime Converter 101.20
 X-David-Sym: 0
 X-David-Flags: 0
 Message-id: [EMAIL PROTECTED]
 MIME-version: 1.0
 Content-type: text/plain; charset=iso-8859-1
 Content-transfer-encoding: 7Bit

I don't know who you are, and honestly I don't think I care, I'm going to just
start firewalling you.  I recommend everyone else on the list does the same.

-brian
-- 
Coding in C is like sending a 3 year old to do groceries. You gotta
tell them exactly what you want or you'll end up with a cupboard full of
pop tarts and pancake mix. -- IRC User (http://www.bash.org/?841435)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS-Performance: Raid-Z vs. Raid5/6 vs. mirrored

2008-06-22 Thread Will Murnane
On Sun, Jun 22, 2008 at 15:37, Bob Friesenhahn
[EMAIL PROTECTED] wrote:
 Keep in mind that ZFS checksums all data, the checksum is stored in a
 different block than the data, and that if ZFS were to checksum on the
 stripe segment level, a lot more checksums would need to be stored.
 All these extra checksums would require more data access, more
I think the question is more why segment in the first place?.  If
ZFS kept everything in recordsize-blocks that reside on one disk each
(or two places, if there is mirroring going on) and made parity just
another recordsized-block, one could avoid the penalty of seeking
every disk for every read.

The downside of this scheme would be deletes---if you actually free
blocks, then the parity is useless.  So you'd need to do something
like keep the old useless block around and put its neighbors in the
parity in a list of blocks to be re-paritied.  Then when new parity
has been regenerated, you can actually free the block.

An advantage this would have would be changing width of raidz/z2
groups: if another disk is added, one can mark every block as needing
new parity of width N+1, and let the re-parity process do its thing.
This would take a while, of course, but it would add the expandability
that people have been asking for.

 Perhaps the solution is to install more RAM in the system so that the
 stripe is fully cached and ZFS does not need to go back to disk prior
 to writing an update.
I don't think the problem is that the stripe is falling out of cache,
but that it costs so much to get it into memory in the first place.

Will
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS-Performance: Raid-Z vs. Raid5/6 vs. mirrored

2008-06-22 Thread Bob Friesenhahn
On Sun, 22 Jun 2008, Brian Hechinger wrote:

 On Sun, Jun 22, 2008 at 10:37:34AM -0500, Bob Friesenhahn wrote:

 Perhaps the solution is to install more RAM in the system so that the
 stripe is fully cached and ZFS does not need to go back to disk prior
 to writing an update.  The need to read prior to write is clearly what
 kills ZFS update performance.  That is why using 8K blocks helps
 database performance.

 How much do slogs/cache disks help in this case?  I'm thinking fast SSD or
 fast iRAM style devices (I really wish Gigabyte would update the iRAM to
 SATA 3.0 and more ram, but I keep saying that, and it keeps not happening).

To clarify, there are really two issues.  One is with updating small 
parts of a disk block without synchronous commit, while the other is 
updating parts of a disk block with synchronous commit.  Databases 
always want to sync their data.  When synchronous write is requested, 
the zfs in-memory recollection of that write can not be used for other 
purposes until the write is reported as completed since otherwise 
results could be incoherent.

More memory helps quite a lot in the cases where files are updated 
without requesting synchronization but is much less useful for the 
cases where the data needs to be committed to disk before proceeding.

Applications which want to update ZFS blocks and go fast at the same 
time will take care to make sure that the I/O is aligned to the start 
of the ZFS block, and that the I/O size is in multiples of the ZFS 
block size.  Testing shows that performance falls off a cliff for 
random I/O when the available ARC cache size is too small and the 
write is not properly aligned or the write is smaller than the ZFS 
block size.  If everything is perfectly aligned then ZFS still goes 
quite fast since it has no need to read the underlying data first. 
What this means for applications is that if they own the file, it 
may be worthwhile to read/write full ZFS blocks and do the final block 
update within the application rather than force ZFS to do it. 
However, if a small part of the the file is read and then immediately 
updated (i.e. record update), ZFS does a good job of caching in that 
case.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What is this, who is doing it, and how do I get you to stop?

2008-06-22 Thread Brian Hechinger
On Sun, Jun 22, 2008 at 06:11:21PM +0200, Volker A. Brandt wrote:
 
 Everyone who post gets this autoreply.

So what do the rest of you do?  Ignore it?

  From: [EMAIL PROTECTED]
 
 These people are not spoofing your domain, they set a From: header
 with no @domain.  Many MTAs append the local domain in this case.
 Maybe it's because they use a German umlaut in the From: string.

That's not a valid email address either, which is still wrong.

 Judging from the word Irrl?ufer, someone at their site has subscribed
 to zfs-discuss but does not exist there any more.

Then they should send back a message pointing out that the user is not longer
there, not just send the whole message back.

  Received: from mail01.csw-datensysteme.de ([62.153.225.98])
 It seems to be a German company (not too far away from me, too. :-)

H, are you for hire?  Maybe you could take a trip out there and deliver
some clue. ;)

   X-Mailer: DvISE by Tobit Software, Germany (0241.444A46454D464F4D4E50),
 
 As you can see, they use a commercial mail appliance.  It's probably
 just misconfigured.

Email appliances and Exchange servers are the bane of the internet.  You don't 
need
to know how to properly setup and email server because now anyone can do it! :)

  I don't know who you are, and honestly I don't think I care, I'm going to 
  just
  start firewalling you.  I recommend everyone else on the list does the same.
 
 No need to get uptight, just tell them politely.  Eventually they
 will figure it out.  :-)

If they had sent some sort of this user is no longer here message or some 
such I
would have been less likely to get all jumpy about it.  I'll redirect my 
misguided
anger at yet another poorly managed mail server at the poor sods who admin it.

-brian
-- 
Coding in C is like sending a 3 year old to do groceries. You gotta
tell them exactly what you want or you'll end up with a cupboard full of
pop tarts and pancake mix. -- IRC User (http://www.bash.org/?841435)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS-Performance: Raid-Z vs. Raid5/6 vs. mirrored

2008-06-22 Thread Bob Friesenhahn
On Sun, 22 Jun 2008, Will Murnane wrote:

 Perhaps the solution is to install more RAM in the system so that the
 stripe is fully cached and ZFS does not need to go back to disk prior
 to writing an update.
 I don't think the problem is that the stripe is falling out of cache,
 but that it costs so much to get it into memory in the first place.

That makes sense and is demonstrated by measurements.

The following iozone Kbytes/sec throughput numbers are from a mirrored 
array rather than Raid-Z but it shows how sensitive ZFS becomes to 
block size once cache memory requirements start to exceed available 
memory.  Since throughput is a function of record size and latency 
this presentation tends to amplify the situation.

   random  randombkwd  record  
stride 
reclen   write rewritereadrereadread   writeread rewriteread
  4  367953  143777   496378   48818662422521  836293  786866   
30269
  8  249827  166847   621371   489279   125204130  929394 1508139   
41568
 16  273266  160537   555350   513444   248956991  928915 2473915   
32016
 32  293463  168727   595128   678359   48666   15831  818962 3708512   
43561
 64  284213  168007   694747   514942   99565   95703  705144 3774777  
270612
128  273797  271583  1260035  1366050  187042  512312 1175683 4616660  
861089
256  273265  272916  1259814  1394034  250743  480186  219927 4708927  
587602
512  260630  262145   713797   743914  313429  535920  343209 2603492  
583120

Clearly random-read and random-write suffers the most.  Since 
sub-block updates cause ZFS to have to read the existing block, the 
random-write performance becomes bottlenecked by the random-read 
performance.  When the write is aligned and a multiple of the ZFS 
block size, then ZFS does not care what is already on disk and writes 
very quickly.  Notice that in the above results, random write became 
much faster than sequential write.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What is this, who is doing it, and how do I get you to stop?

2008-06-22 Thread Volker A. Brandt
  Everyone who post gets this autoreply.

 So what do the rest of you do?  Ignore it?

I for one do ignore it. :-)

   From: [EMAIL PROTECTED]
 
  These people are not spoofing your domain, they set a From: header
  with no @domain.  Many MTAs append the local domain in this case.
  Maybe it's because they use a German umlaut in the From: string.

 That's not a valid email address either, which is still wrong.

You're right, it's wrong.  But I didn't say it was right, right?

  Judging from the word Irrl?ufer, someone at their site has subscribed
  to zfs-discuss but does not exist there any more.

 Then they should send back a message pointing out that the user is not longer
 there, not just send the whole message back.

Preaching to the choir...

   Received: from mail01.csw-datensysteme.de ([62.153.225.98])
  It seems to be a German company (not too far away from me, too. :-)

 H, are you for hire?  Maybe you could take a trip out there and deliver
 some clue. ;)

I am, and I could, but my clue-bat's on loan to a Windows guy. :-)))

X-Mailer: DvISE by Tobit Software, Germany (0241.444A46454D464F4D4E50),
 
  As you can see, they use a commercial mail appliance.  It's probably
  just misconfigured.

 Email appliances and Exchange servers are the bane of the internet.

Amen!

 If they had sent some sort of this user is no longer here message or some 
 such I
 would have been less likely to get all jumpy about it.

Certainly!  See above, look for choir.

  I'll redirect my misguided
 anger at yet another poorly managed mail server at the poor sods who admin it.

Fair enough.


Regards -- Volker
-- 

Volker A. Brandt  Consulting and Support for Sun Solaris
Brandt  Brandt Computer GmbH   WWW: http://www.bb-c.de/
Am Wiesenpfad 6, 53340 Meckenheim Email: [EMAIL PROTECTED]
Handelsregister: Amtsgericht Bonn, HRB 10513  Schuhgröße: 45
Geschäftsführer: Rainer J. H. Brandt und Volker A. Brandt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool i/o error

2008-06-22 Thread Tomas Ögren
On 21 June, 2008 - Victor Pajor sent me these 0,9K bytes:

 Another thing
 
 config:
 
 zfs FAULTED   corrupted data
   raidz1ONLINE
 c1t1d0  ONLINE
 c7t0d0  UNAVAIL   corrupted data
 c7t1d0  UNAVAIL   corrupted data
 
 c70d0  c71d0 don't exist, it's normal. they are c2t0d0  c2t1d0
 
 AVAILABLE DISK SELECTIONS:
0. c1t0d0 DEFAULT cyl 4424 alt 2 hd 255 sec 63
   /[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci10f1,[EMAIL 
 PROTECTED]/[EMAIL PROTECTED],0
1. c1t1d0 SEAGATE-ST336754LW-0005-34.18GB
   /[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci10f1,[EMAIL 
 PROTECTED]/[EMAIL PROTECTED],0
2. c2t0d0 SEAGATE-ST336753LW-0005-34.18GB
   /[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci10f1,[EMAIL 
 PROTECTED],1/[EMAIL PROTECTED],0
3. c2t1d0 SEAGATE-ST336753LW-HPS2-33.92GB
   /[EMAIL PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci10f1,[EMAIL 
 PROTECTED],1/[EMAIL PROTECTED],0

zpool export zfs;zpool import zfs

/Tomas
-- 
Tomas Ögren, [EMAIL PROTECTED], http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] getting inodeno for zfs from vnode in vfs kernel

2008-06-22 Thread Anton B. Rang
If you really need the inode number, you should use the semi-public interface 
to retrieve it and call VOP_GETATTR.  This is what the rest of the kernel does 
when it needs attributes of a vnode.

See for example

http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/syscall/stat.c
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] mv between ZFSs on same zpool

2008-06-22 Thread Richard Elling
Yaniv Aknin wrote:
 Hi,

 Obviously, moving ('renaming') files between ZFSs on the same zpools is just 
 like a move between any other two filesystems, requiring full copy of the 
 data and deletion of the old file.

 I was wondering if there is (and why there isn't) an optimization inside ZFS, 
 thus that copy between ZFSs on the same zpool would be instantaneous or 
 near-so. After all, no blocks /really/ need to be copied, it's more about ZFS 
 finding out that it's a move within the same zpool and inode modifications at 
 the source / target.

 Thoughts?

   

We beat this one into the dust last December.  See thread:
http://mail.opensolaris.org/pipermail/zfs-discuss/2007-December/044975.html
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re-2: What is this, who is doing it, and how do I get you to stop?

2008-06-22 Thread Volker A. Brandt
[EMAIL PROTECTED] writes:
 ..sorry, there was a misconfiguration in our email-system. I've fixed it in 
 this moment...
 We apologize for any problems you had

 Andreas Gaida

Wow, that was fast!  And on a Sunday evening, too...
So, everything is fixed, and we are all happy now :-)


Regards -- Volker
-- 

Volker A. Brandt  Consulting and Support for Sun Solaris
Brandt  Brandt Computer GmbH   WWW: http://www.bb-c.de/
Am Wiesenpfad 6, 53340 Meckenheim Email: [EMAIL PROTECTED]
Handelsregister: Amtsgericht Bonn, HRB 10513  Schuhgröße: 45
Geschäftsführer: Rainer J. H. Brandt und Volker A. Brandt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] CIFS HA service with solaris 10 and SC 3.2

2008-06-22 Thread Marcelo Leal
Hello all,
 i would like to continue with this topic, and after doing some research 
about the topic, i have some (many) doubts, and maybe we could use this thread 
to give some responses to me and other users that can have the same questions...
 First, sorry to CC to many forums, but i think is a relevant topic to all of 
them, so...
 Second, would be nice to clear the understanding on some topics...
 1) What the difference between the smb server in solaris/opensolaris, and the 
new project CIFS? 
 2) I think samba.org has an implementation of CIFS protocol, to make a 
unix-like operating system to be a SMB/CIFS server. Why don't use that? license 
problems? the smbserver that is already on solaris/opensolaris is not a 
samba.org implementation?
 3) One of the goals to the CIFS Server project on OpenSolaris, is to support 
OpenSolaris as a storage operating system... we can not do it with samba.org 
implementation, or smbserver implementation that is already there?
 4) And the last one: ZFS has smb/cifs share/on/off capabilities, what is the 
relation of that with all of that??
 5) Ok, there is another question... there is a new projetc (data migration 
manager/dmm), that is intend to migrate NFS(GNU/Linux) services, and 
CIFS(MS/Windows) services to Solaris/Opensolaris and ZFS. That project is on 
storage community i think...but, how can we create a migration plan if we can 
not handle the services yet? or can?
 Ok, i'm very confuse, but is not just my fault, i think is a little 
complicated all this efforts without a glue, don't you agree?
 And in the top of all, is a need to have an agent to implement HA services on 
it... i want implement a SMB/CIFS server on solaris/opensolaris, and don't know 
if we have the solution in ou community or not, and if there is an agent to 
provide HA or we need to create a project to implement that...
 See, i need help :-)

 Ok, that's all!

 Leal.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] CIFS HA service with solaris 10 and SC 3.2

2008-06-22 Thread Tim
Samba cifs has been in opensolaris from day1.

No, it cannot be used to meet sun's end goal which is cifs INTEGRATION
with the core kernel.  Sun cifs supports windows acl's from the kernel
up.  Samba does not.




On 6/22/08, Marcelo Leal [EMAIL PROTECTED] wrote:
 Hello all,
  i would like to continue with this topic, and after doing some research
 about the topic, i have some (many) doubts, and maybe we could use this
 thread to give some responses to me and other users that can have the same
 questions...
  First, sorry to CC to many forums, but i think is a relevant topic to all
 of them, so...
  Second, would be nice to clear the understanding on some topics...
  1) What the difference between the smb server in solaris/opensolaris, and
 the new project CIFS?
  2) I think samba.org has an implementation of CIFS protocol, to make a
 unix-like operating system to be a SMB/CIFS server. Why don't use that?
 license problems? the smbserver that is already on solaris/opensolaris is
 not a samba.org implementation?
  3) One of the goals to the CIFS Server project on OpenSolaris, is to
 support OpenSolaris as a storage operating system... we can not do it with
 samba.org implementation, or smbserver implementation that is already there?
  4) And the last one: ZFS has smb/cifs share/on/off capabilities, what is
 the relation of that with all of that??
  5) Ok, there is another question... there is a new projetc (data migration
 manager/dmm), that is intend to migrate NFS(GNU/Linux) services, and
 CIFS(MS/Windows) services to Solaris/Opensolaris and ZFS. That project is on
 storage community i think...but, how can we create a migration plan if we
 can not handle the services yet? or can?
  Ok, i'm very confuse, but is not just my fault, i think is a little
 complicated all this efforts without a glue, don't you agree?
  And in the top of all, is a need to have an agent to implement HA services
 on it... i want implement a SMB/CIFS server on solaris/opensolaris, and
 don't know if we have the solution in ou community or not, and if there is
 an agent to provide HA or we need to create a project to implement that...
  See, i need help :-)

  Ok, that's all!

  Leal.


 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] mv between ZFSs on same zpool

2008-06-22 Thread Yaniv Aknin
Thanks for the reference.

I read that thread to the end, and saw there are some complex considerations 
regarding changing st_dev on an open file, but no decision. Despite this 
complexity, I think the situation is quite brain damanged - I'm moving large 
files between ZFSs all the time, otherwise I can't separate the tree as I'd 
like to, and it's fairly annoying to think these blocks are basically not doing 
anything at 50mb/s.

I think even a hack will do for a start (do I hear 'zmv').

Thoughts? Objections?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] raid card vs zfs

2008-06-22 Thread kevin williams
digg linked to an article related to the apple port of ZFS 
(http://www.dell.com/content/products/productdetails.aspx/print_1125?c=uscs=19l=ens=dhss).
  I dont have a mac but was interested in ZFS. 

The article says that ZFS eliminates the need for a RAID card and is faster 
because the striping is running on the main cpu rather than an old chipset on a 
card.  My question is, is this true?  Can I install opensolaris with zfs and 
stripe and mirror a bunch of sata disc for a home NAS server?  I sure would 
like to do that but the cost of the good raid cards has put me off; maybe this 
is the solution.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] raid card vs zfs

2008-06-22 Thread ian
kevin williams writes: 

 digg linked to an article related to the apple port of ZFS 
 (http://www.dell.com/content/products/productdetails.aspx/print_1125?c=uscs=19l=ens=dhss).
   I dont have a mac but was interested in ZFS.  
 
 The article says that ZFS eliminates the need for a RAID card and is faster 
 because the striping is running on the main cpu rather than an old chipset on 
 a card.  My question is, is this true?  Can I install opensolaris with zfs 
 and stripe and mirror a bunch of sata disc for a home NAS server?  I sure 
 would like to do that but the cost of the good raid cards has put me off; 
 maybe this is the solution.
  
The cache may give RAID cards an edge, but ZFS gives near platter speeds for 
its various configurations.  The Thumper is a perfect example of a ZFS 
appliance. 

So yes, you can use OpenSolaris for a home NAS server. 

Ian 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] raid card vs zfs

2008-06-22 Thread James C. McPherson
kevin williams wrote:
 digg linked to an article related to the apple port of ZFS
 (http://www.dell.com/content/products/productdetails.aspx/print_1125?c=uscs=19l=ens=dhss).
 I dont have a mac but was interested in ZFS.
 
 The article says that ZFS eliminates the need for a RAID card and is
 faster because the striping is running on the main cpu rather than an old
 chipset on a card.  My question is, is this true?  Can I install
 opensolaris with zfs and stripe and mirror a bunch of sata disc for a
 home NAS server?  I sure would like to do that but the cost of the good
 raid cards has put me off; maybe this is the solution.

Hi Kevin,
Personally, I'd argue that if you've got a RAID card or array
to use, you should take advantage of it _in conjunction_ with
using ZFS.

If you don't have a RAID card or array, then still use ZFS so
you get the speed and data integrity benefits. There are several
threads on the zfs-discuss mailing list which talk about the
configs that people have used to setup home NAS servers. The
searchable pages are at http://www.opensolaris.org/jive/forum.jspa?forumID=80


You might want to have a look at these two wikidocs:

http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide
http://www.solarisinternals.com/wiki/index.php/ZFS_Configuration_Guide


cheers,
James C. McPherson
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp   http://www.jmcp.homeunix.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] raid card vs zfs

2008-06-22 Thread Tim
It is indeed true and yoi can.



On 6/22/08, kevin williams [EMAIL PROTECTED] wrote:
 digg linked to an article related to the apple port of ZFS
 (http://www.dell.com/content/products/productdetails.aspx/print_1125?c=uscs=19l=ens=dhss).
  I dont have a mac but was interested in ZFS.

 The article says that ZFS eliminates the need for a RAID card and is faster
 because the striping is running on the main cpu rather than an old chipset
 on a card.  My question is, is this true?  Can I install opensolaris with
 zfs and stripe and mirror a bunch of sata disc for a home NAS server?  I
 sure would like to do that but the cost of the good raid cards has put me
 off; maybe this is the solution.


 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] raid card vs zfs

2008-06-22 Thread Brian Hechinger
On Mon, Jun 23, 2008 at 11:13:49AM +1200, [EMAIL PROTECTED] wrote:
 
 The cache may give RAID cards an edge, but ZFS gives near platter speeds for 
 its various configurations.  The Thumper is a perfect example of a ZFS 
 appliance. 

I get very acceptable performance out of my Sun Ultra-80 with 4x 450Mhz US-II
CPUs and 4GB RAM.  I can't wait to upgrade to something a tad faster. :)

 So yes, you can use OpenSolaris for a home NAS server. 

Absolutely, yes.  And you don't need to newest, shiniest hardware to do it
either.  If you are building some super media streaming monster box, then,
well, sure, you do.  If you are building your average home NAS box though,
it really isn't nessesary to get the latest and greatest hardware.

That being said, the best thing you can do for a machine running ZFS is to
give it as much ram as it is able to hold.

-brian
-- 
Coding in C is like sending a 3 year old to do groceries. You gotta
tell them exactly what you want or you'll end up with a cupboard full of
pop tarts and pancake mix. -- IRC User (http://www.bash.org/?841435)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] CIFS HA service with solaris 10 and SC 3.2

2008-06-22 Thread Boyd Adamson
Marcelo Leal [EMAIL PROTECTED] writes:
 Hello all,

 [..]
 
  1) What the difference between the smb server in solaris/opensolaris,
  and the new project CIFS?

What you refer to as the smb server in solaris/opensolaris is in fact
Samba, which sits on top of a plain unix system. This has limitations in
the areas of user accounts and ACLs, among others. The new CIFS project
provides a CIFS server that's integrated from the ground up, including
the filesystem itself.

  2) I think samba.org has an implementation of CIFS protocol, to make
  a unix-like operating system to be a SMB/CIFS server. Why don't use
  that? license problems? the smbserver that is already on
  solaris/opensolaris is not a samba.org implementation?

See above

  3) One of the goals to the CIFS Server project on OpenSolaris, is to
  support OpenSolaris as a storage operating system... we can not do it
  with samba.org implementation, or smbserver implementation that is
  already there?

See above

  4) And the last one: ZFS has smb/cifs share/on/off capabilities,
  what is the relation of that with all of that??

Those properties are part of the administrative interface for the new
in-kernel CIFS server.

  5) Ok, there is another question... there is a new projetc (data
  migration manager/dmm), that is intend to migrate NFS(GNU/Linux)
  services, and CIFS(MS/Windows) services to Solaris/Opensolaris and
  ZFS. That project is on storage community i think...but, how can we
  create a migration plan if we can not handle the services yet? or
  can?

I'm not sure what you mean by we can not handle the services yet. As
mentioned above, OpenSolaris now has 2 separate ways to provide SMB/CIFS
services, and has had NFS support since... oh, about when Sun invented
NFS, I'd guess. :) And it's way more solid than Linux's

  Ok, i'm very confuse, but is not just my fault, i think is a little
  complicated all this efforts without a glue, don't you agree?

  And in the top of all, is a need to have an agent to implement HA
  services on it... i want implement a SMB/CIFS server on
  solaris/opensolaris, and don't know if we have the solution in ou
  community or not, and if there is an agent to provide HA or we need
  to create a project to implement that...

Have you seen this?
http://opensolaris.org/os/community/ha-clusters/ohac/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SMC Webconsole 3.1 and ZFS Administration 1.0 - stacktraces in snv_b89

2008-06-22 Thread Jean-Paul Rivet
Just a note:

Setting compression to gzip on a zpool breaks the GUI with a similar type of 
error -

Application Error
com.iplanet.jato.NavigationException: Exception encountered during forward
Root cause = [java.lang.IllegalArgumentException: No enum const class 
com.sun.zfs.common.model.CompressionProperty$Compression.gzip]

With compression set to on/lzjb, this error does not occur and the GUI works 
fine.

Cheers, JP
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss