[zfs-discuss] questions about block sizes

2008-04-20 Thread [EMAIL PROTECTED]
Hi,
ZFS can use block sizes up to 128k.  If the data is compressed, then 
this size will be larger when decompressed.
So, can the decompressed data be larger than 128k?  If so, does this 
also hold for metadata?  In other words,
can I have a 128k block on the disk with, for instance, indirect blocks 
(compressed blkptr_t data), that results in
more than 1024 blkptr_t when de-compressed?  If I had a very large 
amount of free space, I could try this
and see, but since I don't, I thought I'd ask here.

thanks,
max


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] backup for x4500?

2008-04-20 Thread Peter Tribble
On Wed, Apr 16, 2008 at 2:12 PM, Anna Langley [EMAIL PROTECTED] wrote:

  I've just joined this list, and am trying to understand the state of
  play with using free backup solutions for ZFS, specifically on a Sun
  x4500.
...
  Does anyone here have experience of this with multi-TB filesystems and
  any of these solutions that they'd be willing to share with me please?

My experience so far is that anything past a terabyte and 10 million files,
and any backup software struggles.

(I've largely been involved with commercial solutions, as we already have them.
They struggle as well.)

Generally, handling data volumes on this scale seems to require some way
of partitioning it into more easily digestible chunks. Either into separate
filesystems (zfs makes this easy) or, if that isn't possible, to structure the
data on a large filesystem into some sort of hierarchy so that it has top-level
directories that break it up into smaller chunks.

(Some sort of hashing scheme appears to be indicated. Unfortunately our
applications fall into two classes: everything in one huge directory,
or a hashing
scheme that results in many thousands of top-level directories.)

-- 
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] questions about block sizes

2008-04-20 Thread Mario Goebbels
 ZFS can use block sizes up to 128k.  If the data is compressed, then 
 this size will be larger when decompressed.

ZFS allows you to use variable blocksizes (sized a power of 2 from 512
to 128k), and as far as I know, a compressed block is put into the
smallest fitting one.

-mg
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris 10U5 ZFS features?

2008-04-20 Thread Peter Tribble
On Sun, Apr 20, 2008 at 5:48 AM, Vincent Fox [EMAIL PROTECTED] wrote:
 I would hope at least it has that giant FSYNC patch for ZFS already present?

  We ran into this issue and it nearly killed Solaris here in our Data Center 
 as a product it was such a bad experience.

  Fix was in 127728 (x86) and 127729 (Sparc).

I think you have sparc and x86 swapped over.

Looking at an S10U5 box I have here, 127728-06 is integrated.

-- 
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] backup for x4500?

2008-04-20 Thread Bob Friesenhahn
On Sun, 20 Apr 2008, Peter Tribble wrote:
  Does anyone here have experience of this with multi-TB filesystems and
  any of these solutions that they'd be willing to share with me please?

 My experience so far is that anything past a terabyte and 10 million files,
 and any backup software struggles.

What is the cause of the struggling?  Does the backup host run short 
of RAM or CPU?  If backups are incremental, is a large portion of time 
spent determining the changes to be backed up?  What is the relative 
cost of many small files vs large files?

How does 'zfs send' performance compare with a traditional incremental 
backup system?

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS raidz write performance:what to expect from SATA drives on ICH9R (A

2008-04-20 Thread Pascal Vandeputte
Hi,

First of all, my apologies for some of my posts appearing 2 or even 3 times 
here, the forum seems to be acting up, and although I received a Java exception 
for those double postings and they never appeared yesterday, apparently they 
still made it through eventually.

Back on topic: I fruitlessly tried to extract higher write speeds from the 
Seagate drives using an Addonics Silicon Image 3124 based SATA controller. I 
got exactly the same 21 MB/s for each drive (booted from a Knoppix cd).

I was planning on contacting Seagate support about this, but in the mean time I 
absolutely had to start using this system, even if it meant low write speeds. 
So I installed Solaris on a 1GB CF card and wanted to start configuring ZFS. I 
noticed that the first SATA disk was still shown with a different label by the 
format command (see my other post somewhere here). I tried to get rid of all 
disk labels (unsuccessfully), so I decided to boot Knoppix again and zero out 
the start and end sectors manually (erasing all GPT data).

Back to Solaris. I ran zpool create tank raidz c1t0d0 c1t1d0 c1t2d0 and tried 
a dd while monitoring with iostat -xn 1 to see the effect of not having a slice 
as part of the zpool (write cache etc). I was seeing write speeds in excess of 
50MB/s per drive! Whoa! I didn't understand this at all, because 5 minutes 
earlier I couldn't get more than 21MB/s in Linux using block sizes up to 
1048576 bytes. How could this be?

I decided to destroy the zpool and try to dd from Linux once more. This is when 
my jaw dropped to the floor:

[EMAIL PROTECTED]:~# dd if=/dev/zero of=/dev/sda bs=4096
^[250916+0 records in
250915+0 records out
1027747840 bytes (1.0 GB) copied, 10.0172 s, 103 MB/s

Finally, the write speed one should expect from these drives, according to 
various reviews around the web.

I still get a healthy 52MB/s at the end of the disk:

# dd if=/dev/zero of=/dev/sda bs=4096 seek=18300
dd: writing `/dev/sda': No space left on device
143647+0 records in
143646+0 records out
588374016 bytes (588 MB) copied, 11.2223 s, 52.4 MB/s

But how is it possible that I didn't get these speeds earlier? This may be part 
of the explanation:

[EMAIL PROTECTED]:~# dd if=/dev/zero of=/dev/sda bs=2048
101909+0 records in
101909+0 records out
208709632 bytes (209 MB) copied, 9.32228 s, 22.4 MB/s

Could it be that the firmware in these drives has issues with write requests of 
2048 bytes and smaller?

There must be more to it though, because I'm absolutely sure that I used larger 
block sizes when testing with Linux earlier (like 16384, 65536 and 1048576). 
It's impossible to tell, but maybe there was something fishy going on which was 
fixed by zero'ing parts of the drives. I absolutely cannot explain it otherwise.

Anyway, I'm still not seeing much more than 50MB/s per drive from ZFS, but I 
suspect the 2048 VS 4096 byte write block size effect may be influencing this. 
Having a slice as part of the pool earlier perhaps magnified this behavior as 
well. Caching or swap problems are certainly no issues now.

Any thoughts? I certainly want to thank everyone once more for your 
co-operation!

Greetings,

Pascal
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] backup for x4500?

2008-04-20 Thread Peter Tribble
On Sun, Apr 20, 2008 at 4:39 PM, Bob Friesenhahn
[EMAIL PROTECTED] wrote:
 On Sun, 20 Apr 2008, Peter Tribble wrote:
 
  My experience so far is that anything past a terabyte and 10 million
 files,
  and any backup software struggles.
 

  What is the cause of the struggling?  Does the backup host run short of
 RAM or CPU?  If backups are incremental, is a large portion of time spent
 determining the changes to be backed up?  What is the relative cost of many
 small files vs large files?

It's just the fact that, while the backup completes, it can take over 24 hours.
Clearly this takes you well over any backup window. It's not so much that the
backup software is defective; it's an indication that traditional notions of
backup need to be rethought.

I have one small (200G) filesystem that takes an hour to do an incremental
with no changes. (After a while, it was obvious we don't need to do that
every night.)

The real killer, I think, is sheer number of files. For us, 10 million
files isn't
excessive. I have one filesystem that's likely to have getting on for 200
million files by the time the project finishes. (Gulp!)

  How does 'zfs send' performance compare with a traditional incremental
 backup system?

I haven't done that particular comparison. (zfs send isn't useful for backup
- doesn't span tapes, doesn't hold an index of the files.) But I have compared
it against various varieties of tar for moving data between machines, and
the performance of 'zfs send'  wasn't particularly good - I ended up using
tar instead. (Maybe lots of smallish files again.)

For incrementals, it may be useful. But that presumes a replicated
configuration (preferably with the other node at a DR site), rather than
use in backups.

-- 
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] backup for x4500?

2008-04-20 Thread Bob Friesenhahn
On Sun, 20 Apr 2008, Peter Tribble wrote:

  What is the cause of the struggling?  Does the backup host run short of
 RAM or CPU?  If backups are incremental, is a large portion of time spent
 determining the changes to be backed up?  What is the relative cost of many
 small files vs large files?

 It's just the fact that, while the backup completes, it can take over 24 
 hours.
 Clearly this takes you well over any backup window. It's not so much that the
 backup software is defective; it's an indication that traditional notions of
 backup need to be rethought.

There is no doubt about that.  However, there are organizations with 
hundreds of terrabytes online and they manage to survive somehow.  I 
receive bug reports from people with 600K files in a single 
subdirectory. Terrabyte-sized USB drives are available now. When you 
say that the backup can take over 24 hours, are you talking only about 
the initial backup, or incrementals as well?

 I have one small (200G) filesystem that takes an hour to do an incremental
 with no changes. (After a while, it was obvious we don't need to do that
 every night.)

That is pretty outrageous.  It seems that your backup software is 
suspect since it must be severely assaulting the filesystem.  I am 
using 'rsync' (version 3.0) to do disk-to-disk network backups (with 
differencing) to a large Firewire type drive and have not noticed any 
performance issues.  I do not have 10 million files though (I have 
about half of that).

Since zfs supports really efficient snapshots, a backup system which 
is aware of snapshots can take snapshots and then backup safely even 
if the initial dump takes several days.  Really smart software could 
perform both initial dump and incremental dump simultaneously.  The 
minimum useful incremental backup interval would still be be limited 
to the time required to do one incremental backup.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Periodic ZFS maintenance?

2008-04-20 Thread Sam
I have a 10x500 disc file server with ZFS+, do I need to perform any sort of 
periodic maintenance to the filesystem to keep it in tip top shape?

Sam
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris 10U5 ZFS features?

2008-04-20 Thread Rob Windsor
Peter Tribble wrote:
 On Sun, Apr 20, 2008 at 5:48 AM, Vincent Fox [EMAIL PROTECTED] wrote:
 I would hope at least it has that giant FSYNC patch for ZFS already present?

  We ran into this issue and it nearly killed Solaris here in our Data Center 
 as a product it was such a bad experience.

  Fix was in 127728 (x86) and 127729 (Sparc).
 
 I think you have sparc and x86 swapped over.
 
 Looking at an S10U5 box I have here, 127728-06 is integrated.

Correct (127728 is sparc and 127729 is x86).

They're in the respective patch clusters now, as well as 10u5.

Rob++
-- 
|
|Internet: [EMAIL PROTECTED] __o
|Life: [EMAIL PROTECTED]_`\,_
|   (_)/ (_)
|They couldn't hit an elephant at this distance.
|  -- Major General John Sedgwick
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Periodic ZFS maintenance?

2008-04-20 Thread Ian Collins
Sam wrote:
 I have a 10x500 disc file server with ZFS+, do I need to perform any sort of 
 periodic maintenance to the filesystem to keep it in tip top shape?

   
No, but if there are problems, a periodic scrub will tip you off sooner
rather than later.

Ian

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs filesystem metadata checksum

2008-04-20 Thread asa
thank you. this is exactly what I was looking for.
This is for remote replication so it looks like I am out of luck.  
bummer.

Asa

On Apr 14, 2008, at 4:09 PM, Jeff Bonwick wrote:

 Not at present, but it's a good RFE.  Unfortunately it won't be
 quite as simple as just adding an ioctl to report the dnode checksum.
 To see why, consider a file with one level of indirection: that is,
 it consists of a dnode, a single indirect block, and several data  
 blocks.
 The indirect block contains the checksums of all the data blocks --  
 handy.
 The dnode contains the checksum of the indirect block -- but that's  
 not
 so handy, because the indirect block contains more than just  
 checksums;
 it also contains pointers to blocks, which are specific to the  
 physical
 layout of the data on your machine.  If you did remote replication  
 using
 zfs send | ssh elsewhere zfs recv, the dnode checksum on 'elsewhere'
 would not be the same.

 Jeff

 On Tue, Apr 08, 2008 at 01:45:16PM -0700, asa wrote:
 Hello all. I am looking to be able to verify my zfs backups in the
 most minimal way, ie without having to md5 the whole volume.

 Is there a way to get a checksum for a snapshot and compare it to
 another zfs volume, containing all the same blocks and verify they
 contain the same information? Even when I destroy the snapshot on the
 source?

 kind of like:

 zfs create tank/myfs
 dd if=/dev/urandom bs=128k count=1000 of=/tank/myfs/TESTFILE
 zfs snapshot tank/[EMAIL PROTECTED]
 zfs send tank/[EMAIL PROTECTED] | zfs recv tank/myfs_BACKUP

 zfs destroy tank/[EMAIL PROTECTED]

 zfs snapshot tank/[EMAIL PROTECTED]


 someCheckSumVodooFunc(tank/myfs)
 someCheckSumVodooFunc(tank/myfs_BACKUP)

 is there some zdb hackery which results in a metadata checksum usable
 in this scenario?

 Thank you all!

 Asa
 zfs worshiper
 Berkeley, CA
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Periodic ZFS maintenance?

2008-04-20 Thread Adam Leventhal
On Mon, Apr 21, 2008 at 10:41:35AM +1200, Ian Collins wrote:
 Sam wrote:
  I have a 10x500 disc file server with ZFS+, do I need to perform any sort 
  of periodic maintenance to the filesystem to keep it in tip top shape?
 
 No, but if there are problems, a periodic scrub will tip you off sooner
 rather than later.

Well, tip you off _and_ correct the problems if possible. I believe a long-
standing RFE has been to scrub periodically in the background to ensure that
correctable problems don't turn into uncorrectable ones.

Adam

-- 
Adam Leventhal, Fishworkshttp://blogs.sun.com/ahl
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss