Re: [zfs-discuss] dedupratio riddle

2010-03-18 Thread Henrik Johansson

Hello,

On 17 mar 2010, at 16.22, Paul van der Zwan paul.vanderz...@sun.com  
wrote:




On 16 mrt 2010, at 19:48, valrh...@gmail.com wrote:

Someone correct me if I'm wrong, but it could just be a  
coincidence. That is, perhaps the data that you copied happens to  
lead to a dedup ratio relative to the data that's already on there.  
You could test this out by copying a few gigabytes of data you know  
is unique (like maybe a DVD video file or something), and that  
should change the dedup ratio.


The first copy of that data was unique and even dedup is switched  
off for the entire pool so it seems a bug in the calculation of the

dedupratio or it used a method that is giving unexpected results.


I wonder if the dedup ratio is calculated by the contents of the DDT  
or by all the data contents of the whole pool, i'we only looked at the  
ratio for datasets which had dedup on for the whole lifetime. If the  
former, data added when it's switched off will never alter the ratio  
(until rewritten when with dedup on). The source should have the  
answer, but i'm on mail only for a few weeks.


It'a probably for the whole dataset, that makes the most sense, just a  
thought.


Regards

Henrik
http://sparcv9.blogspot.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-18 Thread Joerg Schilling
Damon Atkins damon_atk...@yahoo.com.au wrote:

 I vote for zfs needing a backup and restore command against a snapshot.

 backup command should output on stderr at least 
 Full_Filename SizeBytes Modification_Date_1970secSigned
 so backup software can build indexes and stdout contains the data.

This is something that does not belong to stderr but to a separate stream
that only allows to build a database. Stderr is used for error messages and 
warnings.


 The advantage of zfs providing the command is that as ZFS upgrades or new 
 features are added backup vendors do not need to re-test their code. Could 
 also mean that when encryption comes a long a property on pool could indicate 
 if it is OK to decrypt the filenames only as part of a backup.

 restore would work the same way except you would pass a filename or a 
 directory to restore etc. And backup software would send back the stream to 
 zfs restore command.

 The other alternative is for zfs to provide a standard API for backups like 
 Oracle does for RMAN.

You need to decide what you like to get..

From what I've read so far, zfs send is a block level api and thus cannot be 
used for real backups. As a result of being block level oriented, the 
interpretation of the data  is done zfs and thus every new feature could be 
copied without changing the format.

If you like to have a backup that is able to retrieve arbitrary single files, 
you need a backup api at file level. If you have such an api, you need to 
enhance the backup tool in many cases where the file metadata is enhanced in 
the filesystem.

We need to discuss to find the best archive formats for ZFS(NTFS) ACLs and for
extended file attributes.

I invite erybody to join star development at:

https://lists.berlios.de/mailman/listinfo/star-developers

and

http://mail.opensolaris.org/mailman/listinfo/star-discuss


Jörg

-- 
 EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
   j...@cs.tu-berlin.de(uni)  
   joerg.schill...@fokus.fraunhofer.de (work) Blog: 
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-18 Thread Joerg Schilling
Svein Skogen sv...@stillbilde.net wrote:

 Please, don't compare proper backup drives to that rotating head
 non-standard catastrophy... DDS was (in)famous for being a delayed-fuse
 tape-shredder.

DDS was a WOM (write only memory) type device. It did not report write errors
and it had many read errors.

Jörg

-- 
 EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
   j...@cs.tu-berlin.de(uni)  
   joerg.schill...@fokus.fraunhofer.de (work) Blog: 
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-18 Thread Svein Skogen
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 18.03.2010 10:31, Joerg Schilling wrote:
 Svein Skogen sv...@stillbilde.net wrote:
 
 Please, don't compare proper backup drives to that rotating head
 non-standard catastrophy... DDS was (in)famous for being a delayed-fuse
 tape-shredder.
 
 DDS was a WOM (write only memory) type device. It did not report write errors
 and it had many read errors.

Kind of like /dev/null. And about as useful in a restore situation.

//Svein

- -- 
- +---+---
  /\   |Svein Skogen   | sv...@d80.iso100.no
  \ /   |Solberg Østli 9| PGP Key:  0xE5E76831
   X|2020 Skedsmokorset | sv...@jernhuset.no
  / \   |Norway | PGP Key:  0xCE96CE13
|   | sv...@stillbilde.net
 ascii  |   | PGP Key:  0x58CD33B6
 ribbon |System Admin   | svein-listm...@stillbilde.net
Campaign|stillbilde.net | PGP Key:  0x22D494A4
+---+---
|msn messenger: | Mobile Phone: +47 907 03 575
|sv...@jernhuset.no | RIPE handle:SS16503-RIPE
- +---+---
 If you really are in a hurry, mail me at
   svein-mob...@stillbilde.net
 This mailbox goes directly to my cellphone and is checked
even when I'm not in front of my computer.
- 
 Picture Gallery:
  https://gallery.stillbilde.net/v/svein/
- 
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.12 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkuh83MACgkQSBMQn1jNM7bvrwCfUoIwu+YO8tfvb/mfSW063Wst
jK0AoIhFdKig2bZd3RSOyEgTPTN3YNng
=Gf9R
-END PGP SIGNATURE-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS/OSOL/Firewire...

2010-03-18 Thread erik.ableson
An interesting thing I just noticed here testing out some Firewire drives with 
OpenSolaris. 

Setup :
OpenSolaris 2009.06 and a dev version (snv_129)
2 500Gb Firewire 400 drives with integrated hubs for daisy-chaining (net: 4 
devices on the chain)
- one SATA bridge
- one PATA bridge

Created a zpool with both drives as simple vdevs
Started a zfs send/recv to backup a local filesystem

Watching zpool iostat I see that the total throughput maxes out at about 
10MB/s.  Thinking that one of the drives may be at fault, I stopped, destroyed 
the pool and created two separate pools from each drive. Restarting the 
send/recv to one disk and saw the same max throughput.  Tried to the other and 
got the same thing.

Then I started one send/recv to one disk, got the max right away, and started 
and send/recv to the second one and got about 4MB/second while the first 
operation dropped to about 6MB/second.

It would appear that the bus bandwidth is limited to about 10MB/sec (~80Mbps) 
which is well below the theoretical 400Mbps that 1394 is supposed to be able to 
handle.  I know that these two disks can go significantly higher since I was 
seeing 30MB/sec when they were used on Macs previously in the same daisy-chain 
configuration.

I get the same symptoms on both the 2009.06 and the b129 machines.

It's not a critical issue to me since these drives will eventually just be used 
for send/recv backups over a slow link, but it doesn't augur well for the day I 
need to restore data...

Anyone else seen this behaviour with Firewire devices and OpenSolaris?

Erik
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is this a sensible spec for an iSCSI storgage box?

2010-03-18 Thread Matt
Ultimately this could have 3TB of data on it and it is difficult to estimate 
the volume of changed data.  It would be nice to have a changes mirrored 
immediately but asynchronously so as not to impede the master.

The second box is likely to have a lower spec with fewer spindles for cost 
reasons.  Immediate failover taking second place to data preservation in the 
event of a failure of the master.  I had looked at this:

http://hub.opensolaris.org/bin/view/Project+avs/WebHome

But it did seem overkill to me and doesn't that mean that a resilver on the 
master will be replicated on the slave even if not required?

A zfs send/receive every 15 minutes might well have to do.

Matt.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] dedupratio riddle

2010-03-18 Thread Paul van der Zwan

On 18 mrt 2010, at 10:07, Henrik Johansson wrote:

 Hello,
 
 On 17 mar 2010, at 16.22, Paul van der Zwan paul.vanderz...@sun.com wrote:
 
 
 On 16 mrt 2010, at 19:48, valrh...@gmail.com wrote:
 
 Someone correct me if I'm wrong, but it could just be a coincidence. That 
 is, perhaps the data that you copied happens to lead to a dedup ratio 
 relative to the data that's already on there. You could test this out by 
 copying a few gigabytes of data you know is unique (like maybe a DVD video 
 file or something), and that should change the dedup ratio.
 
 The first copy of that data was unique and even dedup is switched off for 
 the entire pool so it seems a bug in the calculation of the
 dedupratio or it used a method that is giving unexpected results.
 
 I wonder if the dedup ratio is calculated by the contents of the DDT or by 
 all the data contents of the whole pool, i'we only looked at the ratio for 
 datasets which had dedup on for the whole lifetime. If the former, data added 
 when it's switched off will never alter the ratio (until rewritten when with 
 dedup on). The source should have the answer, but i'm on mail only for a few 
 weeks.
 
 It'a probably for the whole dataset, that makes the most sense, just a 
 thought.
 

It looks like the ratio only gets updated when dedup is switched on and freezes 
if you switch dedup off for the entire pool, like I did.

I tried to have a look at the source but it was way too complex to figure it 
out in the time I had available so far.

Best regards,
Paul van der Zwan
Sun Microsystems Nederland

 Regards
 
 Henrik
 http://sparcv9.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance on SATA Deive

2010-03-18 Thread Kashif Mumtaz
Hi,
I did another test on both machine. And write performance on ZFS extraordinary 
slow.
I did the following test  on both machines
For write  
time dd if=/dev/zero of=test.dbf bs=8k count=1048576
For read  
time dd if=/testpool/test.dbf of=/dev/null bs=8k

ZFS machine has 32GB memory
UFS machine  has 16GB memory


   UFS machine test ###

time dd if=/dev/zero of=test.dbf bs=8k count=1048576

1048576+0 records in
1048576+0 records out

real2m18.352s
user0m5.080s
sys 1m44.388s

#iostat -xnmpz 10

 r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.6  107.94.8 62668.4  0.0  6.70.1   61.9   1  83 c0t0d0
0.00.20.00.2  0.0  0.00.00.8   0   0 c0t0d0s5
0.6  107.74.8 62668.2  0.0  6.70.1   62.0   1  83 c0t0d0s7


For read 
# time dd if=test.dbf of=/dev/null bs=8k
1048576+0 records in
1048576+0 records out

real1m21.285s
user0m4.701s
sys 1m15.322s


For write it took 2.18 minutes and for read it took 1.21 minutes.

## ZFS machine test ##

# time dd if=/dev/zero of=test.dbf bs=8k count=1048576

1048576+0 records in
1048576+0 records out

real140m33.590s
user0m5.182s
sys 2m33.025s


extended device statistics  
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.08.20.0 1037.0  0.0 33.30.0 4062.3   0 100 c0t0d0
  0.08.20.0 1037.0  0.0 33.30.0 4062.3   0 100 c0t0d0s0


-
For read
#time dd if=test.dbf of=/dev/null bs=8k
1048576+0 records in
1048576+0 records out

real0m59.177s
user0m4.471s
sys 0m54.723s

For write it took  140 minutes and for read 59 seconds(less then UFS)



-
In ZFS data was being write around 1037 kw/s while disk remain busy 100%.

In UFS data was being written around 62668 kw/s while disk is busy at 83%


Kindly help me how can I tune the writing performance on ZFS?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] dedupratio riddle

2010-03-18 Thread Craig Alder
I remembered reading a post about this a couple of months back.  This post by 
Jeff Bonwick confirms that the dedupratio is calculated only on the data that 
you've attempted to deduplicate, i.e. only the data written whilst dedup is 
turned on - 
http://mail.opensolaris.org/pipermail/zfs-discuss/2009-December/034721.html.

Regards,

Craig
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance on SATA Deive

2010-03-18 Thread James C. McPherson

On 18/03/10 08:36 PM, Kashif Mumtaz wrote:

Hi,
I did another test on both machine. And write performance on ZFS extraordinary 
slow.


Which build are you running?

On snv_134, 2x dual-core cpus @ 3GHz and 8Gb ram (my desktop), I
see these results:


$ time dd if=/dev/zero of=test.dbf bs=8k count=1048576
1048576+0 records in
1048576+0 records out

real0m28.224s
user0m0.490s
sys 0m19.061s

This is a dataset on a straight mirrored pool, using two SATA2
drives (320Gb Seagate).

$ time dd if=test.dbf bs=8k of=/dev/null
1048576+0 records in
1048576+0 records out

real0m5.749s
user0m0.458s
sys 0m5.260s


James C. McPherson
--
Senior Software Engineer, Solaris
Sun Microsystems
http://www.jmcp.homeunix.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to manage scrub priority or defer scrub?

2010-03-18 Thread Richard Elling
On Mar 16, 2010, at 4:41 PM, Tonmaus wrote:
 Are you sure that you didn't also enable
 something which 
 does consume lots of CPU such as enabling some sort
 of compression, 
 sha256 checksums, or deduplication?
 
 None of them is active on that pool or in any existing file system. Maybe the 
 issue is particular to RAIDZ2, which is comparably recent. On that occasion: 
 does anybody know if ZFS reads all parities during a scrub?

Yes

 Wouldn't it be sufficient for stale corruption detection to read only one 
 parity set unless an error occurs there?

No, because the parity itself is not verified.

 The main concern that one should have is I/O
 bandwidth rather than CPU 
 consumption since software based RAID must handle
 the work using the 
 system's CPU rather than expecting it to be done by
 some other CPU. 
 There are more I/Os and (in the case of mirroring)
 more data 
 transferred.
 
 What I am trying to say is that CPU may become the bottleneck for I/O in case 
 of parity-secured stripe sets. Mirrors and simple stripe sets have almost 0 
 impact on CPU. So far at least my observations. Moreover, x86 processors not 
 optimized for that kind of work as much as i.e. an Areca controller with a 
 dedicated XOR chip is, in its targeted field.

All x86 processors you care about do XOR at memory bandwidth speed.
XOR is one of the simplest instructions to implement on a microprocessor.
The need for a dedicated XOR chip for older hardware RAID systems is
because they use very slow processors with low memory bandwidth. Cheap
is as cheap does :-)

However, the issue for raidz2 and above (including RAID-6) is that the 
second parity is a more computationally complex Reed-Solomon code, 
not a simple XOR. So there is more computing required and that would be 
reflected in the CPU usage.
 -- richard

ZFS storage and performance consulting at http://www.RichardElling.com
ZFS training on deduplication, NexentaStor, and NAS performance
Atlanta, March 16-18, 2010 http://nexenta-atlanta.eventbrite.com 
Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance on SATA Deive

2010-03-18 Thread Kashif Mumtaz
Hi, Thanks for your reply

BOTH are Sun Sparc T1000 machines.

Hard disk  1 TB sata on both

ZFS system Memory32 GB , Processor 1GH 6 core
os  Solaris 10 10/09 s10s_u8wos_08a SPARC
PatchCluster  level 142900-02(Dec 09 )


UFS machine 
Hard disk 1 TB sata
Memory 16 GB
Processor Processor 1GH 6 core

 Solaris 10 8/07 s10s_u4wos_12b SPARC
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to manage scrub priority or defer scrub?

2010-03-18 Thread Tonmaus
 On that
 occasion: does anybody know if ZFS reads all parities
 during a scrub?
 
 Yes
 
  Wouldn't it be sufficient for stale corruption
 detection to read only one parity set unless an error
 occurs there?
 
 No, because the parity itself is not verified.

Aha. Well, my understanding was that a scrub basically means reading all data, 
and compare with the parities, which means that these have to be re-computed. 
Is that correct?

Regards,

Tonmaus
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-18 Thread Edward Ned Harvey
 My own stuff is intended to be backed up by a short-cut combination --
 zfs send/receive to an external drive, which I then rotate off-site (I
 have three of a suitable size).  However, the only way that actually
 works so far is to destroy the pool (not just the filesystem) and
 recreate it from scratch, and then do a full replication stream.  That
 works most of the time, hangs about 1/5.  Anything else I've tried is
 much worse, with hangs approaching 100%.

Interesting, that's precisely what we do at work, and it works 100% of the
time.  Solaris 10u8

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-18 Thread Edward Ned Harvey
 From what I've read so far, zfs send is a block level api and thus
 cannot be
 used for real backups. As a result of being block level oriented, the

Weirdo.  The above cannot be used for real backups is obviously
subjective, is incorrect and widely discussed here, so I just say weirdo.
I'm tired of correcting this constantly.


 I invite erybody to join star development at:

We know, you have an axe to grind.  Don't insult some other product just
because it's not the one you personally work on.  Yours is better in some
ways, and zfs send is better in some ways.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] dedupratio riddle

2010-03-18 Thread Henrik Johansson

On 18 mar 2010, at 18.38, Craig Alder craig.al...@sun.com wrote:

I remembered reading a post about this a couple of months back.   
This post by Jeff Bonwick confirms that the dedupratio is calculated  
only on the data that you've attempted to deduplicate, i.e. only the  
data written whilst dedup is turned on - http://mail.opensolaris.org/pipermail/zfs-discuss/2009-December/034721.html 
.


Ah, I was on the right track then with the DDT then :) guess most  
people have it turned on/off from the begining until BP rewrite to  
ensure everything is deduplicated(which is probably a good idea).


Regards

Henrik
http://sparcv9.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-18 Thread Joerg Schilling
Edward Ned Harvey solar...@nedharvey.com wrote:

  I invite erybody to join star development at:

 We know, you have an axe to grind.  Don't insult some other product just
 because it's not the one you personally work on.  Yours is better in some
 ways, and zfs send is better in some ways.

If you have no technical issues to discuss, please stop insulting 
people/products.

We are on OpenSolaris and we don't like this kind of discussions on the mailing 
lists. Please act collaborative.

It has been widely discussed here already that the output of zfs send cannot be 
used as a backup.

Jörg

-- 
 EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
   j...@cs.tu-berlin.de(uni)  
   joerg.schill...@fokus.fraunhofer.de (work) Blog: 
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-18 Thread Carsten Aulbert
Hi all

On Thursday 18 March 2010 13:54:52 Joerg Schilling wrote:
 If you have no technical issues to discuss, please stop insulting
 people/products.
 
 We are on OpenSolaris and we don't like this kind of discussions on the
  mailing lists. Please act collaborative.
 

May I suggest this to both of you.


 It has been widely discussed here already that the output of zfs send
  cannot be used as a backup.

Depends on the exact definition of backup, e.g. if I may take this from 
wikipedia: 

In information technology, a backup or the process of backing up refers to 
making copies of data so that these additional copies may be used to restore 
the original after a data loss event.

In this regard zfs send *could* be a tool for a backup provided you have the 
means of decrypting/deciphering the blob coming out of it. OTOH if I used zfs 
send to replicate data to another machine/location together with zfs receive 
and put a label backup onto the receiver this would also count as a backup 
from where you can restore everything and/or partially.

In case of 'star' the blob coming out of it might also be useless if you don't 
have star (or other tools) around for deciphering it - very unlikely, but 
still possible ;)

Of course your (plural!) definition of backup may vary, thus I would propose 
first to settle on this before exchanging blows...

Cheers

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-18 Thread Joerg Schilling
Carsten Aulbert carsten.aulb...@aei.mpg.de wrote:

 In case of 'star' the blob coming out of it might also be useless if you 
 don't 
 have star (or other tools) around for deciphering it - very unlikely, but 
 still possible ;)

I invite you to inform yourself about star and to test it yourself.

Star's backups are completely based on POSIX standard archive formats. If you 
don't have star (which is not very probable as star is OpenSource), you may 
extract the incremental dumps from star using any standard POSIX compliant 
archiver. You just lose the information and ability to do incremental restores.

Jörg

-- 
 EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
   j...@cs.tu-berlin.de(uni)  
   joerg.schill...@fokus.fraunhofer.de (work) Blog: 
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-18 Thread Joerg Schilling
Darren J Moffat darren.mof...@oracle.com wrote:

 So exactly what makes it unsuitable for backup ?

 Is it the file format or the way the utility works ?

   If it is the format what is wrong with it ?

   If it is the utility what is needed to fix that ?

This  has been discussed many times in the past already.

If you archive the incremental star send data streams, you cannot
extract single files andit seems that this cannot be fixed without
introducing a different archive format. 

Star implements incremental backups and restores based on POSIX compliant 
archives.

Jörg

-- 
 EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
   j...@cs.tu-berlin.de(uni)  
   joerg.schill...@fokus.fraunhofer.de (work) Blog: 
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-18 Thread Svein Skogen
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 18.03.2010 14:12, Joerg Schilling wrote:
 Darren J Moffat darren.mof...@oracle.com wrote:
 
 So exactly what makes it unsuitable for backup ?

 Is it the file format or the way the utility works ?

  If it is the format what is wrong with it ?

  If it is the utility what is needed to fix that ?
 
 This  has been discussed many times in the past already.
 
 If you archive the incremental star send data streams, you cannot
 extract single files andit seems that this cannot be fixed without
 introducing a different archive format. 
 
 Star implements incremental backups and restores based on POSIX compliant 
 archives.

And how does your favourite tool handle zvols?

//Svein

- -- 
- +---+---
  /\   |Svein Skogen   | sv...@d80.iso100.no
  \ /   |Solberg Østli 9| PGP Key:  0xE5E76831
   X|2020 Skedsmokorset | sv...@jernhuset.no
  / \   |Norway | PGP Key:  0xCE96CE13
|   | sv...@stillbilde.net
 ascii  |   | PGP Key:  0x58CD33B6
 ribbon |System Admin   | svein-listm...@stillbilde.net
Campaign|stillbilde.net | PGP Key:  0x22D494A4
+---+---
|msn messenger: | Mobile Phone: +47 907 03 575
|sv...@jernhuset.no | RIPE handle:SS16503-RIPE
- +---+---
 If you really are in a hurry, mail me at
   svein-mob...@stillbilde.net
 This mailbox goes directly to my cellphone and is checked
even when I'm not in front of my computer.
- 
 Picture Gallery:
  https://gallery.stillbilde.net/v/svein/
- 
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.12 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkuiJ40ACgkQSBMQn1jNM7ZShgCfaSXEz2/SjsKwZYIJ6TAFRBzF
QkAAoJeH7tLHjgL5ECzHhAtlig+qtnat
=Pt1K
-END PGP SIGNATURE-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-18 Thread Joerg Schilling
joerg.schill...@fokus.fraunhofer.de (Joerg Schilling) wrote:


 This  has been discussed many times in the past already.

 If you archive the incremental star send data streams, you cannot
 extract single files andit seems that this cannot be fixed without
 introducing a different archive format. 

Sorry for the typo: this should be zfs send

Jörg

-- 
 EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
   j...@cs.tu-berlin.de(uni)  
   joerg.schill...@fokus.fraunhofer.de (work) Blog: 
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] lazy zfs destroy

2010-03-18 Thread Giovanni Tirloni
On Thu, Mar 18, 2010 at 1:19 AM, Chris Paul chris.p...@rexconsulting.netwrote:

 OK I have a very large zfs snapshot I want to destroy. When I do this, the
 system nearly freezes during the zfs destroy. This is a Sun Fire X4600 with
 128GB of memory. Now this may be more of a function of the IO device, but
 let's say I don't care that this zfs destroy finishes quickly. I actually
 don't care, as long as it finishes before I run out of disk space.

 So a suggestion for room for growth for the zfs suite is the ability to
 lazily destroy snapshots, such that the destroy goes to sleep if the cpu
 idle time falls under a certain percentage.


What build of OpenSolaris are you using ?

Is it nearly freezing during all the process or just at the end ?

There was another thread where a similar issue was discusses a week ago.

-- 
Giovanni
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-18 Thread Mike Gerdts
On Wed, Mar 17, 2010 at 9:15 AM, Edward Ned Harvey
solar...@nedharvey.com wrote:
 I think what you're saying is:  Why bother trying to backup with zfs
 send
 when the recommended practice, fully supportable, is to use other tools
 for
 backup, such as tar, star, Amanda, bacula, etc.   Right?

 The answer to this is very simple.
 #1  ...
 #2  ...

 Oh, one more thing.  zfs send is only discouraged if you plan to store the
 data stream and do zfs receive at a later date.

 If instead, you are doing zfs send | zfs receive onto removable media, or
 another server, where the data is immediately fed through zfs receive then
 it's an entirely viable backup technique.

Richard Elling made an interesting observation that suggests that
storing a zfs send data stream on tape is a quite reasonable thing to
do.  Richard's background makes me trust his analysis of this much
more than I trust the typical person that says that zfs send output is
poison.

http://opensolaris.org/jive/thread.jspa?messageID=465973tstart=0#465861

I think that a similar argument could be made for storing the zfs send
data streams on a zfs file system.  However, it is not clear why you
would do this instead of just zfs send | zfs receive.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-18 Thread Damon Atkins
A system with 100TB of data its 80% full and the a user ask their local system 
admin to restore a  directory with large files, as it was 30days ago with all 
Windows/CIFS ACLS and NFSv4/ACLS etc.

If we used zfs send, we need to go back to a zfs send some 30days ago, and find 
80TB of disk space to be able to restore it.

zfs send/recv is great for copy zfs from one zfs file system to another file 
system even across servers. 

But their needs to be a tool:
* To restore an individual file or a zvol (with all ACLs/properties)
* That allows backup vendors (which place backups on tape or disk or CD or ..) 
build indexes of what is contain in the backup (e.g. filename, owner, size 
modification dates, type (dir/file/etc) )
*Stream output suitable for devices like tape drives.
*Should be able to tell if the file is corrupted when being restored.
*May support recovery of corrupt data blocks within the stream.
*Preferable gnutar command-line compatible
*That admins can use to backup and transfer a subset of files e.g user home 
directory (which is not a file system) to another server or on to CD to be sent 
to their new office location, or 

For backup vendors is the idea for them to use NDMP protocol to backup ZFS and 
all its properties/ACLs? Or is a new tool required to achieve the above??

Cheers
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is this a sensible spec for an iSCSI storage box?

2010-03-18 Thread Scott Meilicke
It is hard, as you note, to recommend a box without knowing the load. How many 
linux boxes are you talking about?

I think having a lot of space for your L2ARC is a great idea.

Will you mirror your SLOG, or load balance them? I ask because perhaps one will 
be enough, IO wise. My box has one SLOG (X25-E) and can support about 2600 IOPS 
using an iometer profile that closely approximates my work load. My ~100 VMs on 
8 ESX boxes average around 1000 IOPS, but can peak 2-3x that during backups.

Don't discount NFS. I absolutely love NFS for management and thin provisioning 
reasons. Much easier (to me) than managing iSCSI, and performance is similar. I 
highly recommend load testing both iSCSI and NFS before you go live. Crash 
consistent backups of your VMs are possible using NFS, and recovering a VM from 
a snapshot is a little easier using NFS, I find.

Why not larger capacity disks?

Hopefully your switches support NIC aggregation?

The only issue I have had on 2009.06 using iSCSI (I had a windows VM directly 
attaching to an iSCSI 4T volume) was solved and back ported to 2009.06 (bug 
6794994).

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS/OSOL/Firewire...

2010-03-18 Thread David Dyer-Bennet

On Thu, March 18, 2010 04:50, erik.ableson wrote:


 It would appear that the bus bandwidth is limited to about 10MB/sec
 (~80Mbps) which is well below the theoretical 400Mbps that 1394 is
 supposed to be able to handle.  I know that these two disks can go
 significantly higher since I was seeing 30MB/sec when they were used on
 Macs previously in the same daisy-chain configuration.

 I get the same symptoms on both the 2009.06 and the b129 machines.

While it wasn't on Solaris, I must say that I've been consistently
disappointed by the performance of external 1394 drives on various Linux
boxes.  I invested in the interface cards for the boxes, and in the
external drives that supported Firewire, because everything said it
performed much better for disk IO, but in fact I  have never found it to
be the case.

Sort-of-glad to hear I don't have to wonder if I should be trying it on
Solaris.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-18 Thread Darren J Moffat

On 18/03/2010 12:54, joerg.schill...@fokus.fraunhofer.de wrote:

It has been widely discussed here already that the output of zfs send cannot be
used as a backup.


First define exactly what you mean by backup.  Please don't confuse 
backup and archival they aren't the same thing.


It would also help if the storage medium for the backup is defined and 
what the required access to it is - eg:

full restore only, incremental restores, per file restore.

The format is now committed and versioned.  It is the only format that 
saves all of the information about a ZFS dataset (including its dataset 
properties) not just the data files, ACLs, extended attributes and 
system attributes.   The steam itself even supports deduplication of the 
data blocks with in it.


So exactly what makes it unsuitable for backup ?

Is it the file format or the way the utility works ?

If it is the format what is wrong with it ?

If it is the utility what is needed to fix that ?

--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-18 Thread Darren J Moffat



On 18/03/2010 13:12, joerg.schill...@fokus.fraunhofer.de wrote:

Darren J Moffatdarren.mof...@oracle.com  wrote:


So exactly what makes it unsuitable for backup ?

Is it the file format or the way the utility works ?

If it is the format what is wrong with it ?

If it is the utility what is needed to fix that ?


This  has been discussed many times in the past already.



If you archive the incremental star send data streams, you cannot
extract single files andit seems that this cannot be fixed without
introducing a different archive format.


That assumes you are writing the 'zfs send' stream to a file or file 
like media.  In many cases people using 'zfs send' for they backup 
strategy are they are writing it back out using 'zfs recv' into another 
pool.  In those cases the files can even be restored over NFS/CIFS by 
using the .zfs/snapshot directory


For example:

http://hub.opensolaris.org/bin/download/User+Group+losug/w%2D2009/Open%2DBackup%2Dwith%2DNotes.pdf


Star implements incremental backups and restores based on POSIX compliant
archives.


ZFS filesystem have functionality beyond POSIX and some of that is 
really very important for some people (especially those using CIFS)


Does Star (or any other POSIX archiver) backup:
ZFS ACLs ?
ZFS system attributes (as used by the CIFS server and locally) ?
ZFS dataset properties (compression, checksum etc) ?

If it doesn't then it is providing an archive of the data in the 
filesystem, not a full/incremental copy of the ZFS dataset.  Which 
depending on the requirements of the backup may not be enough.  In 
otherwords you have data/metadata missing from your backup.


The only tool I'm aware of today that provides a copy of the data, and 
all of the ZPL metadata and all the ZFS dataset properties is 'zfs send'.


Just like (s)tar alone is not an enterprise backup tool, neither is 'zfs 
send'.  Both of them need some scripting and infrastructure mangement 
around them to make a backup solution suitable for a given deployment. 
In some deployments maybe the correct answer is both.


Each have their place (s)tar is a file/directory archiver, 'zfs send' on 
the other hand is a ZFS dataset (not just ZPL fileystems since it works 
on ZVOLs and all future dataset types too) replication tool that happens 
to write out a stream.


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-18 Thread Svein Skogen
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 18.03.2010 14:28, Darren J Moffat wrote:
 
 
 On 18/03/2010 13:12, joerg.schill...@fokus.fraunhofer.de wrote:
 Darren J Moffatdarren.mof...@oracle.com  wrote:

 So exactly what makes it unsuitable for backup ?

 Is it the file format or the way the utility works ?

 If it is the format what is wrong with it ?

 If it is the utility what is needed to fix that ?

 This  has been discussed many times in the past already.
 
 If you archive the incremental star send data streams, you cannot
 extract single files andit seems that this cannot be fixed without
 introducing a different archive format.
 
 That assumes you are writing the 'zfs send' stream to a file or file
 like media.  In many cases people using 'zfs send' for they backup
 strategy are they are writing it back out using 'zfs recv' into another
 pool.  In those cases the files can even be restored over NFS/CIFS by
 using the .zfs/snapshot directory

For the archival of files, most utilities can be ... converted (probably
by including additional metadata) to store those. The problem arises
with zvols (which is where I'm considering zfs send for backup anyways).
Since these volumes already are an all-or-nothing scenario restorewise,
that argument against using send/receive is flawed from the getgo. ( to
restore individual files from a zvol exported as an iSCSI disk, the
backup software would have to go on the machine mounting the iSCSI disk,
not as a backup of the zvol itself), which basically means that apart
from the rollback of snapshots, the send/receive backupstream is only
likely to be used in a disaster-rebuild situation, where restores all
of itself batch is a feature, not a bug. In that scenario restoring
everything _IS_ a feature, not a bug.

As to your two questions above, I'll try to answer them from my limited
understanding of the issue.

The format: Isn't fault tolerant. In the least. One single bit wrong and
the entire stream is invalid. A FEC wrapper would fix this.

The utility: Can't handle streams being split (in case of streams being
larger that a single backup media).

Both of these usually gets fended off with the was never meant as a
backup solution, you're trying to use it as ufsdump which it isn't, on
purpose, ufsdump is oldfashioned and similar arguments. Often
accompanied with creative suggestions such as using usb disks (Have you
ever tried getting several terabytes back and forth over USB?), and then
a helpful pointer to multiple-thousand-dollars worth of backup software,
as an excuse for why noone should be considering adding proper backup
features to zfs itself.

The last paragraph may sound like I'm taking a jab at specific people,
I'm not, really. But I've had my share of helpful people who have been
anything but helpful. Most of them taking care not to put their answers
on the lists, and quite a lot of them wanting to sell me services or
software (or rainwear such as macintoshes)

//Svein

- -- 
- +---+---
  /\   |Svein Skogen   | sv...@d80.iso100.no
  \ /   |Solberg Østli 9| PGP Key:  0xE5E76831
   X|2020 Skedsmokorset | sv...@jernhuset.no
  / \   |Norway | PGP Key:  0xCE96CE13
|   | sv...@stillbilde.net
 ascii  |   | PGP Key:  0x58CD33B6
 ribbon |System Admin   | svein-listm...@stillbilde.net
Campaign|stillbilde.net | PGP Key:  0x22D494A4
+---+---
|msn messenger: | Mobile Phone: +47 907 03 575
|sv...@jernhuset.no | RIPE handle:SS16503-RIPE
- +---+---
 If you really are in a hurry, mail me at
   svein-mob...@stillbilde.net
 This mailbox goes directly to my cellphone and is checked
even when I'm not in front of my computer.
- 
 Picture Gallery:
  https://gallery.stillbilde.net/v/svein/
- 
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.12 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkuiWrwACgkQSBMQn1jNM7YvSACg9+Nh3REdxML6cnc0cWDP5cbP
co4AoKjmeYx3o4/iQhkW7/tgvfF1qPvN
=bNBT
-END PGP SIGNATURE-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is this a sensible spec for an iSCSI storage box?

2010-03-18 Thread Matt
 It is hard, as you note, to recommend a box without
 knowing the load. How many linux boxes are you
 talking about?

This box will act as a backing store for a cluster of 3 or 4 XenServers with 
upwards of 50 VMs running at any one time.

 Will you mirror your SLOG, or load balance them? I
 ask because perhaps one will be enough, IO wise. My
 box has one SLOG (X25-E) and can support about 2600
 IOPS using an iometer profile that closely
 approximates my work load. My ~100 VMs on 8 ESX boxes
 average around 1000 IOPS, but can peak 2-3x that
 during backups.

I was planning to mirror them - mainly in the hope that I could hot swap a new 
one in the event that an existing one started to degrade.  I suppose I could 
start with one of each and convert to a mirror later although the prospect of 
losing either disk fills me with dread.
 
 Don't discount NFS. I absolutely love NFS for
 management and thin provisioning reasons. Much easier
 (to me) than managing iSCSI, and performance is
 similar. I highly recommend load testing both iSCSI
 and NFS before you go live. Crash consistent backups
 of your VMs are possible using NFS, and recovering a
 VM from a snapshot is a little easier using NFS, I
 find.

That's interesting feedback.  Given how easy it is to create NFS and iSCSI 
shares in osol, I'll definitely try both and see how they compare.
 
 Why not larger capacity disks?

We will run out of iops before we run out of space.  Its is more likely that we 
will gradually replace some of the SATA drives with 6gbps SAS drives to help 
with that and we've been mulling over using an LSI SAS 9211-8i controller to 
provide that upgrade path:

http://www.lsi.com/storage_home/products_home/host_bus_adapters/sas_hbas/internal/sas9211-8i/index.html

 Hopefully your switches support NIC aggregation?

Yes, we're hoping that a bond of 4 x NICs will cope.

Any opinions on the use of battery backed SAS adapters? - it also occurred to 
me after writing this that perhaps we could use one and configure it to report 
writes as being flushed to disk before they actually were.  That might give a 
slight edge in performance in some cases but I would prefer to have the data 
security instead, tbh.

Matt.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-18 Thread Svein Skogen
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 18.03.2010 18:21, Darren J Moffat wrote:
 As to your two questions above, I'll try to answer them from my limited
 understanding of the issue.

 The format: Isn't fault tolerant. In the least. One single bit wrong and
 the entire stream is invalid. A FEC wrapper would fix this.
 
 I've logged CR# 6936195 ZFS send stream while checksumed isn't fault
 tollerant to keep track of that.
 
 The utility: Can't handle streams being split (in case of streams being
 larger that a single backup media).
 
 I think it should be possible to store the 'zfs send' stream via NDMP
 and let NDMP deal with the tape splitting.  Though that may need
 additional software that isn't free (or cheap) to drive the parts of
 NDMP that are in Solaris.  I don't know enough about NDMP to be sure but
 I think that should be possible.
 

And here I was thinking that the NDMP stack basically was tapedev and/or
autoloader device via network? (i.e. not a backup utility at all but a
method for the software managing the backup to attach the devices)

//Svein

- -- 
- +---+---
  /\   |Svein Skogen   | sv...@d80.iso100.no
  \ /   |Solberg Østli 9| PGP Key:  0xE5E76831
   X|2020 Skedsmokorset | sv...@jernhuset.no
  / \   |Norway | PGP Key:  0xCE96CE13
|   | sv...@stillbilde.net
 ascii  |   | PGP Key:  0x58CD33B6
 ribbon |System Admin   | svein-listm...@stillbilde.net
Campaign|stillbilde.net | PGP Key:  0x22D494A4
+---+---
|msn messenger: | Mobile Phone: +47 907 03 575
|sv...@jernhuset.no | RIPE handle:SS16503-RIPE
- +---+---
 If you really are in a hurry, mail me at
   svein-mob...@stillbilde.net
 This mailbox goes directly to my cellphone and is checked
even when I'm not in front of my computer.
- 
 Picture Gallery:
  https://gallery.stillbilde.net/v/svein/
- 
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.12 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkuiYjEACgkQSBMQn1jNM7aQ0QCg6hAO3oCb0YcxBbRceTO1ubMv
OhEAoOgFoY903MrazWcRq2HtHH72LXjF
=8aWY
-END PGP SIGNATURE-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-18 Thread Svein Skogen
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 18.03.2010 18:28, Darren J Moffat wrote:
 On 18/03/2010 17:26, Svein Skogen wrote:
 The utility: Can't handle streams being split (in case of streams being
 larger that a single backup media).

 I think it should be possible to store the 'zfs send' stream via NDMP
 and let NDMP deal with the tape splitting.  Though that may need
 additional software that isn't free (or cheap) to drive the parts of
 NDMP that are in Solaris.  I don't know enough about NDMP to be sure but
 I think that should be possible.


 And here I was thinking that the NDMP stack basically was tapedev and/or
 autoloader device via network? (i.e. not a backup utility at all but a
 method for the software managing the backup to attach the devices)
 
 NDMP doesn't define the format of what goes on the tape so it can help
 put the 'zfs send' stream on the tape and thus deal with the lack of
 'zfs send' being able to handle tape media smaller than its stream size.

How would NDMP help with this any more than running a local pipe
splitting the stream (and handling the robotics for feeding in the next
tape)?

I can see the point of NDMP when the tape library isn't physically
connected to the same box as the zpools, but feeding local data via a
network servce seems to me to be just complicating things...

But that's just my opinion.

//Svein

- -- 
- +---+---
  /\   |Svein Skogen   | sv...@d80.iso100.no
  \ /   |Solberg Østli 9| PGP Key:  0xE5E76831
   X|2020 Skedsmokorset | sv...@jernhuset.no
  / \   |Norway | PGP Key:  0xCE96CE13
|   | sv...@stillbilde.net
 ascii  |   | PGP Key:  0x58CD33B6
 ribbon |System Admin   | svein-listm...@stillbilde.net
Campaign|stillbilde.net | PGP Key:  0x22D494A4
+---+---
|msn messenger: | Mobile Phone: +47 907 03 575
|sv...@jernhuset.no | RIPE handle:SS16503-RIPE
- +---+---
 If you really are in a hurry, mail me at
   svein-mob...@stillbilde.net
 This mailbox goes directly to my cellphone and is checked
even when I'm not in front of my computer.
- 
 Picture Gallery:
  https://gallery.stillbilde.net/v/svein/
- 
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.12 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkuiZBgACgkQSBMQn1jNM7Y3XACglyXvPiSd+iInxLaJVeY+lnUn
GiAAn0xjL7KWfbwfwHz7gaKA8FtGNZOb
=6yI8
-END PGP SIGNATURE-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-18 Thread Darren J Moffat

On 18/03/2010 17:34, Svein Skogen wrote:

How would NDMP help with this any more than running a local pipe
splitting the stream (and handling the robotics for feeding in the next
tape)?


Probably doesn't in that case.


I can see the point of NDMP when the tape library isn't physically
connected to the same box as the zpools, but feeding local data via a
network servce seems to me to be just complicating things...


Indeed if the drive is local then you it may be adding a layer you don't 
need.


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-18 Thread Svein Skogen
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 18.03.2010 18:37, Darren J Moffat wrote:
 On 18/03/2010 17:34, Svein Skogen wrote:
 How would NDMP help with this any more than running a local pipe
 splitting the stream (and handling the robotics for feeding in the next
 tape)?
 
 Probably doesn't in that case.
 
 I can see the point of NDMP when the tape library isn't physically
 connected to the same box as the zpools, but feeding local data via a
 network servce seems to me to be just complicating things...
 
 Indeed if the drive is local then you it may be adding a layer you don't
 need.

I'd think getting it to work locally make the utility in a fashion
that it can take a local library, OR a remote NDMP one, would be a
priority. Maybe more so for database customers of midsized size, that
doesn't have the luxury of 64 datacenters on multiple continents. ;)
Having the option of having catastrophy restore backups picked up
every morning (as a stack of 8 tapes?) and brought offsite for
safekeeping should probably be in the operating instructions for ... all
oracle-customers not having the luxury of a multisite setup. ;)

I'd suspect there are a whole lot more small customers than large ones. :p

//Svein

- -- 
- +---+---
  /\   |Svein Skogen   | sv...@d80.iso100.no
  \ /   |Solberg Østli 9| PGP Key:  0xE5E76831
   X|2020 Skedsmokorset | sv...@jernhuset.no
  / \   |Norway | PGP Key:  0xCE96CE13
|   | sv...@stillbilde.net
 ascii  |   | PGP Key:  0x58CD33B6
 ribbon |System Admin   | svein-listm...@stillbilde.net
Campaign|stillbilde.net | PGP Key:  0x22D494A4
+---+---
|msn messenger: | Mobile Phone: +47 907 03 575
|sv...@jernhuset.no | RIPE handle:SS16503-RIPE
- +---+---
 If you really are in a hurry, mail me at
   svein-mob...@stillbilde.net
 This mailbox goes directly to my cellphone and is checked
even when I'm not in front of my computer.
- 
 Picture Gallery:
  https://gallery.stillbilde.net/v/svein/
- 
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.12 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkuiZngACgkQSBMQn1jNM7Y4kgCg7EeQpvZVBzBCcGwI5mdG1vrr
7MYAoNj/EiUTQzycz4bM+wSs9HWZD589
=Cthq
-END PGP SIGNATURE-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is this a sensible spec for an iSCSI storage box?

2010-03-18 Thread Scott Meilicke
I was planning to mirror them - mainly in the hope that I could hot swap a new 
one in the event that an existing one started to degrade. I suppose I could 
start with one of each and convert to a mirror later although the prospect of 
losing either disk fills me with dread.

You do not need to mirror the L2ARC devices, as the system will just hit disk 
as necessary. Mirroring sounds like a good idea on the SLOG, but this has been 
much discussed on the forums.

 Why not larger capacity disks?

We will run out of iops before we run out of space.

Interesting. I find IOPS is more proportional to the number of VMs vs disk 
space. 

User: I need a VM that will consume up to 80G in two years, so give me an 80G 
disk.
Me: OK, but recall we can expand disks and filesystems on the fly, without 
downtime.
User: Well, that is cool, but 80G to start with please.
Me: sigh 

I also believe the SLOG and L2ARC will make using high RPM disks not as 
necessary. But, from what I have read, higher RPM disks will greatly help with 
scrubs and reslivers. Maybe two pools - one with fast mirrored SAS, another 
with big SATA. Or all SATA, but one pool with mirrors, another with raidz2. 
Many options. But measure to see what works for you. iometer is great for that, 
I find. 

Any opinions on the use of battery backed SAS adapters?

Surely these will help with performance in write back mode, but I have not done 
any hard measurements. Anecdotally my PERC5i in a Dell 2950 seemed to greatly 
help with IOPS on a five disk raidz. There are pros and cons. Search the 
forums, but off the top of my head 1) SLOGs are much larger than controller 
caches: 2) only synced write activity is cached in a ZIL, whereas a controller 
cache will cache everything, needed or not, thus running out of space sooner; 
3) SLOGS and L2ARC devices are specialized caches for read and write loads, vs. 
the all in one cache of a controller. 4) A controller *may* be faster, since it 
uses ram for the cache.

One of the benefits of a SLOG on the SAS/SATA bus is for a cluster. If one node 
goes down, the other can bring up the pool, check the ZIL for any necessary 
transactions, and apply them. To do this with battery backed cache, you would 
need fancy interconnects between the nodes, cache mirroring, etc. All of those 
things that SAN array products do. 

Sounds like you have a fun project.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS/OSOL/Firewire...

2010-03-18 Thread erik.ableson
On 18 mars 2010, at 16:58, David Dyer-Bennet wrote:
 On Thu, March 18, 2010 04:50, erik.ableson wrote:
 
 It would appear that the bus bandwidth is limited to about 10MB/sec
 (~80Mbps) which is well below the theoretical 400Mbps that 1394 is
 supposed to be able to handle.  I know that these two disks can go
 significantly higher since I was seeing 30MB/sec when they were used on
 Macs previously in the same daisy-chain configuration.
 
 I get the same symptoms on both the 2009.06 and the b129 machines.
 
 While it wasn't on Solaris, I must say that I've been consistently
 disappointed by the performance of external 1394 drives on various Linux
 boxes.  I invested in the interface cards for the boxes, and in the
 external drives that supported Firewire, because everything said it
 performed much better for disk IO, but in fact I  have never found it to
 be the case.
 
 Sort-of-glad to hear I don't have to wonder if I should be trying it on
 Solaris.

Ditto on the Linux front.  I was hoping that Solaris would be the exception, 
but no luck.  I wonder if Apple wouldn't mind lending one of the driver 
engineers to OpenSolaris for a few months...

Hmmm - that makes me wonder about the Darwin drivers - they're open sourced if 
I remember correctly.

Erik
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-18 Thread erik.ableson
On 18 mars 2010, at 15:51, Damon Atkins wrote:

 A system with 100TB of data its 80% full and the a user ask their local 
 system admin to restore a  directory with large files, as it was 30days ago 
 with all Windows/CIFS ACLS and NFSv4/ACLS etc.
 
 If we used zfs send, we need to go back to a zfs send some 30days ago, and 
 find 80TB of disk space to be able to restore it.
 
 zfs send/recv is great for copy zfs from one zfs file system to another file 
 system even across servers. 

Bingo ! The zfs send/recv scenario is for backup to another site or server.  
Backup in this context being a second copy stored independently from the 
original/master.

In one scenario here, we have individual sites that have zvol backed iSCSI 
volumes based on small, high performance 15K disks in mirror vdevs for the best 
performance.  I only keep about a week of daily snapshots locally.  I use ZFS 
send/recv to a backup system where I have lots of cheap, slow SATA drives in 
RAIDZ6 where I can afford to accumulate a lot more historical snapshots.

The interest is that you can use the same tools in an asymmetric manner, with 
high performance primary systems and one or a few big slow systems to store 
your backups.

Now for instances where I need to go back and get a file back off an NFS 
published filesystem, I can just go browse the .zfs/snapshot directory as 
required - or search for it or whatever I want. It's a live filesystem, not an 
inert object, dependent on external indices and hardware. I think that this is 
the fundamental disconnect in these discussions where people's ideas (or 
requirements) of what constitutes a backup are conflicting.

There are two major reasons and types of backups : one is to be able to 
minimize your downtime and get systems running again as quickly as possible. 
(the server's dead - make it come back!). The other is the ability to go back 
in time and rescue data that has become lost, corrupted or otherwise 
unavailable often with very granular requirements. (I need this particular 12K 
file from August 12, 2009) 

For my purposes, most of my backup strategies are oriented towards Business 
Uptime and minimal RTO. Given the data volume I work with using lots of virtual 
machines, tape is strictly an archival tool.  I just can't restore fast enough, 
and it introduces way to many mechanical dependencies into the process (well I 
could if I had an unlimited budget).  I can restart entire sites from a backup 
system by cloning a filesystem off a backup snapshot and presenting the volumes 
to the servers that need it. Granted, I won't have the performance of a primary 
site, but it will work and people can get work done. This responds to the first 
requirement of minimal downtime.

Going back in time is accomplished via lots of snapshots on the backup storage 
system. Which I can afford since I'm not using expensive disks here.

Then you move up the stack into the contents of the volumes and here's where 
you use your traditional backup tools to get data off the top of the stack - 
out of the OS that's handling the contents of the volume that understands it's 
particularities regarding ACLS and private volume formats like VMFS. 

zfs send/recv is for cloning data off the bottom of the stack without requiring 
the least bit of knowledge about what's happening on top. It's just like using 
any of the asynchronous replication tools that are used in SANs. And they make 
no bones about the fact that they are strictly a block-level thing and don't 
even ask them about the contents. At best, they will try to coordinate 
filesystem snapshots and quiescing operations with the block level snapshots.

Other backup tools take your data off the top of the stack in the context where 
it is used with a fuller understanding of the issues of stuff like ACLs.

When dealing with zvols, ZFS should have no responsibility in trying to 
understand what you do in there other than supplying the blocks.  VMFS, NTFS, 
btrfs, ext4, HFS+, XFS, JFS, ReiserFS and that's just the tip of the iceberg...

ZFS has muddied the waters by straddling the SAN and NAS worlds.

 But their needs to be a tool:
 * To restore an individual file or a zvol (with all ACLs/properties)
 * That allows backup vendors (which place backups on tape or disk or CD or 
 ..) build indexes of what is contain in the backup (e.g. filename, owner, 
 size modification dates, type (dir/file/etc) )
 *Stream output suitable for devices like tape drives.
 *Should be able to tell if the file is corrupted when being restored.
 *May support recovery of corrupt data blocks within the stream.
 *Preferable gnutar command-line compatible
 *That admins can use to backup and transfer a subset of files e.g user home 
 directory (which is not a file system) to another server or on to CD to be 
 sent to their new office location, or 

Highly incomplete and in no particular order :
Backup Exec
NetBackup
Bacula
Amanda/Zmanda
Retrospect
Avamar
Arkeia
Teradactyl

Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-18 Thread Darren J Moffat

As to your two questions above, I'll try to answer them from my limited
understanding of the issue.

The format: Isn't fault tolerant. In the least. One single bit wrong and
the entire stream is invalid. A FEC wrapper would fix this.


I've logged CR# 6936195 ZFS send stream while checksumed isn't fault 
tollerant to keep track of that.



The utility: Can't handle streams being split (in case of streams being
larger that a single backup media).


I think it should be possible to store the 'zfs send' stream via NDMP 
and let NDMP deal with the tape splitting.  Though that may need 
additional software that isn't free (or cheap) to drive the parts of 
NDMP that are in Solaris.  I don't know enough about NDMP to be sure but 
I think that should be possible.


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS/OSOL/Firewire...

2010-03-18 Thread Bob Friesenhahn

On Thu, 18 Mar 2010, erik.ableson wrote:


Ditto on the Linux front.  I was hoping that Solaris would be the 
exception, but no luck.  I wonder if Apple wouldn't mind lending one 
of the driver engineers to OpenSolaris for a few months...


Perhaps the issue is the filesystem rather than the drivers.  Apple 
users have different expectations regarding data loss than Solaris and 
Linux users do.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-18 Thread Svein Skogen
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 18.03.2010 17:49, erik.ableson wrote:
 Conceptually, think of a ZFS system as a SAN Box with built-in asynchronous 
 replication (free!) with block-level granularity.  Then look at your other 
 backup requirements and attach whatever is required to the top of the stack. 
 Remembering that everyone's requirements can be wildly or subtly different so 
 doing it differently is just adapting to the environment.  e.g. - I use ZFS 
 systems at home and work and the tools and scale are wildly different and 
 therefor so are the backup strategies – but that's mostly a budget issue at 
 home... :-)

I'll answer this with some perspective from my own  usage, so bear with
me if my setup isn't exactly enterprise-grade.


Which is exactly what I'm considering it. Basically a semi-intelligent
box that has a lot of disks, and an attached tape autoloader for dumping
then entire box died! data to. In my case the zvols contains vmfs (one
vmfs per vm, actually), so to the just-died you can add and brought the
rest of the network down with it. A typical one-man setup with as cheap
a budget as possible. And as I've posted some weeks ago, I've ... had to
test the restore bit. Having a hit f8 during boot, and boot directly
off the tape I can live without (using a bootcd instead). But missing
the I've now started restore, it'll manage itself and switch tapes when
necessary isn't something I really want to miss out on. For the record,
restore in my case takes appx 12 hours fully automated and managing a
good 60MB/Sec all the way.

Now, I'm willing to miss out on some of the fun, and add an extra server
for simply handling the iSCSI bit, and then set up that to dump itself
onto the Windows-server (who has the backup). That's a NOK12K investment
(I've checked) with an option for becoming NOK16K if I need more than 2
1000BaseT's bundled (if two isn't enough, adding another four sounds
about right). The main storage box has an LSI 8308elp with 8 1T5 7200rpm
barracudas in RAID50 with coldspares, and it's ... near enough my bed to
hear the alarm should a disk die on me. So strictly speaking I don't
need ZFS featurewise (the 8308 has patrol reads/scrubbing, and it pushes
sufficient IO speeds to outperform the four NICS in the box). So
probably for my needs QFS/SAM would've been a better solution. I really
don't know, since the available installer for QFS/SAM only plays nice
with Solaris10u8 and the four Marvell 88E8056 NICs on the mainboard
only has dladm aggregate capable support in the YGE driver (not the
yukonx legacy driver from Marvells site). Maybe someone knowing what
details of the QFS/SAM was opened up could answer me if that would work
at all for my needs. ;)

For day-to-day things snapshots perform admirably. Most of the bulk
storage isn't the iSCSI zvols, but SMB/CIFS shares containing mostly
.nef files. And all of those files are irreplacable. Not having a tape
backup of them is simply not an option. It's a hobbyist setup for
someone who ... used to be in the IT/Telco business, ended up on
disability pension for his troubles, and is trying to climb out of that
pension by starting over as a photographer. So backup software costing
me more than my total living budget for 6 months isn't likely to be a
viable solution for the problem either.

However, still looking from my own stance, there are several people
still in the business who consult with me on solutions on a much larger
scale than mine, and with a budget that'd make my eyes water. I'd really
like to point them towards (Open)Solaris, but without having personally
seen (Open)Solaris actually do the job I'm advertising it for, that
recommendation starts sounding increasingly hollow even to me. I suspect
a lot of such oneman-shops are (often less than more officially)
consulted for larger shops (old hand experience from the business counts
as something), and thus having a setup that works could (by Oracle) be
considered a good advertisement. If one-man-shops happen to be in the
situation that they really don't see the need for considering
alternatives because (Open)Solaris just plain works, then the
signal:noise ratio between people recommending (Open)Solaris and those
speaking about other things gets tweaked in (Open)Solaris'es favor.

//Svein
- -- 
- +---+---
  /\   |Svein Skogen   | sv...@d80.iso100.no
  \ /   |Solberg Østli 9| PGP Key:  0xE5E76831
   X|2020 Skedsmokorset | sv...@jernhuset.no
  / \   |Norway | PGP Key:  0xCE96CE13
|   | sv...@stillbilde.net
 ascii  |   | PGP Key:  0x58CD33B6
 ribbon |System Admin   | svein-listm...@stillbilde.net
Campaign|stillbilde.net | PGP Key:  0x22D494A4
+---+---
|msn messenger: | Mobile Phone: +47 907 03 575
|sv...@jernhuset.no | RIPE handle:SS16503-RIPE
- 

Re: [zfs-discuss] ZFS/OSOL/Firewire...

2010-03-18 Thread Carson Gaspar

Bob Friesenhahn wrote:

On Thu, 18 Mar 2010, erik.ableson wrote:


Ditto on the Linux front.  I was hoping that Solaris would be the 
exception, but no luck.  I wonder if Apple wouldn't mind lending one 
of the driver engineers to OpenSolaris for a few months...


Perhaps the issue is the filesystem rather than the drivers.  Apple 
users have different expectations regarding data loss than Solaris and 
Linux users do.


No, the Solaris firewire drivers are just broken. There is a long trail 
of bug reports that nobody has sufficient interest to fix.


And really, you think Linux is better about data loss than OS X? Please 
cite your sources, because given my experience with Linux, I call bullshit.


--
Carson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-18 Thread Miles Nordin
 djm == Darren J Moffat darren.mof...@oracle.com writes:

   djm I've logged CR# 6936195 ZFS send stream while checksumed
   djm isn't fault tollerant to keep track of that.

Other tar/cpio-like tools are also able to:

 * verify the checksums without extracting (like scrub)

 * verify or even extract the stream using a small userland tool that
   writes files using POSIX functions, so that you can build the tool
   on not-Solaris or extract the data onto not-ZFS.  The 'zfs send'
   stream can't be extracted without the solaris kernel, although yes
   the promise that newer kernels can extract older streams is a very
   helpful one.

   For example, ufsdump | ufsrestore could move UFS data into ZFS.
   but zfs send | zfs recv leaves us trapped on ZFS, even though
   migrating/restoring ZFS data onto a pNFS or Lustre backend is a
   realistic desire in the near term.

 * partial extract

Personally, I could give up the third bullet point.

Admittedly the second bullet is hard to manage while still backing up
zvol's, pNFS / Lustre data-node datasets, windows ACL's, properties,
snapshots/clones, u.s.w., so it's kind of...if you want both vanilla
and chocolate cake at once, you're both going to be unhappy.  But
there should at least be *a* tool that can copy from zfs to NFSv4
while preserving windows ACL's, and the tool should build on other
OS's that support NFSv4 and be capable of faithfully copying one NFSv4
tree to another preserving all the magical metadata.

I know it sounds like ACL-aware rsync is unrelated to your (Darren)
goal of tweaking 'zfs send' to be appropriate for backups, but for
example before ZFS I could make a backup on the machine with disks
attached to it or on an NFS client, and get exactly the same stream
out.  Likewise, I could restore into an NFS client.  Sticking to a
clean API instead of dumping the guts of the filesystem, made the old
stream formats more archival.

The ``I need to extract a ZFS dataset so large that my only available
container is a distributed Lustre filesystem'' use-case is pretty
squarely within the archival realm, is going to be urgent in a year or
so if it isn't already, and is accomodated by GNUtar, cpio, Amanda
(even old ufsrestore Amanda), and all the big commercial backup tools.

I admit it would be pretty damn cool if someone could write a purely
userland version of 'zfs send' and 'zfs recv' that interact with the
outside world using only POSIX file i/o and unix pipes but produce the
standard deduped-ZFS-stream format, even if the hypothetical userland
tool accomplishes this by including a FUSE-like amount of ZFS code and
thus being quite hard to build.  However, so far I don't think the
goals of a replication tool:

 ``make a faithful and complete copy, efficiently, or else give an
   error,''

are compatible with the goals of an archival tool:

 ``extract robustly far into the future even in non-ideal and hard to
   predict circumstances such as different host kernel, different
   destination filesystem, corrupted stream, limited restore space.''


pgpyWHuwbuWZf.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS/OSOL/Firewire...

2010-03-18 Thread Scott Meilicke
Apple users have different expectations regarding data loss than Solaris and 
Linux users do.

Come on, no Apple user bashing. Not true, not fair.

Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Validating alignment of NTFS/VMDK/ZFS blocks

2010-03-18 Thread Chris Murray
Please excuse my pitiful example. :-)

I meant to say *less* overlap between virtual machines, as clearly
block AABB occurs in both.

-Original Message-
From: zfs-discuss-boun...@opensolaris.org
[mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Chris Murray
Sent: 18 March 2010 18:45
To: zfs-discuss@opensolaris.org
Subject: [zfs-discuss] Validating alignment of NTFS/VMDK/ZFS blocks

Good evening,
I understand that NTFS  VMDK do not relate to Solaris or ZFS, but I was
wondering if anyone has any experience of checking the alignment of data
blocks through that stack?

I have a VMware ESX 4.0 host using storage presented over NFS from ZFS
filesystems (recordsize 4KB). Within virtual machine VMDK files, I have
formatted NTFS filesystems, block size 4KB. Dedup is turned on. When I
run ZDB -DD, i see a figure of unique blocks which is higher than I
expect, which makes me wonder whether any given 4KB in the NTFS
filesystem is perfectly aligned with a 4KB block in ZFS? 

e.g. consider two virtual machines sharing lots of the same blocks.
Assuming there /is/ a misalignment between NTFS  VMDK/VMDK  ZFS, if
they're not in the same order within NTFS, they don't align, and will
actually produce different blocks in ZFS:

VM1
NTFS1---2---3---


  
ZFS 1---2---3---4---

ZFS blocks are   AA, AABB and so on ...
Then in another virtual machine, the blocks are in a different order:

VM2
NTFS1---2---3---


  
ZFS 1---2---3---4---
ZFS blocks for this VM would be   CC, CCAA, AABB etc. So, no
overlap between virtual machines, and no benefit from dedup.

I may have it wrong, and there are indeed 30,785,627 unique blocks in my
setup, but if there's a mechanism for checking alignment, I'd find that
very helpful.

Thanks,
Chris
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-18 Thread Miles Nordin
 c == Miles Nordin car...@ivy.net writes:
 mg == Mike Gerdts mger...@gmail.com writes:

 c are compatible with the goals of an archival tool:

sorry, obviously I meant ``not compatible''.

mg Richard Elling made an interesting observation that suggests
mg that storing a zfs send data stream on tape is a quite
mg reasonable thing to do.  Richard's background makes me trust
mg his analysis of this much more than I trust the typical person
mg that says that zfs send output is poison.

ssh and tape are perfect, yet whenever ZFS pools become corrupt
Richard talks about scars on his knees from weak TCP checksums and
lying disk drives and about creating a ``single protection domain'' of
zfs checksums and redundancy instead of a bucket-brigade of fail of
tcp into ssh into $blackbox_backup_Solution(likely involving
unchecksummed disk storage) into SCSI/FC into ECC tapes.  At worst,
lying then or lying now?  At best, the whole thing still strikes me as
a pattern of banging a bunch of arcania into whatever shape's needed
to fit the conclusion that ZFS is glorious and no further work is
requried to make it perfect.

and there is still no way to validate a tape without extracting it,
which is last I worked with them, an optional but suggested part of
$blackbox_backup_Solution (and one which, incidentally, helps with the
bucket brigade problem Richard likes to point out).

and the other archival problems of constraining the restore
environment, and the fundamental incompatibility of goals between
faithful replication and robust, future-proof archiving from my last
post.


pgpLLsyZQuSKJ.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Validating alignment of NTFS/VMDK/ZFS blocks

2010-03-18 Thread Joseph Mocker
Not having specific knowledge of the VMDK format, I think what you are 
seeing is that there is extra data associated with maintaining the VMDK. 
If you are seeing lower dedup ratios than you would expect, it sounds 
like some of this extra data could be added to each block.


The VMDK spec appears to be open, perhaps a read through the spec might 
help understand what VMware is doing to the NTFS data - 
http://en.wikipedia.org/wiki/VMDK


  --joe

On 3/18/2010 11:44 AM, Chris Murray wrote:

Good evening,
I understand that NTFS  VMDK do not relate to Solaris or ZFS, but I was 
wondering if anyone has any experience of checking the alignment of data blocks 
through that stack?

I have a VMware ESX 4.0 host using storage presented over NFS from ZFS 
filesystems (recordsize 4KB). Within virtual machine VMDK files, I have 
formatted NTFS filesystems, block size 4KB. Dedup is turned on. When I run ZDB 
-DD, i see a figure of unique blocks which is higher than I expect, which makes 
me wonder whether any given 4KB in the NTFS filesystem is perfectly aligned 
with a 4KB block in ZFS?

e.g. consider two virtual machines sharing lots of the same blocks. Assuming there /is/ 
a misalignment between NTFS  VMDK/VMDK  ZFS, if they're not in the same order 
within NTFS, they don't align, and will actually produce different blocks in ZFS:

VM1
NTFS1---2---3---
 

   
ZFS 1---2---3---4---

ZFS blocks are   AA, AABB and so on ...
Then in another virtual machine, the blocks are in a different order:

VM2
NTFS1---2---3---
 

   
ZFS 1---2---3---4---
ZFS blocks for this VM would be   CC, CCAA, AABB etc. So, no overlap 
between virtual machines, and no benefit from dedup.

I may have it wrong, and there are indeed 30,785,627 unique blocks in my setup, 
but if there's a mechanism for checking alignment, I'd find that very helpful.

Thanks,
Chris
   


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance on SATA Deive

2010-03-18 Thread Daniel Carosone
On Thu, Mar 18, 2010 at 03:36:22AM -0700, Kashif Mumtaz wrote:
 I did another test on both machine. And write performance on ZFS 
 extraordinary slow.
 -
 In ZFS data was being write around 1037 kw/s while disk remain busy 100%.

That is, as you say, such an extraordinarily slow number that we have
to start at the very basics and eliminate fundamental problems. 

I have seen disks go bad in a way that they simply become very very
slow. You need to be sure that this isn't your problem.  Or perhaps
there's some hardware issue when the disks are used in parallel?

Check all the cables and connectors. Check logs for any errors.

Do you have the opportunity to try testing write speed with dd to the
raw disks?  If the pool is mirrored, can you detach one side at a
time? Test the detached disk with dd, and the pool with the other
disk, one at a time and then concurrently.  One slow disk will slow
down the mirror (but I don't recall seeing such an imbalance in your
iostat output either).

Do you have some spare disks to try other tests with? Try a ZFS
install on those, and see they also have the problem. Try a UFS
install on the current disks, and see if they still have the
problem.  Can you swap the disks between the T1000s and see if the
problem stays with the disks or the chassis?

You have a gremlin to hunt...

--
Dan.


pgprooWSK6vzu.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Validating alignment of NTFS/VMDK/ZFS blocks

2010-03-18 Thread Marc Nicholas
On Thu, Mar 18, 2010 at 2:44 PM, Chris Murray chrismurra...@gmail.comwrote:

 Good evening,
 I understand that NTFS  VMDK do not relate to Solaris or ZFS, but I was
 wondering if anyone has any experience of checking the alignment of data
 blocks through that stack?


NetApp has a great little tool called mbrscan/mbralignit's free, but I'm
not sure if NetApp customers are supposed to distribute it.

-marc
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance on SATA Deive

2010-03-18 Thread Svein Skogen
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 18.03.2010 21:31, Daniel Carosone wrote:

 You have a gremlin to hunt...

Wouldn't Sun help here? ;)

(sorry couldn't help myself, I've spent a week hunting gremlins until I
hit the brick wall of the MPT problem)

//Svein

- -- 
- +---+---
  /\   |Svein Skogen   | sv...@d80.iso100.no
  \ /   |Solberg Østli 9| PGP Key:  0xE5E76831
   X|2020 Skedsmokorset | sv...@jernhuset.no
  / \   |Norway | PGP Key:  0xCE96CE13
|   | sv...@stillbilde.net
 ascii  |   | PGP Key:  0x58CD33B6
 ribbon |System Admin   | svein-listm...@stillbilde.net
Campaign|stillbilde.net | PGP Key:  0x22D494A4
+---+---
|msn messenger: | Mobile Phone: +47 907 03 575
|sv...@jernhuset.no | RIPE handle:SS16503-RIPE
- +---+---
 If you really are in a hurry, mail me at
   svein-mob...@stillbilde.net
 This mailbox goes directly to my cellphone and is checked
even when I'm not in front of my computer.
- 
 Picture Gallery:
  https://gallery.stillbilde.net/v/svein/
- 
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.12 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkuikAIACgkQSBMQn1jNM7ZHpQCgn15+EsQzafhJw1HnhBWlTW9X
STUAoPvVS4bfq3E3N3Vg7JCuQ3M5+Am6
=YSRa
-END PGP SIGNATURE-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Heap corruption, possibly hotswap related (snv_134 with imr_sas, nvdisk drivers)

2010-03-18 Thread Kaya Bekiroğlu
2010/3/18 Kaya Bekiroğlu k...@bekiroglu.com:
 I first noticed this panic when conducting hot-swap tests.  However,
 now I see it every hour or so, even when all drives are attached and
 no ZFS resilvering is in progress.

It appears that these panics recur on my system when the
zfs-auto-snapshot service runs.  Disabling the hourly
zfs-auto-snapshot service prevents the panic.  The panic appears to be
load-related, which explains why it can also occur around hot swap,
but perhaps drivers are not to blame.

 Repro:
 - Pull a drive
 - Wait for drive absence to be acknowledged by fm
 - Physically re-add the drive

 This machine contains two LSI 9240-8i SAS controllers running imr_sas
 (the driver from LSI's website) and a umem NVRAM card running the
 nvdisk driver.  It also contains an SSD L2ARC.

 Mar 17 16:00:10 storage genunix: [ID 478202 kern.notice] kernel memory
 allocator:
 Mar 17 16:00:10 storage genunix: [ID 432124 kern.notice] buffer freed
 to wrong cache
 Mar 17 16:00:10 storage genunix: [ID 815666 kern.notice] buffer was
 allocated from kmem_alloc_160,
 Mar 17 16:00:10 storage genunix: [ID 530907 kern.notice] caller
 attempting free to kmem_alloc_48.
 Mar 17 16:00:10 storage genunix: [ID 563406 kern.notice]
 buffer=ff0715c74510  bufctl=0  cache: kmem_alloc_48
 Mar 17 16:00:10 storage unix: [ID 836849 kern.notice]
 Mar 17 16:00:10 storage ^Mpanic[cpu7]/thread=ff002de17c60:
 Mar 17 16:00:10 storage genunix: [ID 812275 kern.notice] kernel heap
 corruption detected
 Mar 17 16:00:10 storage unix: [ID 10 kern.notice]
 Mar 17 16:00:10 storage genunix: [ID 655072 kern.notice]
 ff002de17a70 genunix:kmem_error+501 ()
 Mar 17 16:00:10 storage genunix: [ID 655072 kern.notice]
 ff002de17ac0 genunix:kmem_slab_free+2d5 ()
 Mar 17 16:00:10 storage genunix: [ID 655072 kern.notice]
 ff002de17b20 genunix:kmem_magazine_destroy+fe ()
 Mar 17 16:00:10 storage genunix: [ID 655072 kern.notice]
 ff002de17b70 genunix:kmem_cache_magazine_purge+a0 ()
 Mar 17 16:00:10 storage genunix: [ID 655072 kern.notice]
 ff002de17ba0 genunix:kmem_cache_magazine_resize+32 ()
 Mar 17 16:00:10 storage genunix: [ID 655072 kern.notice]
 ff002de17c40 genunix:taskq_thread+248 ()
 Mar 17 16:00:10 storage genunix: [ID 655072 kern.notice]
 ff002de17c50 unix:thread_start+8 ()
 Mar 17 16:00:10 storage unix: [ID 10 kern.notice]
 Mar 17 16:00:10 storage genunix: [ID 672855 kern.notice] syncing file 
 systems...
 Mar 17 16:00:10 storage genunix: [ID 904073 kern.notice]  done
 Mar 17 16:00:11 storage genunix: [ID 111219 kern.notice] dumping to
 /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
 Mar 17 16:00:11 storage ahci: [ID 405573 kern.info] NOTICE: ahci0:
 ahci_tran_reset_dport port 0 reset port

 I'd file this directly to the bug database but I'm waiting for my
 account to be reactivated.

 zpool status:
  pool: tank
  state: ONLINE
  scrub: resilver completed after 0h0m with 0 errors on Thu Mar 18 10:07:12 
 2010
 config:

        NAME         STATE     READ WRITE CKSUM
        tank         ONLINE       0     0     0
          raidz1-0   ONLINE       0     0     0
            c6t15d1  ONLINE       0     0     0
            c6t14d1  ONLINE       0     0     0
            c6t13d1  ONLINE       0     0     0
          raidz1-1   ONLINE       0     0     0
            c6t12d1  ONLINE       0     0     0
            c6t11d1  ONLINE       0     0     0
            c6t10d1  ONLINE       0     0     0
          raidz1-2   ONLINE       0     0     0
            c6t9d1   ONLINE       0     0     0
            c6t8d1   ONLINE       0     0     0
            c5t9d1   ONLINE       0     0     0
        logs
          c7d1p0     ONLINE       0     0     0
        cache
          c4t0d0p2   ONLINE       0     0     0
        spares
          c5t8d1     AVAIL

 --
 Kaya




-- 
Kaya
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-18 Thread Ian Collins

On 03/18/10 12:07 PM, Khyron wrote:

Ian,

When you say you spool to tape for off-site archival, what software do 
you

use?


NetVault.

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to manage scrub priority or defer scrub?

2010-03-18 Thread Daniel Carosone
On Thu, Mar 18, 2010 at 05:21:17AM -0700, Tonmaus wrote:
  No, because the parity itself is not verified.
 
 Aha. Well, my understanding was that a scrub basically means reading
 all data, and compare with the parities, which means that these have
 to be re-computed. Is that correct? 

A scrub does, yes. It reads all data and metadata and checksums and
verifies they're correct.

A read of the pool might not - for example, it might:
 - read only one side of a mirror
 - read only one instance of a ditto block (metadata or copies1)
 - use cached copies of data or metadata; for a long-running system it
   might be a long time since some metadata blocks were ever read, if
   they're frequently used.

Roughly speaking, reading through the filesystem does the least work
possible to return the data. A scrub does the most work possible to
check the disks (and returns none of the data). 

For the OP:  scrub issues low-priority IO (and the details of how much
and how low have changed a few times along the version trail).
However, that prioritisation applies only within the kernel; sata disks
don't understand the prioritisation, so once the requests are with the
disk they can still saturate out other IOs that made it to the front
of the kernel's queue faster.  If you're looking for something to
tune, you may want to look at limiting the number of concurrent IO's
handed to the disk to try and avoid saturating the heads.  

You also want to confirm that your disks are on an NCQ-capable
controller (eg sata rather than cmdk) otherwise they will be severely
limited to processing one request at a time, at least for reads if you
have write-cache on (they will be saturated at the stop-and-wait
channel, long before the heads). 

--
Dan.

pgpoGKGntteaH.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] dedupratio riddle

2010-03-18 Thread Daniel Carosone
As noted, the ratio caclulation applies over the data attempted to
dedup, not the whole pool.  However, I saw a commit go by just in the
last couple of days about the dedupratio calculation being misleading,
though I didn't check the details.   Presumably this will be reported
differently from the next builds.  

--
Dan.

pgpH78u3PQOkc.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Validating alignment of NTFS/VMDK/ZFS blocks

2010-03-18 Thread Brian H. Nelson
I have only heard of alignment being discussed in reference to 
block-based storage (like DASD/iSCSI/FC). I'm not really sure how it 
would work out over NFS. I do see why you are asking though.


My understanding is that VMDK files are basically 'aligned' but the 
partitions inside of them may not be. You don't state what OS you are 
using in your guests. Windows XP/2003 and older create mis-alligned 
partitions by default (within a VMDK). You would need to manually 
create/adjust NTFS partitions in those cases in order for them to 
properly fall on a 4k boundary. This could be a cause of the problem you 
are describing.


This doc from VMware is aimed at block-based storage but it has some 
concepts that might be helpful as well as info on aligning guest OS 
partitions:

http://www.vmware.com/pdf/esx3_partition_align.pdf

-Brian


Chris Murray wrote:

Good evening,
I understand that NTFS  VMDK do not relate to Solaris or ZFS, but I was 
wondering if anyone has any experience of checking the alignment of data blocks 
through that stack?

I have a VMware ESX 4.0 host using storage presented over NFS from ZFS filesystems (recordsize 4KB). Within virtual machine VMDK files, I have formatted NTFS filesystems, block size 4KB. Dedup is turned on. When I run ZDB -DD, i see a figure of unique blocks which is higher than I expect, which makes me wonder whether any given 4KB in the NTFS filesystem is perfectly aligned with a 4KB block in ZFS? 


e.g. consider two virtual machines sharing lots of the same blocks. Assuming there /is/ 
a misalignment between NTFS  VMDK/VMDK  ZFS, if they're not in the same order 
within NTFS, they don't align, and will actually produce different blocks in ZFS:

VM1
NTFS1---2---3---


  
ZFS 1---2---3---4---

ZFS blocks are   AA, AABB and so on ...
Then in another virtual machine, the blocks are in a different order:

VM2
NTFS1---2---3---


  
ZFS 1---2---3---4---
ZFS blocks for this VM would be   CC, CCAA, AABB etc. So, no overlap 
between virtual machines, and no benefit from dedup.

I may have it wrong, and there are indeed 30,785,627 unique blocks in my setup, 
but if there's a mechanism for checking alignment, I'd find that very helpful.

Thanks,
Chris
  

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] lazy zfs destroy

2010-03-18 Thread Brandon High
On Wed, Mar 17, 2010 at 9:19 PM, Chris Paul chris.p...@rexconsulting.netwrote:

 OK I have a very large zfs snapshot I want to destroy. When I do this, the
 system nearly freezes during the zfs destroy. This is a Sun Fire X4600 with
 128GB of memory. Now this may be more of a function of the IO device, but
 let's say I don't care that this zfs destroy finishes quickly. I actually
 don't care, as long as it finishes before I run out of disk space.


Destroys are very slow with dedup enabled, and worse with larger data sets
when the dedupe table doesn't fit into RAM. Adding a l2arc may help if
that's the case.

-B

-- 
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS/OSOL/Firewire...

2010-03-18 Thread David Magda

On Mar 18, 2010, at 14:23, Bob Friesenhahn wrote:


On Thu, 18 Mar 2010, erik.ableson wrote:


Ditto on the Linux front.  I was hoping that Solaris would be the  
exception, but no luck.  I wonder if Apple wouldn't mind lending  
one of the driver engineers to OpenSolaris for a few months...


Perhaps the issue is the filesystem rather than the drivers.  Apple  
users have different expectations regarding data loss than Solaris  
and Linux users do.


Apple users (of which I am one) expect things to Just Work. :)

And there are Apple users and Apple users:

http://daringfireball.net/2010/03/ode_to_diskwarrior_superduper_dropbox

If anyone Apple is paying attention, perhaps you could re-open  
discussions with now-Oracle about getting ZFS into Mac OS. :)


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance on SATA Deive

2010-03-18 Thread James C. McPherson

On 18/03/10 10:05 PM, Kashif Mumtaz wrote:

Hi, Thanks for your reply

BOTH are Sun Sparc T1000 machines.

Hard disk  1 TB sata on both

ZFS system Memory32 GB , Processor 1GH 6 core
os  Solaris 10 10/09 s10s_u8wos_08a SPARC
PatchCluster  level 142900-02(Dec 09 )


UFS machine
Hard disk 1 TB sata
Memory 16 GB
Processor Processor 1GH 6 core

  Solaris 10 8/07 s10s_u4wos_12b SPARC


Since you are seeing this on a Solaris 10 update
release, you should log a call with your support
provider to get this investigated.


James C. McPherson
--
Senior Software Engineer, Solaris
Sun Microsystems
http://www.jmcp.homeunix.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-18 Thread David Magda

On Mar 18, 2010, at 15:00, Miles Nordin wrote:


Admittedly the second bullet is hard to manage while still backing up
zvol's, pNFS / Lustre data-node datasets, windows ACL's, properties,


Some commercial backup products are able to parse VMware's VMDK files  
to get file system information of them. The product sits on the VMware  
host, slurps in the files (which can be snapshotted for quiesced  
backups), and if you want to restore, you can either put back the  
entire VMDK or simply restore just the particular file(s) that are of  
interest.


Currently NetBackup only supports parsing NTFS for individual file  
restoration.


Theoretically, zvols could be added to the list of parsable container  
formats.


Though there would probably have to be some kind of API akin to  
VMware's VCB or vStorage.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance on SATA Deive

2010-03-18 Thread Erik Trimble

James C. McPherson wrote:

On 18/03/10 10:05 PM, Kashif Mumtaz wrote:

Hi, Thanks for your reply

BOTH are Sun Sparc T1000 machines.

Hard disk  1 TB sata on both

ZFS system Memory32 GB , Processor 1GH 6 core
os  Solaris 10 10/09 s10s_u8wos_08a SPARC
PatchCluster  level 142900-02(Dec 09 )


UFS machine
Hard disk 1 TB sata
Memory 16 GB
Processor Processor 1GH 6 core

  Solaris 10 8/07 s10s_u4wos_12b SPARC


Since you are seeing this on a Solaris 10 update
release, you should log a call with your support
provider to get this investigated.


James C. McPherson
--
Senior Software Engineer, Solaris
Sun Microsystems
http://www.jmcp.homeunix.com/blog
I would generally agree with James, with the caveaut that you could try 
to update to something a bit latter than Update 4.  That's pretty 
early-on in the ZFS deployment in Solaris 10.


At the minimum, grab the latest Recommended Patch set and apply that, 
then see what your issues are.




--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Performance on SATA Deive

2010-03-18 Thread Erik Trimble

Erik Trimble wrote:

James C. McPherson wrote:

On 18/03/10 10:05 PM, Kashif Mumtaz wrote:

Hi, Thanks for your reply

BOTH are Sun Sparc T1000 machines.

Hard disk  1 TB sata on both

ZFS system Memory32 GB , Processor 1GH 6 core
os  Solaris 10 10/09 s10s_u8wos_08a SPARC
PatchCluster  level 142900-02(Dec 09 )


UFS machine
Hard disk 1 TB sata
Memory 16 GB
Processor Processor 1GH 6 core

  Solaris 10 8/07 s10s_u4wos_12b SPARC


Since you are seeing this on a Solaris 10 update
release, you should log a call with your support
provider to get this investigated.


James C. McPherson
--
Senior Software Engineer, Solaris
Sun Microsystems
http://www.jmcp.homeunix.com/blog
I would generally agree with James, with the caveaut that you could 
try to update to something a bit latter than Update 4.  That's pretty 
early-on in the ZFS deployment in Solaris 10.


At the minimum, grab the latest Recommended Patch set and apply that, 
then see what your issues are.






Oh, nevermind. I'm an idiot.  I was looking at the UFS machine.



--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Validating alignment of NTFS/VMDK/ZFS blocks

2010-03-18 Thread Will Murnane
On Thu, Mar 18, 2010 at 14:44, Chris Murray chrismurra...@gmail.com wrote:
 Good evening,
 I understand that NTFS  VMDK do not relate to Solaris or ZFS, but I was 
 wondering if anyone has any experience of checking the alignment of data 
 blocks through that stack?
It seems to me there's a simple way to check.  Pick 4k of random data
(say, dd if=/dev/urandom of=newfile bs=4k count=1) and copy that onto
the VM filesystem.  Now write a little program to read the .vmdk file
and find that 4k of data.  Report the offset, and check offset % 4096
== 0.  This won't help you fix things, but it'll at least tell you
that something is wrong.

Will
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-18 Thread Edward Ned Harvey
  From what I've read so far, zfs send is a block level api and thus
  cannot be
  used for real backups. As a result of being block level oriented, the
 
 Weirdo.  The above cannot be used for real backups is obviously
 subjective, is incorrect and widely discussed here, so I just say
 weirdo.
 I'm tired of correcting this constantly.

I apologize if I was insulting, and it's clear that I was.  Seriously, I
apologize.  I should have thought about that more before I sent it, and I
should have been more considerate.

To clarify, more accurately, from a technical standpoint, what I meant:

There are circumstances, such as backup to removable disks, or time-critical
incremental data streams, where the performance of incremental zfs send
versus the performance of star, rsync, or any other file-based backup
mechanism, zfs send is the clear winner ... There are circumstances where
zfs send is enormously a winner.

There are other circumstances, such as writing to tape, where star, or tar,
or in other circumstances, where rsync or other tools may be the winner...
And I don't claim to know all the circumstances where something else beats
zfs send.  There probably are many circumstances where some other tool
beats zfs send in some way.  

The only point which I wish to emphasize is that it's not fair to say
unilaterally that one technique is always better than another technique.
Each one has their own pros/cons.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to manage scrub priority or defer scrub?

2010-03-18 Thread Tonmaus
Hello Dan,

Thank you very much for this interesting reply.

 roughly speaking, reading through the filesystem does
 the least work
 possible to return the data. A scrub does the most
 work possible to
 check the disks (and returns none of the data).

Thanks for the clarification. That's what I had thought.

 
 For the OP:  scrub issues low-priority IO (and the
 details of how much
 and how low have changed a few times along the
 version trail).

Is there any documentation about this, besides source code?

 However, that prioritisation applies only within the
 kernel; sata disks
 don't understand the prioritisation, so once the
 requests are with the
 disk they can still saturate out other IOs that made
 it to the front
 of the kernel's queue faster. 

I am not sure what you are hinting at. I initially thought about TCQ vs. NCQ 
when I read this. But I am not sure which detail of TCQ would allow for I/O 
discrimination that NCQ doesn't have. All I know about command cueing is that 
it is about optimising DMA strategies and optimising the handling of the I/O 
requests currently issued in respect to what to do first to return all data in 
the least possible time. (??)

 If you're looking for
 something to
 tune, you may want to look at limiting the number of
 concurrent IO's
 handed to the disk to try and avoid saturating the
 heads.

Indeed, that was what I had in mind. With the addition that I think it is as 
well necessary to avoid saturating other components, such as CPU.
 
 
 You also want to confirm that your disks are on an
 NCQ-capable
 controller (eg sata rather than cmdk) otherwise they
 will be severely
 limited to processing one request at a time, at least
 for reads if you
 have write-cache on (they will be saturated at the
 stop-and-wait
 channel, long before the heads). 

I have two systems here, a production system that is on LSI SAS (mpt) 
controllers, and another one that is on ICH-9 (ahci). Disks are SATA-2. The 
plan was that this combo will have NCQ support. On the other hand, do you know 
if there a method to verify if its functioning?

Best regards,

Tonmaus
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to manage scrub priority or defer scrub?

2010-03-18 Thread Daniel Carosone
On Thu, Mar 18, 2010 at 09:54:28PM -0700, Tonmaus wrote:
  (and the details of how much and how low have changed a few times
  along the version trail).  
 
 Is there any documentation about this, besides source code?

There are change logs and release notes, and random blog postings
along the way - they're less structured but often more informative.
There were some good descriptions about the scrub improvements 6-12
months ago.  The bugid's listed in change logs that mention scrub
should be pretty simple to find and sequence with versions.

  However, that prioritisation applies only within the kernel; sata
  disks don't understand the prioritisation, so once the requests
  are with the disk they can still saturate out other IOs that made
  it to the front of the kernel's queue faster.  
 
 I am not sure what you are hinting at. I initially thought about TCQ
 vs. NCQ when I read this. But I am not sure which detail of TCQ
 would allow for I/O discrimination that NCQ doesn't have. 

Er, the point was exactly that there is no discrimination, once the
request is handed to the disk.  If the internal-to-disk queue is
enough to keep the heads saturated / seek bound, then a new
high-priority-in-the-kernel request will get to the disk sooner, but
may languish once there.   

You'll get best overall disk throughput by letting the disk firmware
optimise seeks, but your priority request won't get any further
preference. 

Shortening the list of requests handed to the disk in parallel may
help, and still keep the channel mostly busy, perhaps at the expense
of some extra seek length and lower overall throughput.

You can shorten the number of outstanding IO's per vdev for the pool
overall, or preferably the number scrub will generate (to avoid
penalising all IO).  The tunables for each of these should be found
readily, probably in the Evil Tuning Guide.

 All I know about command cueing is that it is about optimising DMA
 strategies and optimising the handling of the I/O requests currently
 issued in respect to what to do first to return all data in the
 least possible time. (??)  

Mostly, as above it's about giving the disk controller more than one
thing to work on at a time, and having the issuance of a request and
its completion overlap with others, so the head movement can be
optimised and the controller channel can be busy with data transfer
for another while seeking.

Disks with write cache effectively do this for writes, by pretending
they complete immediately, but reads would block the channel until
satisfied.  (This is all for ATA which lacked this, before NCQ. SCSI
has had these capabilities for a long time).

  If you're looking for something to tune, you may want to look at
  limiting the number of concurrent IO's handed to the disk to try
  and avoid saturating the heads.
 
 Indeed, that was what I had in mind. With the addition that I think
 it is as well necessary to avoid saturating other components, such
 as CPU.  

Less important, since prioritisation can be applied there too, but
potentially also an issue.  Perhaps you want to keep the cpu fan
speed/noise down for a home server, even if the scrub runs longer.

 I have two systems here, a production system that is on LSI SAS
 (mpt) controllers, and another one that is on ICH-9 (ahci). Disks
 are SATA-2. The plan was that this combo will have NCQ support. On
 the other hand, do you know if there a method to verify if its
 functioning? 

AHCI should be fine.  In practice if you see actv  1 (with a small
margin for sampling error) then ncq is working.

--
Dan.



pgpIQ2VrNVyJl.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS send and receive corruption across a WAN link?

2010-03-18 Thread Rob
Can a ZFS send stream become corrupt when piped between two hosts across a WAN 
link using 'ssh'?

For example a host in Australia sends a stream to a host in the UK as follows:

# zfs send tank/f...@now | ssh host.uk receive tank/bar
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss