Re: [zfs-discuss] Deduplication Memory Requirements

2011-05-05 Thread Constantin Gonzalez

Hi,

On 05/ 5/11 03:02 PM, Edward Ned Harvey wrote:

From: Garrett D'Amore [mailto:garr...@nexenta.com]

We have customers using dedup with lots of vm images... in one extreme
case they are getting dedup ratios of over 200:1!


I assume you're talking about a situation where there is an initial VM image, 
and then to clone the machine, the customers copy the VM, correct?
If that is correct, have you considered ZFS cloning instead?

When I said dedup wasn't good for VM's, what I'm talking about is:  If there is data 
inside the VM which is cloned...  For example if somebody logs into the guest OS and then 
does a "cp" operation...  Then dedup of the host is unlikely to be able to 
recognize that data as cloned data inside the virtual disk.


ZFS cloning and ZFS dedup are solving two problems that are related, but
different:

- Through Cloning, a lot of space can be saved in situations where it is
  known beforehand that data is going to be used multiple times from multiple
  different "views". Virtualization is a perfect example of this.

- Through Dedup, space can be saved in situations where the duplicate nature
  of data is not known, or not known beforehand. Again, in virtualization
  scenarios, this could be common modifications to VM images that are
  performed multiple times, but not anticipated, such as extra software,
  OS patches, or simply man users saving the same files to their local
  desktops.

To go back to the "cp" example: If someone logs into a VM that is backed by
ZFS with dedup enabled, then copies a file, the extra space that the file will
take will be minimal. The act of copying the file will break down into a
series of blocks that will be recognized as duplicate blocks.

This is completely independent of the clone nature of the underlying VM's
backing store.

But I agree that the biggest savings are to be expected from cloning first,
as they typically translate into n GB (for the base image) x # of users,
which is a _lot_.

Dedup is still the icing on the cake for all those data blocks that were
unforeseen. And that can be a lot, too, as everone who has seen cluttered
desktops full of downloaded files can probably confirm.


Cheers,
   Constantin


--

Constantin Gonzalez Schmitz, Sales Consultant,
Oracle Hardware Presales Germany
Phone: +49 89 460 08 25 91  | Mobile: +49 172 834 90 30
Blog: http://constantin.glez.de/| Twitter: zalez

ORACLE Deutschland B.V. & Co. KG, Sonnenallee 1, 85551 Kirchheim-Heimstetten

ORACLE Deutschland B.V. & Co. KG
Hauptverwaltung: Riesstraße 25, D-80992 München
Registergericht: Amtsgericht München, HRA 95603

Komplementärin: ORACLE Deutschland Verwaltung B.V.
Hertogswetering 163/167, 3543 AS Utrecht
Handelsregister der Handelskammer Midden-Niederlande, Nr. 30143697
Geschäftsführer: Jürgen Kunz, Marcel van de Molen, Alexander van der Ven

Oracle is committed to developing practices and products that help protect the
environment
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Deleting large amounts of files

2010-07-29 Thread Constantin Gonzalez

Hi,


Is there a way to see which files have been deduped, so I can copy them again 
an un-dedupe them?


unfortunately, that's not easy (I've tried it :) ).

The issue is that the dedup table (which knows which blocks have been deduped)
doesn't know about files.

And if you pull block pointers for deduped blocks from the dedup table,
you'll need to backtrack from there through the filesystem structure
to figure out what files are associated with those blocks.

(remember: Deduplication happens at the block level, not the file level.)

So, in order to compile a list of deduped _files_, one would need to extract
the list of dedupes _blocks_ from the dedup table, then chase the pointers
from the root of the zpool to the blocks in order to figure out what files
they're associated with.

Unless there's a different way that I'm not aware of (and I hope someone can
correct me here), the only way to do that is run a scrub-like process and
build up a table of files and their blocks.

Cheers,
  Constantin

--

Constantin Gonzalez Schmitz | Principal Field Technologist
Phone: +49 89 460 08 25 91 || Mobile: +49 172 834 90 30
Oracle Hardware Presales Germany

ORACLE Deutschland B.V. & Co. KG | Sonnenallee 1 | 85551 Kirchheim-Heimstetten

ORACLE Deutschland B.V. & Co. KG
Hauptverwaltung: Riesstraße 25, D-80992 München
Registergericht: Amtsgericht München, HRA 95603

Komplementärin: ORACLE Deutschland Verwaltung B.V.
Rijnzathe 6, 3454PV De Meern, Niederlande
Handelsregister der Handelskammer Midden-Niederlande, Nr. 30143697
Geschäftsführer: Jürgen Kunz, Marcel van de Molen, Alexander van der Ven

Oracle is committed to developing practices and products that help protect the
environment
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS deduplication ratio on Server 2008 backup VHD files

2010-04-26 Thread Constantin Gonzalez

Hi Tim,

thanks for sharing your dedup experience. Especially for Virtualization, having
a good pool of experience will help a lot of people.

So you see a dedup ratio of 1.29 for two installations of Windows Server 2008 on
the same ZFS backing store, if I understand you correctly.

What dedup ratios do you see for the third, fourth and fifth server
installation?

Also, maybe dedup is not the only way to save space. What compression rate
do you get?

And: Have you tried setting up a Windows System, then setting up the next one
based on a ZFS clone of the first one?


Hope this helps,
   Constantin

On 04/23/10 08:13 PM, tim Kries wrote:

Dedup is a key element for my purpose, because i am planning a central 
repository for like 150 Windows Server 2008 (R2) servers which would take a lot 
less storage if they dedup right.


--
Sent from OpenSolaris, http://www.opensolaris.org/

Constantin Gonzalez  Sun Microsystems GmbH, Germany
Principal Field Technologist   Blog: constantin.glez.de
Tel.: +49 89/4 60 08-25 91  Twitter: @zalez

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Jürgen Kunz
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-22 Thread Constantin Gonzalez

Hi,

I agree 100% with Chris.

Notice the "on their own" part of the original post. Yes, nobody wants
to run zfs send or (s)tar by hand.

That's why Chris's script is so useful: You set it up and forget and get the
job done for 80% of home users.

On another note, I was positively surprised by the availability of Crash Plan
for OpenSolaris:

  http://crashplan.com/

Their free service allows to back up your stuff to a friend's system over the
net in an encrypted way, the paid-for servide uses Crashplan's data centers at
a less than Amazon-S3 pricing.

While this may not be everyone's solution, I find it significant that they
explicitly support OpenSolaris. This either means they're OpenSolaris fans
or that they see potential in OpenSolaris home server users.


Cheers,
  Constantin

On 03/20/10 01:31 PM, Chris Gerhard wrote:


I'll say it again: neither 'zfs send' or (s)tar is an
enterprise (or
even home) backup system on their own one or both can
be components of
the full solution.



Up to a point. zfs send | zfs receive does make a very good back up scheme for 
the home user with a moderate amount of storage. Especially when the entire 
back up will fit on a single drive which I think  would cover the majority of 
home users.

Using external drives and incremental zfs streams allows for extremely quick 
back ups of large amounts of data.

It certainly does for me. 
http://chrisgerhard.wordpress.com/2007/06/01/rolling-incremental-backups/


--
Sent from OpenSolaris, http://www.opensolaris.org/

Constantin Gonzalez  Sun Microsystems GmbH, Germany
Principal Field Technologist   Blog: constantin.glez.de
Tel.: +49 89/4 60 08-25 91  Twitter: @zalez

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?

2010-01-20 Thread Constantin Gonzalez

Hi,

I'm using 2 x 1.5 TB drives from Samsung (EcoGreen, I believe) in my current
home server. One reported 14 Read errors a few weeks ago, roughly 6 months after
install, which went away during the next scrub/resilver.

This remembered me to order a 3rd drive, a 2.0 TB WD20EADS from Western Digital
and I now have a 3-way mirror, which is effectively a 2-way mirror with its
hot-spare already synced in.

The idea behind notching up the capacity is threefold:

- No "sorry, this disk happens to have 1 block too few" problems on attach.

- When the 1.5 TB disks _really_ break, I'll just order another 2 TB one and
  use the opportunity to upgrade pool capacity. Since at least one of the 1.5TB
  drives will still be attached, there won't be any "slightly smaller drive"
  problems either when attaching the second 2TB drive.

- After building in 2 bigger drives, it becomes easy to figure out which of the
  drives to phase out. Just go for the smaller drives. This solves the headache
  of trying to figure out the right drive to build out when you replace drives
  that aren't hot spares and don't have blinking lights.

Frankly, I don't care whether the Samsung or the WD drives are better or worse,
they're both consumer drives and they're both dirt cheap. Just assume that
they'll break soon (since you're probably using them more intensely than their
designed purpose) and make sure their replacements are already there.

It also helps mixing vendors, so one glitch that affect multiple disks in the
same batch won't affect your setup too much. (And yes, I broke that rule with
my initial 2 Samsung drives but I'm now glad I have both vendors :)).

Hope this helps,
   Constantin


Simon Breden wrote:

I see also that Samsung have very recently released the HD203WI 2TB 4-platter 
model.

It seems to have good customer ratings so far at newegg.com, but currently 
there are only 13 reviews so it's a bit early to tell if it's reliable.

Has anyone tried this model with ZFS?

Cheers,
Simon

http://breden.org.uk/2008/03/02/a-home-fileserver-using-zfs/


--
Sent from OpenSolaris, http://www.opensolaris.org/

Constantin Gonzalez  Sun Microsystems GmbH, Germany
Principal Field Technologisthttp://blogs.sun.com/constantin
Tel.: +49 89/4 60 08-25 91   http://google.com/search?q=constantin+gonzalez

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Setting default user/group quotas?

2009-11-19 Thread Constantin Gonzalez

Hi,


IMHO, it would be useful to have something like:

  zfs set userquota=5G tank/home

...

I think that would be great feature.


thanks. I just created CR 6902902 to track this. I hope it becomes viewable
soon here:

  http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6902902

Cheers,
  Constantin

--
Sent from OpenSolaris, http://www.opensolaris.org/

Constantin Gonzalez  Sun Microsystems GmbH, Germany
Principal Field Technologisthttp://blogs.sun.com/constantin
Tel.: +49 89/4 60 08-25 91   http://google.com/search?q=constantin+gonzalez

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Setting default user/group quotas?

2009-11-19 Thread Constantin Gonzalez

Hi,

first of all, many thanks to those who made user/group quotas possible. This
is a huge improvement for many users of ZFS!

While presenting on this new future at the Munich OpenSolaris User Group meeting
yesterday, a question came up that I couldn't find an answer for: Can you set
a default user/group quota?

Apparently,

  zfs set userqu...@user1=5g tank/home/user1

is the only way to set user quotas and the "@user1" part seems to be mandatory,
at least according to the snv_126 version of the ZFS manpage. According to my
attempts with ZFS:

  The {user|group}{used|quota}@ properties must be appended with
  a user or group specifier of one of these forms:
  POSIX name  (eg: "matt")
  POSIX id(eg: "126829")
  SMB n...@domain (eg: "m...@sun")
  SMB SID (eg: "S-1-234-567-89")

Imagine a system that needs to handle thousands of users. Setting quota
individually for all of these users would quickly become unwieldly, in a similar
manner to the unwieldliness that having a filesystem for each user presented.

Which was the reason to introduce user/group quotas in the first place.

IMHO, it would be useful to have something like:

  zfs set userquota=5G tank/home

and that would mean that all users who don't have an individual user quota
assigned to them would see a default 5G quota.

I haven't found an RFE for this yet. Is this planned? Should I file an RFE?
Or did I overlook something?


Thanks,
   Constantin

--
Sent from OpenSolaris, http://www.opensolaris.org/

Constantin Gonzalez  Sun Microsystems GmbH, Germany
Principal Field Technologisthttp://blogs.sun.com/constantin
Tel.: +49 89/4 60 08-25 91   http://google.com/search?q=constantin+gonzalez

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS commands hang after several zfs receives

2009-09-15 Thread Constantin Gonzalez

Hi,

I think I've run into the same issue on OpenSolaris 2009.06.

Does anybody know when this issue will be solved in OpenSolaris?
What's the BugID?

Thanks,
   Constantin

Gary Mills wrote:

On Tue, Sep 15, 2009 at 08:48:20PM +1200, Ian Collins wrote:

Ian Collins wrote:

I have a case open for this problem on Solaris 10u7.

The case has been identified and I've just received an IDR,which I 
will test next week.  I've been told the issue is fixed in update 8, 
but I'm not sure if there is an nv fix target.


I'll post back once I've abused a test system for a while.

The IDR I was sent appears to have fixed the problem.  I have been 
abusing the box for a couple of weeks without any lockups.  Roll on 
update 8!


Was that IDR140221-17?  That one fixed a deadlock bug for us back
in May.



--
Constantin Gonzalez  Sun Microsystems GmbH, Germany
Principal Field Technologisthttp://blogs.sun.com/constantin
Tel.: +49 89/4 60 08-25 91   http://google.com/search?q=constantin+gonzalez

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Crypto Updates [PSARC/2009/443 FastTrack timeout 08/24/2009]

2009-08-18 Thread Constantin Gonzalez

Hi,

Brian Hechinger wrote:

On Tue, Aug 18, 2009 at 12:37:23AM +0100, Robert Milkowski wrote:

Hi Darren,

Thank you for the update.
Have you got any ETA (build number) for the crypto project?


Also, is there any word on if this will support the hardware crypto stuff
in the VIA CPUs natively?  That would be nice. :)


ZFS Crypto uses the Solaris Cryptographics Framework to do the actual
encryption work, so ZFS is agnostic to any hardware crypto acceleration.

The Cryptographic Framework project on OpenSolaris.org is looking for help
in implementing VIA Padlock support for the Solaris Cryptographic Framework:

  http://www.opensolaris.org/os/project/crypto/inprogress/padlock/

Cheers,
  Constantin

--
Constantin Gonzalez  Sun Microsystems GmbH, Germany
Principal Field Technologisthttp://blogs.sun.com/constantin
Tel.: +49 89/4 60 08-25 91   http://google.com/search?q=constantin+gonzalez

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Motherboard for home zfs/solaris file server

2009-07-29 Thread Constantin Gonzalez

Hi,

thank you so much for this post. This is exactly what I was looking for.
I've been eyeing the M3A76-CM board, but will now look at 78 and M4A as
well.

Actually, not that many Asus M3A, let alone M4A boards show up yet on the
OpenSolaris HCL, so I'd like to encourage everyone to share their hardware
experience by clicking on the "submit hardware" link on:

  http://www.sun.com/bigadmin/hcl/data/os/

I've done it a couple of times and it's really just a matter of 5-10 minutes
where you can help others know if a certain component works or not or if a
special driver or /etc/driver_aliases setting is required.

I'm also interested in getting the power down. Right now, I have the
Athlon X2 5050e (45W TDP) on my list, but I'd also like to know more about
the possibilities of the Athlon II X2 250 and whether it has better potential
for power savings.

Neal, the M3A78 seems to have a RealTek RTL8111/8168B NIC chip. I pulled
this off a Gentoo Wiki, because strangely this information doesn't show up
on the Asus website.

Also, thanks for the CF to pata hint for the root pool mirror. Will try to
find fast CFs to boot from. The performance problems you see when writing
may be related to master/slave issues, but I'm not a good PC tweaker to back
that up.

Cheers,
   Constantin


F. Wessels wrote:

Hi,

I'm using asus m3a78 boards (with the sb700) for opensolaris and m2a* boards
(with the sb600) for linux some of them with 4*1GB and others with 4*2Gb ECC
memory. Ecc faults will be detected and reported. I tested it with a small
tungsten light. By moving the light source slowly towards the memory banks
you'll heat them up in a controlled way and at a certain point bit flips will
occur. I recommend you to go for a m4a board since they support up to 16 GB.
 I don't know if you can run opensolaris without a videocard after
installation I think you can disable the "halt on no video card" in the bios.
But Simon Breden had some trouble with it, see his homeserver blog. But you
can go for one of the three m4a boards with a 780g onboard. Those will give
you 2 pci-e x16 connectors. I don't think the onboard nic is supported. I
always put an intel (e1000) in, just to prevent any trouble. I don't have any
trouble with the sb700 in ahci mode. Hotplugging works like a charm.
Transfering a couple of GB's over esata takes considerable less time than via
usb. I have a pata to dual cf adapter and two industrial 16gb cf cards as
mirrored root pool. It takes for ever to install nevada, at least 14 hours. I
suspect the cf cards lack caches. But I don't update that regularly, still on
snv104.  And have 2 mirrors and a hot spare. The sixth port is an esata port
I use to transfer large amounts of data. This system consumes about 73 watts
idle and 82 under load i/o load. (5 disks , a separate nic  ,8 gb ram and a
be2400 all using just 73 watts!!!) Please note that frequency scaling is only
supported on the K10 architecture. But don't expect to much power saving from
it. A lower voltage yields far greater savings than a lower frequency. In
september I'll do a post about the afore mentioned M4A boards and an lsi sas
controller in one of the pcie x16 slots.


--
Constantin Gonzalez  Sun Microsystems GmbH, Germany
Principal Field Technologist        http://blogs.sun.com/constantin
Tel.: +49 89/4 60 08-25 91   http://google.com/search?q=constantin+gonzalez

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis

2008-10-23 Thread Constantin Gonzalez
Hi,

Bob Friesenhahn wrote:
> On Thu, 23 Oct 2008, Constantin Gonzalez wrote:
>>
>> Yes, we're both aware of this. In this particular situation, the customer
>> would restart his backup job (and thus the client application) in case 
>> the
>> server dies.
> 
> So it is ok for this customer if their backup becomes silently corrupted 
> and the backup software continues running?  Consider that some of the 
> backup files may have missing or corrupted data in the middle.  Your 
> customer is quite dedicated in that he will monitor the situation very 
> well and remember to reboot the backup system, correct any corrupted 
> files, and restart the backup software whenever the server panics and 
> reboots.

This is what the customer told me. He uses rsync and he is ok with restarting
the rsync whenever the NFS server restarts.

> A properly built server should be able to handle NFS writes at gigabit 
> wire-speed.

I'm advocating for a properly built system, believe me :).

Cheers,
Constantin

-- 
Constantin Gonzalez  Sun Microsystems GmbH, Germany
Principal Field Technologisthttp://blogs.sun.com/constantin
Tel.: +49 89/4 60 08-25 91   http://google.com/search?q=constantin+gonzalez

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis

2008-10-23 Thread Constantin Gonzalez
Hi,

yes, using slogs is the best solution.

Meanwhile, using mirrored slogs from other servers' RAM-Disks running on UPSs
seem like an interesting idea, if the reliability of UPS-backed RAM is deemed
reliable enough for the purposes of the NFS server.

Thanks for siggesting this!

Cheers,
Constantin

Ross wrote:
> Well, it might be even more of a bodge than disabling the ZIL, but how about:
> 
> - Create a 512MB ramdisk, use that for the ZIL
> - Buy a Micro Memory nvram PCI card for £100 or so.
> - Wait 3-6 months, hopefully buy a fully supported PCI-e SSD to replace the 
> Micro Memory card.
> 
> The ramdisk isn't an ideal solution, but provided you don't export the pool 
> with it offline, it does work.  We used it as a stop gap solution for a 
> couple of weeks while waiting for a Micro Memory nvram card.
> 
> Our reasoning was that our server's on a UPS and we figured if something 
> crashed badly enough to take out something like the UPS, the motherboard, 
> etc, we'd be loosing data anyway.  We just made sure we had good backups in 
> case the pool got corrupted and crossed our fingers.
> 
> The reason I say wait 3-6 months is that there's a huge amount of activity 
> with SSD's at the moment.  Sun said that they were planning to have flash 
> storage launched by Christmas, so I figure there's a fair chance that we'll 
> see some supported PCIe cards by next Spring.
> --
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-- 
Constantin Gonzalez  Sun Microsystems GmbH, Germany
Principal Field Technologisthttp://blogs.sun.com/constantin
Tel.: +49 89/4 60 08-25 91   http://google.com/search?q=constantin+gonzalez

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis

2008-10-23 Thread Constantin Gonzalez
Hi,

Bob Friesenhahn wrote:
> On Wed, 22 Oct 2008, Neil Perrin wrote:
>> On 10/22/08 10:26, Constantin Gonzalez wrote:
>>> 3. Disable ZIL[1]. This is of course evil, but one customer pointed out to 
>>> me
>>> that if a tar xvf were writing locally to a ZFS file system, the writes
>>> wouldn't be synchronous either, so there's no point in forcing NFS users
>>> to having a better availability experience at the expense of 
>>> performance.
> 
> The conclusion reached here is quite seriously wrong and no Sun 
> employee should suggest it to a customer.  If the system writing to a 

I'm not suggesting it to any customer. Actually, I argued quite a long time
with the customer, trying to convince him that "slow but correct" is better.

The conclusion above is a conscious decision by the customer. He says that he
does not want NFS to turn any write into a synchronous write, he's happy if
all writes are asynchronous, because in this case the NFS server is a backup to
disk device and if power fails he simply restarts the backup 'cause he has the
data in multiple copies anyway.

> local filesystem reboots then the applications which were running are 
> also lost and will see the new filesystem state when they are 
> restarted.  If an NFS server sponteneously reboots, the applications 
> on the many clients are still running and the client systems are using 
> cached data.  This means that clients could do very bad things if the 
> filesystem state (as seen by NFS) is suddenly not consistent.  One of 
> the joys of NFS is that the client continues unhindered once the 
> server returns.

Yes, we're both aware of this. In this particular situation, the customer
would restart his backup job (and thus the client application) in case the
server dies.

Thanks for pointing out the difference, this is indeed an important distinction.

Cheers,
   Constantin

-- 
Constantin Gonzalez  Sun Microsystems GmbH, Germany
Principal Field Technologisthttp://blogs.sun.com/constantin
Tel.: +49 89/4 60 08-25 91   http://google.com/search?q=constantin+gonzalez

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis

2008-10-23 Thread Constantin Gonzalez
Hi,

>> - The ZIL exists on a per filesystem basis in ZFS. Is there an RFE 
>> already
>>that asks for the ability to disable the ZIL on a per filesystem 
>> basis?
> 
> Yes: 6280630 zil synchronicity

good, thanks for the pointer!

> Though personally I've been unhappy with the exposure that zil_disable 
> has got.
> It was originally meant for debug purposes only. So providing an official
> way to make synchronous behaviour asynchronous is to me dangerous.

IMHO, the need here is to give admins control over the way they want their
file servers to behave. In this particular case, the admin argues that he knows
what he's doing, that he doesn't want his NFS server to behave more strongly
than a local filesystem and that he deserves control of that behaviour.

Ideally, there would be an NFS option that lets customers choose whether they
want to honor COMMIT requests or not.

Disabling ZIL on a per filesystem basis is only the second best solution, but
since that CR already exists, it seems to be the more realistic route.

Thanks,
Constantin


-- 
Constantin Gonzalez  Sun Microsystems GmbH, Germany
Principal Field Technologisthttp://blogs.sun.com/constantin
Tel.: +49 89/4 60 08-25 91   http://google.com/search?q=constantin+gonzalez

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis

2008-10-22 Thread Constantin Gonzalez
Hi,

On a busy NFS server, performance tends to be very modest for large amounts
of small files due to the well known effects of ZFS and ZIL honoring the
NFS COMMIT operation[1].

For the mature sysadmin who knows what (s)he does, there are three
possibilities:

1. Live with it. Hard, if you see 10x less performance than could be and your
users complain a lot.

2. Use a flash disk for a ZIL, a slog. Can add considerable extra cost,
especially if you're using an X4500/X4540 and can't swap out fast SAS
drives for cheap SATA drives to free the budget for flash ZIL drives.[2]

3. Disable ZIL[1]. This is of course evil, but one customer pointed out to me
that if a tar xvf were writing locally to a ZFS file system, the writes
wouldn't be synchronous either, so there's no point in forcing NFS users
to having a better availability experience at the expense of performance.


So, if the sysadmin draws the informed and conscious conclusion that (s)he
doesn't want to honor NFS COMMIT operations, what are options less disruptive
than disabling ZIL completely?

- I checked the NFS tunables from:
   http://dlc.sun.com/osol/docs/content/SOLTUNEPARAMREF/chapter3-1.html
   But could not find a tunable that would disable COMMIT honoring.
   Is there already an RFE asking for a share option that disable's the
   translation of COMMIT to synchronous writes?

- The ZIL exists on a per filesystem basis in ZFS. Is there an RFE already
   that asks for the ability to disable the ZIL on a per filesystem basis?

   Once Admins start to disable the ZIL for whole pools because the extra
   performance is too tempting, wouldn't it be the lesser evil to let them
   disable it on a per filesystem basis?

Comments?


Cheers,
Constantin

[1]: http://blogs.sun.com/roch/entry/nfs_and_zfs_a_fine
[2]: http://blogs.sun.com/perrin/entry/slog_blog_or_blogging_on

-- 
Constantin Gonzalez  Sun Microsystems GmbH, Germany
Principal Field Technologisthttp://blogs.sun.com/constantin
Tel.: +49 89/4 60 08-25 91   http://google.com/search?q=constantin+gonzalez

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] RFE: Start with desired end state in mind...

2008-02-29 Thread Constantin Gonzalez
Hi,

great, thank you. So ZFS isn't picky about finding the target fs already
created and attributed when replicating data into it.

This is very cool!

Best regards,
Constantin

Darren J Moffat wrote:
> Constantin Gonzalez wrote:
>> Hi Darren,
>>
>> thank you for the clarification, I didn't know that.
>>
>>> See the man page for zfs(1) where the -R options for send is discussed.
> 
> 
>> Back to Brad's RFS, what would one need to do to send a stream from a
>> compressed filesystem to one with a different compression setting, if
>> the source file system has the compression attribute set to a specific
>> algorithm (i.e. not inherited)?
> 
> $ zfs create -o compression=gzip-1 tank/gz1
> # put in your data
> $ zfs snapshot tank/[EMAIL PROTECTED]
> $ zfs create -o compression=gzip-9 tank/gz9
> $ zfs send tank/[EMAIL PROTECTED] | zfs recv -d tank/gz9
> 
>> Will leaving out -R just create a new, but plain unencrypted fs on the
>> receivig side?
> 
> Depends on inheritance.
> 
>> What if one wants to replicated a whole package of filesystems via
>> -R, but change properties on the receiving side before it happens?
> 
> If they are all getting the same properties use inheritance if they 
> aren't then you (by the very nature of what you want to do) need to 
> precreate them with the appropriate options.
> 

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91www.google.com/search?q=constantin+gonzalez

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] RFE: Start with desired end state in mind...

2008-02-29 Thread Constantin Gonzalez
Hi Darren,

thank you for the clarification, I didn't know that.

> See the man page for zfs(1) where the -R options for send is discussed.

oh, this is new. Thank you for bringing us -R.

Back to Brad's RFS, what would one need to do to send a stream from a
compressed filesystem to one with a different compression setting, if
the source file system has the compression attribute set to a specific
algorithm (i.e. not inherited)?

Will leaving out -R just create a new, but plain unencrypted fs on the
receivig side?

What if one wants to replicated a whole package of filesystems via
-R, but change properties on the receiving side before it happens?

Best regards,
Constantin

> 
>> But for the sake of implementing the RFE, one could extend the ZFS
>> send/receive framework with a module that permits manipulation of the
>> data on the fly, specifically in order to allow for things like
>> recompression, en/decryption, change of attributes at the dataset level,
>> etc.
> 
> No need this already works this way.
> 

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91www.google.com/search?q=constantin+gonzalez

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] RFE: Start with desired end state in mind...

2008-02-29 Thread Constantin Gonzalez
Hi Brad,

this is indeed a good idea.

But I assume that it will be difficult to do, due to the low-level nature
of zfs send/receive.

In your compression example, you're asking for zfs send/receive to
decompress the blocks on the fly. But send/receive operates on a lower
level: It doesn't care much what is actually inside the blocks, it
just copies the block structure "as is".

So, unless zfs send/receive starts looking inside the blocks it copies,
it is probably better to use good old tar or cpio.

I would also assume that zfs send/receive doesn't know (and doesn't
have to) anything about zfs dataset properties. It just starts at
the snapshot level (which works independently of whether it's used for
a filesystem, a ZVOL, a whatever) and copies the subtree.

But for the sake of implementing the RFE, one could extend the ZFS
send/receive framework with a module that permits manipulation of the
data on the fly, specifically in order to allow for things like
recompression, en/decryption, change of attributes at the dataset level,
etc.

Best regards,
Constantin

Brad Diggs wrote:
> I love the send and receive feature of zfs.  However, the one feature 
> that it lacks is that I can't specify on the receive end how I want 
> the destination zfs filesystem to be be created before receiving the
> data being sent.  
> 
> For example, lets say that I would like to do a compression study to
> determine which level of compression of the gzip algorithm would save
> the most space for my data.  One of the easiest ways to do that 
> locally or remotely would be to use send/receive like so.
> 
> zfs snapshot zpool/[EMAIL PROTECTED]
> gz=1
> while [ ${gz} -le 9 ]
> do
>zfs send zpool/[EMAIL PROTECTED] | \
>  zfs receive -o compression=gzip-${gz} zpool/gz${gz}data
>zfs list zpool/gz${gz}data
> done
> zfs destroy zpool/[EMAIL PROTECTED]
> 
> Another example.  Lets assume that that the zfs encryption feature was
> available today.  Further, lets assume that I have a filesystem that
> has compression and encryption enabled.  I want to duplicate that exact
> zfs filesystem on another system through send/receive.  Today the
> receive feature does not give me the ability to specify the desired end
> state configuration of the destination zfs filesystem before receiving
> the data.  I think that would be a great feature.
> 
> Just some food for thought.
> 
> Thanks in advance,
> Brad
> 
> _______
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91www.google.com/search?q=constantin+gonzalez

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS with Memory Sticks

2007-12-11 Thread Constantin Gonzalez
Hi Paul,

> # fdisk -E /dev/rdsk/c7t0d0s2

then

> # zpool create -f Radical-Vol /dev/dsk/c7t0d0

should work. The warnings you see are just there to double-check you don't
overwrite any previously used pool which you may regret. -f overrules that.

Hope this helps,
Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS with Memory Sticks

2007-12-06 Thread Constantin Gonzalez
Hi,

> # /usr/sbin/zpool import
>   pool: Radical-Vol
> id: 3051993120652382125
>  state: FAULTED
> status: One or more devices contains corrupted data.
> action: The pool cannot be imported due to damaged devices or data.
>see: http://www.sun.com/msg/ZFS-8000-5E
> config:
> 
> Radical-Vol  UNAVAIL   insufficient replicas
>   c7t0d0s0  UNAVAIL   corrupted data

ok, ZFS did recognize the disk, but the pool is corrupted. Did you remove
it without exporting the pool first?

> Following your command:
> 
> $ /opt/sfw/bin/sudo /usr/sbin/zpool status
>   pool: Rad_Disk_1
>  state: ONLINE
> status: The pool is formatted using an older on-disk format.  The pool can
> still be used, but some features are unavailable.
> action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
> pool will no longer be accessible on older software versions.
>  scrub: none requested
> config:
> 
> NAMESTATE READ WRITE CKSUM
> Rad_Disk_1  ONLINE   0 0 0
>   c0t1d0ONLINE   0 0 0
> 
> errors: No known data errors

But this pool should be accessible, since you can zpool status it. Have
you check zfs get all Rad_Disk_1? Does it show mount points and whether
it should be mounted?

> But this device works currently on my Solaris PC's, the W2100z and a 
> laptop of mine.

Strange. Maybe it's a USB issue. Have you checked:

   http://www.sun.com/io_technologies/usb/USB-Faq.html#Storage

Especially #19?

Best regards,
Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS with Memory Sticks

2007-12-05 Thread Constantin Gonzalez
Hi Paul,

yes, ZFS is platform agnostic and I know it works in SANs.

For the USB stick case, you may have run into labeling issues. Maybe
Solaris SPARC did not recognize the x64 type label on the disk (which
is strange, because it should...).

Did you try making sure that ZFS creates an EFI label on the disk?
You can check this by running zpool status and then the devices should
look like c6t0d0 without the s0 part.

If you want to force this, you can create an EFI label on the USB disk
from hand by saying fdisk -E /dev/rdsk/cxtxdx.

Hope this helps,
Constantin


Paul Gress wrote:
> OK, I've been putting off this question for a while now, but it eating 
> at me, so  I can't hold off any more.  I have a nice 8 gig memory stick 
> I've formated with the ZFS file system.  Works great on all my Solaris 
> PC's, but refuses to work on my Sparc processor.  So I've formated it on 
> my Sparc machine (Blade 2500), works great there now, but not on my 
> PC's.  Re-Formatted it on my PC, doesn't work on Sparc, and so on and so on.
> 
> I thought it was a file system to go back and forth both architectures.  
> So when will this compatibility be here, or if it's possible now, what 
> is the secret?
> 
> Paul
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Best practice for moving FS between pool on same machine?

2007-06-21 Thread Constantin Gonzalez
Hi,

Chris Quenelle wrote:
> Thanks, Constantin!  That sounds like the right answer for me.
> Can I use send and/or snapshot at the pool level?  Or do I have
> to use it on one filesystem at a time?  I couldn't quite figure this
> out from the man pages.

the ZFS team is working on a zfs send -r (recursive) option to be able
to recursively send and receive hierarchies of ZFS filesystems in one go,
including pools.

So you'll need to do it one filesystem at a time.

This is not always trivial: If you send a full snapshot, then an incremental
one and the target filesystem is mounted, you'll likely get an error that the
target filesystem was modified. Make sure the target filesystems are unmounted
and ideally marked as unmountable while performing the send/receives. Also,
you may want to use the -F option to receive which forces a rollback of the
target filesystem to the most recent snapshot.

I've written a script to do all of this, but it's only "works on my system"
certified.

I'd like to get some feedback and validation before I post it on my blog,
so anyone, let me know if you want to try it out.

Best regards,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Scalability/performance

2007-06-20 Thread Constantin Gonzalez
Hi Mike,

> If I was to plan for a 16 disk ZFS-based system, you would probably
> suggest me to configure it as something like 5+1, 4+1, 4+1 all raid-z
> (I don't need the double parity concept)
> 
> I would prefer something like 15+1 :) I want ZFS to be able to detect
> and correct errors, but I do not need to squeeze all the performance
> out of it (I'll be using it as a home storage server for my DVDs and
> other audio/video stuff. So only a few clients at the most streaming
> off of it)

this is possibe. ZFS in theory does not significantly limit the n and 15+1
is indeed possible.

But for a number of reasons (among them performance) people generally
advise to use no more than 10+1.

A lot of ZFS configuration wisdom can be found on the Solaris internals
ZFS Best Practices Guide Wiki at:

  http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide

Richard Elling has done a great job of thoroughly analyzing different
reliability concepts for ZFS in his blog. One good introduction is the
following entry:

  http://blogs.sun.com/relling/entry/zfs_raid_recommendations_space_performance

That may help you find the right tradeoff between space and reliability.

Hope this helps,
   Constantin


-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Scalability/performance

2007-06-20 Thread Constantin Gonzalez
Hi,

> How are paired mirrors more flexiable?

well, I'm talking of a small home system. If the pool gets full, the
way to expand with RAID-Z would be to add 3+ disks (typically 4-5).

With mirror only, you just add two. So in my case it's just about
the granularity of expansion.

The reasoning is that of the three factors reliability, performance and
space, I value them in this order. Space comes last since disk space
is cheap.

If I had a bigger number of disks (12+), I'd be using them in RAID-Z2
sets (4+2 plus 4+2 etc.). Here, the speed is ok and the reliability is
ok and so I can use RAID-Z2 instead of mirroring to get some extra
space as well.

> Right now, i have a 3 disk raid 5 running with the linux DM driver. One
> of the most resent additions was raid5 expansion, so i could pop in a
> matching disk, and expand my raid5 to 4 disks instead of 3 (which is
> always interesting as your cutting on your parity loss). I think though
> in raid5 you shouldn't put more then 6 - 8 disks afaik, so I wouldn't be
> expanding this enlessly.
> 
> So how would this translate to ZFS? I have learned so far that, ZFS

ZFS does not yet support rearranging the disk cofiguration. Right now,
you can expand a single disk to a mirror or an n-way mirror to an n+1 way
mirror.

RAID-Z vdevs can't be changed right now. But you can add more disks
to a pool by adding more vdevs (You have a 1+1 mirror, add another 1+1
pair and get more space, have a 3+2 RAID-Z2 and add another 5+2 RAID etc.)

> basically is raid + LVM. e.g. the mirrored raid-z pairs go into the
> pool, just like one would use LVM to bind all the raid pairs. The
> difference being I suppose, that you can't use a zfs mirror/raid-z
> without having a pool to use it from?

Here's the basic idea:

- You first construct vdevs from disks:

  One disk can be one vdev.
  A 1+1 mirror can be a vdev, too.
  A n+1 or n+2 RAID-Z (RAID-Z2) set can be a vdev too.

- Then you concatenate vdevs to create a pool. Pools can be extended by
  adding more vdevs.

- Then you create ZFS file systems that draw their block usage from the
  resources supplied by the pool. Very flexible.

> Wondering now is if I can simply add a new disk to my raid-z and have it
> 'just work', e.g. the raid-z would be expanded to use the new
> disk(partition of matching size)

If you have a RAID-Z based pool in ZFS, you can add another group of disks
that are organized in a RAID-Z manner (a vdev) to expand the storage capacity
of the pool.

Hope this clarifies things a bit. And yes, please check out the admin guide and
the other collateral available on ZFS. It's full of new concepts and one needs
some getting used to to explore all possibilities.

Cheers,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Scalability/performance

2007-06-20 Thread Constantin Gonzalez
Hi,

> I'm quite interested in ZFS, like everybody else I suppose, and am about
> to install FBSD with ZFS.

welcome to ZFS!

> Anyway, back to business :)
> I have a whole bunch of different sized disks/speeds. E.g. 3 300GB disks
> @ 40mb, a 320GB disk @ 60mb/s, 3 120gb disks @ 50mb/s and so on.
> 
> Raid-Z and ZFS claims to be uber scalable and all that, but would it
> 'just work' with a setup like that too?

Yes. If you dump a set of variable-size disks into a mirror or RAID-Z
configuration, you'll get the same result as if you had the smallest of
their sizes. Then, the pool will grow when exchanging smaller disks with
larger.

I used to run a ZFS pool on 1x250GB, 1x200GB, 1x85 GB and 1x80 GB the following
way:

- Set up an 80 GB slice on all 4 disks and make a 4 disk RAID-Z vdev
- Set up a 5 GB slice on the 250, 200 and 85 GB disks and make a 3 disk RAID-Z
- Set up a 115GB slice on the 200 and the 250 GB disk and make a 2 disk mirror.
- Concatenate all 3 vdevs into one pool. (You need zpool add -f for that).

Not something to be done on a professional production system, but it worked
for my home setup just fine. The remaining 50GB from the 250GB drive then
went into a scratch pool.

Kinda like playing Tetris with RAID-Z...

Later, I decided using just paired disks as mirrors are really more
flexible and easier to expand, since disk space is cheap.

Hope this helps,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best practice for moving FS between pool on same machine?

2007-06-20 Thread Constantin Gonzalez
Hi Chris,

> What is the best (meaning fastest) way to move a large file system 
> from one pool to another pool on the same machine.  I have a machine
> with two pools.  One pool currently has all my data (4 filesystems), but it's
> misconfigured. Another pool is configured correctly, and I want to move the 
> file systems to the new pool.  Should I use 'rsync' or 'zfs send'?

zfs send/receive is the fastest and most efficient way.

I've used it multiple times on my home server until I had my configuration
right :).

> What happens is I forgot I couldn't incrementally add raid devices.  I want
> to end up with two raidz(x4) vdevs in the same pool.  Here's what I have now:

For this reason, I decided to go with mirrors. Yes, they use more raw storage
space, but they are also much more flexible to expand. Just add two disks when
the pool is full and you're done.

If you have a lot of disks or can afford to add disks 4-5 disks at a time, then
RAID-Z may be as easy to do, but remember that two disk failures in RAID-5
variants can be quite common - You may want RAID-Z2 instead.

> 1. move data to dbxpool2
> 2. remount using dbxpool2
> 3. destroy dbxpool1
> 4. create new proper raidz vdev inside dbxpool2 using devices from dbxpool1

Add:

0. Snapshot data in dbxpool1 so you can use zfs send/receive

Then the above should work fine.

> I'm constrained by trying to minimize the downtime for the group
> of people using this as their file server.  So I ended up with
> an ad-hoc assignment of devices.  I'm not worried about
> optimizing my controller traffic at the moment.

Ok. If you want to really be thorough, I'd recommend:

0. Run a backup, just in case. It never hurts.
1. Do a snapshot of dbxpool1
2. zfs send/receive dbxpool1 -> dbxpool2
   (This happens while users are still using dbxpool1, so no downtime).
3. Unmount dbxpool1
4. Do a second snapshot of dbxpool1
5. Do an incremental zfs send/receive of dbxpool1 -> dbxpool2.
   (This should take only a small amount of time)
6. Mount dbxpool2 where dbxpool1 used to be.
7. Check everything is fine with the new mounted pool.
8. Destroy dbxpool1
9. Use disks from dbxpool1 to expand dbxpool2 (be careful :) ).

You might want to exercise the above steps on an extra spare disk with
two pools just to gain some confidence before doing it in production.

I have a script that automatically does 1-6 that is looking for beta
testers. If you're interested, let me know.

Hope this helps,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] New german white paper on ZFS

2007-06-19 Thread Constantin Gonzalez
Hi,

if you understand german or want to brush it up a little, I've a new ZFS
white paper in german for you:

  http://blogs.sun.com/constantin/entry/new_zfs_white_paper_in

Since there's already so much collateral on ZFS in english, I thought it's
time for some localized stuff for my country.

There are also some new ZFS slides that go with it, also in german.

Let me know if you have any suggestions.

Hope this helps,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS boot: Now, how can I do a pseudo live upgrade?

2007-05-25 Thread Constantin Gonzalez
Hi Malachi,

Malachi de Ælfweald wrote:
> I'm actually wondering the same thing because I have b62 w/ the ZFS
> bits; but need the snapshot's "-r" functionality.

you're lucky, it's already there. From my b62 machine's "man zfs":

 zfs snapshot [-r] [EMAIL PROTECTED]|[EMAIL PROTECTED]

 Creates  a  snapshot  with  the  given  name.  See   the
 "Snapshots" section for details.

 -rRecursively create  snapshots  of  all  descendant
   datasets.  Snapshots are taken atomically, so that
   all recursive snapshots  correspond  to  the  same
   moment in time.

Or did you mean send -r?

Best regards,
   Constantin


-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS boot: Now, how can I do a pseudo live upgrade?

2007-05-25 Thread Constantin Gonzalez
Hi,

> Our upgrade story isn't great right now.  In the meantime,
> you might check out Tim Haley's blog entry on using
> bfu with zfs root.

thanks.

But doesn't live upgrade just start the installer from the new OS
DVD with the right options? Can't I just do that too?

Cheers,
   Constantin

> 
> http://blogs.sun.com/timh/entry/friday_fun_with_bfu_and
> 
> lori
> 
> Constantin Gonzalez wrote:
>> Hi,
>>
>> I'm a big fan of live upgrade. I'm also a big fan of ZFS boot. The
>> latter is
>> more important for me. And yes, I'm looking forward to both being
>> integrated
>> with each other.
>>
>> Meanwhile, what is the best way to upgrade a post-b61 system that is
>> booted
>> from ZFS?
>>
>>
>> I'm thinking:
>>
>> 1. Boot from ZFS
>> 2. Use Tim's excellent multiple boot datasets script to create a new
>> cloned ZFS
>>boot environment:
>>http://blogs.sun.com/timf/entry/an_easy_way_to_manage
>> 3. Loopback mount the new OS ISO image
>> 4. Run the installer from the loopbacked ISO image in upgrade mode on
>> the clone
>> 5. Mark the clone to be booted the next time
>> 6. Reboot into the upgraded OS.
>>
>>
>> Questions:
>>
>> - How exactly do I do step 4? Before, luupgrade did everything for me,
>> now
>>   what manpage do I need to do this?
>>
>> - Did I forget something above? I'm ok with losing some logfiles and
>> stuff that
>>   maybe changed between the clone and the reboot, but is there
>> anything else?
>>
>> - Did someone already blog about this and I haven't noticed yet?
>>
>>
>> Cheers,
>>Constantin
>>
>>   
> 

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS boot: Now, how can I do a pseudo live upgrade?

2007-05-25 Thread Constantin Gonzalez
Hi,

I'm a big fan of live upgrade. I'm also a big fan of ZFS boot. The latter is
more important for me. And yes, I'm looking forward to both being integrated
with each other.

Meanwhile, what is the best way to upgrade a post-b61 system that is booted
from ZFS?


I'm thinking:

1. Boot from ZFS
2. Use Tim's excellent multiple boot datasets script to create a new cloned ZFS
   boot environment:
   http://blogs.sun.com/timf/entry/an_easy_way_to_manage
3. Loopback mount the new OS ISO image
4. Run the installer from the loopbacked ISO image in upgrade mode on the clone
5. Mark the clone to be booted the next time
6. Reboot into the upgraded OS.


Questions:

- How exactly do I do step 4? Before, luupgrade did everything for me, now
  what manpage do I need to do this?

- Did I forget something above? I'm ok with losing some logfiles and stuff that
  maybe changed between the clone and the reboot, but is there anything else?

- Did someone already blog about this and I haven't noticed yet?


Cheers,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] raidz pool with a slice on the boot disk

2007-04-24 Thread Constantin Gonzalez
Hi Russell,

Russell Baird wrote:
> I created a pool with the following command…
> 
> zpool create -f batches raidz c0t0d0s7 c0t1d0 c0t2d0 c0t3d0
> 
> Notice that I specified slice 7, which is an unused slice on my boot disk
> t0.  No part of the operating system exists on slice 7.  Is this acceptable?
> Will this jeopardize the redundancy of my raidz pool with part of it on the
> boot disk?

no, this should work fine from a ZFS perspective. I've been running a much more
complicated setup at home for over a year and it worked well.

That said, you still might want to reconsider: If your boot disk fails, you'll
need to replace it and repartition it to fit your raid scheme before you
can replace the s7 part of your RAID-Z. That might be a hassle you want to
avoid.

Out of a s similar situation, I decided to invest into more disks and
then host my data and my OS on separate disks. The goal is to keep everything
that makes your server unique (data + copies of any config changes etc.)
onto a separate pool that can easily be plugged into a different, plain vanilla
server, while leaving the boot filesystem as untouched as possible.

This is just a recommendation with the goal of reducing administration
and recovery complexity. As said, there should be no real harm from your
config.

There is a slight performance impact though: ZFS will enable the disk
write cache on c0t1-t3 but not on c0t0, so, effectively, c0t0 is going to
be the slowest drive in the RAID-Z set during certain circumstances and
therefore slightly affect the performance of the RAID-Z vdev.

Hope this helps,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] A big Thank You to the ZFS team!

2007-04-24 Thread Constantin Gonzalez
Hi,

I just got ZFS boot up and running on my laptop. This being a major milestone
in the history of ZFS, I thought I'd reflect a bit on what ZFS brought to my
life so far:


- I'm using ZFS at home and on my laptop since January 2006 for mission
  critical purposes:

  - Backups of my wife's and my own Macs.

  - Storing family photos (I have a baby now, so they _are_ mission critical
:) ).

  - Storing my ca. 400 CDs that were carefully ripped and metadata'ed, which
took a lot of work.

  - Providing fast and reliable storage for my PVR.

  - And of course all the rough stuff that happens to laptops on the road.

- ZFS has already saved me from bit rot once. I could see that it fixed a bad
  block during a weekly scrub. What a great feeling to know that your data is
  much safer than it was before and to be able to see how and when it is being
  protected!

  It is kinda weird to talk to customers about adopting ZFS while knowing that
  my family pictures at home are probably stored safer than their company
  data...

- ZFS enabled me to just take a bunch of differently sized drives that have
  been lying around somewhere and turn them into an easy to manage,
  consistent and redundant pool of storage that effortlessly handles very
  diverse workloads (File server, audio streaming, video streaming).

- During the frequent migrations (Couldn't make up my mind first on how to
  slice and dice my 4 disks), zfs send/receive has been my best friend. It
  enabled me to painlessly migrate whole filesystems between pools in
  minutes.
  I'm now writing a script to further automate recursive and updating
  zfs send/receive orgies for backups and other purposes.

- Disk storage is cheap, and thanks to ZFS it became reliable at zero cost.
  Therefore, I can snapshot a lot, not think about whether to delete stuff
  or not, or simply delete stuff I don't need know, while knowing it is
  still preserved in my snapshots.

- As a result of all of this, I learned a great deal about Solaris 10 and
  it's other features, which is a big help in my day-to-day job.


I know there's still a lot to do and that we're still working on some bugs,
but I can safely say that ZFS is the best thing that happened to my data
so far.


So here's a big

  THANK YOU!

to the ZFS team for making all of this and more possible for my little home
system.


Down the road, I've now migrated my pools to external mirrored USB disks
(mirrored because it's fast and lowers complexity; USB, because it's
pluggable and host-independent) and I'm thinking of how to backup them
(I realize I still need a backup) onto other external disks or preferably
another system. Again, zfs send/receive will be my friend here.

ZFS boot on my home server is the other next big thing, enabling me to
mirror my root file system more reliably than SVM can while saving space
for live upgrade and enabling other cool stuff.

I'm also thinking of using iSCSI zvols as Mac OS X storage for audio/video
editing and whole-disk backups, but that requires some waiting until
the Mac OS X iSCSI support has matured a bit.

And then I can start to really archive stuff: Older backups that sit on CDs
and are threatened by CD-rot, old photo CDs that have been sitting there and
hopefully haven't begun to rot yet, maybe scan in some older photos,
migrating my CD collection to a lossless format, etc.


This sounds like I've been drinking too much koolaid, and I've probably have,
but I guess all the above points remain valid even if I didn't work for Sun.
So please take this email as being written by a private ZFS user and not
a Sun employee.


So, again, thank you so much ZFS team and keep up the good work!


Best regards,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive question

2007-04-20 Thread Constantin Gonzalez
Hi,

Krzys wrote:
> Ok, so -F option is not in U3, is there any way to replicate file system
> and not be able to mount it automatically? so when I do zfs send/receive
> it wont be mounted and changes would not be made so that further
> replications could be possible? What I did notice was that if I am doing
> zfs send/receive right one after another I am able to replicate all my
> snaps, but when I wait a day or even few hours I get notice that file
> system got changed, and that is because it was mounted and I guess
> because of that I am not able to perform any more snaps to be send...
> any idea what I could do meanwhile I am waiting for -F?

this should work:

  zfs unmount pool/filesystem
  zfs rollback (latest snapshot)
  zfs send ... | zfs receive
  zfs mount pool/filesystem

Better yet: Assuming you don't actually want to use the filesystem you
replicate to, but just use it as a sink for backup purposes, you can mark
it unmountable, then just send stuff to it.

  zfs set canmount=off pool/filesystem
  zfs rollback (latest snapshot, one last time)

Then, whenever you want to access the receiving filesystem, clone it.

Hope this helps,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS boot: 3 smaller glitches with console, /etc/dfs/sharetab and /dev/random

2007-04-20 Thread Constantin Gonzalez
Hi,

>> 2. After going through the zfs-bootification, Solaris complains on
>> reboot that
>>/etc/dfs/sharetab is missing. Somehow this seems to have been
>> fallen through
>>the cracks of the find command. Well, touching /etc/dfs/sharetab
>> just fixes
>>the issue.
> 
> This is unrelated to ZFS boot issues, and sounds like this bug:
> 
> 6542481 No sharetab after BFU from snv_55
> 
> It's fixed in build 62.

hmm, that doesn't fit what I saw:

- Upgraded from snv_61 to snv_62
- snv_62 booted with not problems (other than the t_optmgmt bug)
- Then migrated to ZFS boot
- Now the sharetab issues shows up.

So why did the sharetab issue only show up after the ZFSification of the
boot process?

Best regards,
Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS boot: 3 smaller glitches with console, /etc/dfs/sharetab and /dev/random

2007-04-19 Thread Constantin Gonzalez Schmitz
Hi,

I've now gone through both the opensolaris instructions:

  http://www.opensolaris.org/os/community/zfs/boot/zfsboot-manual/

and Tim Foster's script:

  http://blogs.sun.com/timf/entry/zfs_bootable_datasets_happily_rumbling

for making my laptop ZFS bootable.


Both work well and here's a big THANK YOU to the ZFS boot team!


There seem to be 3 smaller glitches with these approaches:

1. The instructions on opensolaris.org assume that one wants console output
   to show up in /dev/tty. This may be true for a server, but it isn't for a
   laptop or workstation user. Therefore, I suggest someone explains them to be
   optional as not everybody knows that these can be left out.

2. After going through the zfs-bootification, Solaris complains on reboot that
   /etc/dfs/sharetab is missing. Somehow this seems to have been fallen through
   the cracks of the find command. Well, touching /etc/dfs/sharetab just fixes
   the issue.

3. But here's a more serious one: While booting, Solaris complains:

   Apr 19 15:00:37 foeni kcf: [ID 415456 kern.warning] WARNING: No randomness
   provider enabled for /dev/random. Use cryptoadm(1M) to enable a provider.

   Somehow, /dev/random and/or it's counterpart in /devices seems to have
   suffered from the migration procedure.

Does anybody know how to fix the /dev/random issue? I'm not very fluent in
cryptoadm(1M) and some superficial reading of it's manpage did not enlighten
me too much (cryptoadm list -p claims all is well...).

Best regards and again, congratulations to the ZFS boot team!

   Constantin


-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Who modified my ZFS receive destination?

2007-04-12 Thread Constantin Gonzalez
Hi Trev,

Trevor Watson wrote:
> Hi Constantin,
> 
> I had the same problem, and the solution was to make sure that the
> filesystem is not mounted on the destination system when you perform the
> zfs recv (zfs set mountpoint=none santiago/home).

thanks! This time it worked:

# zfs unmount santiago/home/constant
# zfs rollback santiago/home/[EMAIL PROTECTED]:nobackup-2007-01-08-00:00:00
# zfs send -i pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-08-00:00:00
pelotillehue/[EMAIL PROTECTED]:nobackup-2007-02-15-00:00:01 | zfs receive
santiago/home/constant
#

Still, this is kinda strange. This means, that we'll need to zfs unmount, then
zfs rollback  a lot when doing send/receive on a regular basis
(as in weekly, daily, hourly, minutely cron-jobs) to be sure. Or keep any
replicated filesystems unmounted _all_ the time.

Best regards,
   Constantin


> 
> Trev
> 
> Constantin Gonzalez wrote:
>> Hi,
>>
>> I'm currently migrating a filesystem from one pool to the other through
>> a series of zfs send/receive commands in order to preserve all snapshots.
>>
>> But at some point, zfs receive says "cannot receive: destination has been
>> modified since most recent snapshot". I am pretty sure nobody changed
>> anything
>> at my destination filesystem and I also tried rolling back to an earlier
>> snapshot on the destination filesystem to make it clean again.
>>
>> Here's an excerpt of the snapshots on my source filesystem:
>>
>> # zfs list -rt snapshot pelotillehue/constant
>> NAME  
>> USED  AVAIL
>> REFER  MOUNTPOINT
>> pelotillehue/[EMAIL PROTECTED]
>> 236K  -
>> 33.6G  -
>> pelotillehue/[EMAIL PROTECTED]
>> 747K  -
>> 46.0G  -
>> pelotillehue/[EMAIL PROTECTED]:nobackup-2006-11-22-00:00:06 
>> 3.07G  -
>>  116G  -
>> pelotillehue/[EMAIL PROTECTED]:nobackup-2006-11-29-00:00:00 
>> 18.9M  -
>>  115G  -
>> pelotillehue/[EMAIL PROTECTED]:nobackup-2006-12-01-00:00:03 
>> 10.9M  -
>>  115G  -
>> pelotillehue/[EMAIL PROTECTED]:nobackup-2006-12-08-00:00:00  
>> 606M  -
>>  105G  -
>> pelotillehue/[EMAIL PROTECTED]:nobackup-2006-12-15-00:00:01  
>> 167M  -
>>  105G  -
>> pelotillehue/[EMAIL PROTECTED]:nobackup-2006-12-22-00:00:00 
>> 5.31M  -
>>  105G  -
>> pelotillehue/[EMAIL PROTECTED]:nobackup-2006-12-29-00:00:01 
>> 1.90M  -
>>  105G  -
>> pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-01-00:00:01 
>> 1.26M  -
>>  105G  -
>> pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-08-00:00:00 
>> 15.2M  -
>>  109G  -
>> pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-15-00:00:00 
>> 17.5M  -
>>  109G  -
>>
>> ... (further lines omitted)
>>
>>
>> On the destination filesystem, snapshots have been replicated through
>> zfs send/receive up to the 2007-01-01 snapshot, so I do the following:
>>
>> # zfs send -i
>> pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-01-00:00:01
>> pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-08-00:00:00 | zfs
>> receive
>> santiago/home/constant
>>
>> This worked, but now, only seconds later:
>>
>> # zfs send -i
>> pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-08-00:00:00
>> pelotillehue/[EMAIL PROTECTED]:nobackup-2007-02-15-00:00:01 | zfs
>> receive
>> santiago/home/constant
>> cannot receive: destination has been modified since most recent snapshot
>>
>> Fails. So I try rolling back to the 2007-01-08 snapshot on the
>> destination
>> filesystem to be clean again, but:
>>
>> # zfs rollback
>> santiago/home/[EMAIL PROTECTED]:nobackup-2007-01-08-00:00:00
>> # zfs send -i
>> pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-08-00:00:00
>> pelotillehue/[EMAIL PROTECTED]:nobackup-2007-02-15-00:00:01 | zfs
>> receive
>> santiago/home/constant
>> cannot receive: destination has been modified since most recent snapshot
>>
>> Hmm, why does ZFS think my destination has been modified, although I
>> didn't
>> do anything?
>>
>> Another peculiar thing: zfs list on the destination snapshots says:
>>
>> # zfs list -rt snapshot santiago/home/constant
>> NAME   
>> USED  AVAIL
>>  REFER  MOUNTPOINT
>> santiago/home/[EMAIL PROTECTED]
>> 189K  -
>>  33.6G  -
>> santiago/home/[EMAIL PROTECTED]

Summary: [zfs-discuss] Poor man's backup by attaching/detaching mirror drives on a _striped_ pool?

2007-04-12 Thread Constantin Gonzalez
Hi,

here's a quick summary of the answers I've seen so far:

- Splitting mirrors is a current practice with traditional volume
  management. The goal is to quickly and effortlessly create a clone of a
  storage volume/pool.

- Splitting mirrors with ZFS can be done, but it has to be done the
  hard way by resilvering, then unplugging the disk, then trying to
  import it somewhere else. zpool detach would render the detached disk
  unimportable.

- Another, cleaner way of splitting a mirror would be to export the
  pool, then disconnect one drive, then re-import again. After that,
  the disconnected drive needs to be zpool detach'ed from the mother,
  while the clone can then be imported and its missing mirrors
  detached as well. But this involves unmounting the pool so it can't
  be done without downtime.

- The supported alternative would be zfs snapshot, then zfs send/receive,
  but this introduces the complexity of snapshot management which
  makes it less simple, thus less appealing to the clone-addicted admin.

- There's an RFE for supporting splitting mirrors: 5097228
  http://bugs.opensolaris.org/view_bug.do?bug_id=5097228

IMHO, we should investigate if something like zpool clone would be useful.
It could be implemented as a script that recursively snapshots the source
pool, then zfs send/receives it to the destination pool, then copies all
properties, but the actual reason why people do mirror splitting in the
first place is because of its simplicity.

A zpool clone or a zpool send/receive command would be even simpler and less
error-prone than the tradition of splitting mirrors, plus it could be
implemented more efficiently and more reliably than a script, thus bringing
real additional value to administrators.

Maybe zpool clone or zpool send/receive would be the better way of implementing
5097228 in the first place?

Best regards,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
[EMAIL PROTECTED]
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Who modified my ZFS receive destination?

2007-04-12 Thread Constantin Gonzalez
Hi,

I'm currently migrating a filesystem from one pool to the other through
a series of zfs send/receive commands in order to preserve all snapshots.

But at some point, zfs receive says "cannot receive: destination has been
modified since most recent snapshot". I am pretty sure nobody changed anything
at my destination filesystem and I also tried rolling back to an earlier
snapshot on the destination filesystem to make it clean again.

Here's an excerpt of the snapshots on my source filesystem:

# zfs list -rt snapshot pelotillehue/constant
NAME   USED  AVAIL
REFER  MOUNTPOINT
pelotillehue/[EMAIL PROTECTED] 236K  -
33.6G  -
pelotillehue/[EMAIL PROTECTED] 747K  -
46.0G  -
pelotillehue/[EMAIL PROTECTED]:nobackup-2006-11-22-00:00:06  3.07G  -
 116G  -
pelotillehue/[EMAIL PROTECTED]:nobackup-2006-11-29-00:00:00  18.9M  -
 115G  -
pelotillehue/[EMAIL PROTECTED]:nobackup-2006-12-01-00:00:03  10.9M  -
 115G  -
pelotillehue/[EMAIL PROTECTED]:nobackup-2006-12-08-00:00:00   606M  -
 105G  -
pelotillehue/[EMAIL PROTECTED]:nobackup-2006-12-15-00:00:01   167M  -
 105G  -
pelotillehue/[EMAIL PROTECTED]:nobackup-2006-12-22-00:00:00  5.31M  -
 105G  -
pelotillehue/[EMAIL PROTECTED]:nobackup-2006-12-29-00:00:01  1.90M  -
 105G  -
pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-01-00:00:01  1.26M  -
 105G  -
pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-08-00:00:00  15.2M  -
 109G  -
pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-15-00:00:00  17.5M  -
 109G  -

... (further lines omitted)


On the destination filesystem, snapshots have been replicated through
zfs send/receive up to the 2007-01-01 snapshot, so I do the following:

# zfs send -i pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-01-00:00:01
pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-08-00:00:00 | zfs receive
santiago/home/constant

This worked, but now, only seconds later:

# zfs send -i pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-08-00:00:00
pelotillehue/[EMAIL PROTECTED]:nobackup-2007-02-15-00:00:01 | zfs receive
santiago/home/constant
cannot receive: destination has been modified since most recent snapshot

Fails. So I try rolling back to the 2007-01-08 snapshot on the destination
filesystem to be clean again, but:

# zfs rollback santiago/home/[EMAIL PROTECTED]:nobackup-2007-01-08-00:00:00
# zfs send -i pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-08-00:00:00
pelotillehue/[EMAIL PROTECTED]:nobackup-2007-02-15-00:00:01 | zfs receive
santiago/home/constant
cannot receive: destination has been modified since most recent snapshot

Hmm, why does ZFS think my destination has been modified, although I didn't
do anything?

Another peculiar thing: zfs list on the destination snapshots says:

# zfs list -rt snapshot santiago/home/constant
NAMEUSED  AVAIL
 REFER  MOUNTPOINT
santiago/home/[EMAIL PROTECTED] 189K  -
 33.6G  -
santiago/home/[EMAIL PROTECTED] 670K  -
 46.0G  -
santiago/home/[EMAIL PROTECTED]:nobackup-2006-11-22-00:00:06  3.07G  -
  116G  -
santiago/home/[EMAIL PROTECTED]:nobackup-2006-11-29-00:00:00  18.4M  -
  115G  -
santiago/home/[EMAIL PROTECTED]:nobackup-2006-12-01-00:00:03  10.5M  -
  115G  -
santiago/home/[EMAIL PROTECTED]:nobackup-2006-12-08-00:00:00   603M  -
  105G  -
santiago/home/[EMAIL PROTECTED]:nobackup-2006-12-15-00:00:01   163M  -
  105G  -
santiago/home/[EMAIL PROTECTED]:nobackup-2006-12-22-00:00:00  4.87M  -
  105G  -
santiago/home/[EMAIL PROTECTED]:nobackup-2006-12-29-00:00:01  1.79M  -
  106G  -
santiago/home/[EMAIL PROTECTED]:nobackup-2007-01-01-00:00:01  1.16M  -
  106G  -
santiago/home/[EMAIL PROTECTED]:nobackup-2007-01-08-00:00:0057K  -
  109G  -

Note that the Used column for the 2007-01-08 snapshot says 57K on the
destination, but 15.2M on the source. Could it be that the reception of
the 2007-01-08 failed and ZFS didn't notice?

I've tried this multiple times, including destroying snapshots and rolling
back on the destination to the 2007-01-01 state, so what you see above is
already a second try of the same.

The other values vary too, but only slightly. Compression is turned on on
both pools. The source pool has been scrubbed on Monday with no known data
errors and the destination pool is brand new and I'm scrubbing it as we speak.

Best regards,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marce

Re: [zfs-discuss] Re: Poor man's backup by attaching/detaching mirror

2007-04-11 Thread Constantin Gonzalez Schmitz
Hi,

>> How would you access the data on that device?
> 
> Presumably, zpool import.

yes.

> This is basically what everyone does today with mirrors, isn't it? :-)

sure. This may not be pretty, but it's what customers are doing all the time
with regular mirrors, 'cause it's quick, easy and reliable.

Cheers,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Poor man's backup by attaching/detaching mirror drives on a _striped_ pool?

2007-04-11 Thread Constantin Gonzalez Schmitz
Hi Mark,

Mark J Musante wrote:
> On Tue, 10 Apr 2007, Constantin Gonzalez wrote:
> 
>> Has anybody tried it yet with a striped mirror? What if the pool is
>> composed out of two mirrors? Can I attach devices to both mirrors, let
>> them resilver, then detach them and import the pool from those?
> 
> You'd want to export them, not detach them.  Detaching will overwrite the
> vdev labels and make it un-importable.

thank you for the export/import idea, it does sound cleaner from a ZFS
perspective, but comes at the expense of temporarily unmounting the filesystems.

So, instead of detaching, would unplugging, then detaching work?

I'm thinking something like this:

 - zpool create tank mirror   
 - {physically move  to new box}
 - zpool detach tank 

On the new box:
 - zpool import tank
 - zpool detach tank 
 - zpool detach tank 

This should work for one disk, and I assume this would also work for multiple
disks?

Thinking along similar lines, would it be a useful RFE to allow asynchronous
mirroring like this:

- ,  are both 250GB,  is 500GB
- zpool create tank mirror , 

This means that half of dev3 would mirror dev1, the other half would mirror dev2
and dev1,dev2 is a regular stripe.

The utility of this would be for cases where customer have set up mirrors, then
need to replace disks or upgrade the mirror after a long time, when bigger disks
are easier to get than smaller ones and while reusing older disks.

Best regards,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs. rmvolmgr

2007-04-11 Thread Constantin Gonzalez Schmitz
Hi,

sorry, I needed to be more clear:

Here's what I did:

1. Connect USB storage device (a disk) to machine
2. Find USB device through rmformat
3. Try zpool create on that device. It fails with:
   can't open "/dev/rdsk/cNt0d0p0", device busy
4. svcadm disable rmvolmgr
5. Now zpool create works with that device and the pool gets created.
6. svcadm enable rmvolmgr
7. After that, everything works as expected, the device stays under control
   of the pool.

>>   can't open "/dev/rdsk/cNt0d0p0", device busy
> 
> Do you remember exactly what command/operation resulted in this error?

See above, it comes right after trying to create a zpool on that device.

> It is something that tries to open device exclusively.

So after ZFS opens the device exclusively, hald and rmvolmgr will ignore it?
What happens at boot time, is zfs then quicker in grabbing the device than
hald and rmvolmgr are?

>> So far, I've just said svcadm disable -t rmvolmgr, did my thing, then
>> said svcadm enable rmvolmgr.
> 
> This can't possibly be true, because rmvolmgr does not open devices.

Hmm. I really remember to have done the above. Actually, I've been pulling
some hairs out trying to do zpools on external devices until I got the idea
of diasbling the rmvolmgr, then it worked.

> You'd need to also disable the 'hal' service. Run fuser on your device
> and you'll see it's one of the hal addons that keeps it open:

Perhaps something depended on rmvolmgr which release the device after I
disabled the service?

>> For instance, I'm now running several USB disks with ZFS pools on
>> them, and
>> even after restarting rmvolmgr or rebooting, ZFS, the disks and rmvolmgr
>> get along with each other just fine.
> 
> I'm confused here. In the beginning you said that something got in the
> way, but now you're saying they get along just fine. Could you clarify.

After creating the pool, the device now belongs to ZFS. Now, ZFS seems to
be able to grab the device before anybody else.

> One possible workaround would be to match against USB disk's serial
> number and tell hal to ignore it using fdi(4) file. For instance, find
> your USB disk in lshal(1M) output, it will look like this:
> 
> udi = '/org/freedesktop/Hal/devices/pci_0_0/pci1028_12c_1d_7/storage_5_0'
>   usb_device.serial = 'DEF1061F7B62'  (string)
>   usb_device.product_id = 26672  (0x6830)  (int)
>   usb_device.vendor_id = 1204  (0x4b4)  (int)
>   usb_device.vendor = 'Cypress Semiconductor'  (string)
>   usb_device.product = 'USB2.0 Storage Device'  (string)
>   info.bus = 'usb_device'  (string)
>   info.solaris.driver = 'scsa2usb'  (string)
>   solaris.devfs_path = '/[EMAIL PROTECTED],0/pci1028,[EMAIL 
> PROTECTED],7/[EMAIL PROTECTED]'  (string)
> 
> You want to match an object with this usb_device.serial property and set
> info.ignore property to true. The fdi(4) would look like this:

thanks, this sounds just like what I was looking for.

So the correct way of having a zpool out of external USB drives is to:

1. Attach the drives
2. Find their USB serial numbers with lshal
3. Set up an fdi file that matches the disks and tells hal to ignore them

The naming of the file

  /etc/hal/fdi/preprobe/30user/10-ignore-usb.fdi

sounds like init.d style directory and file naming, ist this correct?

Best regards,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Poor man's backup by attaching/detaching mirror drives on a _striped_ pool?

2007-04-10 Thread Constantin Gonzalez
Hi,

one quick&dirty way of backing up a pool that is a mirror of two devices is to
zpool attach a third one, wait for the resilvering to finish, then zpool detach
it again.

The third device then can be used as a poor man's simple backup.

Has anybody tried it yet with a striped mirror? What if the pool is
composed out of two mirrors? Can I attach devices to both mirrors, let them
resilver, then detach them and import the pool from those?

Best regards,
  Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS vs. rmvolmgr

2007-04-10 Thread Constantin Gonzalez
Hi,

while playing around with ZFS and USB memory sticks or USB harddisks,
rmvolmgr tends to get in the way, which results in a

  can't open "/dev/rdsk/cNt0d0p0", device busy

error.

So far, I've just said svcadm disable -t rmvolmgr, did my thing, then
said svcadm enable rmvolmgr.

Is there a more elegant approach that tells rmvolmgr to leave certain
devices alone on a per disk basis?

For instance, I'm now running several USB disks with ZFS pools on them, and
even after restarting rmvolmgr or rebooting, ZFS, the disks and rmvolmgr
get along with each other just fine.

What and how does ZFS tell rmvolmgr that a particular set of disks belongs
to ZFS and should not be treated as removable?

Best regards,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Setting up for zfsboot

2007-04-05 Thread Constantin Gonzalez
Hi,

>>> - RAID-Z is _very_ slow when one disk is broken.
>> Do you have data on this? The reconstruction should be relatively cheap
>> especially when compared with the initial disk access.
> 
> Also, what is your definition of "broken"?  Does this mean the device
> appears as FAULTED in the pool status, or that the drive is present and
> not responding?  If it's the latter, this will be fixed by my upcoming
> FMA work.

sorry, the _very_ may be exaggarated and depending much on the load of
the system and the config.

I'm referring to a couple of posts and anecdotal experience from colleagues.
This means that indeed "slow" or "very slow" may be a mixture of
reconstruction overhead and device timeout issue.

So, it's nice to see that the upcoming FMA code will fix some of the slowness
issues.

Did anybody measure how much CPU overhead RAID-Z and RAID-Z2 parity
computation induces, both for writes and for reads (assuming a data disk
is broken)? This data would be useful when arguing for a "software RAID"
scheme in front of hardware-RAID addicted customers.

Best regards,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Setting up for zfsboot

2007-04-04 Thread Constantin Gonzalez
Hi,

Manoj Joseph wrote:

> Can write-cache not be turned on manually as the user is sure that it is
> only ZFS that is using the entire disk?

yes it can be turned on. But I don't know if ZFS would then know about it.

I'd still feel more comfortably with it being turned off unless ZFS itself
does it.

But maybe someone from the ZFS team can clarify this.

Cheers,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Setting up for zfsboot

2007-04-04 Thread Constantin Gonzalez
Hi,

> Now that zfsboot is becoming available, I'm wondering how to put it to
> use. Imagine a system with 4 identical disks. Of course I'd like to use

you lucky one :).

> raidz, but zfsboot doesn't do raidz. What if I were to partition the
> drives, such that I have 4 small partitions that make up a zfsboot
> partition (4 way mirror), and the remainder of each drive becomes part
> of a raidz?

Sounds good. Performance will suffer a bit, as ZFS thinks it has two pools
with 4 spindels each, but it should still perform better than the same on
a UFS basis.

You may also want to have two 2-way mirrors and keep the second for other
purposes such as a scratch space for zfs migration or as spare disks for
other stuff.

> Do I still have the advantages of having the whole disk
> 'owned' by zfs, even though it's split into two parts?

I'm pretty sure that this is not the case:

- ZFS has no guarantee that someone will do something else with that other
  partition, so it can't assume the right to turn on disk cache for the whole
  disk.

- Yes, it could be smart and realize that it does have the whole disk, only
  split up across two pools, but then I assume that this is not your typical
  enterprise class configuration and so it probably didn't get implemented
  that way.

I'd say that not being able to benefit from the disk drive's cache is not
as bad in the face of ZFS' other advantages, so you can probably live with
that.

> Swap would probably have to go on a zvol - would that be best placed on
> the n-way mirror, or on the raidz?

I'd place it onto the mirror for performance reasons. Also, it feels cleaner
to have all your OS stuff on one pool and all your user/app/data stuff on
another. This is also recommended by the ZFS Best Practices Wiki on
www.solarisinternals.com.

Now back to the 4 disk RAID-Z: Does it have to be RAID-Z? Maybe you might want
to reconsider using 2 2-way mirrors:

- RAID-Z is slow when writing, you basically get only one disk's bandwidth.
  (Yes, with variable block sizes this might be slightly better...)

- RAID-Z is _very_ slow when one disk is broken.

- Using mirrors is more convenient for growing the pool: You run out of space,
  you add two disks, and get better performance too. No need to buy 4 extra
  disks for another RAID-Z set.

- When using disks, you need to consider availability, performance and space.
  Of all the three, space is the cheapest. Therefore it's best to sacrifice
  space and you'll get better availability and better performance.

Hope this helps,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pathological ZFS performance

2007-03-30 Thread Constantin Gonzalez
ut watching I/O throughput.
> 
> For yet-another-fallback, I am thinking about using SATA-to-IDE
> converters here:
> 
> http://www.newegg.com/product/product.asp?item=N82E16812156010
> 
> It feels kind of nuts, but I have to think this would perform
> better than what I have now.  This would cost me the one SATA
> drive I'm using now in a smaller pool.
> 
> Rob T
> _______
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Migrating a pool

2007-03-28 Thread Constantin Gonzalez
Hi Matt,

cool, thank you for doing this!

I'll still write my script since today my two shiny new 320GB USB
disks will arrive :).

I'll add to that the feature to first send all current snapshots, then
bring down the services that depend on the filesystem, unmount the old
fs, send a final incremental snapshot then zfs set mountpoint=x to the new
filesystem, then bring up the services again.

Hope this works as I imagine.

Cheers,
   Constantin

Matthew Ahrens wrote:
> Constantin Gonzalez wrote:
>> What is the most elegant way of migrating all filesystems to the new
>> pool,
>> including snapshots?
>>
>> Can I do a master snapshot of the whole pool, including
>> sub-filesystems and
>> their snapshots, then send/receive them to the new pool?
>>
>> Or do I have to write a script that will individually snapshot all
>> filesystems
>> within my old pool, then run a send (-i) orgy?
> 
> Unfortunately, you will need to make/find a script to do the various
> 'zfs send -i' to send each snapshot of each filesystem.
> 
> I am working on 'zfs send -r', which will make this a snap:
> 
> # zfs snapshot -r [EMAIL PROTECTED]
> # zfs send -r [EMAIL PROTECTED] | zfs recv ...
> 
> You'll also be able to do 'zfs send -r -i @yesterday [EMAIL PROTECTED]'.
> 
> See RFE 6421958.
> 
> --matt

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Migrating a pool

2007-03-27 Thread Constantin Gonzalez
Hi,

> Today I have about half a dozen filesystems in the old pool plus dozens of
> snapshots thanks to Tim Bray's excellent SMF snapshotting service.

I'm sorry I mixed up Tim's last name. The fine guy who wrote the SMF snapshot
service is Tim Foster. And here's the link:

  http://blogs.sun.com/timf/entry/zfs_automatic_snapshots_0_8

There doesn't seem to be an easy answer to the original question of how to
migrate a complete pool. Writing a script with a snapshot send/receive
party seems to be the only approach.

I wish I could zfs snapshot pool then zfs send pool | zfs receive dest and
all blocks would be transferred as they are, including all embedded snapshots.

Is that already an RFE?

Best regards,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Migrating a pool

2007-03-26 Thread Constantin Gonzalez
Hi,

soon it'll be time to migrate my patchwork pool onto a real pair of
mirrored (albeit USB-based) external disks.

Today I have about half a dozen filesystems in the old pool plus dozens of
snapshots thanks to Tim Bray's excellent SMF snapshotting service.

What is the most elegant way of migrating all filesystems to the new pool,
including snapshots?

Can I do a master snapshot of the whole pool, including sub-filesystems and
their snapshots, then send/receive them to the new pool?

Or do I have to write a script that will individually snapshot all filesystems
within my old pool, then run a send (-i) orgy?

Best regards,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] CSI:Munich - How to save the world with ZFS and 12 USB Sticks

2007-03-09 Thread Constantin Gonzalez
Hi,

a few weeks ago, Richard Elling noticed our ZFS video:

  http://www.opensolaris.org/jive/thread.jspa?threadID=23220&tstart=120

Finally, we got the english version done. Many thanks to Marc Baumann, our
brave video editor for making this possible.

Here's the video and some comments, both in english:

  http://blogs.sun.com/constantin/entry/csi_munich_how_to_save

The video alone is available here:

  http://video.google.com/videoplay?docid=8100808442979626078

Please forgive the occasional soundless lip-movements, it turns out that
the english language has less redundancy than the german one :).

Have fun,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 2-way mirror or RAIDZ?

2007-02-27 Thread Constantin Gonzalez
Hi,

> I have a shiny new Ultra 40 running S10U3 with 2 x 250Gb disks.

congratulations, this is a great machine!

> I want to make best use of the available disk space and have some level
> of redundancy without impacting performance too much.
> 
> What I am trying to figure out is: would it be better to have a simple
> mirror of an identical 200Gb slice from each disk or split each disk
> into 2 x 80Gb slices plus one extra 80Gb slice on one of the disks to
> make a 4 + 1 RAIDZ configuration?

you probably want to mirror the OS slice of the disk to protect your OS and
its configuration from the loss of a whole disk. Do it with SVM today and
upgrade to a bootable ZFS mirror in the future.

The OS slice needs only to be 5GB in size if you follow the standard
recommendation, but 10 GB is probably a safe and easy to remember bet, leaving
you some extra space for apps etc.

Plan to be able to live upgrade into new OS versions. You may break up the
mirror to do so, but this is kinda complicated and error-prone.
Disk space is cheap, so I'd rather recommend you safe two slices per disk for
creating 2 mirrored boot environments where you can LU back and forth.

For swap, allocate an extra slice per disk and of course mirror swap too.
1GB swap should be sufficient.

Now, you can use the rest for ZFS. Having only two physical disks, there is
no good reason to do something other than mirroring. If you created 4+1
slices for RAID-Z, you would always lose the whole pool if one disk broke.
Not good. You could play russian roulette by having 2+3 slices and RAID-Z2
and hoping that the right disk fails, but that isn't s good practice either
and it wouldn't buy you any redundant space either, just leave an extra
unprotected scratch slice.

So, go for the mirror, it gives you good performance and less headaches.

If you can spare the money, try increasing the number of disks. You'd still
need to mirror boot and swap slices, but then you would be able to use a real
RAID-Z config for the rest, enabling to leverage more disk capacity at a good
redundancy/performance compromise.

Hope this helps,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FYI: ZFS on USB sticks (from Germany)

2007-02-05 Thread Constantin Gonzalez
Hi,

Artem: Thanks. And yes, Peter S. is a great actor!

Christian Mueller wrote:
> who is peter stormare? (sorry, i'm from old europe...)

as usual, Wikipedia knows it:

  http://en.wikipedia.org/wiki/Peter_Stormare

and he's european too :). Great actor, great movies. I particularly like
Constantine, not just because of the name, of course :)

Out budget is quite limited at the moment, but after the 1.000.000th view on
YouTube/Google Video we might want to reconsider our cast for the next
episode :).

But first, we need to get the english version finished...

Cheers,
   Constantin

> 
> thx & bye
> christian
> 
> Artem Kachitchkine schrieb:
>>
>>> Brilliant video, guys.
>>
>> Totally agreed, great work.
>>
>> Boy, would I like to see Peter Stormare in that video %)
>>
>> -Artem.
> 

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FYI: ZFS on USB sticks (from Germany)

2007-02-02 Thread Constantin Gonzalez
Hi Richard,

Richard Elling wrote:
> FYI,
> here is an interesting blog on using ZFS with a dozen USB drives from
> Constantin.
> http://blogs.sun.com/solarium/entry/solaris_zfs_auf_12_usb

thank you for spotting it :).

We're working on translating the video (hope we get the lip-syncing right...)
and will then re-release it in an english version. BTW, we've now hosted
the video on YouTube so it can be embedded in the blog.

Of course, I'll then write an english version of the blog entry with the
tech details.

Please hang on for a week or two... :).

Best regards,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: How much do we really want zpool remove?

2007-01-31 Thread Constantin Gonzalez
Hi,

I need to be a little bit more precise in how I formulate comments:

1. Yes, zpool remove is a desirable feature, no doubt about that.

2. Most of the cases where customers ask for "zpool remove" can be solved
   with zfs send/receive or with zpool replace. Think Pareto's 80-20 rule.

   2a. The cost of doing 2., including extra scratch storage space or scheduling
   related work into planned downtimes is smaller than the cost of not using
   ZFS at all.

   2b. Even in the remaining 20% of cases (figuratively speaking, YMMV) where
   zpool remove would be the only solution, I feel that the cost of
   sacrificing the extra storage space that would have become available
   through "zpool remove" is smaller than the cost of the project not
   benefitting from the rest of ZFS' features.

3. Bottom line: Everybody wants zpool remove as early as possible, but IMHO
   this is not an objective barrier to entry for ZFS.

Note my use of the word "objective". I do feel that we have to implement
zpool remove for subjective reasons, but that is a non technical matter.

Is this an agreeable summary of the situation?

Best regards,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Client Solutionshttp://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: can I use zfs on just a partition?

2007-01-26 Thread Constantin Gonzalez Schmitz
Hi,

> When you do the initial install, how do you do the slicing?
> 
> Just create like:
> / 10G
> swap 2G
> /altroot 10G
> /zfs restofdisk

yes.

> Or do you just create the first three slices and leave the rest of the
> disk untouched?  I understand the concept at this point, just trying to
> explain to a third party exactly what they need to do to prep the system
> disk for me :)

No. You need to be able to tell ZFS what to use. Hence, if your pool is
created at the slice level, you need to create a slice for it.

So the above is the way to go.

And yes, you only should do this on laptos and other machines where you only
have 1 disk or are otherwise very disk-limited :).

Best regards,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Client Solutionshttp://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: How much do we really want zpool remove?

2007-01-26 Thread Constantin Gonzalez Schmitz
Hi,

I do agree that zpool remove is a _very_ desirable feature, not doubt about
that.

Here are a couple of thoughts and workarounds, in random order, that might
give us some more perspective:

- My home machine has 4 disks and a big zpool across them. Fine. But what
  if a controller fails or worse, a CPU? Right, I need a second machine, if
  I'm really honest with myself and serious with my data. Don't laugh, ZFS
  on a Solaris server is becoming my mission-critical home storage solution
  that is supposed to last beyond CDs and DVDs and other vulnerable media.

  So, if I was an enterprise, I'd be willing to keep enough empty LUNs
  available to facilitate at least the migration of one or more filesystems
  if not complete pools. With a little bit of scripting, this can be done
  quite easily and efficiently through zfs send/receive and some LUN
  juggling.

  If I was an enterprise's server admin and the storage guys wouldn't have
  enough space for migrations, I'd be worried.

- We need to avoid customers thinking "Veritas can shrink, ZFS can't". That
  is wrong. ZFS _filesystems_ grow and shrink all the time, it's just the
  pools below them that can just grow. And Veritas does not even have pools.

  People have started to follow a One-pool-to-store-them-all which I think
  is not always appropriate. Some alternatives:

  - One pool per zone might be a good idea if you want to migrate zones
across systems which then becomes easy through zpool export/import in
a SAN.

  - One pool per service level (mirror, RAID-Z2, fast, slow, cheap, expensive)
might be another idea. Keep some cheap mirrored storage handy for your pool
migration needs and you could wiggle your life around zpool remove.

Switching between Mirror, RAID-Z, RAID-Z2 then becomes just a zfs
send/receive pair.

Shrinking a pool requires some more zfs send/receiving and maybe some
scripting, but these are IMHO less painful than living without ZFS'
data integrity and the other joys of ZFS.

Sorry if I'm stating the obvious or stuff that has been discussed before,
but the more I think about zpool remove, the more I think it's a question
of willingness to plan/work/script/provision vs. a real show stopper.

Best regards,
   Constantin

P.S.: Now with my big mouth I hope I'll survive a customer confcall next
week with a customer asking for exactly zpool remove :).

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Client Solutionshttp://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: can I use zfs on just a partition?

2007-01-25 Thread Constantin Gonzalez Schmitz
Hi Tim,

> Essentially I'd like to have the / and swap on the first 60GB of the disk.  
> Then use the remaining 100GB as a zfs partition to setup zones on.  Obviously 
> the snapshots are extremely useful in such a setup :)
> 
> Does my plan sound feasible from both a usability and performance standpoint?

yes, it works, I do it on my laptop all the time:

# format
Searching for disks...done


AVAILABLE DISK SELECTIONS:
   0. c0d0 
  /[EMAIL PROTECTED],0/[EMAIL PROTECTED],1/[EMAIL PROTECTED]/[EMAIL 
PROTECTED],0
Specify disk (enter its number): 0
selecting c0d0
Controller working list found
[disk formatted, defect list found]
Warning: Current Disk has mounted partitions.
/dev/dsk/c0d0s0 is currently mounted on /. Please see umount(1M).
/dev/dsk/c0d0s1 is currently used by swap. Please see swap(1M).
/dev/dsk/c0d0s3 is part of active ZFS pool poolchen. Please see zpool(1M).
/dev/dsk/c0d0s4 is in use for live upgrade /. Please see ludelete(1M).

c0d0s5 is also free and can be used as a third live upgrade partition.

My recommendation: Use at least 2 slices for the OS so you can enjoy live
upgrade, one for swap and the rest for ZFS.

Performance-wise, this is of course not optimal, but perfectly feasible. I have
an Acer Ferrari 4000 which is known to have a slow disk, but it still works
great for what I do (email, web, Solaris demos, presentations, occasional
video).

More complicated things are possible as well. The following blog entry:

  http://blogs.sun.com/solarium/entry/tetris_spielen_mit_zfs

(sorry it's german) ilustrates how my 4 disks at home are sliced in order
to get OS partitions on multiple disks, Swap and as much ZFS space as
possible at acceptable redundancy despite differently-sized disks. Check out
the graphic in the above entry to see what I mean. Works great (but I had to
use -f to zpool create :) ) and gives me enough performance for all my
home-serving needs.

Hope this helps,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Client Solutionshttp://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] poor NFS/ZFS performance

2006-11-23 Thread Constantin Gonzalez
Hi Roch,

thanks, now I better understand the issue :).

> Nope.  NFS  is slow   for single threaded  tar  extract. The
> conservative approach of NFS is needed with the NFS protocol
> in order to ensure client's side data integrity. Nothing ZFS 
> related.

...

> NFS is plenty fast in a throughput context (not that it does 
> not need work). The complaints we have here are about single 
> threaded code.

ok, then it's "just" a single thread client latency of request issue, which
(as increasingly often) software vendors need to realize. The proper way to
deal with this, then, is to multi-thread on the application layer.

Reminds my of many UltraSPARC T1 issues, which don't sit in hardware nor
OS, but in the way applications have been developed for years :).

Best regards,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Client Solutionshttp://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] poor NFS/ZFS performance

2006-11-23 Thread Constantin Gonzalez
Hi,

I haven't followed all the details in this discussion, but it seems to me
that it all breaks down to:

- NFS on ZFS is slow due to NFS being very conservative when sending
  ACK to clients only after writes have definitely committed to disk.

- Therefore, the problem is not that much ZFS specific, it's just a
  conscious focus on data correctness vs. speed on ZFS/NFS' part.

- Currently known workarounds include:

  - Sacrifice correctness for speed by disabling ZIL or using a less
conservative network file system.

  - Optimize NFS/ZFS to get as much speed as possible within the constraints
of the NFS protocol.

But one aspect I haven't seen so far is: How can we optimize ZFS on a more
hardware oriented level to both achieve good NFS speeds and still preserve
the NFS level of correctness?

One possibility might be to give the ZFS pool enough spindles so it can
comfortably handle many small IOs fast enough for them not to become
NFS commit bottlenecks. This may require some tweaking on the ZFS side so
it doesn't queue up write IOs for too long as to not delay commits more than
necessary.

Has anyone investigated this branch or am I too simplistic in my view of the
underlying root of the problem?

Best regards,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Client Solutionshttp://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Newbie questions about drive problems

2006-08-31 Thread Constantin Gonzalez
Hi,

> I have 3 drives.
> The first one will be the primary/boot drive under UFS. The 2 others will 
> become a mirrored pool with ZFS.
> Now, I have problem with the boot drive (hardware or software), so all the 
> data on my mirrored pool are ok?
> How can I restore this pool? When I create the pool, do I need to save the 
> properties?

All metadata for the pool is stored inside the pool. If the boot disk fails in
any way, all pool data is safe.

Worst case might be that you have to reinstall everything on the boot disk.
After that, you just say "zfs import" to get your pool back and everything
will be ok.

> What happend when a drive crash when ZFS write some data on a raidz pool?

If the crash occurs in the middle of a write operation, then the new data
blocks will not be valid. ZFS will then revert back to the state before
writing the new set of blocks. Therefore you'll have 100% data integrity
but of course the new blocks that were written to the pool will be lost.

> Do the pool go to the degraded state or faulted state?

No, the pool will come up as online. The degraded state is only for devices
that aren't accessible any more and the faulted state is for pools that do
not have enough valid devices to be complete.

Hope this helps,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Client Solutionshttp://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Encryption on ZFS / Disk Usage

2006-08-22 Thread Constantin Gonzalez
Hi,

Thomas Deutsch wrote:
> Hi
> 
> I'm thinking about to change from Linux/Softwareraid to
> OpenSolaris/ZFS. During this, I've got some (probably stupid)
> questions:

don't worry, there are no stupid questions :).

> 1. Is ZFS able to encrypt all the data? If yes: How safe is this
> encryption? I'm currently using dm-crypt on linux for doing this.

Encryption for ZFS is a planned feature, but not available now. See:

  http://www.opensolaris.org/os/project/zfs-crypto/

Another project is an encrypted loopback device called xlofi which can
be used on top of ZFS:

  http://www.opensolaris.org/os/community/security/projects/xlofi/

I understand that both approaches are independent of the encryption
mechanism, so one would be free to choose a suitably safe cypher that
is supported by the Solaris Cryptographic Framework.

> 2. How big is the usable diskspace? I know that a rai5 is using the
> space of one disk for parity informations. A raid5 with four disk of
> 300GB has 900GB Space. How is it with ZFS? How much space do I have in
> this case?

ZFS' RAIDZ1 uses one parity disk per RAIDZ set, similarly to RAID-5.
ZFS' RAIDZ2 uses two parity disks per RAIDZ set.

So, the amount of usable space can be computed by number of disks per
RAIDZ set minus 1 or 2 depending on the algorithm times the minimum
capacity per disk. Same calculation as with traditional RAID.

But there are advantages for RAIDZ over traditional RAID-5:

- No RAID-5 write hole.
- Better performance through serialization of write requests.
- Better performance through eliminating the need for read-modify-write.
- Better data integrity through end-to-end checksums.
- Faster re-syncing of replaced disks since you only need to recreate
  used blocks.
- Compression can be easily switched on for some extra space
  depending of the nature of the data.

See also:

  http://blogs.sun.com/roller/page/bonwick?entry=raid_z

Best regards,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Client Solutionshttp://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Load-balancing over vdevs vs. real disks?

2006-08-22 Thread Constantin Gonzalez
Hi Eric,

>> This means that we have one pool with 3 vdevs that access up to 3
>> different
>> sliced on the same physical disk.

minor correction: 1 pool, 3 vdevs, 3 slices per disk on 4 disks.

>> Question: Does ZFS consider the underlying physical disks when
>> load-balancing
>> or does it only load-balance across vdevs thereby potentially overloading
>> physical disks with up to 3 parallel requests per physical disk at once?
> 
> ZFS only does dynamic striping across the (top-level) vdevs.
> 
> I understand why you setup your pool that way, but ZFS really likes
> whole disks instead of slices.

ok, understood. When I run out of storage, I'll try to get 4 cheap SATA
drives of equal size and migrate all over.

> Trying to interpret that the devices are really slices and part of other
> vdevds seems overly complicated for the gain achieved.

So what data does ZFS base it's dynamic stripig on? Does it count IOPs per
vdev or does it try to sense the load on the vdevs by measuring, say response
times, queue leghts etc.?

Best regards,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Client Solutionshttp://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS Load-balancing over vdevs vs. real disks?

2006-08-21 Thread Constantin Gonzalez
Hi,

my ZFS pool for my home server is a bit unusual:

pool: pelotillehue
 state: ONLINE
 scrub: scrub completed with 0 errors on Mon Aug 21 06:10:13 2006
config:

NAMESTATE READ WRITE CKSUM
pelotillehue  ONLINE   0 0 0
  mirrorONLINE   0 0 0
c0d1s5  ONLINE   0 0 0
c1d0s5  ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c0d0s3  ONLINE   0 0 0
c0d1s3  ONLINE   0 0 0
c1d0s3  ONLINE   0 0 0
c1d1s3  ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c0d1s4  ONLINE   0 0 0
c1d0s4  ONLINE   0 0 0
c1d1s4  ONLINE   0 0 0

The reason is simple: I have 4 differently-sized disks (80, 80, 200, 250 GB.
It's a home server and so I crammed whatever I could find elswhere into that box
:) ) and my goal was to create the biggest pool possible but retaining some
level of redundancy.

The above config therefore groups the biggest slices that can be created on all
four disks into the 4-disk RAID-Z vdev, then the biggest slices that can be
created on 3 disks into the 3-disk RAID-Z, then two large slices remain which
are mirrored. It's like playing Tetris with disk slices... But the pool can
tolerate 1 broken disk and it gave me maximum storage capacity, so be it.

This means that we have one pool with 3 vdevs that access up to 3 different
sliced on the same physical disk.

Question: Does ZFS consider the underlying physical disks when load-balancing
or does it only load-balance across vdevs thereby potentially overloading
physical disks with up to 3 parallel requests per physical disk at once?

I'm pretty sure ZFS is very intelligent and will do the right thing, but a
confirmation would be nice here.

Best regards,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Client Solutionshttp://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Home Server with ZFS

2006-08-18 Thread Constantin Gonzalez Schmitz
Hi,

>> What i dont know is what happens if the boot disk dies? can i replace
>> is, install solaris again and get it to see the zfs mirror?
> 
> As I understand it, this be possible, but I haven't tried it and I'm
> not an expert Solaris admin.  Some ZFS info is stored in a persistent
> file on your system disk, and you may have to do a little dance to get
> around that.  It's worth researching and practicing in advance :-).

IIRC, then ZFS has all relevant information stored inside the pool. So you
should be able to install a new OS into the replacement disk, then say
"zpool import" (possibly with -d and the devices where the mirror lives)
to re-import the pool.

But I haven't really tried it myself :).

All in all, ZFS is an excellent choice for a home server. I use ZFS as a video
storage for a digital set top box (quotas are really handy here), as a storage
for my music collection, as a backup storage for important data (including
photos), etc.

I'm currently juggling around 4 differently sized disks into a new config
with the goal of getting as much storage as possible out of them at a minimum
level of redundance. Interesting, Teris-like calculation exercise that I'd be
happy to blog about when I'm done.

Feel free to visit my blog for how to set up your home server as a ZFS iTunes
streaming server :).

Best regards,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Client Solutionshttp://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Proposal: user-defined properties

2006-08-18 Thread Constantin Gonzalez Schmitz
Hi Eric,

this is a great proposal and I'm sure this is going to help administrators
a lot.

One small question below:

> Any property which contains a colon (':') is defined as a 'user
> property'.  The name can contain alphanumeric characters, plus the
> following special characters: ':', '-', '.', '_'.  User properties are
> always strings, and are always inherited.  No additional validation is
> done on the contents.  Properties are set and retrieved through the
> standard mechanisms: 'zfs set', 'zfs get', and 'zfs inherit'.

>   # zfs list -o name,local:department
>   NAME  LOCAL:DEPARTMENT
>   test  12345
>   test/foo  12345
>   # zfs set local:department=67890 test/foo
>   # zfs inherit local:department test
>   # zfs get -s local -r all test 
>   NAME  PROPERTY  VALUE  SOURCE
>   test/foo  local:department  12345  local
>   # zfs list -o name,local:department
>   NAME  LOCAL:DEPARTMENT
>   test  -
>   test/foo  12345

the example suggests that properties may be case-insensitive. Is that the case
(sorry for the pun)? If so, that should be noted in the user defined property
definition just for clarity.

Best regards,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Client Solutionshttp://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Enabling compression/encryption on a populated filesystem

2006-07-18 Thread Constantin Gonzalez
Hi,

there might be value in a "zpool scrub -r (as in "re-write blocks") other than
the prior discussion on encryption and compression.

For instance, a bit that is just about to rot might not be detected with a
regular zfs scrub but it would be rewritten with a re-writing scrub.

It would also exercise the writing "muscles" on disks that don't see a lot of
writing, such as archives or system disks, thereby detecting any degradation
that affects writing of data.

Of course the re-writing must be 100% safe, but that can be done with COW
quite easily.

Then, admins would for instance run a "zpool scrub" every week and maybe a
"zpool scrub -r" every month or so.

Just my 2 cents,
  Constantin


Luke Scharf wrote:
> Darren J Moffat wrote:
>> Buth the reason thing is how do you tell the admin "its done now the
>> filesystem is safe".   With compression you don't generally care if
>> some old stuff didn't compress (and with the current implementation it
>> has to compress a certain amount or it gets written uncompressed
>> anyway).  With encryption the human admin really needs to be told. 
> As a sysadmin, I'd be happy with another scrub-type command.  Something
> with the following meaning:
> 
> "Reapply all block-level properties such as compression, encryption,
> and checksum to every block in the volume.  Have the admin come back
> tomorrow and run 'zpool status' too see if it's zone." 
> 
> Mad props if I can do this on a live filesystem (like the other ZFS
> commands, which also get mad props for being good tools).
> 
> A natural command for this would be something like "zfs blockscrub
> tank/volume".  Also, "zpool blockscrub tank" would make sense to me as
> well, even though it might touch more data.
> 
> Of course, it's easy for me to just say this, since I'm not thinking
> about the implementation very deeply...
> 
> -Luke
> 
> 
> ----
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Client Solutionshttp://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [raidz] file not removed: No space left on device

2006-07-04 Thread Constantin Gonzalez
Hi Eric,

Eric Schrock wrote:
> You don't need to grow the pool.  You should always be able truncate the
> file without consuming more space, provided you don't have snapshots.
> Mark has a set of fixes in testing which do a much better job of
> estimating space, allowing us to always unlink files in full pools
> (provided there are no snapshots, of course).  This provides much more
> logical behavior by reserving some extra slop.

is this a planned and not yet implemented functionality or why did Tatjana
see the "not able to rm" behaviour?

Or should she use unlink(1M) in these cases?

Best regards,
   Constantin

> 
> - Eric
> 
> On Mon, Jul 03, 2006 at 02:23:06PM +0200, Constantin Gonzalez wrote:
>> Hi,
>>
>> of course, the reason for this is the copy-on-write approach: ZFS has
>> to write new blocks first before the modification of the FS structure
>> can reflect the state with the deleted blocks removed.
>>
>> The only way out of this is of course to grow the pool. Once ZFS learns
>> how to free up vdevs this may become a better solution because you can then
>> shrink the pool again after the rming.
>>
>> I expect many customers to run into similar problems and I've already gotten
>> a number of "what if the pool is full" questions. My answer has always been
>> "No file system should be used up more than 90% for a number of reasons", but
>> in practice this is hard to ensure.
>>
>> Perhaps this is a good opportunity for an RFE: ZFS should reserve enough
>> blocks in a pool in order to always be able to rm and destroy stuff.
>>
>> Best regards,
>>Constantin
>>
>> P.S.: Most US Sun employees are on vacation this week, so don't be alarmed
>> if the really good answers take some time :).
> 
> --
> Eric Schrock, Solaris Kernel Development   http://blogs.sun.com/eschrock

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Client Solutionshttp://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [raidz] file not removed: No space left on device

2006-07-03 Thread Constantin Gonzalez
Hi,

of course, the reason for this is the copy-on-write approach: ZFS has
to write new blocks first before the modification of the FS structure
can reflect the state with the deleted blocks removed.

The only way out of this is of course to grow the pool. Once ZFS learns
how to free up vdevs this may become a better solution because you can then
shrink the pool again after the rming.

I expect many customers to run into similar problems and I've already gotten
a number of "what if the pool is full" questions. My answer has always been
"No file system should be used up more than 90% for a number of reasons", but
in practice this is hard to ensure.

Perhaps this is a good opportunity for an RFE: ZFS should reserve enough
blocks in a pool in order to always be able to rm and destroy stuff.

Best regards,
   Constantin

P.S.: Most US Sun employees are on vacation this week, so don't be alarmed
if the really good answers take some time :).

Tatjana S Heuser wrote:
> On a system still running nv_30, I've a small RaidZ filled to the brim:
> 
> 2 3 [EMAIL PROTECTED] pts/9 ~ 78# uname -a
> SunOS mir 5.11 snv_30 sun4u sparc SUNW,UltraAX-MP
> 
> 0 3 [EMAIL PROTECTED] pts/9 ~ 50# zfs list
> NAME   USED  AVAIL  REFER  MOUNTPOINT
> mirpool1  33.6G  0   137K  /mirpool1
> mirpool1/home 12.3G  0  12.3G  /export/home
> mirpool1/install  12.9G  0  12.9G  /export/install
> mirpool1/local1.86G  0  1.86G  /usr/local
> mirpool1/opt  4.76G  0  4.76G  /opt
> mirpool1/sfw   752M  0   752M  /usr/sfw
> 
> Trying to free some space is meeting a lot of reluctance, though:
> 0 3 [EMAIL PROTECTED] pts/9 ~ 51# rm debug.log 
> rm: debug.log not removed: No space left on device
> 0 3 [EMAIL PROTECTED] pts/9 ~ 55# rm -f debug.log
> 2 3 [EMAIL PROTECTED] pts/9 ~ 56# ls -l debug.log 
> -rw-r--r--   1 th12242027048 Jun 29 23:24 debug.log
> 0 3 [EMAIL PROTECTED] pts/9 ~ 58# :> debug.log 
> debug.log: No space left on device.
> 0 3 [EMAIL PROTECTED] pts/9 ~ 63# ls -l debug.log
> -rw-r--r--   1 th12242027048 Jun 29 23:24 debug.log
> 
> There are no snapshots, so removing/clearing the files /should/ 
> be a way to free some space there.
> 
> Of course this is the same filesystem where zdb dumps core 
> - see:
> 
> *Synopsis*: zdb dumps core - bad checksum
> http://bt2ws.central.sun.com/CrPrint?id=6437157
> *Change Request ID*: 6437157
> 
> (zpool reports the RaidZ pool as healthy while
> zdb crashes with a 'bad checksum' message.)
>  
>  
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Client Solutionshttp://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] add_install_client and ZFS and SMF incompatibility

2006-06-23 Thread Constantin Gonzalez
Hi,

I just set up an install server on my notebook and of course all the installer
data is on a ZFS volume. I love the "zfs compression=on" command!

It seems that the standard ./add_install_client script from the S10U2 Tools
directory creates an entry in /etc/vfstab for a loopback mount of the Solaris
miniroot into the /tftpboot directory.

Unfortunately, at boot time (I'm using Nevada build 39), the mount_all
script tries to mount the loopback mount from /vfstab before ZFS gets its
filesystems mounted.

So the SMF filesystem/local method fails and I have to either mount all ZFS
filesystems from hand, then re-run mount_all or replace the vfstab entry with
a simple symlink. Which only works until you say add_install_client the next
time.

Is this a known issue?

Best regards,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Client Solutionshttp://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS and Flash archives

2006-06-20 Thread Constantin Gonzalez
Hi,

I'm currently setting up a demo machine. It would be nice to set up everything
the way I like it, including a number of ZFS filesystems, then create a flash
archive, then install from that archive.

Will there be any issues with webstart flash and ZFS? Does flar create need
to be ZFS aware and if so, is it ZFS aware in S10u2b09a?

Best regards,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Client Solutionshttp://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] user undo

2006-05-30 Thread Constantin Gonzalez Schmitz

Hi,

so we have two questions:

1. Is it really ZFS' job to provide an undo functionality?

2. If it turns out to be a feature that needs to be implemented by
   ZFS, what is the better approach: Snapshot based or file-based?

My personal opinion on 1) is:

- The purpose of any Undo-like action is to provide a safety net to the user
  in case she commits an error that she wants to undo.

- So, it depends on how we define "user" here. If by user we mean your regular
  file system user with a GUI etc., then of course it's a matter of the
  application.

- But if user=sysadmin, I guess a more fundamental way of implementing "undo" is
  in order. We could either restrict the undo functionality to some admin
  interface and force admins to use just that, then it would still be a feature
  that the admin interface needs to implement.

  But in order to save all admins from shooting themselves into their knees, the
  best way would be to provide an admin-savvy safety net.

- Now, coming from the other side, ZFS provides a nice and elegant way of
  implementing snapshots. That's where I count 1+1: If ZFS knew how to do
  snapshots right before any significant administrator or user action and if
  ZFS had a way of managing those snapshots so admins and users could easily
  undo any action (including zfs destroy, zpool destroy, or just rm -rf /*),
  then the benefit/investment ratio for implementing such a feature should
  be extremely interesting.

One more step towards a truly foolproof filesystem.

But: If it turns out that providing an undo function via snapshots is not
possible/elegantly feasible/cheap or if there's any significant roadblock that
prevents ZFS from providing an undo feature in an elegant way, then it might not
be a good idea after all and we should just forget it.

So I guess it boils down to: Can the ZFS framework be used to implement an undo
feature much more elegantly than your classic filemanager while extending the 
range of undo customers to even the CLI based admin?


Best regards,
   Constantin

Erik Trimble wrote:
Once again, I hate to be a harpy on this one, but are we really 
convinced that having a "undo" (I'm going to call is RecycleBin from now 
on) function for file deletion built into ZFS is a good thing?


Since I've seen nothing to the contrary, I'm assuming that we're doing 
this by changing the actual effects of an "unlink(2)" sys lib call 
against a file in ZFS, and having some other library call added to take 
care of actual deletion.


Even with it being a ZFS option parameter, I can see s many places 
that it breaks assumptions and causes problems that I can't think it's a 
good thing to blindly turn on for everything.


And, I've still not seen a good rebuttal to the idea of moving this up 
to the Application level, and using a new library to implements the 
functionality (and requires Apps to specifically (and explicitly) 
support RecycleBin in the design).




You will notice that Windows does this.  The Recycle Bin is usable from 
within Windows Explorer, but if you use "del" from a command prompt, it 
actually deletes the file.  I see no reason why we shouldn't support the 
same functionality (i.e. RecycleBin from within Nautilus (as it already 
does), and true deletion via "rm").




-Erik



--
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Client Solutionshttp://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Backup/Restore of ZFS Properties

2006-05-30 Thread Constantin Gonzalez Schmitz

Hi,


Yes, a trivial wrapper could:
1. Store all property values in a file in the fs
2. zfs send...
3. zfs receive...
4. Set all the properties stored in that file


IMHO 3. and 4. need to be swapped - otherwise e.g. files will
not be compressed when restored.


hmm, I assumed that the ZFS stream format would take the blocks as they are
(compressed) and then restore them in a 1:1 fashion (compressed) no matter
what the target fs' compression setting is.

Then, the missing compression attribute would only affect new files, while old
files are still compressed (just like ZFS doesn't unpack everything if you just
turn off compression).

Can anybody clarify?

Best regards,
  Constantin

--
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Client Solutionshttp://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] A .zfs/info file

2006-05-29 Thread Constantin Gonzalez

Hi,

Darren J Moffat wrote:

Over coffee with a colleague (cc'd) we were talking about the problem of
taking advantage of ZFS over NFS (or CIFS) from a non Solaris machine.

We already have the .zfs/snapshot dir and this is great.  One of the
other areas was knowing the settings on your data set are.  So enter
.zfs/info which would be an ascii representation of the information from
`zfs get all`.

I can see some problems with this, and it reminds me a little too much
of what happened to /proc on Linux and so a bit uncomfortable about
suggesting.


I share the uncomfort with the /proc analogy.

But Wes' scripting approach seems to be just fine for me. The timestamping
would communicate the SLA of "just a script" versus the magically hacked
nature of pseudo-files.

But being able to poll data out of ZFS over NFS is probably just a minor
issue. In germany, we say: "Give 'em the little finger an they'll want the
whole hand".

So, I assume the next thing a ZFS-over-NFS user would want is to change stuff
over NFS which would then become difficult. Resisting the pseudo-files-as-a-
broken-API-for-changing-stuff urge might then be even more appropriate.

Best regards,
  Constantin

--
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Client Solutionshttp://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] user undo

2006-05-29 Thread Constantin Gonzalez

Hi,

the current discussion on how to implement "undo" seems to circulate around
concepts and tweaks for replacing any "rm" like action with "mv" and then
fix the problems associated with namespaces, ACLs etc.

Why not use snapshots?

A snapshot-oriented implementation of undo would:

- Create a snapshot of the FS whenever anything is attempted that someone
  might want to undo. This could be done even at the most fundamental level
  (i.e. before any "zpool" or "zfs" command, where the potential damage to
  be undone is biggest).

- The undo-feature would then exchange the live FS with the snapshot taken
  prior to the revoked action. Just tweak one or two pointers and the undo
  is done.

- This would transparently work with any app, user action, even admin action,
  depending on where the snapshotting code would be hooked up to.

- As an alternative to undo, the user can browse the .zfs hierarchy in search
  of that small file which got lost in an rm -rf orgy without having to restore
  the snapshot with all the other unwanted files.

- When ZFS wants to reclaim blocks, it would start deleting the oldest
  undo-snapshots.

- To separate undo-snapshots from user-triggered ones, the undo-code could
  place its snapshots in .zfs/snapshots/undo .

Did I miss something why undo can't be implemented with snapshots?

Best regards,
   Constantin

--
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Client Solutionshttp://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Snapshot management proposal idea

2006-05-09 Thread Constantin Gonzalez

Hi Tim,

thank you for your comments.

Bringing in SMF is an excellent idea and should make what admins like to
do much more elegant.

I guess the question here is to find out:

- What degree of canned functionality is needed to address 80% of every admin's
  need.

- Who should provide the functionality: The ZFS core team, a community of
  people (inside OpenSolaris or outside) or some person/group who take this
  as their personal project and maybe publishes it.

We're probably more than half way through the first. Maybe it's time to create
a project under the opensolaris community to nail down the second and start
thinking of what functionality is really needed to be inside ZFS to make our
life with integrating SMF support for managing snapshots easier.

I'm not sure user-defined properties in ZFS is the solution. Depending on the
apps, scripts and users using such properties, pools, filesystems and snapshots
can easily be polluted with lots of properties that someone thought might be
useful. Perhaps it may be more beneficial to go through the process of 
discussing and deciding which set of properties would make snapshot management

much easier, then go with it.

Best regards,
   Constantin

Tim Foster wrote:

Hey Constantin,

On Mon, 2006-05-08 at 11:37 +0200, Constantin Gonzalez wrote: 

I took the liberty of renaming the thread to $SUBJECT because I think that what
we really are looking for is an ability for ZFS to automatically manage snapshot
after they have been created.


Wow, nice summary of the problem!

I thought I'd add a few ideas into the fray.

Here's what I was thinking could be fairly quick to implement, which
administrators could build upon if necessary.

I believe we certainly should provide a way for users to schedule
automatic snapshots, but probably not build it into the filesystem
itself (imho - ZFS is a filesystem dammit Jim, not a backup solution!)

This could be easily implemented via a set of SMF instances which
create/destroy cron jobs which would themselves call a simple script
responsible for taking the snapshots.

Of course, this isn't as flexible as an administrator writing their own
scripts, but it could be enough for most users, with those that want
more functionality being able to build on this functionality.

So, it's not as intelligent as the daemon Bill was suggesting, we
wouldn't poll the FS to reap snapshots when space is limited. For that
functionality, I'd hope for an as-yet-nonexistent ZFS FMA event to
report that some pools are getting short on space, which could be the
trigger for deleting these auto-snapshots if necessary  (I'd also
imagine lots of other things would be interested in keying off such an
event as well...)


The service that we could have for taking auto snapshots could be called

/system/filesystem/zfs/auto-snapshot

We'd have one instance per set of automatic snapshots taken. Which isn't
to say we need one instance per filesystem, as we could define instances
that snapshot all child filesystems contained this top level filesystem.

  /system/filesystem/zfs/auto-snapshot:[fs-name]

The properties we'd have for each instance are:

- interval = minutes | hours | days | months | years
- period = take snapshots every how many [interval]s
- keep = number of snapshots to keep before rolling over 
 (delete the oldest when we hit the limit)

- offset = # seconds into the start of the period
at which we take the snapshot ( < period * interval)
- snapshot-children = true | false

Here's some examples of SMF instances that would implement
auto-snapshots.

The following instance takes a snapshot every 4 days, at 01:00, keeping
30 snapshots into the past :

  /system/filesystem/zfs/auto-snapshot:tank
   interval = days
 period = 4
   keep = 30
 offset = (60 * 60)
  snapshot-children = false
  


This instance takes a weekly snapshot, keeping the last two, and will
snapshot all children of export/home/timf[1] :

  /system/filesystem/zfs/auto-snapshot:export/home/timf
   interval = days
 period = 7
   keep = 2
 offset = 0
  snapshot-children = true


Essentially, I'm really just suggesting a glorified interface to cron,
so why not just use cron ? Well, I suspect having a service like this
would be easier to manage than a heap of cron jobs : at a glance, I can
tell which auto-snapshots are being taken, when and how. I also like the
idea of tieing into SMF, since that means other options, like the GUI
interfaces in the Visual Panels project, may become available in the
future.

Anyway, that's what I was thinking of (and it wouldn't be too hard to
implement)  I've no doubt this could be refined - but does anyone think
this is the right direction to go in ?

cheers,
tim

[1] I'm not yet sure if SMF instanc

[zfs-discuss] ZFS Snapshot management proposal idea (was: Properties of ZFS snapshots I'd like to see...)

2006-05-08 Thread Constantin Gonzalez

Hi,

thank you for the excellent comments, thoughts and ideas on the
"Properties of ZFS snapshots I'd like to see..." thread.

I took the liberty of renaming the thread to $SUBJECT because I think that what
we really are looking for is an ability for ZFS to automatically manage snapshot
after they have been created.

To summarize:

- "Snapshot management in ZFS" can be defined as an automatic way of:

  - Using free space on disk for automatically or manually generated snapshots
but giving priority to new data on the disk at the expense of destroying
old snapshots that are considered less useful.

  - Implementing policies that decide which snapshot to keep even at the
cost of not having enough space for new data. Possible policies include:

- Prioritize recent snapshots over older one.
  (Assuming that the older the snapshot, the less the user cares)

- Prioritize older snapshots over recent ones.
  (Assuming, the older the snapshots, the more errors can be corrected)

- Any combination of the above. (I.e. keep at least one yearly, one
  monthly, one weekly and one daily snapshot).

  - Giving users the possibility to decide what policies to apply to snapshots
when created.

  - Giving users the possibility to configure automatic snapshots at regular
intervals (similar to NetApps).

  - Automatically snapshotting a pool or a filesystem before any administrative
action in order to facilitate a "zfs undo" functionality.

- Much or all of the above can be implemented today with user- or admin-level
  scripts. The question therefore is whether this should be incorporated into
  ZFS or not. Here are pros and cons:

  Pros:
  - Make it easier for users and admins to enjoy the benefits of snapshots
without having to write scripts.

  - Make advanced functionality available to users and admins that would take
a lot of complex scripting and therefore can be implemented more elegantly
inside zfs than outside zfs. ("zfs undo" and free space management for
instance).

  - Reduce the risk of user and admin errors when scripting by providing a
single point of development for a critical functionality.
(Example: dealing with different time standards is non-trivial; scripts
may be less robust than OS-level code, etc.).

  Cons:
  - ZFS is a file system, not a backup management system. Leave that to the
application and 3rd party vendors.

  - Deleting snapshots is a difficult question and each user/admin/site may
have very different policies about when to delete them and when not. This
makes a one-size-fits all approach either insufficient or not generic
enough for all users to be really useful.

Feel free to add to the lists so we can make up our minds. Maybe this can
evolve into something the ZFS team may be interested in.

Best regards,
   Constantin

--
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Client Solutionshttp://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Properties of ZFS snapshots I'd like to see...

2006-05-05 Thread Constantin Gonzalez

Hi Wes,

Wes Williams wrote:

Interesting idea Constantin.

However, perhaps instead of or in addition to your idea, I'd like to have a
mechanism or script that would overwrite the older snapshots [u]only if[/u]
some more current snapshot were created.  Ideally this mechanism would
prevent your idea of expired snapshots being removed in some case where the
creation of new snapshots somehow failed.


yeah, that could be another pair of snapshot properties: number of snapshots to
minimally/maximally keep.


Additionally, by only removing the snapshots after the creation of their
replacement is successful, this should prevent the possibility of data loss
if there were a major system problem during the creation of a new snapshot as
well.


Yes, snapshot replacement should always be split up into creating a successor,
see if it was successful, then delete the old one.

Best regards,
   Constantin

--
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Client Solutionshttp://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Properties of ZFS snapshots I'd like to see...

2006-05-05 Thread Constantin Gonzalez

Hi Al,


1) But is this something that belongs in ZFS or is this a backup/restore
type tool that is simply a "user" of zfs?


...


Again - this looks like an operational backup/restore policy.  Not a ZFS
function.


So the question is: Is advanced management of snapshots (aging, expiring,
etc.) something left to the domain of a ZFS user (backup/restore application,
administrator, script) or should these concepts be adopted by ZFS as a
filesystem (which BTW is already much more).

IMHO, backup/restore is much more than playing with snapshots. The dividing
line starts when you copy your data to a different medium. As soon as the
data stays on the disk, I wouldn't say it's backup/restore related. As long
as it's just snapshots, it should be definitely not be called "backup/restore".

But you're right in that my desired functionality can "easily" be implemented
with scripts. Then I would still argue for including this functionality as
part of the ZFS user interface, because of ease of use and minimization of
possible errors for the administrator. If it ain't simple to use, chances
are that people won't. Same goes for snapshots: If admins don't have a really
easy way to get rid of them, chances are they will use them less.

Another point of view might be ease of implementation. A few person-months spent
at Sun (or the OpenSolaris developer community) might come up with a more
robust, clean, efficient, bug-free, elegant way of achieving the task of
snapshot management than millions of person-months in many admins creating
scripts and re-inventing wheels that may be half-baked.

But yes, it is a matter of interpretation who should take care of managing
snapshots after they've been created, ZFS or some application/script/user
action.


Thinking further, ZFS could start doing automatic snapshots (invisible from
the user) by just keeping every uber-block at each interval. Then, when the
admin panics, ZFS could say "hmm, here's a couple of leftover snapshots
that happen to still exist because you had a lot space left on the disks
that you may find useful".


Now you're describing a form of filesystem snapshotting function that
might have to be closely integrated with zfs.  This is in addition to the
other data replication features that are already in the pipeline for zfs.


Yes, this is when the above discussed features definitively cross the line
towards ZFS' responsibilities.

Actually, it would be cool if ZFS took a hidden snapshot each time a zfs or
zpool command is issued. Then an admin could say "zfs undo" after she/he
discovered that she/he just did a horrible mistake.


The basic idea behind this whole thinking is to maximize utilization of free
blocks. If your disk utilization is only 50%, why not use the other 50% for
snapshots by default, that could save your life?


IMHO the majority of the functionality you're describing belongs in a
backup/restore tool that is simply a consumer of zfs functionality.  And
this functionality could be easily scripted using your scripting tool of
choice.


yes and no, depending on the interpretation. The potential of having a
"zfs undo" subcommand and the automatic exploitation of free space on disk
for keeping snapshots as part of overall snapshot management are definitely
something that ZFS can do much better internally, as opposed to having to
implement it with some other app.

Best regards,
   Constantin

--
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Client Solutionshttp://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Properties of ZFS snapshots I'd like to see...

2006-05-05 Thread Constantin Gonzalez

Hi,

(apologies if this has been discussed before, I hope not)

while setting up a script at home to do automatic snapshots, a number of
wishes popped into my mind:

The basic problem with regular snapshotting is that you end up managing
so many of them. Wouldn't it be nice if you could assign an expiration date
to a snapshot?

For instance:

  zfs snapshot -e 3d tank/[EMAIL PROTECTED]

would create a regular snapshot with an expiration date of 3 days from the
date it was created.

You could then change the expiration date with zfs set if you want to keep
it longer. "0" would mean no expiration date and so on.

Then, ZFS would be free to destroy the snapshot to free up space, but only if
it must: Just like the yogurt in your fridge, you may or may not be able to eat
it after the best before date, but you are guaranteed to be able to eat it
(or sue the yogurt company) if it's inside the best before date.

Another property could control the rigidness of this policy: Hard expiration
would destroy the snapshot as soon as the expiry time arrives, soft
expiration would work like the yogurt example above.

The benefits of this approach would be ease of complexity: Imagine you do
a snapshot every week, then you'll have 52 snapshots by the end of one year.
This means that sysadmins will start writing scripts to automatically
delete snapshots they don't need (I'm about to do just that) at the risk
of deleting the wrong snapshot. Or, they won't because it takes too much
thinking (you really want to make that script really robust).

Another set of expiration related properties could allow for more complex
snapshot management:

- Multiple layers of snapshots: Keep one Yearly, one monthly, one weekly
  and the snapshot from yesterday always available.

- Multiple priorities: Assign priorities to snapshots so less important
  ones get destroyed first.

- Specify date ranges to destroy/modify attributes on multiple snapshots at
  once.

Is this something we're already looking at or should we start looking at
this as an RFE?

Thinking further, ZFS could start doing automatic snapshots (invisible from
the user) by just keeping every uber-block at each interval. Then, when the
admin panics, ZFS could say "hmm, here's a couple of leftover snapshots
that happen to still exist because you had a lot space left on the disks
that you may find useful".

The basic idea behind this whole thinking is to maximize utilization of free
blocks. If your disk utilization is only 50%, why not use the other 50% for
snapshots by default, that could save your life?

Best regards,
   Constantin

--
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Client Solutionshttp://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS: More information on ditto blocks?

2006-05-05 Thread Constantin Gonzalez

Hi,

(apologies if this was discussed before, I _did_ some research, but this
one may have slipped for me...)

Looking through the current Sun ZFS Technical presentation, I found a ZFS
feature that was new to me: Ditto Blocks.

In search of more information, I asked Google but there seem to be no real
information other than the source code on Ditto Blocks.

From the Ditto Block slide, I conclude that:

- ZFS blocks can have multiple copies (up to 3), even on the same disk, but
  preferably on multiple disks, if possible.

- The uber-block has an additional 3 copies (we already knew that)

- The ZFS metadata structure has 2 or more copies (that was new to me)

- In the future, users will be able to ask for multiple copies of their
  data (wow, what a great feature for laptop users with big, but single
  disks!)

Can someone elaborate more on ditto blocks? Perhaps that would be a great
blog entry (Google didn't find anything for "site:blogs.sun.com zfs ditto
blocks").

In particular:

- Are regular data blocks multiplied by default if the disk isn't mirrored/
  raid-z'ed and there's enough space?

- What are the general rules on what blocks get multiplied how often?

Best regards,
   Constantin

--
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Client Solutionshttp://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss