Re: [zfs-discuss] Deduplication Memory Requirements

2011-05-05 Thread Constantin Gonzalez

Hi,

On 05/ 5/11 03:02 PM, Edward Ned Harvey wrote:

From: Garrett D'Amore [mailto:garr...@nexenta.com]

We have customers using dedup with lots of vm images... in one extreme
case they are getting dedup ratios of over 200:1!


I assume you're talking about a situation where there is an initial VM image, 
and then to clone the machine, the customers copy the VM, correct?
If that is correct, have you considered ZFS cloning instead?

When I said dedup wasn't good for VM's, what I'm talking about is:  If there is data 
inside the VM which is cloned...  For example if somebody logs into the guest OS and then 
does a cp operation...  Then dedup of the host is unlikely to be able to 
recognize that data as cloned data inside the virtual disk.


ZFS cloning and ZFS dedup are solving two problems that are related, but
different:

- Through Cloning, a lot of space can be saved in situations where it is
  known beforehand that data is going to be used multiple times from multiple
  different views. Virtualization is a perfect example of this.

- Through Dedup, space can be saved in situations where the duplicate nature
  of data is not known, or not known beforehand. Again, in virtualization
  scenarios, this could be common modifications to VM images that are
  performed multiple times, but not anticipated, such as extra software,
  OS patches, or simply man users saving the same files to their local
  desktops.

To go back to the cp example: If someone logs into a VM that is backed by
ZFS with dedup enabled, then copies a file, the extra space that the file will
take will be minimal. The act of copying the file will break down into a
series of blocks that will be recognized as duplicate blocks.

This is completely independent of the clone nature of the underlying VM's
backing store.

But I agree that the biggest savings are to be expected from cloning first,
as they typically translate into n GB (for the base image) x # of users,
which is a _lot_.

Dedup is still the icing on the cake for all those data blocks that were
unforeseen. And that can be a lot, too, as everone who has seen cluttered
desktops full of downloaded files can probably confirm.


Cheers,
   Constantin


--

Constantin Gonzalez Schmitz, Sales Consultant,
Oracle Hardware Presales Germany
Phone: +49 89 460 08 25 91  | Mobile: +49 172 834 90 30
Blog: http://constantin.glez.de/| Twitter: zalez

ORACLE Deutschland B.V.  Co. KG, Sonnenallee 1, 85551 Kirchheim-Heimstetten

ORACLE Deutschland B.V.  Co. KG
Hauptverwaltung: Riesstraße 25, D-80992 München
Registergericht: Amtsgericht München, HRA 95603

Komplementärin: ORACLE Deutschland Verwaltung B.V.
Hertogswetering 163/167, 3543 AS Utrecht
Handelsregister der Handelskammer Midden-Niederlande, Nr. 30143697
Geschäftsführer: Jürgen Kunz, Marcel van de Molen, Alexander van der Ven

Oracle is committed to developing practices and products that help protect the
environment
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Deleting large amounts of files

2010-07-29 Thread Constantin Gonzalez

Hi,


Is there a way to see which files have been deduped, so I can copy them again 
an un-dedupe them?


unfortunately, that's not easy (I've tried it :) ).

The issue is that the dedup table (which knows which blocks have been deduped)
doesn't know about files.

And if you pull block pointers for deduped blocks from the dedup table,
you'll need to backtrack from there through the filesystem structure
to figure out what files are associated with those blocks.

(remember: Deduplication happens at the block level, not the file level.)

So, in order to compile a list of deduped _files_, one would need to extract
the list of dedupes _blocks_ from the dedup table, then chase the pointers
from the root of the zpool to the blocks in order to figure out what files
they're associated with.

Unless there's a different way that I'm not aware of (and I hope someone can
correct me here), the only way to do that is run a scrub-like process and
build up a table of files and their blocks.

Cheers,
  Constantin

--

Constantin Gonzalez Schmitz | Principal Field Technologist
Phone: +49 89 460 08 25 91 || Mobile: +49 172 834 90 30
Oracle Hardware Presales Germany

ORACLE Deutschland B.V.  Co. KG | Sonnenallee 1 | 85551 Kirchheim-Heimstetten

ORACLE Deutschland B.V.  Co. KG
Hauptverwaltung: Riesstraße 25, D-80992 München
Registergericht: Amtsgericht München, HRA 95603

Komplementärin: ORACLE Deutschland Verwaltung B.V.
Rijnzathe 6, 3454PV De Meern, Niederlande
Handelsregister der Handelskammer Midden-Niederlande, Nr. 30143697
Geschäftsführer: Jürgen Kunz, Marcel van de Molen, Alexander van der Ven

Oracle is committed to developing practices and products that help protect the
environment
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS deduplication ratio on Server 2008 backup VHD files

2010-04-26 Thread Constantin Gonzalez

Hi Tim,

thanks for sharing your dedup experience. Especially for Virtualization, having
a good pool of experience will help a lot of people.

So you see a dedup ratio of 1.29 for two installations of Windows Server 2008 on
the same ZFS backing store, if I understand you correctly.

What dedup ratios do you see for the third, fourth and fifth server
installation?

Also, maybe dedup is not the only way to save space. What compression rate
do you get?

And: Have you tried setting up a Windows System, then setting up the next one
based on a ZFS clone of the first one?


Hope this helps,
   Constantin

On 04/23/10 08:13 PM, tim Kries wrote:

Dedup is a key element for my purpose, because i am planning a central 
repository for like 150 Windows Server 2008 (R2) servers which would take a lot 
less storage if they dedup right.


--
Sent from OpenSolaris, http://www.opensolaris.org/

Constantin Gonzalez  Sun Microsystems GmbH, Germany
Principal Field Technologist   Blog: constantin.glez.de
Tel.: +49 89/4 60 08-25 91  Twitter: @zalez

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Jürgen Kunz
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-22 Thread Constantin Gonzalez

Hi,

I agree 100% with Chris.

Notice the on their own part of the original post. Yes, nobody wants
to run zfs send or (s)tar by hand.

That's why Chris's script is so useful: You set it up and forget and get the
job done for 80% of home users.

On another note, I was positively surprised by the availability of Crash Plan
for OpenSolaris:

  http://crashplan.com/

Their free service allows to back up your stuff to a friend's system over the
net in an encrypted way, the paid-for servide uses Crashplan's data centers at
a less than Amazon-S3 pricing.

While this may not be everyone's solution, I find it significant that they
explicitly support OpenSolaris. This either means they're OpenSolaris fans
or that they see potential in OpenSolaris home server users.


Cheers,
  Constantin

On 03/20/10 01:31 PM, Chris Gerhard wrote:


I'll say it again: neither 'zfs send' or (s)tar is an
enterprise (or
even home) backup system on their own one or both can
be components of
the full solution.



Up to a point. zfs send | zfs receive does make a very good back up scheme for 
the home user with a moderate amount of storage. Especially when the entire 
back up will fit on a single drive which I think  would cover the majority of 
home users.

Using external drives and incremental zfs streams allows for extremely quick 
back ups of large amounts of data.

It certainly does for me. 
http://chrisgerhard.wordpress.com/2007/06/01/rolling-incremental-backups/


--
Sent from OpenSolaris, http://www.opensolaris.org/

Constantin Gonzalez  Sun Microsystems GmbH, Germany
Principal Field Technologist   Blog: constantin.glez.de
Tel.: +49 89/4 60 08-25 91  Twitter: @zalez

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?

2010-01-20 Thread Constantin Gonzalez

Hi,

I'm using 2 x 1.5 TB drives from Samsung (EcoGreen, I believe) in my current
home server. One reported 14 Read errors a few weeks ago, roughly 6 months after
install, which went away during the next scrub/resilver.

This remembered me to order a 3rd drive, a 2.0 TB WD20EADS from Western Digital
and I now have a 3-way mirror, which is effectively a 2-way mirror with its
hot-spare already synced in.

The idea behind notching up the capacity is threefold:

- No sorry, this disk happens to have 1 block too few problems on attach.

- When the 1.5 TB disks _really_ break, I'll just order another 2 TB one and
  use the opportunity to upgrade pool capacity. Since at least one of the 1.5TB
  drives will still be attached, there won't be any slightly smaller drive
  problems either when attaching the second 2TB drive.

- After building in 2 bigger drives, it becomes easy to figure out which of the
  drives to phase out. Just go for the smaller drives. This solves the headache
  of trying to figure out the right drive to build out when you replace drives
  that aren't hot spares and don't have blinking lights.

Frankly, I don't care whether the Samsung or the WD drives are better or worse,
they're both consumer drives and they're both dirt cheap. Just assume that
they'll break soon (since you're probably using them more intensely than their
designed purpose) and make sure their replacements are already there.

It also helps mixing vendors, so one glitch that affect multiple disks in the
same batch won't affect your setup too much. (And yes, I broke that rule with
my initial 2 Samsung drives but I'm now glad I have both vendors :)).

Hope this helps,
   Constantin


Simon Breden wrote:

I see also that Samsung have very recently released the HD203WI 2TB 4-platter 
model.

It seems to have good customer ratings so far at newegg.com, but currently 
there are only 13 reviews so it's a bit early to tell if it's reliable.

Has anyone tried this model with ZFS?

Cheers,
Simon

http://breden.org.uk/2008/03/02/a-home-fileserver-using-zfs/


--
Sent from OpenSolaris, http://www.opensolaris.org/

Constantin Gonzalez  Sun Microsystems GmbH, Germany
Principal Field Technologisthttp://blogs.sun.com/constantin
Tel.: +49 89/4 60 08-25 91   http://google.com/search?q=constantin+gonzalez

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Setting default user/group quotas?

2009-11-19 Thread Constantin Gonzalez

Hi,

first of all, many thanks to those who made user/group quotas possible. This
is a huge improvement for many users of ZFS!

While presenting on this new future at the Munich OpenSolaris User Group meeting
yesterday, a question came up that I couldn't find an answer for: Can you set
a default user/group quota?

Apparently,

  zfs set userqu...@user1=5g tank/home/user1

is the only way to set user quotas and the @user1 part seems to be mandatory,
at least according to the snv_126 version of the ZFS manpage. According to my
attempts with ZFS:

  The {user|group}{used|quota}@ properties must be appended with
  a user or group specifier of one of these forms:
  POSIX name  (eg: matt)
  POSIX id(eg: 126829)
  SMB n...@domain (eg: m...@sun)
  SMB SID (eg: S-1-234-567-89)

Imagine a system that needs to handle thousands of users. Setting quota
individually for all of these users would quickly become unwieldly, in a similar
manner to the unwieldliness that having a filesystem for each user presented.

Which was the reason to introduce user/group quotas in the first place.

IMHO, it would be useful to have something like:

  zfs set userquota=5G tank/home

and that would mean that all users who don't have an individual user quota
assigned to them would see a default 5G quota.

I haven't found an RFE for this yet. Is this planned? Should I file an RFE?
Or did I overlook something?


Thanks,
   Constantin

--
Sent from OpenSolaris, http://www.opensolaris.org/

Constantin Gonzalez  Sun Microsystems GmbH, Germany
Principal Field Technologisthttp://blogs.sun.com/constantin
Tel.: +49 89/4 60 08-25 91   http://google.com/search?q=constantin+gonzalez

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Setting default user/group quotas?

2009-11-19 Thread Constantin Gonzalez

Hi,


IMHO, it would be useful to have something like:

  zfs set userquota=5G tank/home

...

I think that would be great feature.


thanks. I just created CR 6902902 to track this. I hope it becomes viewable
soon here:

  http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6902902

Cheers,
  Constantin

--
Sent from OpenSolaris, http://www.opensolaris.org/

Constantin Gonzalez  Sun Microsystems GmbH, Germany
Principal Field Technologisthttp://blogs.sun.com/constantin
Tel.: +49 89/4 60 08-25 91   http://google.com/search?q=constantin+gonzalez

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS commands hang after several zfs receives

2009-09-15 Thread Constantin Gonzalez

Hi,

I think I've run into the same issue on OpenSolaris 2009.06.

Does anybody know when this issue will be solved in OpenSolaris?
What's the BugID?

Thanks,
   Constantin

Gary Mills wrote:

On Tue, Sep 15, 2009 at 08:48:20PM +1200, Ian Collins wrote:

Ian Collins wrote:

I have a case open for this problem on Solaris 10u7.

The case has been identified and I've just received an IDR,which I 
will test next week.  I've been told the issue is fixed in update 8, 
but I'm not sure if there is an nv fix target.


I'll post back once I've abused a test system for a while.

The IDR I was sent appears to have fixed the problem.  I have been 
abusing the box for a couple of weeks without any lockups.  Roll on 
update 8!


Was that IDR140221-17?  That one fixed a deadlock bug for us back
in May.



--
Constantin Gonzalez  Sun Microsystems GmbH, Germany
Principal Field Technologisthttp://blogs.sun.com/constantin
Tel.: +49 89/4 60 08-25 91   http://google.com/search?q=constantin+gonzalez

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Crypto Updates [PSARC/2009/443 FastTrack timeout 08/24/2009]

2009-08-18 Thread Constantin Gonzalez

Hi,

Brian Hechinger wrote:

On Tue, Aug 18, 2009 at 12:37:23AM +0100, Robert Milkowski wrote:

Hi Darren,

Thank you for the update.
Have you got any ETA (build number) for the crypto project?


Also, is there any word on if this will support the hardware crypto stuff
in the VIA CPUs natively?  That would be nice. :)


ZFS Crypto uses the Solaris Cryptographics Framework to do the actual
encryption work, so ZFS is agnostic to any hardware crypto acceleration.

The Cryptographic Framework project on OpenSolaris.org is looking for help
in implementing VIA Padlock support for the Solaris Cryptographic Framework:

  http://www.opensolaris.org/os/project/crypto/inprogress/padlock/

Cheers,
  Constantin

--
Constantin Gonzalez  Sun Microsystems GmbH, Germany
Principal Field Technologisthttp://blogs.sun.com/constantin
Tel.: +49 89/4 60 08-25 91   http://google.com/search?q=constantin+gonzalez

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Motherboard for home zfs/solaris file server

2009-07-29 Thread Constantin Gonzalez

Hi,

thank you so much for this post. This is exactly what I was looking for.
I've been eyeing the M3A76-CM board, but will now look at 78 and M4A as
well.

Actually, not that many Asus M3A, let alone M4A boards show up yet on the
OpenSolaris HCL, so I'd like to encourage everyone to share their hardware
experience by clicking on the submit hardware link on:

  http://www.sun.com/bigadmin/hcl/data/os/

I've done it a couple of times and it's really just a matter of 5-10 minutes
where you can help others know if a certain component works or not or if a
special driver or /etc/driver_aliases setting is required.

I'm also interested in getting the power down. Right now, I have the
Athlon X2 5050e (45W TDP) on my list, but I'd also like to know more about
the possibilities of the Athlon II X2 250 and whether it has better potential
for power savings.

Neal, the M3A78 seems to have a RealTek RTL8111/8168B NIC chip. I pulled
this off a Gentoo Wiki, because strangely this information doesn't show up
on the Asus website.

Also, thanks for the CF to pata hint for the root pool mirror. Will try to
find fast CFs to boot from. The performance problems you see when writing
may be related to master/slave issues, but I'm not a good PC tweaker to back
that up.

Cheers,
   Constantin


F. Wessels wrote:

Hi,

I'm using asus m3a78 boards (with the sb700) for opensolaris and m2a* boards
(with the sb600) for linux some of them with 4*1GB and others with 4*2Gb ECC
memory. Ecc faults will be detected and reported. I tested it with a small
tungsten light. By moving the light source slowly towards the memory banks
you'll heat them up in a controlled way and at a certain point bit flips will
occur. I recommend you to go for a m4a board since they support up to 16 GB.
 I don't know if you can run opensolaris without a videocard after
installation I think you can disable the halt on no video card in the bios.
But Simon Breden had some trouble with it, see his homeserver blog. But you
can go for one of the three m4a boards with a 780g onboard. Those will give
you 2 pci-e x16 connectors. I don't think the onboard nic is supported. I
always put an intel (e1000) in, just to prevent any trouble. I don't have any
trouble with the sb700 in ahci mode. Hotplugging works like a charm.
Transfering a couple of GB's over esata takes considerable less time than via
usb. I have a pata to dual cf adapter and two industrial 16gb cf cards as
mirrored root pool. It takes for ever to install nevada, at least 14 hours. I
suspect the cf cards lack caches. But I don't update that regularly, still on
snv104.  And have 2 mirrors and a hot spare. The sixth port is an esata port
I use to transfer large amounts of data. This system consumes about 73 watts
idle and 82 under load i/o load. (5 disks , a separate nic  ,8 gb ram and a
be2400 all using just 73 watts!!!) Please note that frequency scaling is only
supported on the K10 architecture. But don't expect to much power saving from
it. A lower voltage yields far greater savings than a lower frequency. In
september I'll do a post about the afore mentioned M4A boards and an lsi sas
controller in one of the pcie x16 slots.


--
Constantin Gonzalez  Sun Microsystems GmbH, Germany
Principal Field Technologisthttp://blogs.sun.com/constantin
Tel.: +49 89/4 60 08-25 91   http://google.com/search?q=constantin+gonzalez

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Wolf Frenkel
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis

2008-10-23 Thread Constantin Gonzalez
Hi,

 - The ZIL exists on a per filesystem basis in ZFS. Is there an RFE 
 already
that asks for the ability to disable the ZIL on a per filesystem 
 basis?
 
 Yes: 6280630 zil synchronicity

good, thanks for the pointer!

 Though personally I've been unhappy with the exposure that zil_disable 
 has got.
 It was originally meant for debug purposes only. So providing an official
 way to make synchronous behaviour asynchronous is to me dangerous.

IMHO, the need here is to give admins control over the way they want their
file servers to behave. In this particular case, the admin argues that he knows
what he's doing, that he doesn't want his NFS server to behave more strongly
than a local filesystem and that he deserves control of that behaviour.

Ideally, there would be an NFS option that lets customers choose whether they
want to honor COMMIT requests or not.

Disabling ZIL on a per filesystem basis is only the second best solution, but
since that CR already exists, it seems to be the more realistic route.

Thanks,
Constantin


-- 
Constantin Gonzalez  Sun Microsystems GmbH, Germany
Principal Field Technologisthttp://blogs.sun.com/constantin
Tel.: +49 89/4 60 08-25 91   http://google.com/search?q=constantin+gonzalez

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis

2008-10-23 Thread Constantin Gonzalez
Hi,

Bob Friesenhahn wrote:
 On Wed, 22 Oct 2008, Neil Perrin wrote:
 On 10/22/08 10:26, Constantin Gonzalez wrote:
 3. Disable ZIL[1]. This is of course evil, but one customer pointed out to 
 me
 that if a tar xvf were writing locally to a ZFS file system, the writes
 wouldn't be synchronous either, so there's no point in forcing NFS users
 to having a better availability experience at the expense of 
 performance.
 
 The conclusion reached here is quite seriously wrong and no Sun 
 employee should suggest it to a customer.  If the system writing to a 

I'm not suggesting it to any customer. Actually, I argued quite a long time
with the customer, trying to convince him that slow but correct is better.

The conclusion above is a conscious decision by the customer. He says that he
does not want NFS to turn any write into a synchronous write, he's happy if
all writes are asynchronous, because in this case the NFS server is a backup to
disk device and if power fails he simply restarts the backup 'cause he has the
data in multiple copies anyway.

 local filesystem reboots then the applications which were running are 
 also lost and will see the new filesystem state when they are 
 restarted.  If an NFS server sponteneously reboots, the applications 
 on the many clients are still running and the client systems are using 
 cached data.  This means that clients could do very bad things if the 
 filesystem state (as seen by NFS) is suddenly not consistent.  One of 
 the joys of NFS is that the client continues unhindered once the 
 server returns.

Yes, we're both aware of this. In this particular situation, the customer
would restart his backup job (and thus the client application) in case the
server dies.

Thanks for pointing out the difference, this is indeed an important distinction.

Cheers,
   Constantin

-- 
Constantin Gonzalez  Sun Microsystems GmbH, Germany
Principal Field Technologisthttp://blogs.sun.com/constantin
Tel.: +49 89/4 60 08-25 91   http://google.com/search?q=constantin+gonzalez

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis

2008-10-23 Thread Constantin Gonzalez
Hi,

yes, using slogs is the best solution.

Meanwhile, using mirrored slogs from other servers' RAM-Disks running on UPSs
seem like an interesting idea, if the reliability of UPS-backed RAM is deemed
reliable enough for the purposes of the NFS server.

Thanks for siggesting this!

Cheers,
Constantin

Ross wrote:
 Well, it might be even more of a bodge than disabling the ZIL, but how about:
 
 - Create a 512MB ramdisk, use that for the ZIL
 - Buy a Micro Memory nvram PCI card for £100 or so.
 - Wait 3-6 months, hopefully buy a fully supported PCI-e SSD to replace the 
 Micro Memory card.
 
 The ramdisk isn't an ideal solution, but provided you don't export the pool 
 with it offline, it does work.  We used it as a stop gap solution for a 
 couple of weeks while waiting for a Micro Memory nvram card.
 
 Our reasoning was that our server's on a UPS and we figured if something 
 crashed badly enough to take out something like the UPS, the motherboard, 
 etc, we'd be loosing data anyway.  We just made sure we had good backups in 
 case the pool got corrupted and crossed our fingers.
 
 The reason I say wait 3-6 months is that there's a huge amount of activity 
 with SSD's at the moment.  Sun said that they were planning to have flash 
 storage launched by Christmas, so I figure there's a fair chance that we'll 
 see some supported PCIe cards by next Spring.
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-- 
Constantin Gonzalez  Sun Microsystems GmbH, Germany
Principal Field Technologisthttp://blogs.sun.com/constantin
Tel.: +49 89/4 60 08-25 91   http://google.com/search?q=constantin+gonzalez

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis

2008-10-23 Thread Constantin Gonzalez
Hi,

Bob Friesenhahn wrote:
 On Thu, 23 Oct 2008, Constantin Gonzalez wrote:

 Yes, we're both aware of this. In this particular situation, the customer
 would restart his backup job (and thus the client application) in case 
 the
 server dies.
 
 So it is ok for this customer if their backup becomes silently corrupted 
 and the backup software continues running?  Consider that some of the 
 backup files may have missing or corrupted data in the middle.  Your 
 customer is quite dedicated in that he will monitor the situation very 
 well and remember to reboot the backup system, correct any corrupted 
 files, and restart the backup software whenever the server panics and 
 reboots.

This is what the customer told me. He uses rsync and he is ok with restarting
the rsync whenever the NFS server restarts.

 A properly built server should be able to handle NFS writes at gigabit 
 wire-speed.

I'm advocating for a properly built system, believe me :).

Cheers,
Constantin

-- 
Constantin Gonzalez  Sun Microsystems GmbH, Germany
Principal Field Technologisthttp://blogs.sun.com/constantin
Tel.: +49 89/4 60 08-25 91   http://google.com/search?q=constantin+gonzalez

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Disabling COMMIT at NFS level, or disabling ZIL on a per-filesystem basis

2008-10-22 Thread Constantin Gonzalez
Hi,

On a busy NFS server, performance tends to be very modest for large amounts
of small files due to the well known effects of ZFS and ZIL honoring the
NFS COMMIT operation[1].

For the mature sysadmin who knows what (s)he does, there are three
possibilities:

1. Live with it. Hard, if you see 10x less performance than could be and your
users complain a lot.

2. Use a flash disk for a ZIL, a slog. Can add considerable extra cost,
especially if you're using an X4500/X4540 and can't swap out fast SAS
drives for cheap SATA drives to free the budget for flash ZIL drives.[2]

3. Disable ZIL[1]. This is of course evil, but one customer pointed out to me
that if a tar xvf were writing locally to a ZFS file system, the writes
wouldn't be synchronous either, so there's no point in forcing NFS users
to having a better availability experience at the expense of performance.


So, if the sysadmin draws the informed and conscious conclusion that (s)he
doesn't want to honor NFS COMMIT operations, what are options less disruptive
than disabling ZIL completely?

- I checked the NFS tunables from:
   http://dlc.sun.com/osol/docs/content/SOLTUNEPARAMREF/chapter3-1.html
   But could not find a tunable that would disable COMMIT honoring.
   Is there already an RFE asking for a share option that disable's the
   translation of COMMIT to synchronous writes?

- The ZIL exists on a per filesystem basis in ZFS. Is there an RFE already
   that asks for the ability to disable the ZIL on a per filesystem basis?

   Once Admins start to disable the ZIL for whole pools because the extra
   performance is too tempting, wouldn't it be the lesser evil to let them
   disable it on a per filesystem basis?

Comments?


Cheers,
Constantin

[1]: http://blogs.sun.com/roch/entry/nfs_and_zfs_a_fine
[2]: http://blogs.sun.com/perrin/entry/slog_blog_or_blogging_on

-- 
Constantin Gonzalez  Sun Microsystems GmbH, Germany
Principal Field Technologisthttp://blogs.sun.com/constantin
Tel.: +49 89/4 60 08-25 91   http://google.com/search?q=constantin+gonzalez

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] RFE: Start with desired end state in mind...

2008-02-29 Thread Constantin Gonzalez
Hi,

great, thank you. So ZFS isn't picky about finding the target fs already
created and attributed when replicating data into it.

This is very cool!

Best regards,
Constantin

Darren J Moffat wrote:
 Constantin Gonzalez wrote:
 Hi Darren,

 thank you for the clarification, I didn't know that.

 See the man page for zfs(1) where the -R options for send is discussed.
 
 
 Back to Brad's RFS, what would one need to do to send a stream from a
 compressed filesystem to one with a different compression setting, if
 the source file system has the compression attribute set to a specific
 algorithm (i.e. not inherited)?
 
 $ zfs create -o compression=gzip-1 tank/gz1
 # put in your data
 $ zfs snapshot tank/[EMAIL PROTECTED]
 $ zfs create -o compression=gzip-9 tank/gz9
 $ zfs send tank/[EMAIL PROTECTED] | zfs recv -d tank/gz9
 
 Will leaving out -R just create a new, but plain unencrypted fs on the
 receivig side?
 
 Depends on inheritance.
 
 What if one wants to replicated a whole package of filesystems via
 -R, but change properties on the receiving side before it happens?
 
 If they are all getting the same properties use inheritance if they 
 aren't then you (by the very nature of what you want to do) need to 
 precreate them with the appropriate options.
 

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91www.google.com/search?q=constantin+gonzalez

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] RFE: Start with desired end state in mind...

2008-02-29 Thread Constantin Gonzalez
Hi Darren,

thank you for the clarification, I didn't know that.

 See the man page for zfs(1) where the -R options for send is discussed.

oh, this is new. Thank you for bringing us -R.

Back to Brad's RFS, what would one need to do to send a stream from a
compressed filesystem to one with a different compression setting, if
the source file system has the compression attribute set to a specific
algorithm (i.e. not inherited)?

Will leaving out -R just create a new, but plain unencrypted fs on the
receivig side?

What if one wants to replicated a whole package of filesystems via
-R, but change properties on the receiving side before it happens?

Best regards,
Constantin

 
 But for the sake of implementing the RFE, one could extend the ZFS
 send/receive framework with a module that permits manipulation of the
 data on the fly, specifically in order to allow for things like
 recompression, en/decryption, change of attributes at the dataset level,
 etc.
 
 No need this already works this way.
 

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91www.google.com/search?q=constantin+gonzalez

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS with Memory Sticks

2007-12-11 Thread Constantin Gonzalez
Hi Paul,

 # fdisk -E /dev/rdsk/c7t0d0s2

then

 # zpool create -f Radical-Vol /dev/dsk/c7t0d0

should work. The warnings you see are just there to double-check you don't
overwrite any previously used pool which you may regret. -f overrules that.

Hope this helps,
Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS with Memory Sticks

2007-12-06 Thread Constantin Gonzalez
Hi,

 # /usr/sbin/zpool import
   pool: Radical-Vol
 id: 3051993120652382125
  state: FAULTED
 status: One or more devices contains corrupted data.
 action: The pool cannot be imported due to damaged devices or data.
see: http://www.sun.com/msg/ZFS-8000-5E
 config:
 
 Radical-Vol  UNAVAIL   insufficient replicas
   c7t0d0s0  UNAVAIL   corrupted data

ok, ZFS did recognize the disk, but the pool is corrupted. Did you remove
it without exporting the pool first?

 Following your command:
 
 $ /opt/sfw/bin/sudo /usr/sbin/zpool status
   pool: Rad_Disk_1
  state: ONLINE
 status: The pool is formatted using an older on-disk format.  The pool can
 still be used, but some features are unavailable.
 action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
 pool will no longer be accessible on older software versions.
  scrub: none requested
 config:
 
 NAMESTATE READ WRITE CKSUM
 Rad_Disk_1  ONLINE   0 0 0
   c0t1d0ONLINE   0 0 0
 
 errors: No known data errors

But this pool should be accessible, since you can zpool status it. Have
you check zfs get all Rad_Disk_1? Does it show mount points and whether
it should be mounted?

 But this device works currently on my Solaris PC's, the W2100z and a 
 laptop of mine.

Strange. Maybe it's a USB issue. Have you checked:

   http://www.sun.com/io_technologies/usb/USB-Faq.html#Storage

Especially #19?

Best regards,
Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS with Memory Sticks

2007-12-05 Thread Constantin Gonzalez
Hi Paul,

yes, ZFS is platform agnostic and I know it works in SANs.

For the USB stick case, you may have run into labeling issues. Maybe
Solaris SPARC did not recognize the x64 type label on the disk (which
is strange, because it should...).

Did you try making sure that ZFS creates an EFI label on the disk?
You can check this by running zpool status and then the devices should
look like c6t0d0 without the s0 part.

If you want to force this, you can create an EFI label on the USB disk
from hand by saying fdisk -E /dev/rdsk/cxtxdx.

Hope this helps,
Constantin


Paul Gress wrote:
 OK, I've been putting off this question for a while now, but it eating 
 at me, so  I can't hold off any more.  I have a nice 8 gig memory stick 
 I've formated with the ZFS file system.  Works great on all my Solaris 
 PC's, but refuses to work on my Sparc processor.  So I've formated it on 
 my Sparc machine (Blade 2500), works great there now, but not on my 
 PC's.  Re-Formatted it on my PC, doesn't work on Sparc, and so on and so on.
 
 I thought it was a file system to go back and forth both architectures.  
 So when will this compatibility be here, or if it's possible now, what 
 is the secret?
 
 Paul
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Best practice for moving FS between pool on same machine?

2007-06-21 Thread Constantin Gonzalez
Hi,

Chris Quenelle wrote:
 Thanks, Constantin!  That sounds like the right answer for me.
 Can I use send and/or snapshot at the pool level?  Or do I have
 to use it on one filesystem at a time?  I couldn't quite figure this
 out from the man pages.

the ZFS team is working on a zfs send -r (recursive) option to be able
to recursively send and receive hierarchies of ZFS filesystems in one go,
including pools.

So you'll need to do it one filesystem at a time.

This is not always trivial: If you send a full snapshot, then an incremental
one and the target filesystem is mounted, you'll likely get an error that the
target filesystem was modified. Make sure the target filesystems are unmounted
and ideally marked as unmountable while performing the send/receives. Also,
you may want to use the -F option to receive which forces a rollback of the
target filesystem to the most recent snapshot.

I've written a script to do all of this, but it's only works on my system
certified.

I'd like to get some feedback and validation before I post it on my blog,
so anyone, let me know if you want to try it out.

Best regards,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best practice for moving FS between pool on same machine?

2007-06-20 Thread Constantin Gonzalez
Hi Chris,

 What is the best (meaning fastest) way to move a large file system 
 from one pool to another pool on the same machine.  I have a machine
 with two pools.  One pool currently has all my data (4 filesystems), but it's
 misconfigured. Another pool is configured correctly, and I want to move the 
 file systems to the new pool.  Should I use 'rsync' or 'zfs send'?

zfs send/receive is the fastest and most efficient way.

I've used it multiple times on my home server until I had my configuration
right :).

 What happens is I forgot I couldn't incrementally add raid devices.  I want
 to end up with two raidz(x4) vdevs in the same pool.  Here's what I have now:

For this reason, I decided to go with mirrors. Yes, they use more raw storage
space, but they are also much more flexible to expand. Just add two disks when
the pool is full and you're done.

If you have a lot of disks or can afford to add disks 4-5 disks at a time, then
RAID-Z may be as easy to do, but remember that two disk failures in RAID-5
variants can be quite common - You may want RAID-Z2 instead.

 1. move data to dbxpool2
 2. remount using dbxpool2
 3. destroy dbxpool1
 4. create new proper raidz vdev inside dbxpool2 using devices from dbxpool1

Add:

0. Snapshot data in dbxpool1 so you can use zfs send/receive

Then the above should work fine.

 I'm constrained by trying to minimize the downtime for the group
 of people using this as their file server.  So I ended up with
 an ad-hoc assignment of devices.  I'm not worried about
 optimizing my controller traffic at the moment.

Ok. If you want to really be thorough, I'd recommend:

0. Run a backup, just in case. It never hurts.
1. Do a snapshot of dbxpool1
2. zfs send/receive dbxpool1 - dbxpool2
   (This happens while users are still using dbxpool1, so no downtime).
3. Unmount dbxpool1
4. Do a second snapshot of dbxpool1
5. Do an incremental zfs send/receive of dbxpool1 - dbxpool2.
   (This should take only a small amount of time)
6. Mount dbxpool2 where dbxpool1 used to be.
7. Check everything is fine with the new mounted pool.
8. Destroy dbxpool1
9. Use disks from dbxpool1 to expand dbxpool2 (be careful :) ).

You might want to exercise the above steps on an extra spare disk with
two pools just to gain some confidence before doing it in production.

I have a script that automatically does 1-6 that is looking for beta
testers. If you're interested, let me know.

Hope this helps,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Scalability/performance

2007-06-20 Thread Constantin Gonzalez
Hi,

 I'm quite interested in ZFS, like everybody else I suppose, and am about
 to install FBSD with ZFS.

welcome to ZFS!

 Anyway, back to business :)
 I have a whole bunch of different sized disks/speeds. E.g. 3 300GB disks
 @ 40mb, a 320GB disk @ 60mb/s, 3 120gb disks @ 50mb/s and so on.
 
 Raid-Z and ZFS claims to be uber scalable and all that, but would it
 'just work' with a setup like that too?

Yes. If you dump a set of variable-size disks into a mirror or RAID-Z
configuration, you'll get the same result as if you had the smallest of
their sizes. Then, the pool will grow when exchanging smaller disks with
larger.

I used to run a ZFS pool on 1x250GB, 1x200GB, 1x85 GB and 1x80 GB the following
way:

- Set up an 80 GB slice on all 4 disks and make a 4 disk RAID-Z vdev
- Set up a 5 GB slice on the 250, 200 and 85 GB disks and make a 3 disk RAID-Z
- Set up a 115GB slice on the 200 and the 250 GB disk and make a 2 disk mirror.
- Concatenate all 3 vdevs into one pool. (You need zpool add -f for that).

Not something to be done on a professional production system, but it worked
for my home setup just fine. The remaining 50GB from the 250GB drive then
went into a scratch pool.

Kinda like playing Tetris with RAID-Z...

Later, I decided using just paired disks as mirrors are really more
flexible and easier to expand, since disk space is cheap.

Hope this helps,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Scalability/performance

2007-06-20 Thread Constantin Gonzalez
Hi,

 How are paired mirrors more flexiable?

well, I'm talking of a small home system. If the pool gets full, the
way to expand with RAID-Z would be to add 3+ disks (typically 4-5).

With mirror only, you just add two. So in my case it's just about
the granularity of expansion.

The reasoning is that of the three factors reliability, performance and
space, I value them in this order. Space comes last since disk space
is cheap.

If I had a bigger number of disks (12+), I'd be using them in RAID-Z2
sets (4+2 plus 4+2 etc.). Here, the speed is ok and the reliability is
ok and so I can use RAID-Z2 instead of mirroring to get some extra
space as well.

 Right now, i have a 3 disk raid 5 running with the linux DM driver. One
 of the most resent additions was raid5 expansion, so i could pop in a
 matching disk, and expand my raid5 to 4 disks instead of 3 (which is
 always interesting as your cutting on your parity loss). I think though
 in raid5 you shouldn't put more then 6 - 8 disks afaik, so I wouldn't be
 expanding this enlessly.
 
 So how would this translate to ZFS? I have learned so far that, ZFS

ZFS does not yet support rearranging the disk cofiguration. Right now,
you can expand a single disk to a mirror or an n-way mirror to an n+1 way
mirror.

RAID-Z vdevs can't be changed right now. But you can add more disks
to a pool by adding more vdevs (You have a 1+1 mirror, add another 1+1
pair and get more space, have a 3+2 RAID-Z2 and add another 5+2 RAID etc.)

 basically is raid + LVM. e.g. the mirrored raid-z pairs go into the
 pool, just like one would use LVM to bind all the raid pairs. The
 difference being I suppose, that you can't use a zfs mirror/raid-z
 without having a pool to use it from?

Here's the basic idea:

- You first construct vdevs from disks:

  One disk can be one vdev.
  A 1+1 mirror can be a vdev, too.
  A n+1 or n+2 RAID-Z (RAID-Z2) set can be a vdev too.

- Then you concatenate vdevs to create a pool. Pools can be extended by
  adding more vdevs.

- Then you create ZFS file systems that draw their block usage from the
  resources supplied by the pool. Very flexible.

 Wondering now is if I can simply add a new disk to my raid-z and have it
 'just work', e.g. the raid-z would be expanded to use the new
 disk(partition of matching size)

If you have a RAID-Z based pool in ZFS, you can add another group of disks
that are organized in a RAID-Z manner (a vdev) to expand the storage capacity
of the pool.

Hope this clarifies things a bit. And yes, please check out the admin guide and
the other collateral available on ZFS. It's full of new concepts and one needs
some getting used to to explore all possibilities.

Cheers,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Scalability/performance

2007-06-20 Thread Constantin Gonzalez
Hi Mike,

 If I was to plan for a 16 disk ZFS-based system, you would probably
 suggest me to configure it as something like 5+1, 4+1, 4+1 all raid-z
 (I don't need the double parity concept)
 
 I would prefer something like 15+1 :) I want ZFS to be able to detect
 and correct errors, but I do not need to squeeze all the performance
 out of it (I'll be using it as a home storage server for my DVDs and
 other audio/video stuff. So only a few clients at the most streaming
 off of it)

this is possibe. ZFS in theory does not significantly limit the n and 15+1
is indeed possible.

But for a number of reasons (among them performance) people generally
advise to use no more than 10+1.

A lot of ZFS configuration wisdom can be found on the Solaris internals
ZFS Best Practices Guide Wiki at:

  http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide

Richard Elling has done a great job of thoroughly analyzing different
reliability concepts for ZFS in his blog. One good introduction is the
following entry:

  http://blogs.sun.com/relling/entry/zfs_raid_recommendations_space_performance

That may help you find the right tradeoff between space and reliability.

Hope this helps,
   Constantin


-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] New german white paper on ZFS

2007-06-19 Thread Constantin Gonzalez
Hi,

if you understand german or want to brush it up a little, I've a new ZFS
white paper in german for you:

  http://blogs.sun.com/constantin/entry/new_zfs_white_paper_in

Since there's already so much collateral on ZFS in english, I thought it's
time for some localized stuff for my country.

There are also some new ZFS slides that go with it, also in german.

Let me know if you have any suggestions.

Hope this helps,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS boot: Now, how can I do a pseudo live upgrade?

2007-05-25 Thread Constantin Gonzalez
Hi,

I'm a big fan of live upgrade. I'm also a big fan of ZFS boot. The latter is
more important for me. And yes, I'm looking forward to both being integrated
with each other.

Meanwhile, what is the best way to upgrade a post-b61 system that is booted
from ZFS?


I'm thinking:

1. Boot from ZFS
2. Use Tim's excellent multiple boot datasets script to create a new cloned ZFS
   boot environment:
   http://blogs.sun.com/timf/entry/an_easy_way_to_manage
3. Loopback mount the new OS ISO image
4. Run the installer from the loopbacked ISO image in upgrade mode on the clone
5. Mark the clone to be booted the next time
6. Reboot into the upgraded OS.


Questions:

- How exactly do I do step 4? Before, luupgrade did everything for me, now
  what manpage do I need to do this?

- Did I forget something above? I'm ok with losing some logfiles and stuff that
  maybe changed between the clone and the reboot, but is there anything else?

- Did someone already blog about this and I haven't noticed yet?


Cheers,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS boot: Now, how can I do a pseudo live upgrade?

2007-05-25 Thread Constantin Gonzalez
Hi,

 Our upgrade story isn't great right now.  In the meantime,
 you might check out Tim Haley's blog entry on using
 bfu with zfs root.

thanks.

But doesn't live upgrade just start the installer from the new OS
DVD with the right options? Can't I just do that too?

Cheers,
   Constantin

 
 http://blogs.sun.com/timh/entry/friday_fun_with_bfu_and
 
 lori
 
 Constantin Gonzalez wrote:
 Hi,

 I'm a big fan of live upgrade. I'm also a big fan of ZFS boot. The
 latter is
 more important for me. And yes, I'm looking forward to both being
 integrated
 with each other.

 Meanwhile, what is the best way to upgrade a post-b61 system that is
 booted
 from ZFS?


 I'm thinking:

 1. Boot from ZFS
 2. Use Tim's excellent multiple boot datasets script to create a new
 cloned ZFS
boot environment:
http://blogs.sun.com/timf/entry/an_easy_way_to_manage
 3. Loopback mount the new OS ISO image
 4. Run the installer from the loopbacked ISO image in upgrade mode on
 the clone
 5. Mark the clone to be booted the next time
 6. Reboot into the upgraded OS.


 Questions:

 - How exactly do I do step 4? Before, luupgrade did everything for me,
 now
   what manpage do I need to do this?

 - Did I forget something above? I'm ok with losing some logfiles and
 stuff that
   maybe changed between the clone and the reboot, but is there
 anything else?

 - Did someone already blog about this and I haven't noticed yet?


 Cheers,
Constantin

   
 

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS boot: Now, how can I do a pseudo live upgrade?

2007-05-25 Thread Constantin Gonzalez
Hi Malachi,

Malachi de Ælfweald wrote:
 I'm actually wondering the same thing because I have b62 w/ the ZFS
 bits; but need the snapshot's -r functionality.

you're lucky, it's already there. From my b62 machine's man zfs:

 zfs snapshot [-r] [EMAIL PROTECTED]|[EMAIL PROTECTED]

 Creates  a  snapshot  with  the  given  name.  See   the
 Snapshots section for details.

 -rRecursively create  snapshots  of  all  descendant
   datasets.  Snapshots are taken atomically, so that
   all recursive snapshots  correspond  to  the  same
   moment in time.

Or did you mean send -r?

Best regards,
   Constantin


-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] A big Thank You to the ZFS team!

2007-04-24 Thread Constantin Gonzalez
Hi,

I just got ZFS boot up and running on my laptop. This being a major milestone
in the history of ZFS, I thought I'd reflect a bit on what ZFS brought to my
life so far:


- I'm using ZFS at home and on my laptop since January 2006 for mission
  critical purposes:

  - Backups of my wife's and my own Macs.

  - Storing family photos (I have a baby now, so they _are_ mission critical
:) ).

  - Storing my ca. 400 CDs that were carefully ripped and metadata'ed, which
took a lot of work.

  - Providing fast and reliable storage for my PVR.

  - And of course all the rough stuff that happens to laptops on the road.

- ZFS has already saved me from bit rot once. I could see that it fixed a bad
  block during a weekly scrub. What a great feeling to know that your data is
  much safer than it was before and to be able to see how and when it is being
  protected!

  It is kinda weird to talk to customers about adopting ZFS while knowing that
  my family pictures at home are probably stored safer than their company
  data...

- ZFS enabled me to just take a bunch of differently sized drives that have
  been lying around somewhere and turn them into an easy to manage,
  consistent and redundant pool of storage that effortlessly handles very
  diverse workloads (File server, audio streaming, video streaming).

- During the frequent migrations (Couldn't make up my mind first on how to
  slice and dice my 4 disks), zfs send/receive has been my best friend. It
  enabled me to painlessly migrate whole filesystems between pools in
  minutes.
  I'm now writing a script to further automate recursive and updating
  zfs send/receive orgies for backups and other purposes.

- Disk storage is cheap, and thanks to ZFS it became reliable at zero cost.
  Therefore, I can snapshot a lot, not think about whether to delete stuff
  or not, or simply delete stuff I don't need know, while knowing it is
  still preserved in my snapshots.

- As a result of all of this, I learned a great deal about Solaris 10 and
  it's other features, which is a big help in my day-to-day job.


I know there's still a lot to do and that we're still working on some bugs,
but I can safely say that ZFS is the best thing that happened to my data
so far.


So here's a big

  THANK YOU!

to the ZFS team for making all of this and more possible for my little home
system.


Down the road, I've now migrated my pools to external mirrored USB disks
(mirrored because it's fast and lowers complexity; USB, because it's
pluggable and host-independent) and I'm thinking of how to backup them
(I realize I still need a backup) onto other external disks or preferably
another system. Again, zfs send/receive will be my friend here.

ZFS boot on my home server is the other next big thing, enabling me to
mirror my root file system more reliably than SVM can while saving space
for live upgrade and enabling other cool stuff.

I'm also thinking of using iSCSI zvols as Mac OS X storage for audio/video
editing and whole-disk backups, but that requires some waiting until
the Mac OS X iSCSI support has matured a bit.

And then I can start to really archive stuff: Older backups that sit on CDs
and are threatened by CD-rot, old photo CDs that have been sitting there and
hopefully haven't begun to rot yet, maybe scan in some older photos,
migrating my CD collection to a lossless format, etc.


This sounds like I've been drinking too much koolaid, and I've probably have,
but I guess all the above points remain valid even if I didn't work for Sun.
So please take this email as being written by a private ZFS user and not
a Sun employee.


So, again, thank you so much ZFS team and keep up the good work!


Best regards,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS boot: 3 smaller glitches with console, /etc/dfs/sharetab and /dev/random

2007-04-19 Thread Constantin Gonzalez Schmitz
Hi,

I've now gone through both the opensolaris instructions:

  http://www.opensolaris.org/os/community/zfs/boot/zfsboot-manual/

and Tim Foster's script:

  http://blogs.sun.com/timf/entry/zfs_bootable_datasets_happily_rumbling

for making my laptop ZFS bootable.


Both work well and here's a big THANK YOU to the ZFS boot team!


There seem to be 3 smaller glitches with these approaches:

1. The instructions on opensolaris.org assume that one wants console output
   to show up in /dev/tty. This may be true for a server, but it isn't for a
   laptop or workstation user. Therefore, I suggest someone explains them to be
   optional as not everybody knows that these can be left out.

2. After going through the zfs-bootification, Solaris complains on reboot that
   /etc/dfs/sharetab is missing. Somehow this seems to have been fallen through
   the cracks of the find command. Well, touching /etc/dfs/sharetab just fixes
   the issue.

3. But here's a more serious one: While booting, Solaris complains:

   Apr 19 15:00:37 foeni kcf: [ID 415456 kern.warning] WARNING: No randomness
   provider enabled for /dev/random. Use cryptoadm(1M) to enable a provider.

   Somehow, /dev/random and/or it's counterpart in /devices seems to have
   suffered from the migration procedure.

Does anybody know how to fix the /dev/random issue? I'm not very fluent in
cryptoadm(1M) and some superficial reading of it's manpage did not enlighten
me too much (cryptoadm list -p claims all is well...).

Best regards and again, congratulations to the ZFS boot team!

   Constantin


-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Who modified my ZFS receive destination?

2007-04-12 Thread Constantin Gonzalez
Hi,

I'm currently migrating a filesystem from one pool to the other through
a series of zfs send/receive commands in order to preserve all snapshots.

But at some point, zfs receive says cannot receive: destination has been
modified since most recent snapshot. I am pretty sure nobody changed anything
at my destination filesystem and I also tried rolling back to an earlier
snapshot on the destination filesystem to make it clean again.

Here's an excerpt of the snapshots on my source filesystem:

# zfs list -rt snapshot pelotillehue/constant
NAME   USED  AVAIL
REFER  MOUNTPOINT
pelotillehue/[EMAIL PROTECTED] 236K  -
33.6G  -
pelotillehue/[EMAIL PROTECTED] 747K  -
46.0G  -
pelotillehue/[EMAIL PROTECTED]:nobackup-2006-11-22-00:00:06  3.07G  -
 116G  -
pelotillehue/[EMAIL PROTECTED]:nobackup-2006-11-29-00:00:00  18.9M  -
 115G  -
pelotillehue/[EMAIL PROTECTED]:nobackup-2006-12-01-00:00:03  10.9M  -
 115G  -
pelotillehue/[EMAIL PROTECTED]:nobackup-2006-12-08-00:00:00   606M  -
 105G  -
pelotillehue/[EMAIL PROTECTED]:nobackup-2006-12-15-00:00:01   167M  -
 105G  -
pelotillehue/[EMAIL PROTECTED]:nobackup-2006-12-22-00:00:00  5.31M  -
 105G  -
pelotillehue/[EMAIL PROTECTED]:nobackup-2006-12-29-00:00:01  1.90M  -
 105G  -
pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-01-00:00:01  1.26M  -
 105G  -
pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-08-00:00:00  15.2M  -
 109G  -
pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-15-00:00:00  17.5M  -
 109G  -

... (further lines omitted)


On the destination filesystem, snapshots have been replicated through
zfs send/receive up to the 2007-01-01 snapshot, so I do the following:

# zfs send -i pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-01-00:00:01
pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-08-00:00:00 | zfs receive
santiago/home/constant

This worked, but now, only seconds later:

# zfs send -i pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-08-00:00:00
pelotillehue/[EMAIL PROTECTED]:nobackup-2007-02-15-00:00:01 | zfs receive
santiago/home/constant
cannot receive: destination has been modified since most recent snapshot

Fails. So I try rolling back to the 2007-01-08 snapshot on the destination
filesystem to be clean again, but:

# zfs rollback santiago/home/[EMAIL PROTECTED]:nobackup-2007-01-08-00:00:00
# zfs send -i pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-08-00:00:00
pelotillehue/[EMAIL PROTECTED]:nobackup-2007-02-15-00:00:01 | zfs receive
santiago/home/constant
cannot receive: destination has been modified since most recent snapshot

Hmm, why does ZFS think my destination has been modified, although I didn't
do anything?

Another peculiar thing: zfs list on the destination snapshots says:

# zfs list -rt snapshot santiago/home/constant
NAMEUSED  AVAIL
 REFER  MOUNTPOINT
santiago/home/[EMAIL PROTECTED] 189K  -
 33.6G  -
santiago/home/[EMAIL PROTECTED] 670K  -
 46.0G  -
santiago/home/[EMAIL PROTECTED]:nobackup-2006-11-22-00:00:06  3.07G  -
  116G  -
santiago/home/[EMAIL PROTECTED]:nobackup-2006-11-29-00:00:00  18.4M  -
  115G  -
santiago/home/[EMAIL PROTECTED]:nobackup-2006-12-01-00:00:03  10.5M  -
  115G  -
santiago/home/[EMAIL PROTECTED]:nobackup-2006-12-08-00:00:00   603M  -
  105G  -
santiago/home/[EMAIL PROTECTED]:nobackup-2006-12-15-00:00:01   163M  -
  105G  -
santiago/home/[EMAIL PROTECTED]:nobackup-2006-12-22-00:00:00  4.87M  -
  105G  -
santiago/home/[EMAIL PROTECTED]:nobackup-2006-12-29-00:00:01  1.79M  -
  106G  -
santiago/home/[EMAIL PROTECTED]:nobackup-2007-01-01-00:00:01  1.16M  -
  106G  -
santiago/home/[EMAIL PROTECTED]:nobackup-2007-01-08-00:00:0057K  -
  109G  -

Note that the Used column for the 2007-01-08 snapshot says 57K on the
destination, but 15.2M on the source. Could it be that the reception of
the 2007-01-08 failed and ZFS didn't notice?

I've tried this multiple times, including destroying snapshots and rolling
back on the destination to the 2007-01-01 state, so what you see above is
already a second try of the same.

The other values vary too, but only slightly. Compression is turned on on
both pools. The source pool has been scrubbed on Monday with no known data
errors and the destination pool is brand new and I'm scrubbing it as we speak.

Best regards,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland

Summary: [zfs-discuss] Poor man's backup by attaching/detaching mirror drives on a _striped_ pool?

2007-04-12 Thread Constantin Gonzalez
Hi,

here's a quick summary of the answers I've seen so far:

- Splitting mirrors is a current practice with traditional volume
  management. The goal is to quickly and effortlessly create a clone of a
  storage volume/pool.

- Splitting mirrors with ZFS can be done, but it has to be done the
  hard way by resilvering, then unplugging the disk, then trying to
  import it somewhere else. zpool detach would render the detached disk
  unimportable.

- Another, cleaner way of splitting a mirror would be to export the
  pool, then disconnect one drive, then re-import again. After that,
  the disconnected drive needs to be zpool detach'ed from the mother,
  while the clone can then be imported and its missing mirrors
  detached as well. But this involves unmounting the pool so it can't
  be done without downtime.

- The supported alternative would be zfs snapshot, then zfs send/receive,
  but this introduces the complexity of snapshot management which
  makes it less simple, thus less appealing to the clone-addicted admin.

- There's an RFE for supporting splitting mirrors: 5097228
  http://bugs.opensolaris.org/view_bug.do?bug_id=5097228

IMHO, we should investigate if something like zpool clone would be useful.
It could be implemented as a script that recursively snapshots the source
pool, then zfs send/receives it to the destination pool, then copies all
properties, but the actual reason why people do mirror splitting in the
first place is because of its simplicity.

A zpool clone or a zpool send/receive command would be even simpler and less
error-prone than the tradition of splitting mirrors, plus it could be
implemented more efficiently and more reliably than a script, thus bringing
real additional value to administrators.

Maybe zpool clone or zpool send/receive would be the better way of implementing
5097228 in the first place?

Best regards,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
[EMAIL PROTECTED]
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Who modified my ZFS receive destination?

2007-04-12 Thread Constantin Gonzalez
Hi Trev,

Trevor Watson wrote:
 Hi Constantin,
 
 I had the same problem, and the solution was to make sure that the
 filesystem is not mounted on the destination system when you perform the
 zfs recv (zfs set mountpoint=none santiago/home).

thanks! This time it worked:

# zfs unmount santiago/home/constant
# zfs rollback santiago/home/[EMAIL PROTECTED]:nobackup-2007-01-08-00:00:00
# zfs send -i pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-08-00:00:00
pelotillehue/[EMAIL PROTECTED]:nobackup-2007-02-15-00:00:01 | zfs receive
santiago/home/constant
#

Still, this is kinda strange. This means, that we'll need to zfs unmount, then
zfs rollback last snapshot a lot when doing send/receive on a regular basis
(as in weekly, daily, hourly, minutely cron-jobs) to be sure. Or keep any
replicated filesystems unmounted _all_ the time.

Best regards,
   Constantin


 
 Trev
 
 Constantin Gonzalez wrote:
 Hi,

 I'm currently migrating a filesystem from one pool to the other through
 a series of zfs send/receive commands in order to preserve all snapshots.

 But at some point, zfs receive says cannot receive: destination has been
 modified since most recent snapshot. I am pretty sure nobody changed
 anything
 at my destination filesystem and I also tried rolling back to an earlier
 snapshot on the destination filesystem to make it clean again.

 Here's an excerpt of the snapshots on my source filesystem:

 # zfs list -rt snapshot pelotillehue/constant
 NAME  
 USED  AVAIL
 REFER  MOUNTPOINT
 pelotillehue/[EMAIL PROTECTED]
 236K  -
 33.6G  -
 pelotillehue/[EMAIL PROTECTED]
 747K  -
 46.0G  -
 pelotillehue/[EMAIL PROTECTED]:nobackup-2006-11-22-00:00:06 
 3.07G  -
  116G  -
 pelotillehue/[EMAIL PROTECTED]:nobackup-2006-11-29-00:00:00 
 18.9M  -
  115G  -
 pelotillehue/[EMAIL PROTECTED]:nobackup-2006-12-01-00:00:03 
 10.9M  -
  115G  -
 pelotillehue/[EMAIL PROTECTED]:nobackup-2006-12-08-00:00:00  
 606M  -
  105G  -
 pelotillehue/[EMAIL PROTECTED]:nobackup-2006-12-15-00:00:01  
 167M  -
  105G  -
 pelotillehue/[EMAIL PROTECTED]:nobackup-2006-12-22-00:00:00 
 5.31M  -
  105G  -
 pelotillehue/[EMAIL PROTECTED]:nobackup-2006-12-29-00:00:01 
 1.90M  -
  105G  -
 pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-01-00:00:01 
 1.26M  -
  105G  -
 pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-08-00:00:00 
 15.2M  -
  109G  -
 pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-15-00:00:00 
 17.5M  -
  109G  -

 ... (further lines omitted)


 On the destination filesystem, snapshots have been replicated through
 zfs send/receive up to the 2007-01-01 snapshot, so I do the following:

 # zfs send -i
 pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-01-00:00:01
 pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-08-00:00:00 | zfs
 receive
 santiago/home/constant

 This worked, but now, only seconds later:

 # zfs send -i
 pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-08-00:00:00
 pelotillehue/[EMAIL PROTECTED]:nobackup-2007-02-15-00:00:01 | zfs
 receive
 santiago/home/constant
 cannot receive: destination has been modified since most recent snapshot

 Fails. So I try rolling back to the 2007-01-08 snapshot on the
 destination
 filesystem to be clean again, but:

 # zfs rollback
 santiago/home/[EMAIL PROTECTED]:nobackup-2007-01-08-00:00:00
 # zfs send -i
 pelotillehue/[EMAIL PROTECTED]:nobackup-2007-01-08-00:00:00
 pelotillehue/[EMAIL PROTECTED]:nobackup-2007-02-15-00:00:01 | zfs
 receive
 santiago/home/constant
 cannot receive: destination has been modified since most recent snapshot

 Hmm, why does ZFS think my destination has been modified, although I
 didn't
 do anything?

 Another peculiar thing: zfs list on the destination snapshots says:

 # zfs list -rt snapshot santiago/home/constant
 NAME   
 USED  AVAIL
  REFER  MOUNTPOINT
 santiago/home/[EMAIL PROTECTED]
 189K  -
  33.6G  -
 santiago/home/[EMAIL PROTECTED]
 670K  -
  46.0G  -
 santiago/home/[EMAIL PROTECTED]:nobackup-2006-11-22-00:00:06 
 3.07G  -
   116G  -
 santiago/home/[EMAIL PROTECTED]:nobackup-2006-11-29-00:00:00 
 18.4M  -
   115G  -
 santiago/home/[EMAIL PROTECTED]:nobackup-2006-12-01-00:00:03 
 10.5M  -
   115G  -
 santiago/home/[EMAIL PROTECTED]:nobackup-2006-12-08-00:00:00  
 603M  -
   105G  -
 santiago/home/[EMAIL PROTECTED]:nobackup-2006-12-15-00:00:01  
 163M  -
   105G  -
 santiago/home/[EMAIL PROTECTED]:nobackup-2006-12-22-00:00:00 
 4.87M  -
   105G  -
 santiago/home/[EMAIL PROTECTED]:nobackup-2006-12-29-00:00:01 
 1.79M  -
   106G  -
 santiago/home/[EMAIL PROTECTED]:nobackup-2007-01-01-00:00:01 
 1.16M  -
   106G  -
 santiago/home/[EMAIL PROTECTED]:nobackup-2007-01-08-00:00:00   
 57K  -
   109G  -

 Note that the Used

Re: [zfs-discuss] ZFS vs. rmvolmgr

2007-04-11 Thread Constantin Gonzalez Schmitz
Hi,

sorry, I needed to be more clear:

Here's what I did:

1. Connect USB storage device (a disk) to machine
2. Find USB device through rmformat
3. Try zpool create on that device. It fails with:
   can't open /dev/rdsk/cNt0d0p0, device busy
4. svcadm disable rmvolmgr
5. Now zpool create works with that device and the pool gets created.
6. svcadm enable rmvolmgr
7. After that, everything works as expected, the device stays under control
   of the pool.

   can't open /dev/rdsk/cNt0d0p0, device busy
 
 Do you remember exactly what command/operation resulted in this error?

See above, it comes right after trying to create a zpool on that device.

 It is something that tries to open device exclusively.

So after ZFS opens the device exclusively, hald and rmvolmgr will ignore it?
What happens at boot time, is zfs then quicker in grabbing the device than
hald and rmvolmgr are?

 So far, I've just said svcadm disable -t rmvolmgr, did my thing, then
 said svcadm enable rmvolmgr.
 
 This can't possibly be true, because rmvolmgr does not open devices.

Hmm. I really remember to have done the above. Actually, I've been pulling
some hairs out trying to do zpools on external devices until I got the idea
of diasbling the rmvolmgr, then it worked.

 You'd need to also disable the 'hal' service. Run fuser on your device
 and you'll see it's one of the hal addons that keeps it open:

Perhaps something depended on rmvolmgr which release the device after I
disabled the service?

 For instance, I'm now running several USB disks with ZFS pools on
 them, and
 even after restarting rmvolmgr or rebooting, ZFS, the disks and rmvolmgr
 get along with each other just fine.
 
 I'm confused here. In the beginning you said that something got in the
 way, but now you're saying they get along just fine. Could you clarify.

After creating the pool, the device now belongs to ZFS. Now, ZFS seems to
be able to grab the device before anybody else.

 One possible workaround would be to match against USB disk's serial
 number and tell hal to ignore it using fdi(4) file. For instance, find
 your USB disk in lshal(1M) output, it will look like this:
 
 udi = '/org/freedesktop/Hal/devices/pci_0_0/pci1028_12c_1d_7/storage_5_0'
   usb_device.serial = 'DEF1061F7B62'  (string)
   usb_device.product_id = 26672  (0x6830)  (int)
   usb_device.vendor_id = 1204  (0x4b4)  (int)
   usb_device.vendor = 'Cypress Semiconductor'  (string)
   usb_device.product = 'USB2.0 Storage Device'  (string)
   info.bus = 'usb_device'  (string)
   info.solaris.driver = 'scsa2usb'  (string)
   solaris.devfs_path = '/[EMAIL PROTECTED],0/pci1028,[EMAIL 
 PROTECTED],7/[EMAIL PROTECTED]'  (string)
 
 You want to match an object with this usb_device.serial property and set
 info.ignore property to true. The fdi(4) would look like this:

thanks, this sounds just like what I was looking for.

So the correct way of having a zpool out of external USB drives is to:

1. Attach the drives
2. Find their USB serial numbers with lshal
3. Set up an fdi file that matches the disks and tells hal to ignore them

The naming of the file

  /etc/hal/fdi/preprobe/30user/10-ignore-usb.fdi

sounds like init.d style directory and file naming, ist this correct?

Best regards,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Poor man's backup by attaching/detaching mirror drives on a _striped_ pool?

2007-04-11 Thread Constantin Gonzalez Schmitz
Hi Mark,

Mark J Musante wrote:
 On Tue, 10 Apr 2007, Constantin Gonzalez wrote:
 
 Has anybody tried it yet with a striped mirror? What if the pool is
 composed out of two mirrors? Can I attach devices to both mirrors, let
 them resilver, then detach them and import the pool from those?
 
 You'd want to export them, not detach them.  Detaching will overwrite the
 vdev labels and make it un-importable.

thank you for the export/import idea, it does sound cleaner from a ZFS
perspective, but comes at the expense of temporarily unmounting the filesystems.

So, instead of detaching, would unplugging, then detaching work?

I'm thinking something like this:

 - zpool create tank mirror dev1 dev2 dev3
 - {physically move dev3 to new box}
 - zpool detach tank dev3

On the new box:
 - zpool import tank
 - zpool detach tank dev1
 - zpool detach tank dev2

This should work for one disk, and I assume this would also work for multiple
disks?

Thinking along similar lines, would it be a useful RFE to allow asynchronous
mirroring like this:

- dev1, dev2 are both 250GB, dev3 is 500GB
- zpool create tank mirror dev1,dev2 dev3

This means that half of dev3 would mirror dev1, the other half would mirror dev2
and dev1,dev2 is a regular stripe.

The utility of this would be for cases where customer have set up mirrors, then
need to replace disks or upgrade the mirror after a long time, when bigger disks
are easier to get than smaller ones and while reusing older disks.

Best regards,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Poor man's backup by attaching/detaching mirror

2007-04-11 Thread Constantin Gonzalez Schmitz
Hi,

 How would you access the data on that device?
 
 Presumably, zpool import.

yes.

 This is basically what everyone does today with mirrors, isn't it? :-)

sure. This may not be pretty, but it's what customers are doing all the time
with regular mirrors, 'cause it's quick, easy and reliable.

Cheers,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS vs. rmvolmgr

2007-04-10 Thread Constantin Gonzalez
Hi,

while playing around with ZFS and USB memory sticks or USB harddisks,
rmvolmgr tends to get in the way, which results in a

  can't open /dev/rdsk/cNt0d0p0, device busy

error.

So far, I've just said svcadm disable -t rmvolmgr, did my thing, then
said svcadm enable rmvolmgr.

Is there a more elegant approach that tells rmvolmgr to leave certain
devices alone on a per disk basis?

For instance, I'm now running several USB disks with ZFS pools on them, and
even after restarting rmvolmgr or rebooting, ZFS, the disks and rmvolmgr
get along with each other just fine.

What and how does ZFS tell rmvolmgr that a particular set of disks belongs
to ZFS and should not be treated as removable?

Best regards,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Setting up for zfsboot

2007-04-05 Thread Constantin Gonzalez
Hi,

 - RAID-Z is _very_ slow when one disk is broken.
 Do you have data on this? The reconstruction should be relatively cheap
 especially when compared with the initial disk access.
 
 Also, what is your definition of broken?  Does this mean the device
 appears as FAULTED in the pool status, or that the drive is present and
 not responding?  If it's the latter, this will be fixed by my upcoming
 FMA work.

sorry, the _very_ may be exaggarated and depending much on the load of
the system and the config.

I'm referring to a couple of posts and anecdotal experience from colleagues.
This means that indeed slow or very slow may be a mixture of
reconstruction overhead and device timeout issue.

So, it's nice to see that the upcoming FMA code will fix some of the slowness
issues.

Did anybody measure how much CPU overhead RAID-Z and RAID-Z2 parity
computation induces, both for writes and for reads (assuming a data disk
is broken)? This data would be useful when arguing for a software RAID
scheme in front of hardware-RAID addicted customers.

Best regards,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Setting up for zfsboot

2007-04-04 Thread Constantin Gonzalez
Hi,

 Now that zfsboot is becoming available, I'm wondering how to put it to
 use. Imagine a system with 4 identical disks. Of course I'd like to use

you lucky one :).

 raidz, but zfsboot doesn't do raidz. What if I were to partition the
 drives, such that I have 4 small partitions that make up a zfsboot
 partition (4 way mirror), and the remainder of each drive becomes part
 of a raidz?

Sounds good. Performance will suffer a bit, as ZFS thinks it has two pools
with 4 spindels each, but it should still perform better than the same on
a UFS basis.

You may also want to have two 2-way mirrors and keep the second for other
purposes such as a scratch space for zfs migration or as spare disks for
other stuff.

 Do I still have the advantages of having the whole disk
 'owned' by zfs, even though it's split into two parts?

I'm pretty sure that this is not the case:

- ZFS has no guarantee that someone will do something else with that other
  partition, so it can't assume the right to turn on disk cache for the whole
  disk.

- Yes, it could be smart and realize that it does have the whole disk, only
  split up across two pools, but then I assume that this is not your typical
  enterprise class configuration and so it probably didn't get implemented
  that way.

I'd say that not being able to benefit from the disk drive's cache is not
as bad in the face of ZFS' other advantages, so you can probably live with
that.

 Swap would probably have to go on a zvol - would that be best placed on
 the n-way mirror, or on the raidz?

I'd place it onto the mirror for performance reasons. Also, it feels cleaner
to have all your OS stuff on one pool and all your user/app/data stuff on
another. This is also recommended by the ZFS Best Practices Wiki on
www.solarisinternals.com.

Now back to the 4 disk RAID-Z: Does it have to be RAID-Z? Maybe you might want
to reconsider using 2 2-way mirrors:

- RAID-Z is slow when writing, you basically get only one disk's bandwidth.
  (Yes, with variable block sizes this might be slightly better...)

- RAID-Z is _very_ slow when one disk is broken.

- Using mirrors is more convenient for growing the pool: You run out of space,
  you add two disks, and get better performance too. No need to buy 4 extra
  disks for another RAID-Z set.

- When using disks, you need to consider availability, performance and space.
  Of all the three, space is the cheapest. Therefore it's best to sacrifice
  space and you'll get better availability and better performance.

Hope this helps,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Setting up for zfsboot

2007-04-04 Thread Constantin Gonzalez
Hi,

Manoj Joseph wrote:

 Can write-cache not be turned on manually as the user is sure that it is
 only ZFS that is using the entire disk?

yes it can be turned on. But I don't know if ZFS would then know about it.

I'd still feel more comfortably with it being turned off unless ZFS itself
does it.

But maybe someone from the ZFS team can clarify this.

Cheers,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Pathological ZFS performance

2007-03-30 Thread Constantin Gonzalez
/product.asp?item=N82E16812156010
 
 It feels kind of nuts, but I have to think this would perform
 better than what I have now.  This would cost me the one SATA
 drive I'm using now in a smaller pool.
 
 Rob T
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Migrating a pool

2007-03-29 Thread Constantin Gonzalez
Hi Matt,

cool, thank you for doing this!

I'll still write my script since today my two shiny new 320GB USB
disks will arrive :).

I'll add to that the feature to first send all current snapshots, then
bring down the services that depend on the filesystem, unmount the old
fs, send a final incremental snapshot then zfs set mountpoint=x to the new
filesystem, then bring up the services again.

Hope this works as I imagine.

Cheers,
   Constantin

Matthew Ahrens wrote:
 Constantin Gonzalez wrote:
 What is the most elegant way of migrating all filesystems to the new
 pool,
 including snapshots?

 Can I do a master snapshot of the whole pool, including
 sub-filesystems and
 their snapshots, then send/receive them to the new pool?

 Or do I have to write a script that will individually snapshot all
 filesystems
 within my old pool, then run a send (-i) orgy?
 
 Unfortunately, you will need to make/find a script to do the various
 'zfs send -i' to send each snapshot of each filesystem.
 
 I am working on 'zfs send -r', which will make this a snap:
 
 # zfs snapshot -r [EMAIL PROTECTED]
 # zfs send -r [EMAIL PROTECTED] | zfs recv ...
 
 You'll also be able to do 'zfs send -r -i @yesterday [EMAIL PROTECTED]'.
 
 See RFE 6421958.
 
 --matt

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Migrating a pool

2007-03-27 Thread Constantin Gonzalez
Hi,

soon it'll be time to migrate my patchwork pool onto a real pair of
mirrored (albeit USB-based) external disks.

Today I have about half a dozen filesystems in the old pool plus dozens of
snapshots thanks to Tim Bray's excellent SMF snapshotting service.

What is the most elegant way of migrating all filesystems to the new pool,
including snapshots?

Can I do a master snapshot of the whole pool, including sub-filesystems and
their snapshots, then send/receive them to the new pool?

Or do I have to write a script that will individually snapshot all filesystems
within my old pool, then run a send (-i) orgy?

Best regards,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Migrating a pool

2007-03-27 Thread Constantin Gonzalez
Hi,

 Today I have about half a dozen filesystems in the old pool plus dozens of
 snapshots thanks to Tim Bray's excellent SMF snapshotting service.

I'm sorry I mixed up Tim's last name. The fine guy who wrote the SMF snapshot
service is Tim Foster. And here's the link:

  http://blogs.sun.com/timf/entry/zfs_automatic_snapshots_0_8

There doesn't seem to be an easy answer to the original question of how to
migrate a complete pool. Writing a script with a snapshot send/receive
party seems to be the only approach.

I wish I could zfs snapshot pool then zfs send pool | zfs receive dest and
all blocks would be transferred as they are, including all embedded snapshots.

Is that already an RFE?

Best regards,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 2-way mirror or RAIDZ?

2007-02-27 Thread Constantin Gonzalez
Hi,

 I have a shiny new Ultra 40 running S10U3 with 2 x 250Gb disks.

congratulations, this is a great machine!

 I want to make best use of the available disk space and have some level
 of redundancy without impacting performance too much.
 
 What I am trying to figure out is: would it be better to have a simple
 mirror of an identical 200Gb slice from each disk or split each disk
 into 2 x 80Gb slices plus one extra 80Gb slice on one of the disks to
 make a 4 + 1 RAIDZ configuration?

you probably want to mirror the OS slice of the disk to protect your OS and
its configuration from the loss of a whole disk. Do it with SVM today and
upgrade to a bootable ZFS mirror in the future.

The OS slice needs only to be 5GB in size if you follow the standard
recommendation, but 10 GB is probably a safe and easy to remember bet, leaving
you some extra space for apps etc.

Plan to be able to live upgrade into new OS versions. You may break up the
mirror to do so, but this is kinda complicated and error-prone.
Disk space is cheap, so I'd rather recommend you safe two slices per disk for
creating 2 mirrored boot environments where you can LU back and forth.

For swap, allocate an extra slice per disk and of course mirror swap too.
1GB swap should be sufficient.

Now, you can use the rest for ZFS. Having only two physical disks, there is
no good reason to do something other than mirroring. If you created 4+1
slices for RAID-Z, you would always lose the whole pool if one disk broke.
Not good. You could play russian roulette by having 2+3 slices and RAID-Z2
and hoping that the right disk fails, but that isn't s good practice either
and it wouldn't buy you any redundant space either, just leave an extra
unprotected scratch slice.

So, go for the mirror, it gives you good performance and less headaches.

If you can spare the money, try increasing the number of disks. You'd still
need to mirror boot and swap slices, but then you would be able to use a real
RAID-Z config for the rest, enabling to leverage more disk capacity at a good
redundancy/performance compromise.

Hope this helps,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/

Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland Boemer
Vorsitzender des Aufsichtsrates: Martin Haering
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FYI: ZFS on USB sticks (from Germany)

2007-02-05 Thread Constantin Gonzalez
Hi,

Artem: Thanks. And yes, Peter S. is a great actor!

Christian Mueller wrote:
 who is peter stormare? (sorry, i'm from old europe...)

as usual, Wikipedia knows it:

  http://en.wikipedia.org/wiki/Peter_Stormare

and he's european too :). Great actor, great movies. I particularly like
Constantine, not just because of the name, of course :)

Out budget is quite limited at the moment, but after the 1.000.000th view on
YouTube/Google Video we might want to reconsider our cast for the next
episode :).

But first, we need to get the english version finished...

Cheers,
   Constantin

 
 thx  bye
 christian
 
 Artem Kachitchkine schrieb:

 Brilliant video, guys.

 Totally agreed, great work.

 Boy, would I like to see Peter Stormare in that video %)

 -Artem.
 

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] FYI: ZFS on USB sticks (from Germany)

2007-02-02 Thread Constantin Gonzalez
Hi Richard,

Richard Elling wrote:
 FYI,
 here is an interesting blog on using ZFS with a dozen USB drives from
 Constantin.
 http://blogs.sun.com/solarium/entry/solaris_zfs_auf_12_usb

thank you for spotting it :).

We're working on translating the video (hope we get the lip-syncing right...)
and will then re-release it in an english version. BTW, we've now hosted
the video on YouTube so it can be embedded in the blog.

Of course, I'll then write an english version of the blog entry with the
tech details.

Please hang on for a week or two... :).

Best regards,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Global Systems Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: How much do we really want zpool remove?

2007-01-31 Thread Constantin Gonzalez
Hi,

I need to be a little bit more precise in how I formulate comments:

1. Yes, zpool remove is a desirable feature, no doubt about that.

2. Most of the cases where customers ask for zpool remove can be solved
   with zfs send/receive or with zpool replace. Think Pareto's 80-20 rule.

   2a. The cost of doing 2., including extra scratch storage space or scheduling
   related work into planned downtimes is smaller than the cost of not using
   ZFS at all.

   2b. Even in the remaining 20% of cases (figuratively speaking, YMMV) where
   zpool remove would be the only solution, I feel that the cost of
   sacrificing the extra storage space that would have become available
   through zpool remove is smaller than the cost of the project not
   benefitting from the rest of ZFS' features.

3. Bottom line: Everybody wants zpool remove as early as possible, but IMHO
   this is not an objective barrier to entry for ZFS.

Note my use of the word objective. I do feel that we have to implement
zpool remove for subjective reasons, but that is a non technical matter.

Is this an agreeable summary of the situation?

Best regards,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Client Solutionshttp://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: can I use zfs on just a partition?

2007-01-26 Thread Constantin Gonzalez Schmitz
Hi,

 When you do the initial install, how do you do the slicing?
 
 Just create like:
 / 10G
 swap 2G
 /altroot 10G
 /zfs restofdisk

yes.

 Or do you just create the first three slices and leave the rest of the
 disk untouched?  I understand the concept at this point, just trying to
 explain to a third party exactly what they need to do to prep the system
 disk for me :)

No. You need to be able to tell ZFS what to use. Hence, if your pool is
created at the slice level, you need to create a slice for it.

So the above is the way to go.

And yes, you only should do this on laptos and other machines where you only
have 1 disk or are otherwise very disk-limited :).

Best regards,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Client Solutionshttp://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: can I use zfs on just a partition?

2007-01-25 Thread Constantin Gonzalez Schmitz
Hi Tim,

 Essentially I'd like to have the / and swap on the first 60GB of the disk.  
 Then use the remaining 100GB as a zfs partition to setup zones on.  Obviously 
 the snapshots are extremely useful in such a setup :)
 
 Does my plan sound feasible from both a usability and performance standpoint?

yes, it works, I do it on my laptop all the time:

# format
Searching for disks...done


AVAILABLE DISK SELECTIONS:
   0. c0d0 DEFAULT cyl 48451 alt 2 hd 64 sec 63
  /[EMAIL PROTECTED],0/[EMAIL PROTECTED],1/[EMAIL PROTECTED]/[EMAIL 
PROTECTED],0
Specify disk (enter its number): 0
selecting c0d0
Controller working list found
[disk formatted, defect list found]
Warning: Current Disk has mounted partitions.
/dev/dsk/c0d0s0 is currently mounted on /. Please see umount(1M).
/dev/dsk/c0d0s1 is currently used by swap. Please see swap(1M).
/dev/dsk/c0d0s3 is part of active ZFS pool poolchen. Please see zpool(1M).
/dev/dsk/c0d0s4 is in use for live upgrade /. Please see ludelete(1M).

c0d0s5 is also free and can be used as a third live upgrade partition.

My recommendation: Use at least 2 slices for the OS so you can enjoy live
upgrade, one for swap and the rest for ZFS.

Performance-wise, this is of course not optimal, but perfectly feasible. I have
an Acer Ferrari 4000 which is known to have a slow disk, but it still works
great for what I do (email, web, Solaris demos, presentations, occasional
video).

More complicated things are possible as well. The following blog entry:

  http://blogs.sun.com/solarium/entry/tetris_spielen_mit_zfs

(sorry it's german) ilustrates how my 4 disks at home are sliced in order
to get OS partitions on multiple disks, Swap and as much ZFS space as
possible at acceptable redundancy despite differently-sized disks. Check out
the graphic in the above entry to see what I mean. Works great (but I had to
use -f to zpool create :) ) and gives me enough performance for all my
home-serving needs.

Hope this helps,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Client Solutionshttp://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] poor NFS/ZFS performance

2006-11-23 Thread Constantin Gonzalez
Hi,

I haven't followed all the details in this discussion, but it seems to me
that it all breaks down to:

- NFS on ZFS is slow due to NFS being very conservative when sending
  ACK to clients only after writes have definitely committed to disk.

- Therefore, the problem is not that much ZFS specific, it's just a
  conscious focus on data correctness vs. speed on ZFS/NFS' part.

- Currently known workarounds include:

  - Sacrifice correctness for speed by disabling ZIL or using a less
conservative network file system.

  - Optimize NFS/ZFS to get as much speed as possible within the constraints
of the NFS protocol.

But one aspect I haven't seen so far is: How can we optimize ZFS on a more
hardware oriented level to both achieve good NFS speeds and still preserve
the NFS level of correctness?

One possibility might be to give the ZFS pool enough spindles so it can
comfortably handle many small IOs fast enough for them not to become
NFS commit bottlenecks. This may require some tweaking on the ZFS side so
it doesn't queue up write IOs for too long as to not delay commits more than
necessary.

Has anyone investigated this branch or am I too simplistic in my view of the
underlying root of the problem?

Best regards,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Client Solutionshttp://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] poor NFS/ZFS performance

2006-11-23 Thread Constantin Gonzalez
Hi Roch,

thanks, now I better understand the issue :).

 Nope.  NFS  is slow   for single threaded  tar  extract. The
 conservative approach of NFS is needed with the NFS protocol
 in order to ensure client's side data integrity. Nothing ZFS 
 related.

...

 NFS is plenty fast in a throughput context (not that it does 
 not need work). The complaints we have here are about single 
 threaded code.

ok, then it's just a single thread client latency of request issue, which
(as increasingly often) software vendors need to realize. The proper way to
deal with this, then, is to multi-thread on the application layer.

Reminds my of many UltraSPARC T1 issues, which don't sit in hardware nor
OS, but in the way applications have been developed for years :).

Best regards,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Client Solutionshttp://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Newbie questions about drive problems

2006-08-31 Thread Constantin Gonzalez
Hi,

 I have 3 drives.
 The first one will be the primary/boot drive under UFS. The 2 others will 
 become a mirrored pool with ZFS.
 Now, I have problem with the boot drive (hardware or software), so all the 
 data on my mirrored pool are ok?
 How can I restore this pool? When I create the pool, do I need to save the 
 properties?

All metadata for the pool is stored inside the pool. If the boot disk fails in
any way, all pool data is safe.

Worst case might be that you have to reinstall everything on the boot disk.
After that, you just say zfs import to get your pool back and everything
will be ok.

 What happend when a drive crash when ZFS write some data on a raidz pool?

If the crash occurs in the middle of a write operation, then the new data
blocks will not be valid. ZFS will then revert back to the state before
writing the new set of blocks. Therefore you'll have 100% data integrity
but of course the new blocks that were written to the pool will be lost.

 Do the pool go to the degraded state or faulted state?

No, the pool will come up as online. The degraded state is only for devices
that aren't accessible any more and the faulted state is for pools that do
not have enough valid devices to be complete.

Hope this helps,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Client Solutionshttp://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS Load-balancing over vdevs vs. real disks?

2006-08-21 Thread Constantin Gonzalez
Hi,

my ZFS pool for my home server is a bit unusual:

pool: pelotillehue
 state: ONLINE
 scrub: scrub completed with 0 errors on Mon Aug 21 06:10:13 2006
config:

NAMESTATE READ WRITE CKSUM
pelotillehue  ONLINE   0 0 0
  mirrorONLINE   0 0 0
c0d1s5  ONLINE   0 0 0
c1d0s5  ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c0d0s3  ONLINE   0 0 0
c0d1s3  ONLINE   0 0 0
c1d0s3  ONLINE   0 0 0
c1d1s3  ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c0d1s4  ONLINE   0 0 0
c1d0s4  ONLINE   0 0 0
c1d1s4  ONLINE   0 0 0

The reason is simple: I have 4 differently-sized disks (80, 80, 200, 250 GB.
It's a home server and so I crammed whatever I could find elswhere into that box
:) ) and my goal was to create the biggest pool possible but retaining some
level of redundancy.

The above config therefore groups the biggest slices that can be created on all
four disks into the 4-disk RAID-Z vdev, then the biggest slices that can be
created on 3 disks into the 3-disk RAID-Z, then two large slices remain which
are mirrored. It's like playing Tetris with disk slices... But the pool can
tolerate 1 broken disk and it gave me maximum storage capacity, so be it.

This means that we have one pool with 3 vdevs that access up to 3 different
sliced on the same physical disk.

Question: Does ZFS consider the underlying physical disks when load-balancing
or does it only load-balance across vdevs thereby potentially overloading
physical disks with up to 3 parallel requests per physical disk at once?

I'm pretty sure ZFS is very intelligent and will do the right thing, but a
confirmation would be nice here.

Best regards,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Client Solutionshttp://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Proposal: user-defined properties

2006-08-18 Thread Constantin Gonzalez Schmitz
Hi Eric,

this is a great proposal and I'm sure this is going to help administrators
a lot.

One small question below:

 Any property which contains a colon (':') is defined as a 'user
 property'.  The name can contain alphanumeric characters, plus the
 following special characters: ':', '-', '.', '_'.  User properties are
 always strings, and are always inherited.  No additional validation is
 done on the contents.  Properties are set and retrieved through the
 standard mechanisms: 'zfs set', 'zfs get', and 'zfs inherit'.

   # zfs list -o name,local:department
   NAME  LOCAL:DEPARTMENT
   test  12345
   test/foo  12345
   # zfs set local:department=67890 test/foo
   # zfs inherit local:department test
   # zfs get -s local -r all test 
   NAME  PROPERTY  VALUE  SOURCE
   test/foo  local:department  12345  local
   # zfs list -o name,local:department
   NAME  LOCAL:DEPARTMENT
   test  -
   test/foo  12345

the example suggests that properties may be case-insensitive. Is that the case
(sorry for the pun)? If so, that should be noted in the user defined property
definition just for clarity.

Best regards,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Client Solutionshttp://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Home Server with ZFS

2006-08-18 Thread Constantin Gonzalez Schmitz
Hi,

 What i dont know is what happens if the boot disk dies? can i replace
 is, install solaris again and get it to see the zfs mirror?
 
 As I understand it, this be possible, but I haven't tried it and I'm
 not an expert Solaris admin.  Some ZFS info is stored in a persistent
 file on your system disk, and you may have to do a little dance to get
 around that.  It's worth researching and practicing in advance :-).

IIRC, then ZFS has all relevant information stored inside the pool. So you
should be able to install a new OS into the replacement disk, then say
zpool import (possibly with -d and the devices where the mirror lives)
to re-import the pool.

But I haven't really tried it myself :).

All in all, ZFS is an excellent choice for a home server. I use ZFS as a video
storage for a digital set top box (quotas are really handy here), as a storage
for my music collection, as a backup storage for important data (including
photos), etc.

I'm currently juggling around 4 differently sized disks into a new config
with the goal of getting as much storage as possible out of them at a minimum
level of redundance. Interesting, Teris-like calculation exercise that I'd be
happy to blog about when I'm done.

Feel free to visit my blog for how to set up your home server as a ZFS iTunes
streaming server :).

Best regards,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Client Solutionshttp://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Enabling compression/encryption on a populated filesystem

2006-07-18 Thread Constantin Gonzalez
Hi,

there might be value in a zpool scrub -r (as in re-write blocks) other than
the prior discussion on encryption and compression.

For instance, a bit that is just about to rot might not be detected with a
regular zfs scrub but it would be rewritten with a re-writing scrub.

It would also exercise the writing muscles on disks that don't see a lot of
writing, such as archives or system disks, thereby detecting any degradation
that affects writing of data.

Of course the re-writing must be 100% safe, but that can be done with COW
quite easily.

Then, admins would for instance run a zpool scrub every week and maybe a
zpool scrub -r every month or so.

Just my 2 cents,
  Constantin


Luke Scharf wrote:
 Darren J Moffat wrote:
 Buth the reason thing is how do you tell the admin its done now the
 filesystem is safe.   With compression you don't generally care if
 some old stuff didn't compress (and with the current implementation it
 has to compress a certain amount or it gets written uncompressed
 anyway).  With encryption the human admin really needs to be told. 
 As a sysadmin, I'd be happy with another scrub-type command.  Something
 with the following meaning:
 
 Reapply all block-level properties such as compression, encryption,
 and checksum to every block in the volume.  Have the admin come back
 tomorrow and run 'zpool status' too see if it's zone. 
 
 Mad props if I can do this on a live filesystem (like the other ZFS
 commands, which also get mad props for being good tools).
 
 A natural command for this would be something like zfs blockscrub
 tank/volume.  Also, zpool blockscrub tank would make sense to me as
 well, even though it might touch more data.
 
 Of course, it's easy for me to just say this, since I'm not thinking
 about the implementation very deeply...
 
 -Luke
 
 
 
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Client Solutionshttp://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [raidz] file not removed: No space left on device

2006-07-04 Thread Constantin Gonzalez
Hi Eric,

Eric Schrock wrote:
 You don't need to grow the pool.  You should always be able truncate the
 file without consuming more space, provided you don't have snapshots.
 Mark has a set of fixes in testing which do a much better job of
 estimating space, allowing us to always unlink files in full pools
 (provided there are no snapshots, of course).  This provides much more
 logical behavior by reserving some extra slop.

is this a planned and not yet implemented functionality or why did Tatjana
see the not able to rm behaviour?

Or should she use unlink(1M) in these cases?

Best regards,
   Constantin

 
 - Eric
 
 On Mon, Jul 03, 2006 at 02:23:06PM +0200, Constantin Gonzalez wrote:
 Hi,

 of course, the reason for this is the copy-on-write approach: ZFS has
 to write new blocks first before the modification of the FS structure
 can reflect the state with the deleted blocks removed.

 The only way out of this is of course to grow the pool. Once ZFS learns
 how to free up vdevs this may become a better solution because you can then
 shrink the pool again after the rming.

 I expect many customers to run into similar problems and I've already gotten
 a number of what if the pool is full questions. My answer has always been
 No file system should be used up more than 90% for a number of reasons, but
 in practice this is hard to ensure.

 Perhaps this is a good opportunity for an RFE: ZFS should reserve enough
 blocks in a pool in order to always be able to rm and destroy stuff.

 Best regards,
Constantin

 P.S.: Most US Sun employees are on vacation this week, so don't be alarmed
 if the really good answers take some time :).
 
 --
 Eric Schrock, Solaris Kernel Development   http://blogs.sun.com/eschrock

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Client Solutionshttp://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [raidz] file not removed: No space left on device

2006-07-03 Thread Constantin Gonzalez
Hi,

of course, the reason for this is the copy-on-write approach: ZFS has
to write new blocks first before the modification of the FS structure
can reflect the state with the deleted blocks removed.

The only way out of this is of course to grow the pool. Once ZFS learns
how to free up vdevs this may become a better solution because you can then
shrink the pool again after the rming.

I expect many customers to run into similar problems and I've already gotten
a number of what if the pool is full questions. My answer has always been
No file system should be used up more than 90% for a number of reasons, but
in practice this is hard to ensure.

Perhaps this is a good opportunity for an RFE: ZFS should reserve enough
blocks in a pool in order to always be able to rm and destroy stuff.

Best regards,
   Constantin

P.S.: Most US Sun employees are on vacation this week, so don't be alarmed
if the really good answers take some time :).

Tatjana S Heuser wrote:
 On a system still running nv_30, I've a small RaidZ filled to the brim:
 
 2 3 [EMAIL PROTECTED] pts/9 ~ 78# uname -a
 SunOS mir 5.11 snv_30 sun4u sparc SUNW,UltraAX-MP
 
 0 3 [EMAIL PROTECTED] pts/9 ~ 50# zfs list
 NAME   USED  AVAIL  REFER  MOUNTPOINT
 mirpool1  33.6G  0   137K  /mirpool1
 mirpool1/home 12.3G  0  12.3G  /export/home
 mirpool1/install  12.9G  0  12.9G  /export/install
 mirpool1/local1.86G  0  1.86G  /usr/local
 mirpool1/opt  4.76G  0  4.76G  /opt
 mirpool1/sfw   752M  0   752M  /usr/sfw
 
 Trying to free some space is meeting a lot of reluctance, though:
 0 3 [EMAIL PROTECTED] pts/9 ~ 51# rm debug.log 
 rm: debug.log not removed: No space left on device
 0 3 [EMAIL PROTECTED] pts/9 ~ 55# rm -f debug.log
 2 3 [EMAIL PROTECTED] pts/9 ~ 56# ls -l debug.log 
 -rw-r--r--   1 th12242027048 Jun 29 23:24 debug.log
 0 3 [EMAIL PROTECTED] pts/9 ~ 58# : debug.log 
 debug.log: No space left on device.
 0 3 [EMAIL PROTECTED] pts/9 ~ 63# ls -l debug.log
 -rw-r--r--   1 th12242027048 Jun 29 23:24 debug.log
 
 There are no snapshots, so removing/clearing the files /should/ 
 be a way to free some space there.
 
 Of course this is the same filesystem where zdb dumps core 
 - see:
 
 *Synopsis*: zdb dumps core - bad checksum
 http://bt2ws.central.sun.com/CrPrint?id=6437157
 *Change Request ID*: 6437157
 
 (zpool reports the RaidZ pool as healthy while
 zdb crashes with a 'bad checksum' message.)
  
  
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Client Solutionshttp://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] add_install_client and ZFS and SMF incompatibility

2006-06-23 Thread Constantin Gonzalez
Hi,

I just set up an install server on my notebook and of course all the installer
data is on a ZFS volume. I love the zfs compression=on command!

It seems that the standard ./add_install_client script from the S10U2 Tools
directory creates an entry in /etc/vfstab for a loopback mount of the Solaris
miniroot into the /tftpboot directory.

Unfortunately, at boot time (I'm using Nevada build 39), the mount_all
script tries to mount the loopback mount from /vfstab before ZFS gets its
filesystems mounted.

So the SMF filesystem/local method fails and I have to either mount all ZFS
filesystems from hand, then re-run mount_all or replace the vfstab entry with
a simple symlink. Which only works until you say add_install_client the next
time.

Is this a known issue?

Best regards,
   Constantin

-- 
Constantin GonzalezSun Microsystems GmbH, Germany
Platform Technology Group, Client Solutionshttp://www.sun.de/
Tel.: +49 89/4 60 08-25 91   http://blogs.sun.com/constantin/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss