Re: [zfs-discuss] RFE: Un-dedup for unique blocks

2013-01-22 Thread Tomas Forsman
On 22 January, 2013 - Darren J Moffat sent me these 0,6K bytes:

> On 01/21/13 17:03, Sa?o Kiselkov wrote:
>> Again, what significant features did they add besides encryption? I'm
>> not saying they didn't, I'm just not aware of that many.
>
> Just a few examples:
>
> Solaris ZFS already has support for 1MB block size.
>
> Support for SCSI UNMAP - both issuing it and honoring it when it is the  
> backing store of an iSCSI target.

Would this apply to say a SATA SSD used as ZIL? (which we have, a
vertex2ex with supercap)

/Tomas
-- 
Tomas Forsman, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Odd snapshots exposed in Solaris 11.1

2013-01-22 Thread Tomas Forsman
On 22 January, 2013 - Ian Collins sent me these 0,9K bytes:

> Since upgrading to Solaris 11.1, I've started seeing snapshots like
>
> tank/vbox/shares%VMs
>
> appearing with zfs list -t snapshot.
>
> I thought snapshots with a % in their name where private objects created  
> during a send/receive operation.  These snapshots don't have many  
> properties:
>
> zfs get all tank/vbox/shares%VMs
> NAME  PROPERTYVALUE  SOURCE
> tank/vbox/shares%VMs  creationTue Jan 15  9:15 2013  -
> tank/vbox/shares%VMs  mountpoint  /vbox/shares   -
> tank/vbox/shares%VMs  share.* ...local
> tank/vbox/shares%VMs  zoned   offdefault
>
> Which is casing one of my scripts grief.

Ignore snapshots that have %, only use those with @ ?

> Does anyone know why these are showing up?

They're used for new style sharing in s11.1 as well.

zfs list -t share

/Tomas
-- 
Tomas Forsman, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] RFE: Un-dedup for unique blocks

2013-01-20 Thread Tomas Forsman
On 19 January, 2013 - Jim Klimov sent me these 2,0K bytes:

> Hello all,
>
>   While revising my home NAS which had dedup enabled before I gathered
> that its RAM capacity was too puny for the task, I found that there is
> some deduplication among the data bits I uploaded there (makes sense,
> since it holds backups of many of the computers I've worked on - some
> of my homedirs' contents were bound to intersect). However, a lot of
> the blocks are in fact "unique" - have entries in the DDT with count=1
> and the blkptr_t bit set. In fact they are not deduped, and with my
> pouring of backups complete - they are unlikely to ever become deduped.

Another RFE would be 'zfs dedup mypool/somefs' and basically go through
and do a one-shot dedup. Would be useful in various scenarios. Possibly
go through the entire pool at once, to make dedups intra-datasets (like
"the real thing").

/Tomas
-- 
Tomas Forsman, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris 11 System Reboots Continuously Because of a ZFS-Related Panic (7191375)

2013-01-14 Thread Tomas Forsman
On 14 January, 2013 - Cindy Swearingen sent me these 1,0K bytes:

> I believe the bug.oraclecorp.com URL is accessible with a support
> contract, but its difficult for me to test.

Support contract or not, the domain is not exposed to the internet.

> I should have mentioned it. I apologize.
>
> cs
>
> On 01/14/13 14:02, Nico Williams wrote:
>> On Mon, Jan 14, 2013 at 1:48 PM, Tomas Forsman  wrote:
>>>> https://bug.oraclecorp.com/pls/bug/webbug_print.show?c_rptno=15852599
>>>
>>> Host oraclecorp.com not found: 3(NXDOMAIN)
>>>
>>> Would oracle.internal be a better domain name?
>>
>> Things like that cannot be changed easily.  They (Oracle) are stuck
>> with that domainname for the forseeable future.  Also, whoever thought
>> it up probably didn't consider leakage of internal URIs to the
>> outside.  *shrug*
>> ___
>> zfs-discuss mailing list
>> zfs-discuss@opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


/Tomas
-- 
Tomas Forsman, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris 11 System Reboots Continuously Because of a ZFS-Related Panic (7191375)

2013-01-14 Thread Tomas Forsman
On 14 January, 2013 - Cindy Swearingen sent me these 2,3K bytes:

> Hi Jamie,
>
> Yes, that is correct.
>
> The S11u1 version of this bug is:
>
> https://bug.oraclecorp.com/pls/bug/webbug_print.show?c_rptno=15852599

Host oraclecorp.com not found: 3(NXDOMAIN)

Would oracle.internal be a better domain name?

> and has this notation which means Solaris 11.1 SRU 3.4:
>
> Changeset pushed to build 0.175.1.3.0.4.0
>
> Thanks,
>
> Cindy
>
> On 01/11/13 19:10, Jamie Krier wrote:
>> It appears this bug has been fixed in Solaris 11.1 SRU 3.4
>>
>> 7191375  15809921SUNBT7191375 metadata rewrites should 
>> coordinate with
>> l2arc
>>
>>
>> Cindy can you confirm?
>>
>> Thanks
>>
>>
>> On Fri, Jan 4, 2013 at 3:55 PM, Richard Elling > <mailto:richard.ell...@gmail.com>> wrote:
>>
>> On Jan 4, 2013, at 11:12 AM, Robert Milkowski
>> mailto:rmilkow...@task.gda.pl>> wrote:
>>
>>>>
>>>> Illumos is not so good at dealing with huge memory systems but
>>>> perhaps
>>>> it is also more stable as well.
>>>
>>> Well, I guess that it depends on your environment, but generally I
>>> would
>>> expect S11 to be more stable if only because the sheer amount of bugs
>>> reported by paid customers and bug fixes by Oracle that Illumos is not
>>> getting (lack of resource, limited usage, etc.).
>>
>> There is a two-edged sword. Software reliability analysis shows that
>> the
>> most reliable software is the software that is oldest and unchanged.
>> But
>> people also want new functionality. So while Oracle has more changes
>> being implemented in Solaris, it is destabilizing while simultaneously
>> improving reliability. Unfortunately, it is hard to get both wins.
>> What is more
>> likely is that new features are being driven into Solaris 11 that are
>> destabilizing. By contrast, the number of new features being added to
>> illumos-gate (not to be confused with illumos-based distros) is
>> relatively
>> modest and in all cases are not gratuitous.
>>   -- richard
>>
>> --
>>
>> richard.ell...@richardelling.com
>> <mailto:richard.ell...@richardelling.com>
>> +1-760-896-4422 
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> ___
>> zfs-discuss mailing list
>> zfs-discuss@opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


/Tomas
-- 
Tomas Forsman, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] S11 vs illumos zfs compatiblity

2012-12-14 Thread Tomas Forsman
On 13 December, 2012 - Jan Owoc sent me these 1,0K bytes:

> Hi,
> 
> On Thu, Dec 13, 2012 at 9:14 AM, sol  wrote:
> > Hi
> >
> > I've just tried to use illumos (151a5)  import a pool created on solaris
> > (11.1) but it failed with an error about the pool being incompatible.
> >
> > Are we now at the stage where the two prongs of the zfs fork are pointing in
> > incompatible directions?
> 
> Yes, that is correct. The last version of Solaris with source code
> used zpool version 28. This is the last version that is readable by
> non-Solaris operating systems FreeBSD, GNU/Linux, but also
> OpenIndiana. The filesystem, "zfs", is technically at the same
> version, but you can't access it if you can't access the pool :-).

zfs version is bumped to 6 too in s11.1:
The following filesystem versions are supported:

VER  DESCRIPTION
---  
 1   Initial ZFS filesystem version
 2   Enhanced directory entries
 3   Case insensitive and SMB credentials support
 4   userquota, groupquota properties
 5   System attributes
 6   Multilevel file system support

Pool version is upped as well:
 29  RAID-Z/mirror hybrid allocator
 30  Encryption
 31  Improved 'zfs list' performance
 32  One MB blocksize
 33  Improved share support
 34  Sharing with inheritance

> If you want to access the data now, your only option is to use Solaris
> to read it, and copy it over (eg. with zfs send | recv) onto a pool
> created with version 28.
> 
> Jan
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


/Tomas
-- 
Tomas Forsman, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris 11 System Reboots Continuously Because of a ZFS-Related Panic (7191375)

2012-12-12 Thread Tomas Forsman
On 12 December, 2012 - Thomas Nau sent me these 7,3K bytes:

> Jamie
> We ran Into the same and had to migrate the pool while imported
> read-only. On top we were adviced to NOT use an L2ARC. Maybe you
> should consider that as well

We also ran into something similar, imported read-only and created a new
pool. A few months later, we ran into an L2ARC bug (15809921) to which
we've received an IDR that we have not applied yet.

This bug caused the following:
errors: Permanent errors have been detected in the following files:

:<0x132c1f>

on a 3x3 mirrored pool (triple-mirroring), all 9 disks had checksum
errors.

> Thomas
> 
> 
> Am 12.12.2012 um 19:21 schrieb Jamie Krier :
> 
> > I've hit this bug on four of my Solaris 11 servers. Looking for anyone else 
> > who has seen it, as well as comments/speculation on cause.  
> > 
> > This bug is pretty bad.  If you are lucky you can import the pool read-only 
> > and migrate it elsewhere.  
> > 
> > I've also tried setting zfs:zfs_recover=1,aok=1 with varying results.
> > 
> > 
> > 
> > http://docs.oracle.com/cd/E26502_01/html/E28978/gmkgj.html#scrolltoc
> > 
> > 
> > 
> > Hardware platform:
> > 
> > Supermicro X8DAH
> > 
> > 144GB ram
> > 
> > Supermicro sas2 jbods
> > 
> > LSI 9200-8e controllers (Phase 13 fw)
> > 
> > Zuesram log
> > 
> > ZuesIops sas l2arc
> > 
> > Seagate ST33000650SS sas drives
> > 
> > 
> > 
> > All four servers are running the same hardware, so at first I suspected a 
> > problem there.  I opened a ticket with Oracle which ended with this email:
> > 
> > -
> > 
> > We strongly expect that this is a software issue because this problem does 
> > not happen
> > 
> > on Solaris 10.   On Solaris 11, it happens with both the SPARC and the X64 
> > versions of
> > 
> > Solaris.
> > 
> > 
> > 
> > We have quite a few customer who have seen this issue and we are in the 
> > process of
> > 
> > working on a fix.  Because we do not know the source of the problem yet, I 
> > cannot speculate
> > 
> > on the time to fix.  This particular portion of Solaris 11 (the virtual 
> > memory sub-system) is quite
> > 
> > different than in Solaris 10.  We re-wrote the memory management in order 
> > to get ready for
> > 
> > systems with much more memory than Solaris 10 was designed to handle.
> > 
> > 
> > 
> > Because this is the memory management system, there is not expected to be 
> > any
> > 
> > work-around.
> > 
> > 
> > 
> > Depending on your company's requirements, one possibility is to use Solaris 
> > 10 until this
> > 
> > issue is resolved.
> > 
> > 
> > 
> > I apologize for any inconvenience that  this bug may cause.  We are working 
> > on it as a Sev 1 Priority1 in sustaining engineering.
> > 
> > -
> > 
> > 
> > 
> > I am thinking about switching to an Illumos distro, but wondering if this 
> > problem may be present there as well. 
> > 
> > 
> > 
> > Thanks
> > 
> > 
> > 
> > - Jamie
> > 
> > ___
> > zfs-discuss mailing list
> > zfs-discuss@opensolaris.org
> > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



/Tomas
-- 
Tomas Forsman, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Remove disk

2012-11-30 Thread Tomas Forsman
On 30 November, 2012 - Jim Klimov sent me these 2,3K bytes:

> On 2012-11-30 15:52, Tomas Forsman wrote:
>> On 30 November, 2012 - Albert Shih sent me these 0,8K bytes:
>>
>>> Hi all,
>>>
>>> I would like to knwon if with ZFS it's possible to do something like that :
>>>
>>> http://tldp.org/HOWTO/LVM-HOWTO/removeadisk.html
>
> Removing a disk - no, one still can not reduce the amount of devices
> in a zfs pool nor change raidzN redundancy levels (you can change
> single disks to mirrors and back), nor reduce disk size.
>
> As Tomas wrote, you can increase the disk size by replacing smaller
> ones with bigger ones.

.. unless you're hit by 512b/4k sector crap. I don't have it readily at
hand how to check the ashift value on a vdev, anyone
else/archives/google?

/Tomas
-- 
Tomas Forsman, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Remove disk

2012-11-30 Thread Tomas Forsman
On 30 November, 2012 - Albert Shih sent me these 0,8K bytes:

> Hi all,
> 
> I would like to knwon if with ZFS it's possible to do something like that :
> 
>   http://tldp.org/HOWTO/LVM-HOWTO/removeadisk.html
> 
> meaning : 
> 
> I have a zpool with 48 disks with 4 raidz2 (12 disk). Inside those 48 disk
> I've 36x 3T and 12 x 2T.
> Can I buy new 12x4 To disk put in the server, add in the zpool, ask zpool
> to migrate all data on those 12 old disk on the new and remove those old
> disk ? 

You pull out one 2T, put in a 4T, wait for resilver (possibly tell it to
replace, if you don't have autoreplace on)
Repeat until done.
If you have the physical space, you can first put in a new disk, tell it
to replace and then remove the old.

/Tomas
-- 
Tomas Forsman, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Repairing corrupted ZFS pool

2012-11-19 Thread Tomas Forsman
On 19 November, 2012 - Jim Klimov sent me these 1,1K bytes:

> Oh, and one more thing: rsync is only good if your filesystems don't
> really rely on ZFS/NFSv4-style ACLs. If you need those, you are stuck
> with Solaris tar or Solaris cpio to carry the files over, or you have
> to script up replication of ACLs after rsync somehow.

Ugly hack that seems to do the trick for us is to first rsync, then:

#!/usr/local/bin/perl -w

for my $oldfile (@ARGV) {
my $newfile = $oldfile;
$newfile =~ s{/export}{/newdir/export};

next if -l $oldfile;

open(F,"-|","/bin/ls","-ladV","--",$oldfile);
my @a = ;
close(F);
my $crap = shift @a; # filename line
chomp(@a);
for (@a) {
$_ =~ s/ //g;
}
my $acl = join(",",@a);
system("/bin/chmod","A=".$acl,$newfile);
}


/bin/find /export -acl -print0 | xargs -0 /blah/aclcopy.pl


/Tomas
-- 
Tomas Forsman, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD ZIL/L2ARC partitioning

2012-11-14 Thread Tomas Forsman
On 14 November, 2012 - Michel Jansens sent me these 1,0K bytes:

> Hi,
>
> I've ordered a new server with:
> - 4x600GB Toshiba 10K SAS2 Disks
> - 2x100GB OCZ DENEVA 2R SYNC eMLC SATA (no expander so I hope no SAS/ 
> SATA problems). Specs: 
> http://www.oczenterprise.com/ssd-products/deneva-2-r-sata-6g-2.5-emlc.html
>
> I want to use the 2 OCZ SSDs as mirrored intent log devices, but as the 
> intent log needs quite a small amount of the disks (10GB?), I was  
> wondering if I can use the rest of the disks as L2ARC?
>
> I have a few questions about this:
>
> -Is 10GB enough for a log device?

Our log device for the department file server, for roughly 100
workstations etc, seems to hover about 2MB used. It's only sync writes
that goes here, and it's emptied at the next transaction.

So check how much sync writes you have per flush (normally 5 seconds
nowadays, used to be 30 I think?). If you are pushing more than 2GB of
sync operations per second, then I think you should get something
beefier ;)

> -Can I partition the disks (10GB + 90 GB) and use the unused (90GB)  
> space as L2ARC?
> -If I use the rest of the disks as L2ARC, do I have to mirror the L2ARC 
> or can I just add 2 partitions (eg: 2 x 90GB = 180GB)
> -If I used non mirrored L2ARC, Could something go wrong if one L2ARC  
> device failed (pool unavailable,lock in the kernel, panic,...)?

It's checksummed and verified, shouldn't be a problem even if it fails
(could be a problem if it's half-failing and just being slow, if so -
get rid of it).

>
>
> --
> Michel Jansens
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


/Tomas
-- 
Tomas Forsman, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] (Meta)Data corrupted on multiple triple mirrors

2012-11-10 Thread Tomas Forsman
Hello.

3 month old pool hosted across two SAS JBODs w/ SATA disks on an x4170m2
running Solaris 11 SRU6.6, ran a scrub and I got this:

  pool: zdata
 state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
  scan: scrub repaired 0 in 19h51m with 1 errors on Fri Nov  9 18:54:22 2012
config:

NAME   STATE READ WRITE CKSUM
zdata  ONLINE   0 0 1
  mirror-0 ONLINE   0 0 2
c3t1d0 ONLINE   0 0 2
c3t11d0ONLINE   0 0 2
c3t0d0 ONLINE   0 0 2
  mirror-1 ONLINE   0 0 2
c3t4d0 ONLINE   0 0 2
c3t14d0ONLINE   0 0 2
c3t3d0 ONLINE   0 0 2
  mirror-2 ONLINE   0 0 2
c3t7d0 ONLINE   0 0 2
c3t17d0ONLINE   0 0 2
c3t6d0 ONLINE   0 0 2
logs
  c0t5E83A97F1471E0A4d0s0  ONLINE   0 0 0
cache
  c0t01507A51E36B3C08d0s0  ONLINE   0 0 0
spares
  c3t10d0  AVAIL   

errors: Permanent errors have been detected in the following files:

:<0x132c1f>

How did this happen and how screwed are we? I've sent a bug report to
Oracle too..

Log is an OCZ Vertex2EX (slc&supercap), cache is an Intel 510.

Host has been rebooted 3 times after the pool was created, all the same
day as creation.

/Tomas
-- 
Tomas Forsman, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Scrub and checksum permutations

2012-10-29 Thread Tomas Forsman
On 28 October, 2012 - Edward Ned Harvey (opensolarisisdeadlongliveopensolaris) 
sent me these 1,0K bytes:

> > From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> > boun...@opensolaris.org] On Behalf Of Jim Klimov
> > 
> > I tend to agree that parity calculations likely
> > are faster (even if not all parities are simple XORs - that would
> > be silly for double- or triple-parity sets which may use different
> > algos just to be sure).
> 
> Even though parity calculation is faster than fletcher, which is
> faster than sha256, it's all irrelevant, except in the hugest of file
> servers.  Go write to disk or read from disk as fast as you can, and
> see how much CPU you use.  Even on moderate fileservers that I've done
> this on (a dozen disks in parallel) the cpu load is negligible.  
> 
> If you ever get up to a scale where the cpu load becomes significant,
> you solve it by adding more cpu's.  There is a limit somewhere, but
> it's huge.

For just the parity thing, this is an older linux on a quite old cpu
(first dual core athlon64's):

[961655.168961] xor: automatically using best checksumming function: generic_sse
[961655.188007]generic_sse:  6128.000 MB/sec
[961655.188010] xor: using function: generic_sse (6128.000 MB/sec)
[961655.256025] raid6: int64x1   1867 MB/s
[961655.324020] raid6: int64x2   2372 MB/s
[961655.392027] raid6: int64x4   1854 MB/s
[961655.460019] raid6: int64x8   1672 MB/s
[961655.528062] raid6: sse2x1 834 MB/s
[961655.596047] raid6: sse2x21273 MB/s
[961655.664028] raid6: sse2x42116 MB/s
[961655.664030] raid6: using algorithm sse2x4 (2116 MB/s)

So raid6 at 2Gbyte/s and raid5 at 6Gbyte/s should be enough on a 6+ year
old low-end desktop machine..

/Tomas
-- 
Tomas Forsman, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ARC de-allocation with large ram

2012-10-22 Thread Tomas Forsman
On 22 October, 2012 - Robert Milkowski sent me these 3,6K bytes:

> Hi,
> 
> If after it decreases in size it stays there it might be similar to:
> 
>   7111576 arc shrinks in the absence of memory pressure
> 
> Also, see document:
> 
>   ZFS ARC can shrink down without memory pressure result in slow
> performance [ID 1404581.1]
> 
> Specifically, check if arc_no_grow is set to 1 after the cache size is
> decreased, and if it stays that way.
> 
> The fix is in one of the SRUs and I think it should be in 11.1
> I don't know if it was fixed in Illumos or even if Illumos was affected by
> this at all.

The code that affects bug 7111576 was introduced between s10 and s11.

/Tomas
-- 
Tomas Forsman, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can the ZFS "copies" attribute substitute HW disk redundancy?

2012-08-01 Thread Tomas Forsman
On 01 August, 2012 - opensolarisisdeadlongliveopensolaris sent me these 1,8K 
bytes:

> > From: Sa??o Kiselkov [mailto:skiselkov...@gmail.com]
> > Sent: Wednesday, August 01, 2012 9:56 AM
> > 
> > On 08/01/2012 03:35 PM, opensolarisisdeadlongliveopensolaris wrote:
> > >> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> > >> boun...@opensolaris.org] On Behalf Of Jim Klimov
> > >>
> > >> Availability of the DDT is IMHO crucial to a deduped pool, so
> > >> I won't be surprised to see it forced to triple copies.
> > >
> > > IMHO, the more important thing for dedup moving forward is to create an
> > option to dedicate a fast device (SSD or whatever) to the DDT.  So all those
> > little random IO operations never hit the rusty side of the pool.
> > 
> > That's something you can already do with an L2ARC. In the future I plan
> > on investigating implementing a set of more fine-grained ARC and L2ARC
> > policy tuning parameters that would give more control into the hands of
> > admins over how the ARC/L2ARC cache is used.
> 
> L2ARC is a read cache.  Hence the "R" and "C" in "L2ARC."

"Adaptive Replacement Cache", right.

> This means two major things:
> #1  Writes don't benefit, 
> and
> #2  There's no way to load the whole DDT into the cache anyway.  So you're 
> guaranteed to have performance degradation with the dedup.
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

/Tomas
-- 
Tomas Forsman, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] New fast hash algorithm - is it needed?

2012-07-11 Thread Tomas Forsman
On 11 July, 2012 - Sa??o Kiselkov sent me these 1,4K bytes:

> On 07/11/2012 10:50 AM, Ferenc-Levente Juhos wrote:
> > Actually although as you pointed out that the chances to have an sha256
> > collision is minimal, but still it can happen, that would mean
> > that the dedup algorithm discards a block that he thinks is a duplicate.
> > Probably it's anyway better to do a byte to byte comparison
> > if the hashes match to be sure that the blocks are really identical.
> > 
> > The funny thing here is that ZFS tries to solve all sorts of data integrity
> > issues with checksumming and healing, etc.,
> > and on the other hand a hash collision in the dedup algorithm can cause
> > loss of data if wrongly configured.
> > 
> > Anyway thanks that you have brought up the subject, now I know if I will
> > enable the dedup feature I must set it to sha256,verify.
> 
> Oh jeez, I can't remember how many times this flame war has been going
> on on this list. Here's the gist: SHA-256 (or any good hash) produces a
> near uniform random distribution of output. Thus, the chances of getting
> a random hash collision are around 2^-256 or around 10^-77. If I asked
> you to pick two atoms at random *from the entire observable universe*,
> your chances of hitting on the same atom are higher than the chances of
> that hash collision. So leave dedup=on with sha256 and move on.

So in ZFS, which normally uses 128kB blocks, you can instead store them
100% uniquely into 32 bytes.. A nice 4096x compression rate..
decompression is a bit slower though..

/Tomas
-- 
Tomas Forsman, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is there an actual newsgroup for zfs-discuss?

2012-06-11 Thread Tomas Forsman
On 11 June, 2012 - David Combs sent me these 2,4K bytes:

> Advantages to true newsgroup?
> 
> THREADS!  If you wait until a thread is finished, you can then see the entire 
> thread, and if you want, you can download the whole thing, or just parts 
> thereof.
> 
> Great for learning, if you're mostly a lurker, not posting answers.
> 
> That's vastly simpler than getting a thousand emails, saving them all in a 
> file, and then having to sort them by subject-line (modified so "RE:", etc 
> are ignored in sort key), to get them grouped by zfs-topic.  What a pain.

.. or use a mail reader that doesn't suck.

> When via a powerful newsreader (I use trn4), all of that is done for you 
> automatically.

mutt is probably pretty close.

> Furthermore, at least one newsreader, trn4, via its "t" (tree) command, will 
> draw a tree (rooted at the right side of the page, but the text oriented so 
> you can read right-side-up, without turning your head sideways) so you can 
> pursue topics, can see how they lay out, or better yet, delete, say, flame 
> subthreads (surely not a problem with zfs-discuss).
> 
> Besides that, you don't have posts hitting your regular email list, diluting 
> the urgent stuff you really do have to respond to, making it easier to miss 
> that stuff.
> 
> Lets you keep newsgroup-TYPE stuff (eg zfs-discuss email) separate from 
> true-email stuff.

It's called "mail filtering".

> --
> 
> QUESTION: how much effort (for someone who already knows how, has done it 
> before) to create a newsgroup that mirrors a listserv?
> 
> Once done, all anyone has to do is request from his friendly isp (mine is 
> panix.com, *very* friendly) to support that new newsgroup.
> 
> Name it something like comp.unix.solaris.zfs.
> 
> Or just have it all go to comp.unix.solaris, which of course already exists.
> 
> 
> Anyway, I hope I've answered your question.
> 
> Cheers!
> 
> David
> 
> 
> -Original Message-
> From: James C. McPherson [mailto:j...@opensolaris.org] 
> Sent: Monday, June 11, 2012 4:55 PM
> To: David Combs
> Cc: zfs-discuss@opensolaris.org
> Subject: Re: [zfs-discuss] Is there an actual newsgroup for zfs-discuss?
> 
> On 12/06/12 06:40 AM, David Combs wrote:
> > Actual newsgroup for zfs-discuss?
> 
> Actually, no. Where's the value in having a newsgroup as well as a mailing 
> list?
> 
> 
> James C. McPherson
> --
> Oracle
> http://www.jmcp.homeunix.com/blog
> -
> No virus found in this message.

Good to know, I better trust this info - just like spam that says it's
not spam :P

> Checked by AVG - www.avg.com
> Version: 2012.0.2178 / Virus Database: 2433/5062 - Release Date: 06/11/12
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


/Tomas
-- 
Tomas Forsman, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Scrub works in parallel?

2012-06-10 Thread Tomas Forsman
On 10 June, 2012 - Kalle Anka sent me these 1,5K bytes:

> Assume we have 100 disks in one zpool. Assume it takes 5 hours to
> scrub one disk. If I scrub the zpool, how long time will it take? 
> 
> 
> Will it scrub one disk at a time, so it will take 500 hours, i.e. in
> sequence, just serial? Or is it possible to run the scrub in parallel,
> so it takes 5h no matter how many disks?

It walks the filesystem/pool trees, so it's not just reading the disk
from track 0 to track 12345, but validates all possible copies.

/Tomas
-- 
Tomas Forsman, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cluster vs nfs

2012-04-26 Thread Tomas Forsman
On 26 April, 2012 - Jim Klimov sent me these 1,6K bytes:

> Which reminds me: older Solarises used to have a nifty-looking
> (via descriptions) cachefs, apparently to speed up NFS clients
> and reduce traffic, which we did not get to really use in real
> life. AFAIK Oracle EOLed it for Solaris 11, and I am not sure
> it is in illumos either.
>
> Does caching in current Solaris/illumos NFS client replace those
> benefits, or did the project have some merits of its own (like
> caching into local storage of client, so that the cache was not
> empty after reboot)?

It had its share of merits and bugs.

/Tomas
-- 
Tomas Forsman, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris 11/ZFS historical reporting

2012-04-16 Thread Tomas Forsman
On 16 April, 2012 - Anh Quach sent me these 0,4K bytes:

> Are there any tools that ship w/ Solaris 11 for historical reporting on 
> things like network activity, zpool iops/bandwidth, etc., or is it pretty 
> much roll-your-own scripts and whatnot? 

zpool iostat 5  is the closest built-in..

/Tomas
-- 
Tomas Forsman, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Advice for migrating ZFS configuration

2012-03-07 Thread Tomas Forsman
On 08 March, 2012 - Fajar A. Nugraha sent me these 1,9K bytes:

> > Can somebody guide me? What's the easiest way out of this mess, so that I
> > can move from what is now a simple two-disk zpool (less than 50% full) to a
> > three-disk raidz configuration, starting with one unused disk?
> 
> - use the one new disk to create a temporary pool
> - copy the data ("zfs snapshot -r" + "zfs send -R | zfs receive")
> - destroy old pool
> - create a three-disk raidz pool using two disks and a fake device,
> something like http://www.dev-eth0.de/creating-raidz-with-missing-device/

.. copy data from temp to new pool, quite important step ;)

> - destroy the temporary pool
> - replace the fake device with now-free disk
> - export the new pool
> - import the new pool and rename it in the process: "zpool import
> temp_pool_name old_pool_name"

/Tomas
-- 
Tomas Forsman, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] arc_no_grow is set to 1 and never set back to 0

2012-01-19 Thread Tomas Forsman
On 19 January, 2012 - Clemens Kalb sent me these 1,1K bytes:

> Peter,
> 
> I experience the same issue w/ 2 systems upgraded from Solaris 11
> Express to Solaris 11 11/11. Has your SR with Oracle made any
> progress since your last post?

Target  solaris_11
Customer Status 3-Accepted
Severity2-High
Last Updated2012-01-07 00:00:00 GMT+00:00

I've filed an SR pointing at the same bug too, to get momentum.

/Tomas
-- 
Tomas Forsman, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] arc_no_grow is set to 1 and never set back to 0

2012-01-05 Thread Tomas Forsman
On 04 January, 2012 - Steve Gonczi sent me these 2,5K bytes:

> The interesting bit is what happens inside arc_reclaim_needed(), 
> that is, how it arrives at the conclusion that there is memory pressure. 
> 
> Maybe we could trace arg0, which gives the location where 
> we have left the function. This would finger which return path 
> arc_reclaim_needed() took. 

It's new code, basically comparing the inuse/total/free from
kstat -n zfs_file_data, which seems buggered.

> Steve 
> 
> 
> - Original Message -
> 
> 
> 
> Well it looks like the only place this get's changed is in the 
> arc_reclaim_thread for opensolaris. I suppose you could dtrace it to see what 
> is going on and investigate what is happening to the return code of the 
> arc_reclaim_needed is. 
> 
> 
> 
> http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/arc.c#2089
>  
> 
> 
> maybe 
> 
> 
> dtrace -n 'fbt:zfs:arc_reclaim_needed:return { trace(arg1) }' 
> 
> 
> Dave 
> 
> 
> 
> 
> 


/Tomas
-- 
Tomas Forsman, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] arc_no_grow is set to 1 and never set back to 0

2012-01-03 Thread Tomas Forsman
On 03 January, 2012 - Peter Radig sent me these 3,5K bytes:

> Hello.
> 
> I have a Solaris 11/11 x86 box (which I migrated from SolEx 11/10 a couple of 
> weeks ago).
> 
> Without no obvious reason (at least for me), after an uptime of 1 to 2 days 
> (observed 3 times now) Solaris sets arc_no_grow to 1 and then never sets it 
> back to 0. ARC is being shrunk to less than 1 GB -- needless to say that 
> performance is terrible. There is not much load on this system.
> 
> Memory seems to be not an issue (see below).
> 
> I looked at the old Nevada code base of onnv_147 and can't find a reason for 
> this happening.
> 
> How can I find out what's causing this?

New code that seems to be counting wrong.. I was planning on filing a
bug, but am currently struggling to convince oracle that we bought
support..

Try this:
kstat -n zfs_file_data

In my case, I get:
free15322
mem_inuse   24324866048
mem_total   25753026560
.. where ::memstat says:
Kernel2638984 10308   42%
ZFS File Data   39260   1531%
Anon   873549  3412   14%
Exec and libs5199200%
Page cache  20019780%
Free (cachelist) 6608250%
Free (freelist)   2703509 10560   43%

On another reboot, it refused to go over 130MB on a 24GB system..


> BTW: I was not seeing this on SolEx 11/10.

Dito.

> Thanks,
> Peter
> 
> 
> 
> *** ::memstat ***
> Page SummaryPagesMB  %Tot
>      
> Kernel 860254  3360   21%
> ZFS File Data3047110%
> Anon38246   1491%
> Exec and libs3765140%
> Page cache   8517330%
> Free (cachelist) 5866220%
> Free (freelist)   3272317 12782   78%
> Total 4192012 16375
> Physical  4192011 16375
> 
> mem_inuse   4145901568
> mem_total   1077466365952
> 
> *** ::arc ***
> hits  = 186279921
> misses=  14366462
> demand_data_hits  =   4648464
> demand_data_misses=   8605873
> demand_metadata_hits  = 171803126
> demand_metadata_misses=   3805675
> prefetch_data_hits=772678
> prefetch_data_misses  =   1464457
> prefetch_metadata_hits=   9055653
> prefetch_metadata_misses  =490457
> mru_hits  =  12295087
> mru_ghost_hits= 0
> mfu_hits  = 175281066
> mfu_ghost_hits= 0
> deleted   =  14462192
> mutex_miss=30
> hash_elements =   3752768
> hash_elements_max =   3752770
> hash_collisions   =  11409790
> hash_chains   =  8256
> hash_chain_max=20
> p =48 MB
> c =   781 MB
> c_min =64 MB
> c_max = 15351 MB
> size  =   788 MB
> buf_size  =   185 MB
> data_size =   289 MB
> other_size=   313 MB
> l2_hits   = 0
> l2_misses =  14366462
> l2_feeds  = 0
> l2_rw_clash   = 0
> l2_read_bytes = 0 MB
> l2_write_bytes= 0 MB
> l2_writes_sent= 0
> l2_writes_done= 0
> l2_writes_error   = 0
> l2_writes_hdr_miss= 0
> l2_evict_lock_retry   = 0
> l2_evict_reading  = 0
> l2_abort_lowmem   =   490
> l2_cksum_bad  = 0
> l2_io_error   = 0
> l2_hdr_size   = 0 MB
> memory_throttle_count = 0
> meta_used =   499 MB
> meta_max  =  1154 MB
> meta_limit=     0 MB
> arc_no_grow   = 1
> arc_tempreserve   = 0 MB
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


/Tomas
-- 
Tomas Forsman, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Download illumos build image

2011-12-20 Thread Tomas Forsman
On 20 December, 2011 - Henry Lau sent me these 1,6K bytes:

> Hi
>  
> When I run the following script from my illumos build 151,
>  
> ./usr/src/tools/scripts/onu -t nightly -d ./packages/i386/nightly
>  
> It also download nidia, java ... How to avoid these packages to be dowmload 
> so that the image can be copied to the system faster?

You should probably ask on some forum/list that's related to illumos..
This is about the ZFS filesystem..

/Tomas
-- 
Tomas Forsman, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS smb/cifs shares in Solaris 11 (some observations)

2011-11-29 Thread Tomas Forsman
On 29 November, 2011 - sol sent me these 4,9K bytes:

> Hi
> 
> Several observations with zfs cifs/smb shares in the new Solaris 11.
> 
> 1) It seems that the previously documented way to set the smb share name no 
> longer works
>  zfs set sharesmb=name=my_share_name
> You have to use the long-winded
> zfs set share=name=my_share_name,path=/my/share/path,prot=smb
> This is fine but not really obvious if moving scripts from Solaris10 to 
> Solaris11.

Same with nfs, all changed.

> 2) If you use "zfs rename" to rename a zfs filesystem it doesn't rename the 
> smb share name.
> 
> 3) Also you might end up with two shares having the same name.
> 
> 4) So how do you rename the smb share? There doesn't appear to be a "zfs 
> unset" and if you issue the command twice with different names then both are 
> listed when you use "zfs get share".

man zfs_share

 zfs set -c share=name=sharename filesystem

 Removes a file system share. The -c option distinguishes
 this subcommand from the zfs set share command described
 above.

> 
> 5) The "share" value act like a property but does not show up if you use "zfs 
> get" so that's not really consistent
> 
> 6) zfs filesystems created with Solaris 10 and shared with smb cannot be 
> mounted from Windows when the server is upgraded to Solaris 11.
> The client just gets "permission denied" but in the server log you might see 
> "access denied: share ACL".
> If you create a brand new zfs filesystem then it works fine. So what is the 
> difference?
> The ACLs have never been set or changed so it's not that, and the two 
> filesystems appear to have identical ACLs.
> But if you look at the extended attributes the successful filesystem has 
> xattr {A--m} and the unsuccessful has {}.
> However that xattr cannot be set on the share to see if it allows it to be 
> mounted.
> "chmod S+cA share" gives "chmod: ERROR: extended system attributes not 
> supported for share" (even though it has the xattr=on property).
> What is the problem here, why cannot a Solaris 10 filesystem be shared via 
> smb?
> And how can extended attributes be set on a zfs filesystem?
> 
> Thanks folks

> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



/Tomas
-- 
Tomas Forsman, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs sync=disabled property

2011-11-10 Thread Tomas Forsman
On 10 November, 2011 - Will Murnane sent me these 1,5K bytes:

> On Thu, Nov 10, 2011 at 14:12, Tomas Forsman  wrote:
> > On 10 November, 2011 - Bob Friesenhahn sent me these 1,6K bytes:
> >> On Wed, 9 Nov 2011, Tomas Forsman wrote:
> >>>>
> >>>> At all times, if there's a server crash, ZFS will come back along at next
> >>>> boot or mount, and the filesystem will be in a consistent state, that was
> >>>> indeed a valid state which the filesystem actually passed through at some
> >>>> moment in time.  So as long as all the applications you're running can
> >>>> accept the possibility of "going back in time" as much as 30 sec, 
> >>>> following
> >>>> an ungraceful ZFS crash, then it's safe to disable ZIL (set 
> >>>> sync=disabled).
> >>>
> >>> Client writes block 0, server says OK and writes it to disk.
> >>> Client writes block 1, server says OK and crashes before it's on disk.
> >>> Client writes block 2.. waaiits.. waiits.. server comes up and, server
> >>> says OK and writes it to disk.
> > When a client writes something, and something else ends up on disk - I
> > call that corruption. Doesn't matter whose fault it is and technical
> > details, the wrong data was stored despite the client being careful when
> > writing.
> If the hardware is behaving itself (actually doing a cache flush when
> ZFS asks it to, for example) the server won't say OK for block 1 until
> it's actually on disk.  This behavior is what makes NFS over ZFS slow
> without a slog: NFS does everything O_SYNC by default, so ZFS runs
> around syncing all the disks all the time.  Therefore, you won't lose
> data in this circumstance.

Which is exactly what this thread is about, consequences from
-disabling- sync.

/Tomas
-- 
Tomas Forsman, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs sync=disabled property

2011-11-10 Thread Tomas Forsman
On 10 November, 2011 - Bob Friesenhahn sent me these 1,6K bytes:

> On Wed, 9 Nov 2011, Tomas Forsman wrote:
>>>
>>> At all times, if there's a server crash, ZFS will come back along at next
>>> boot or mount, and the filesystem will be in a consistent state, that was
>>> indeed a valid state which the filesystem actually passed through at some
>>> moment in time.  So as long as all the applications you're running can
>>> accept the possibility of "going back in time" as much as 30 sec, following
>>> an ungraceful ZFS crash, then it's safe to disable ZIL (set sync=disabled).
>>
>> Client writes block 0, server says OK and writes it to disk.
>> Client writes block 1, server says OK and crashes before it's on disk.
>> Client writes block 2.. waaiits.. waiits.. server comes up and, server
>> says OK and writes it to disk.
>>
>> Now, from the view of the clients, block 0-2 are all OK'd by the server
>> and no visible errors.
>> On the server, block 1 never arrived on disk and you've got silent
>> corruption.
>
> The silent corruption (of zfs) does not occur due to simple reason that 
> flushing all of the block writes are acknowledged by the disks and then a 
> new transaction occurs to start the next transaction group. The previous 
> transaction is not closed until the next transaction has been 
> successfully started by writing the previous TXG group record to disk.  
> Given properly working hardware, the worst case scenario is losing the 
> whole transaction group and no "corruption" occurs.
>
> Loss of data as seen by the client can definitely occur.

When a client writes something, and something else ends up on disk - I
call that corruption. Doesn't matter whose fault it is and technical
details, the wrong data was stored despite the client being careful when
writing.

/Tomas
-- 
Tomas Forsman, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs sync=disabled property

2011-11-09 Thread Tomas Forsman
On 08 November, 2011 - Edward Ned Harvey sent me these 2,9K bytes:

> > From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> > boun...@opensolaris.org] On Behalf Of Evaldas Auryla
> > 
> > I'm trying to evaluate what are the risks of running NFS share of zfs
> > dataset with sync=disabled property. The clients are vmware hosts in our
> > environment and server is SunFire X4540 "Thor" system. Though general
> > recommendation tells not to do this, but after testing performance with
> > default setting and sync=disabled - it's night and day, so it's really
> > tempting to do sync=disabled ! Thanks for any suggestion.
> 
> I know a lot of people will say "don't do it," but that's only partial
> truth.  The real truth is:
> 
> At all times, if there's a server crash, ZFS will come back along at next
> boot or mount, and the filesystem will be in a consistent state, that was
> indeed a valid state which the filesystem actually passed through at some
> moment in time.  So as long as all the applications you're running can
> accept the possibility of "going back in time" as much as 30 sec, following
> an ungraceful ZFS crash, then it's safe to disable ZIL (set sync=disabled).

Client writes block 0, server says OK and writes it to disk.
Client writes block 1, server says OK and crashes before it's on disk.
Client writes block 2.. waaiits.. waiits.. server comes up and, server
says OK and writes it to disk.

Now, from the view of the clients, block 0-2 are all OK'd by the server
and no visible errors.
On the server, block 1 never arrived on disk and you've got silent
corruption.

Too bad NFS is resilient against servers coming and going..

/Tomas
-- 
Tomas Forsman, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Remove corrupt files from snapshot

2011-11-03 Thread Tomas Forsman
On 03 November, 2011 - Paul Kraus sent me these 1,3K bytes:

> On Thu, Nov 3, 2011 at 8:35 AM,   wrote:
> 
> > I have got a bunch of corrupted files in various snapshots on my
> > ZFS file backing store. I was not able to recover them so decided
> >  to remove all, otherwise the continuously make trouble for my
> >  incremental backup (rsync, diff etc. fails).
> 
> Why are you backing up the snapshots ? Or perhaps a better question is
> why are you backing them up more than once, as they can't change ?
> 
> What are you trying to accomplish with the snapshots ?
> 
> You can set the snapdir property on the dataset to hidden and it will
> not show up with an ls, even an ls -a, you have to know that the
> ".zfs" directory is there and cd into it blind. This will keep tools
> that walk the directory tree from finding it.
> 
> > zfs get snapdir xxx
> NAME  PROPERTY  VALUESOURCE
> xxx  snapdir   hidden   default
> 
> You would use "zfs set snapdir=hidden " to set the parameter.

.. which is default.

/Tomas
-- 
Tomas Forsman, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] "zfs diff" performance disappointing

2011-09-26 Thread Tomas Forsman
On 27 September, 2011 - Ian Collins sent me these 0,8K bytes:

>  On 09/27/11 07:55 AM, Jesus Cea wrote:
>> -BEGIN PGP SIGNED MESSAGE-
>> Hash: SHA1
>>
>> I just upgraded to Solaris 10 Update 10, and one of the improvements
>> is "zfs diff".
>>
>> Using the "birthtime" of the sectors, I would expect very high
>> performance. The actual performance doesn't seems better that an
>> standard "rdiff", though. Quite disappointing...
>>
>> Should I disable "atime" to improve "zfs diff" performance? (most data
>> doesn't change, but "atime" of most files would change).
>>
> I tend to disable atime in the root filesystem and only enable it on a  
> filesystem if required.  So far, it has never been required on any of  
> the systems I look after!

I've found it useful time after time.. do things and then check atime
to see whatever files it looked at..
(yes, I know about truss and dtrace)

/Tomas
-- 
Tomas Forsman, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] remove wrongly added device from zpool

2011-09-19 Thread Tomas Forsman
On 19 September, 2011 - Fred Liu sent me these 0,9K bytes:

> > 
> > That's a huge bummer, and it's the main reason why device removal has
> > been a
> > priority request for such a long time...  There is no solution.  You
> > can
> > only destroy & recreate your pool, or learn to live with it that way.
> > 
> > Sorry...
> > 
> 
> Yeah, I also realized this when I send out this message. In NetApp, it is so
> easy to change raid group size. There is still a long way for zfs to go.
> Hope I can see that in the future.
> 
> I also did another huge "mistake" which really brings me into the deep pain.
> I physically removed these two added devices for I though raidz2 can afford 
> it.
> But now the whole pool corrupts. I don't know where I can go ...
> Any help will be tremendously appreciated.

You can add mirrors to those lonely disks.

/Tomas
-- 
Tomas Forsman, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs scripts

2011-09-09 Thread Tomas Forsman
On 09 September, 2011 - cephas maposah sent me these 0,4K bytes:

> i am trying to come up with a script that incorporates other scripts.
> 
> eg
> zfs send pool/filesystem1@100911 > /backup/filesystem1.snap
> zfs send pool/filesystem2@100911 > /backup/filesystem2.snap

#!/bin/sh
zfs send pool/filesystem1@100911 > /backup/filesystem1.snap &
zfs send pool/filesystem2@100911 > /backup/filesystem2.snap

..?

> i need to incorporate these 2 into a single script with both commands
> running concurrently.

/Tomas
-- 
Tomas Forsman, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] BAD WD drives - defective by design?

2011-09-07 Thread Tomas Forsman
On 07 September, 2011 - Roy Sigurd Karlsbakk sent me these 2,0K bytes:

> > http://wdc.custhelp.com/app/answers/detail/a_id/1397/~/difference-between-desktop-edition-and-raid-%28enterprise%29-edition-drives
> 
> "When an error is found on a desktop edition hard drive, the drive will enter 
> into a deep recovery cycle to attempt to repair the error, recover the data 
> from the problematic area, and then reallocate a dedicated area to replace 
> the problematic area. This process can take up to 2 minutes depending on the 
> severity of the issue"
> 
> Or in other words: "When an error occurs on a desktop drive, the drive will 
> refuse to realize the sector is bad, and retry forever "

The common use for desktop drives is having a single disk without
redundancy.. If a sector is feeling bad, it's better if it tries a bit
harder to recover it than just say "blah, there was a bit of dirt in the
corner.. I don't feel like looking at it, so I'll just say your data is
screwed instead".. In a raid setup, that data is sitting safe(?) on some
other disk as well, so it might as well give up early.

So don't use desktop drives in raid and don't use raid disks in a
desktop setup. Ofcourse, this is just a config setting - but it's still
reality.

/Tomas
-- 
Tomas Forsman, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool recovery import from dd images

2011-08-24 Thread Tomas Forsman
On 24 August, 2011 - Kelsey Damas sent me these 1,8K bytes:

> On Wed, Aug 24, 2011 at 1:23 PM, Cindy Swearingen
>  wrote:
> 
> > I wonder if you need to make links from the original device
> > name to the new device names.
> >
> > You can see from the zdb -l output below that the device path
> > is pointing to the original device names (really long device
> > names).
> 
> Thank you for this suggestion.  I've created symlinks from the
> /dev/dsk directory pointing to the dd image, but zpool import says the
> same thing.   However, look at the error message below.   zpool can
> see the hostname of the last system to use the pool, but the date
> looks like the epoch (Dec 31 1969).   Is this meaningful?
> 
> # ln -s /jbod1-diskbackup/restore/deep_Lun0.dd
> /dev/dsk/c4t526169645765622E436F6D202020202030303330383933323030303130363120d0s0
> # ln -s /jbod1-diskbackup/restore/deep_san01_lun.dd
> /dev/dsk/c4t526169645765622E436F6D202020202030303131383933323030303130354220d0s0
> 
> # zpool import -d .

Just for fun, try an absolute path.

/Tomas
-- 
Tomas Forsman, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss