Re: [zfs-discuss] tuning zfs_arc_min

2011-10-12 Thread Frank Van Damme

Op 12-10-11 02:27, Richard Elling schreef:

On Oct 11, 2011, at 2:03 PM, Frank Van Damme wrote:

Honestly? I don't remember. might be a leftover setting from a year
ago. by now, I figured out I need to update the boot archive in
order for the new setting to have effect at boot time which apparently
involves booting in safe mode.


The archive should be updated when you reboot. Or you can run
bootadm update-archive
anytime.

At boot, the zfs_arc_min is copied into arc_c_min overriding the default
setting. You can see the current value via kstat:
kstat -p zfs:0:arcstats:c_min
zfs:0:arcstats:c_min389202432

This is the smallest size that the ARC will shrink to, when asked to shrink
because other applications need memory.


The root of the problem seems to be that that process never completes.

9 /lib/svc/bin/svc.startd
  332   /sbin/sh /lib/svc/method/boot-archive-update
347   /sbin/bootadm update-archive

Can't kill it and run from the cmdline either, it simply ignores 
SIGKILL. (Which shouldn't even be possible).


--
No part of this copyright message may be reproduced, read or seen,
dead or alive or by any means, including but not limited to telepathy
without the benevolence of the author.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] tuning zfs_arc_min

2011-10-11 Thread Frank Van Damme
2011/10/11 Richard Elling richard.ell...@gmail.com:
 ZFS Tunables (/etc/system):
         set zfs:zfs_arc_min = 0x20
         set zfs:zfs_arc_meta_limit=0x1

 It is not uncommon to tune arc meta limit. But I've not seen a case
 where tuning arc min is justified, especially for a storage server. Can
 you explain your reasoning?


Honestly? I don't remember. might be a leftover setting from a year
ago. by now, I figured out I need to update the boot archive in
order for the new setting to have effect at boot time which apparently
involves booting in safe mode.

-- 
Frank Van Damme
No part of this copyright message may be reproduced, read or seen,
dead or alive or by any means, including but not limited to telepathy
without the benevolence of the author.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] tuning zfs_arc_min

2011-10-10 Thread Frank Van Damme
2011/10/8 James Litchfield jim.litchfi...@oracle.com:
 The value of zfs_arc_min specified in /etc/system must be over 64MB
 (0x400).
 Otherwise the setting is ignored. The value is in bytes not pages.


wel I've now set it to 0x800 and it stubbornly stays at 2048 MB...


-- 
Frank Van Damme
No part of this copyright message may be reproduced, read or seen,
dead or alive or by any means, including but not limited to telepathy
without the benevolence of the author.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] tuning zfs_arc_min

2011-10-06 Thread Frank Van Damme
Hello,

quick and stupid question: I'm breaking my head over how to tunz
zfs_arc_min on a running system. There must be some magic word to pipe
into mdb -kw but I forgot it. I tried /etc/system but it's still at the
old value after reboot:

ZFS Tunables (/etc/system):
 set zfs:zfs_arc_min = 0x20
 set zfs:zfs_arc_meta_limit=0x1

ARC Size:
 Current Size: 1314 MB (arcsize)
 Target Size (Adaptive):   5102 MB (c)
 Min Size (Hard Limit):2048 MB (zfs_arc_min)
 Max Size (Hard Limit):5102 MB (zfs_arc_max)


I could use the memory now since I'm running out of it, trying to delete
a large snapshot :-/

-- 
No part of this copyright message may be reproduced, read or seen,
dead or alive or by any means, including but not limited to telepathy
without the benevolence of the author.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS

2011-10-06 Thread Frank Van Damme
2011/9/13 Paul Kraus p...@kraus-haus.org:
 The only tools I have found that work with zfs ACLs are the native zfs
 tools (zfs send / recv), the native Solaris tools (cp, mv, etc.), and
 Symantec NetBackup. I have not tried other commercial backup systems
 as we already have NBU in house.

cpio, possibly?

-- 
Frank Van Damme
No part of this copyright message may be reproduced, read or seen,
dead or alive or by any means, including but not limited to telepathy
without the benevolence of the author.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive and ashift

2011-07-26 Thread Frank Van Damme
Op 26-07-11 12:56, Fred Liu schreef:
 Any alternatives, if you don't mind? ;-)

vpn's, openssl piped over netcat, a password-protected zip file,... ;)

ssh would be the most practical, probably.

-- 
No part of this copyright message may be reproduced, read or seen,
dead or alive or by any means, including but not limited to telepathy
without the benevolence of the author.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zil on multiple usb keys

2011-07-18 Thread Frank Van Damme
2011/7/15 Eugen Leitl eu...@leitl.org:
 Speaking of which, is there a point in using an eSATA flash stick?
 If yes, which?

It depends on the drive off course, you'll have to look up benchmark
results - but there are eSata sticks out there that are more or less
built to Perform (as opposed to providing cheap storage).


-- 
Frank Van Damme
No part of this copyright message may be reproduced, read or seen,
dead or alive or by any means, including but not limited to telepathy
without the benevolence of the author.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Summary: Dedup memory and performance (again, again)

2011-07-15 Thread Frank Van Damme
Op 15-07-11 04:27, Edward Ned Harvey schreef:
 Is anyone from Oracle reading this?  I understand if you can't say what
 you're working on and stuff like that.  But I am merely hopeful this work
 isn't going into a black hole...  
 
 Anyway.  Thanks for listening (I hope.)   ttyl

If they aren't, maybe someone from an open source Solaris version is :)

-- 
No part of this copyright message may be reproduced, read or seen,
dead or alive or by any means, including but not limited to telepathy
without the benevolence of the author.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Summary: Dedup memory and performance (again, again)

2011-07-14 Thread Frank Van Damme
Op 12-07-11 13:40, Jim Klimov schreef:
 Even if I batch background RM's so a hundred processes hang
 and then they all at once complete in a minute or two.

Hmmm. I only run one rm process at a time. You think running more
processes at the same time would be faster?

-- 
No part of this copyright message may be reproduced, read or seen,
dead or alive or by any means, including but not limited to telepathy
without the benevolence of the author.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Summary: Dedup memory and performance (again, again)

2011-07-14 Thread Frank Van Damme
Op 14-07-11 12:28, Jim Klimov schreef:

 Yes, quite often it seems so.
 Whenever my slow dcpool decides to accept a write,
 it processes a hundred pending deletions instead of one ;)
 
 Even so, it took quite a few pool or iscsi hangs and then
 reboots of both server and client, and about a week overall,
 to remove a 50Gb dir with 400k small files from a deduped
 pool served over iscsi from a volume in a physical pool.
 
 Just completed this night ;)

It seems counter-intuitive - you'd say: concurrent disk access makes
things only slower - , but it turns out to be true. I'm deleting a dozen
times faster than before. How completely ridiculous.

Thank you :-)

-- 
No part of this copyright message may be reproduced, read or seen,
dead or alive or by any means, including but not limited to telepathy
without the benevolence of the author.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] question about COW and snapshots

2011-06-16 Thread Frank Van Damme
Op 15-06-11 05:56, Richard Elling schreef:
 You can even have applications like databases make snapshots when
 they want.

Makes me think of a backup utility called mylvmbackup, which is written
with Linux in mind - basically it locks mysql tables, takes an LVM
snapshot and releases the lock (and then you backup the database files
from the snapshot). Should work at least as well with ZFS.

-- 
No part of this copyright message may be reproduced, read or seen,
dead or alive or by any means, including but not limited to telepathy
without the benevolence of the author.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] question about COW and snapshots

2011-06-16 Thread Frank Van Damme
Op 15-06-11 14:30, Simon Walter schreef:
 Anyone know how Google Docs does it?

Anyone from Google on the list? :-)

Seriously, this is the kind of feature to be found in Serious CMS
applications, like, as already mentioned, Alfresco.

-- 
No part of this copyright message may be reproduced, read or seen,
dead or alive or by any means, including but not limited to telepathy
without the benevolence of the author.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool import hangs any zfs-related programs, eats all RAM and dies in swapping hell

2011-06-14 Thread Frank Van Damme
2011/6/10 Tim Cook t...@cook.ms:
 While your memory may be sufficient, that cpu is sorely lacking.  Is it even
 64bit?  There's a reason intel couldn't give those things away in the early
 2000s and amd was eating their lunch.

A Pentium 4 is 32-bit.

-- 
Frank Van Damme
No part of this copyright message may be reproduced, read or seen,
dead or alive or by any means, including but not limited to telepathy
without the benevolence of the author.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] DDT sync?

2011-06-01 Thread Frank Van Damme
2011/6/1 Edward Ned Harvey opensolarisisdeadlongliveopensola...@nedharvey.com:
 (2)  The above is pretty much the best you can do, if your server is going
 to be a normal server, handling both reads  writes.  Because the data and
 the meta_data are both stored in the ARC, the data has a tendency to push
 the meta_data out.  But in a special use case - Suppose you only care about
 write performance and saving disk space.  For example, suppose you're the
 destination server of a backup policy.  You only do writes, so you don't
 care about keeping data in cache.  You want to enable dedup to save cost on
 backup disks.  You only care about keeping meta_data in ARC.  If you set
 primarycache=metadata   I'll go test this now.  The hypothesis is that
 my arc_meta_used should actually climb up to the arc_meta_limit before I
 start hitting any disk reads, so my write performance with/without dedup
 should be pretty much equal up to that point.  I'm sacrificing the potential
 read benefit of caching data in ARC, in order to hopefully gain write
 performance - So write performance can be just as good with dedup enabled or
 disabled.  In fact, if there's much duplicate data, the dedup write
 performance in this case should be significantly better than without dedup.

I guess this is pretty much why I have primarycache=metadata and
set zfs:zfs_arc_meta_limit=0x1
set zfs:zfs_arc_min=0xC000
in /etc/system.

And the ARC size on this box tends to drop far below arc_min after a
few days, not withstanding the fact it's supposed to be a hard limit.

I call for an arc_data_max setting :)

-- 
Frank Van Damme
No part of this copyright message may be reproduced, read or seen,
dead or alive or by any means, including but not limited to telepathy
without the benevolence of the author.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] DDT sync?

2011-05-27 Thread Frank Van Damme
Op 26-05-11 13:38, Edward Ned Harvey schreef:
 Perhaps a property could be
 set, which would store the DDT exclusively on that device.

Oh yes please, let me put my DDT on an SSD.

But what if you loose it (the vdev), would there be a way to reconstruct
the DDT (which you need to be able to delete old, deduplicated files)?
Let me guess - this requires tracing down all blocks and depends on an
infamous feature called BPR? ;)

 Both the necessity to read  write the primary storage pool...  That's
 very hurtful.  And even with infinite ram, it's going to be
 unavoidable for things like destroying snapshots, or anything at all
 you ever want to do after a reboot.  

Indeed. But then again, zfs also doesn't (yet?) keep its l2arc cache
between reboots. Once it does, you could flush out the entire arc to
l2arc before reboot.

-- 
No part of this copyright message may be reproduced, read or seen,
dead or alive or by any means, including but not limited to telepathy
without the benevolence of the author.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] optimal layout for 8x 1 TByte SATA (consumer)

2011-05-27 Thread Frank Van Damme
2011/5/26 Eugen Leitl eu...@leitl.org:
 How bad would raidz2 do on mostly sequential writes and reads
 (Athlon64 single-core, 4 GByte RAM, FreeBSD 8.2)?

 The best way is to go is striping mirrored pools, right?
 I'm worried about losing the two wrong drives out of 8.
 These are all 7200.11 Seagates, refurbished. I'd scrub
 once a week, that'd probably suck on raidz2, too?

 Thanks.

Sequential? Let's suppose no spares.

4 mirrors of 2 = sustained bandwidth of 4 disks
raidz2 with 8 disks = sustained bandwidth of 6 disks

So :)

-- 
Frank Van Damme
No part of this copyright message may be reproduced, read or seen,
dead or alive or by any means, including but not limited to telepathy
without the benevolence of the author.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] offline dedup

2011-05-27 Thread Frank Van Damme
2011/5/27 Edward Ned Harvey
opensolarisisdeadlongliveopensola...@nedharvey.com:
 I don't think this is true.  The reason you need arc+l2arc to store your DDT
 is because when you perform a write, the system will need to check and see
 if that block is a duplicate of an already existing block.  If you dedup
 once, and later disable dedup, the system won't bother checking to see if
 there are duplicate blocks anymore.  So the DDT won't need to be in
 arc+l2arc.  I should say shouldn't.

Except when deleting deduped blocks.

-- 
Frank Van Damme
No part of this copyright message may be reproduced, read or seen,
dead or alive or by any means, including but not limited to telepathy
without the benevolence of the author.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS, Oracle and Nexenta

2011-05-25 Thread Frank Van Damme
Op 24-05-11 22:58, LaoTsao schreef:
 With various fock of opensource project
 E.g. Zfs, opensolaris, openindina etc there are all different
 There are not guarantee to be compatible 

I hope at least they'll try. Just in case I want to import/export zpools
between Nexenta and OpenIndiana?

-- 
No part of this copyright message may be reproduced, read or seen,
dead or alive or by any means, including but not limited to telepathy
without the benevolence of the author.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS, Oracle and Nexenta

2011-05-25 Thread Frank Van Damme
Op 25-05-11 14:27, joerg.moellenk...@sun.com schreef:
 Well, at first ZFS development is no standard body and at the end
 everything has to be measured in compatibility to the Oracle ZFS
 implementation

Why? Given that ZFS is Solaris ZFS just as well as Nexenta ZFS just as
well as illumos ZFS, by what reason is Oracle ZFS being declared the
standard or reference? Because they write the first so-many lines or
because they make the biggest sales on it (kinda hard to sell licenses
to an open source product)?


-- 
No part of this copyright message may be reproduced, read or seen,
dead or alive or by any means, including but not limited to telepathy
without the benevolence of the author.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris vs FreeBSD question

2011-05-20 Thread Frank Van Damme
Op 20-05-11 01:17, Chris Forgeron schreef:
 I ended up switching back to FreeBSD after using Solaris for some time 
 because I was getting tired of weird pool corruptions and the like.

Did you ever manage to recover the data you blogged about on Sunday,
February 6, 2011?

-- 
No part of this copyright message may be reproduced, read or seen,
dead or alive or by any means, including but not limited to telepathy
without the benevolence of the author.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Faster copy from UFS to ZFS

2011-05-19 Thread Frank Van Damme
Op 03-05-11 17:55, Brandon High schreef:
 -H: Hard links

If you're going to this for 2 TB of data, remember to expand your swap
space first (or have tons of memory). Rsync will need it to store every
inode number in the directory.

-- 
No part of this copyright message may be reproduced, read or seen,
dead or alive or by any means, including but not limited to telepathy
without the benevolence of the author.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Summary: Dedup and L2ARC memory requirements

2011-05-11 Thread Frank Van Damme
Op 10-05-11 06:56, Edward Ned Harvey schreef:
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Edward Ned Harvey

 BTW, here's how to tune it:

 echo arc_meta_limit/Z 0x3000 | sudo mdb -kw

 echo ::arc | sudo mdb -k | grep meta_limit
 arc_meta_limit=   768 MB
 
 Well ... I don't know what to think yet.  I've been reading these numbers
 for like an hour, finding interesting things here and there, but nothing to
 really solidly point my finger at.
 
 The one thing I know for sure...  The free mem drops at an unnatural rate.
 Initially the free mem disappears at a rate approx 2x faster than the sum of
 file size and metadata combined.  Meaning the system could be caching the
 entire file and all the metadata, and that would only explain half of the
 memory disappearance.

I'm seeing similar things. Yesterday I first rebooted with set
zfs:zfs_arc_meta_limit=0x1 (that's 4 GiB) set in /etc/system and
monitored while the box was doing its regular job (taking backups).
zfs_arc_min is also set to 4 GiB. What I noticed is that shortly after
the reboot, the arc started filling up rapidly, mostly with metadata. It
shot up to:

arc_meta_max  =  3130 MB

afterwards, the number for arc_meta_used steadily dropped. Some 12 hours
ago, I started deleting files, it has deleted about 600 files since
then. Now at the moment the arc size stays right at the minimum of 2
GiB, of which metadata fluctuates around 1650 MB.

This is the output of the getmemstats.sh script you posted.

Memory: 6135M phys mem, 539M free mem, 6144M total swap, 6144M free swap
zfs:0:arcstats:c2147483648  = 2 GiB target size
zfs:0:arcstats:c_max5350862848  = 5 GiB
zfs:0:arcstats:c_min2147483648  = 2 GiB
zfs:0:arcstats:data_size829660160   = 791 MiB
zfs:0:arcstats:hdr_size 93396336= 89 MiB
zfs:0:arcstats:other_size   411215168   = 392 MiB
zfs:0:arcstats:size 1741492896  = 1661 Mi
arc_meta_used =  1626 MB
arc_meta_limit=  4096 MB
arc_meta_max  =  3130 MB

I get way more cache misses then I'd like:

Time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcszc
10:01:133K   380 10   1667   214   15   2597 1G   2G
10:02:132K   340 16372   302   46   323   16 1G   2G
10:03:132K   368 18473   321   46   347   17 1G   2G
10:04:131K   348 25444   303   63   335   24 1G   2G
10:05:132K   420 15874   332   36   383   14 1G   2G
10:06:133K   489 16   1326   357   35   427   14 1G   2G
10:07:132K   405 15492   355   39   401   15 1G   2G
10:08:132K   366 13402   326   37   366   13 1G   2G
10:09:131K   364 20181   345   58   363   20 1G   2G
10:10:134K   370  8592   311   21   3698 1G   2G
10:11:134K   351  8572   294   21   3508 1G   2G
10:12:133K   378 10592   319   26   372   10 1G   2G
10:13:133K   393 11532   339   28   393   11 1G   2G
10:14:132K   403 13402   363   35   402   13 1G   2G
10:15:133K   365 11482   317   30   365   11 1G   2G
10:16:132K   374 15402   334   40   374   15 1G   2G
10:17:133K   385 12432   341   28   383   12 1G   2G
10:18:134K   343  8642   279   19   3438 1G   2G
10:19:133K   391 10592   332   23   391   10 1G   2G


So, one explanation I can think of is that the rest of the memory are
l2arc pointers, supposing they are not actually counted in the arc
memory usage totals (AFAIK l2arc pointers are considered to be part of
arc). Then again my l2arc is still growing (slowly) and I'm only caching
metadata at the moment, so you'd think it'd shrink if there's no more
room for l2arc pointers. Besides I'm getting very little reads from ssd:

 capacity operationsbandwidth
pool  alloc   free   read  write   read  write
  -  -  -  -  -  -
backups   5.49T  1.57T415121  3.13M  1.58M
  raidz1  5.49T  1.57T415121  3.13M  1.58M
c0t0d0s1  -  -170 16  2.47M   551K
c0t1d0s1  -  -171 16  2.46M   550K
c0t2d0s1  -  -170 16  2.53M   552K
c0t3d0s1  -  -170 16  2.44M   550K
cache -  -  -  -  -  -
  c1t5d0  63.4G  48.4G 20  0  2.45M  0
  -  -  -  -  -  -

(typical statistic over 1 minute)


I might try the windows solution and reboot the machine to free up
memory and let it fill the cache all over again and see if I get more
cache hits... hmmm...

 I set the 

Re: [zfs-discuss] Summary: Dedup and L2ARC memory requirements

2011-05-09 Thread Frank Van Damme
Op 09-05-11 14:36, Edward Ned Harvey schreef:
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Edward Ned Harvey

 So now I'll change meta_max and
 see if it helps...
 
 Oh, know what?  Nevermind.
 I just looked at the source, and it seems arc_meta_max is just a gauge for
 you to use, so you can know what's the highest arc_meta_used has ever
 reached.  So the most useful thing for you to do would be to set this to 0
 to reset the counter.  And then you can start watching it over time. 

Ok good to know - but that confuses me even more since in my previous
post my arc_meta_used was bigger than my arc_meta_limit (by about 50%)
and now wince I doubled _limit, _used only shrunk by a couple megs.

I'd really like to find some way to tell this machine CACHE MORE
METADATA, DAMNIT! :-)

-- 
No part of this copyright message may be reproduced, read or seen,
dead or alive or by any means, including but not limited to telepathy
without the benevolence of the author.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Summary: Dedup and L2ARC memory requirements

2011-05-09 Thread Frank Van Damme
Op 09-05-11 15:42, Edward Ned Harvey schreef:
  in my previous
  post my arc_meta_used was bigger than my arc_meta_limit (by about 50%)
 I have the same thing.  But as I sit here and run more and more extensive
 tests on it ... it seems like arc_meta_limit is sort of a soft limit.  Or it
 only checks periodically or something like that.  Because although I
 sometimes see size  limit, and I definitely see max  limit ...  When I do
 bigger and bigger more intensive stuff, the size never grows much more than
 limit.  It always gets knocked back down within a few seconds...


I found a script called arc_summary.pl and look what it says.


ARC Size:
 Current Size: 1734 MB (arcsize)
 Target Size (Adaptive):   1387 MB (c)
 Min Size (Hard Limit):637 MB (zfs_arc_min)
 Max Size (Hard Limit):5102 MB (zfs_arc_max)



c =  1512 MB
c_min =   637 MB
c_max =  5102 MB
size  =  1736 MB
...
arc_meta_used =  1735 MB
arc_meta_limit=  2550 MB
arc_meta_max  =  1832 MB

There are a dew seconds between running the script and ::arc | mdb -k,
but it seems that it just doesn't use more arc than 1734 or so MB, and
that nearly all of it is used for metadata. (I set primarycache=metadata
to my data fs, so I deem it logical). So the goal seems shifted to
trying to enlarge the arc size (what's it doing with the other memory???
I have close to no processes running.)


-- 
No part of this copyright message may be reproduced, read or seen,
dead or alive or by any means, including but not limited to telepathy
without the benevolence of the author.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Summary: Dedup and L2ARC memory requirements

2011-05-06 Thread Frank Van Damme
Op 06-05-11 05:44, Richard Elling schreef:
 As the size of the data grows, the need to have the whole DDT in RAM or L2ARC
 decreases. With one notable exception, destroying a dataset or snapshot 
 requires
 the DDT entries for the destroyed blocks to be updated. This is why people can
 go for months or years and not see a problem, until they try to destroy a 
 dataset.

So what you are saying is you with your ram-starved system, don't even
try to start using snapshots on that system. Right?

-- 
No part of this copyright message may be reproduced, read or seen,
dead or alive or by any means, including but not limited to telepathy
without the benevolence of the author.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] gaining speed with l2arc

2011-05-03 Thread Frank Van Damme
Hi, hello,

another dedup question. I just installed an ssd disk as l2arc.  This
is a backup server with 6 GB RAM (ie I don't often read the same data
again), basically it has a large number of old backups on it and they
need to be deleted. Deletion speed seems to have improved although the
majority of reads are still coming from disk.

 capacity operationsbandwidth
pool  alloc   free   read  write   read  write
  -  -  -  -  -  -
backups   5.49T  1.58T  1.03K  6  3.13M  91.1K
  raidz1  5.49T  1.58T  1.03K  6  3.13M  91.1K
c0t0d0s1  -  -200  2  4.35M  20.8K
c0t1d0s1  -  -202  1  4.28M  24.7K
c0t2d0s1  -  -202  1  4.28M  24.9K
c0t3d0s1  -  -197  1  4.27M  13.1K
cache -  -  -  -  -  -
  c1t5d0   112G  7.96M 63  2   337K  66.6K

The above output is while the machine is only deleting files (so I
guess the goal is to have *all* metadata reads from the cache). So the
first riddle: how to explain the low number of writes to l2arc
compared to the reads from disk.

Because reading bits of the DDT is supposed to be the biggest
bottleneck, I reckoned it would be a good idea to try not to expire
any part of my DDT from l2arc. l2arc is memory mapped, so they say, so
perhaps there is a method to reserve as much memory for this as
possible, too.
Could one attain this by setting zfs_arc_meta_limit to a higher value?
I don't need much process memory on this machine (I use rsync and not
much else).

I was also wondering if setting secondarycache=metadata for that zpool
would be a good idea (to make sure l2arc stays reserver for metadata,
since the DDT is considered metadata).
Bad idea, or would it even help to set primarycache=metadata too, to not
let RAM fill up with file data?

P.S. the system is: NexentaOS_134f (I'm looking into newer OpenSolaris
variants with bugs fixed/better performance, too).

-- 
Frank Van Damme
No part of this copyright message may be reproduced, read or seen,
dead or alive or by any means, including but not limited to telepathy
without the benevolence of the author.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS ... open source moving forward?

2010-12-11 Thread Frank Van Damme
2010/12/10 Freddie Cash fjwc...@gmail.com:
 On Fri, Dec 10, 2010 at 5:31 AM, Edward Ned Harvey
 opensolarisisdeadlongliveopensola...@nedharvey.com wrote:
 It's been a while since I last heard anybody say anything about this.
 What's the latest version of publicly released ZFS?  Has oracle made it
 closed-source moving forward?

 Nexenta ... openindiana ... etc ... Are they all screwed?

 ZFSv28 is available for FreeBSD 9-CURRENT.

 We won't know until after Oracle releases Solaris 11 whether or not
 they'll live up to their promise to open the source to ZFSv31.  Until
 Solaris 11 is released, there's really not much point in debating it.

And if they don't, it will be Sad, both in terms of useful code not
being available to a wide community to review and amend, as in terms
of Oracle not really getting the point about open source development.


-- 
Frank Van Damme
No part of this copyright message may be reproduced, read or seen,
dead or alive or by any means, including but not limited to telepathy
without the benevolence of the author.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] very slow boot: stuck at mounting zfs filesystems

2010-12-09 Thread Frank Van Damme
2010/12/8 taemun tae...@gmail.com:
 Dedup? Taking a long time to boot after hard reboot after lookup?

 I'll bet that it hard locked whilst deleting some files or a dataset that
 was dedup'd. After the delete is started, it spends *ages* cleaning up the
 DDT (the table containing a list of dedup'd blocks). If you hard lock in the
 middle of this clean up, then the DDT isn't valid, to anything. The next
 mount attempt on that pool will do this operation for you. Which will take
 an inordinate amount of time. My pool spent eight days (iirc) in limbo,
 waiting for the DDT cleanup to finish. Once it did, it wrote out a shedload
 of blocks and then everything was fine. This was for a zfs destroy of a
 900GB, 64KiB block dataset, over 2x 8-wide raidz vdevs.

Eight days is just... scary.
Ok so basically it seems you can't have all the advantages of zfs at
once. No more fsck, but if you have a deduplicated pool the kernel
will still consider it as unclean if you have a crash or unclean
shutdown?

I am indeed nearly continously deleting older files because each day a
mass of files gets written to the machine (and backups rotated). Is it
in some way possible to do the cleanup in smaller increments so the
amount of cleanup work to do when you (hard)reboot is smaller?

 Unfortunately, raidz is of course slower for random reads than a set or
 mirrors. The raidz/mirror hybrid allocator available in snv_148+ is somewhat
 of a workaround for this, although I've not seen comprehensive figures for
 the gain it gives
 - http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6977913



-- 
Frank Van Damme
No part of this copyright message may be reproduced, read or seen,
dead or alive or by any means, including but not limited to telepathy
without the benevolence of the author.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] very slow boot: stuck at mounting zfs filesystems

2010-12-09 Thread Frank Van Damme
2010/12/8  gon...@comcast.net:
 To explain further the  slow delete problem:

 It is absolutely critical for zfs to manage the incoming data rate.
 This is done reasonably well for write transactions.

 Delete transactions, prior to dedup, were very light-weight, nearly free,
 so these are not managed.

 Because of dedup,  deletes become rather expensive, because they introduce a
 substantial seek penalty. Mostly because the need to update the dedupe
 meta data (reference counts and such)

 The mechanism of the problem:
 1) Too many delete transactions are accepted into the
 open transaction group.

 2) When this txg comes up to be synced to disk, the sync takes a very long
 time.
 ( instead of a healthy 1-2 seconds, minutes, hours or days)

Ok, had to look that one up, but the fog starts clearing up.
I reckon in zfs land, a command like sync has no effect at all?

 3) Because the open txg can not be closed while the sync of a previous txg
 is in progress, eventually we run out of buffer space in the open txg, and all
 input is severely throttled.

 4) Because of (3) other bad things happen, like the arc tries to shrink,
 memory shortage, making things worse.

Yes... I see... speaking of which: the arc size on my system would be
1685483656 bytes - that's 1.6 GB in a system with 6 GB, with 3942 MB
allocated to the kernel (dixit mdb's ::memstat module). So can i
assume that the better part of the rest is allocated in buffers that
needlessly fill up over time? I'd much rather have the memory used for
ARC :)

 5) Because delete-s persist across reboots, you are unable to mount your
 pool

 Once solution is booting into maintenance mode, and renaming the zfs cache
 file (look in /etc/zfs, I forget the name at the moment)
 You can then boot up and import your pool. The import will take a long time
 but meanwhile you are up and can do other things.
 At that point you have the option of getting rid of the pool and starting
 over
 ( possibly installing a better kernel and starting over)..
 After update, and import, update your pool to the current pool version
 and life will be much better.

By now, the system booted up. It has taken quit a few hours though.
This system is actually running Nexenta but I'll see if I can upgrade
the kernel.

 I hope this helps, good luck

It clarified a few things. Thank you very much. There are one or two
things I still have to change on this system it seems...

 In addition, there was virtual memory related bug (allocating one of the zfs
 memory caches with the wrong object size) that would cause other
 components to hang, waiting for memory allocations.

 This was so bad in earlier kernels that systems would become unresponsive
 for
 a potentially very long time ( a phenomenon known as bricking).

 As I recall a lot fo fixes came in in the 140 series kernels to fix this.

 Anything 145 and above should be OK.

I'm on 134f. No wonder.

-- 
Frank Van Damme
No part of this copyright message may be reproduced, read or seen,
dead or alive or by any means, including but not limited to telepathy
without the benevolence of the author.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] very slow boot: stuck at mounting zfs filesystems

2010-12-08 Thread Frank Van Damme
Hello list,

I'm having trouble with a server holding a lot of data. After a few
months of uptime, it is currently rebooting from a lockup (reason
unknown so far) but it is taking hours to boot up again. The boot
process is stuck at the stage where it says:
mounting zfs filesystems (1/5)
the machine responds to pings and keystrokes. I can see disk activity;
the disk leds blink one after another.

The file system layout is: a 40 GB mirror for the syspool, and a raidz
volume over 4 2TB disks which I use for taking backups (=the purpose
of this machine). I have deduplication enabled on the backups pool
(which turned out to be pretty slow for file deletes since there are a
lot of files on the backups pool and I haven't installed an l2arc
yet). The main memory is 6 GB, it's an HP server running Nexenta core
platform (kernel version 134f).

I assume sooner or later the machine will boot up, but I'm in a bit of
a panic about how to solve this permanently - after all the last thing
I want is not being able to restore data one day because it takes days
to boot the machine.

Does anyone have an idea how much longer it may take and if the
problem may have anything to do with dedup?

-- 
Frank Van Damme
No part of this copyright message may be reproduced, read or seen,
dead or alive or by any means, including but not limited to telepathy
without the benevolence of the author.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] deduplication: l2arc size

2010-08-23 Thread Frank Van Damme
Hi,

this has already been the source of a lot of interesting discussions, so
far I haven't found the ultimate conclusion. From some discussion on
this list in February, I learned that an antry in ZFS' deduplication
table takes (in practice) half a KiB of memory. At the moment my data
looks like this (output of zdb -D)...


DDT-sha256-zap-duplicate: 3299796 entries, size 350 on disk, 163 in core
DDT-sha256-zap-unique: 9727611 entries, size 333 on disk, 151 in core

dedup = 1.73, compress = 1.20, copies = 1.00, dedup * compress / copies
= 2.07

So that means the DDT contains a total of 13,027,407 entries, meaning
it's 6,670,032,384 bytes big. So suppose our data grow on with a factor
12, it will take 80 GB. So, it would be best to buy a 128 GB SSD as
L2ARC cache. Correct?


Thanks for enlightening me,


-- 
Frank Van Damme
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss