Re: [zfs-discuss] tuning zfs_arc_min
Op 12-10-11 02:27, Richard Elling schreef: On Oct 11, 2011, at 2:03 PM, Frank Van Damme wrote: Honestly? I don't remember. might be a leftover setting from a year ago. by now, I figured out I need to update the boot archive in order for the new setting to have effect at boot time which apparently involves booting in safe mode. The archive should be updated when you reboot. Or you can run bootadm update-archive anytime. At boot, the zfs_arc_min is copied into arc_c_min overriding the default setting. You can see the current value via kstat: kstat -p zfs:0:arcstats:c_min zfs:0:arcstats:c_min389202432 This is the smallest size that the ARC will shrink to, when asked to shrink because other applications need memory. The root of the problem seems to be that that process never completes. 9 /lib/svc/bin/svc.startd 332 /sbin/sh /lib/svc/method/boot-archive-update 347 /sbin/bootadm update-archive Can't kill it and run from the cmdline either, it simply ignores SIGKILL. (Which shouldn't even be possible). -- No part of this copyright message may be reproduced, read or seen, dead or alive or by any means, including but not limited to telepathy without the benevolence of the author. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] tuning zfs_arc_min
2011/10/11 Richard Elling richard.ell...@gmail.com: ZFS Tunables (/etc/system): set zfs:zfs_arc_min = 0x20 set zfs:zfs_arc_meta_limit=0x1 It is not uncommon to tune arc meta limit. But I've not seen a case where tuning arc min is justified, especially for a storage server. Can you explain your reasoning? Honestly? I don't remember. might be a leftover setting from a year ago. by now, I figured out I need to update the boot archive in order for the new setting to have effect at boot time which apparently involves booting in safe mode. -- Frank Van Damme No part of this copyright message may be reproduced, read or seen, dead or alive or by any means, including but not limited to telepathy without the benevolence of the author. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] tuning zfs_arc_min
2011/10/8 James Litchfield jim.litchfi...@oracle.com: The value of zfs_arc_min specified in /etc/system must be over 64MB (0x400). Otherwise the setting is ignored. The value is in bytes not pages. wel I've now set it to 0x800 and it stubbornly stays at 2048 MB... -- Frank Van Damme No part of this copyright message may be reproduced, read or seen, dead or alive or by any means, including but not limited to telepathy without the benevolence of the author. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] tuning zfs_arc_min
Hello, quick and stupid question: I'm breaking my head over how to tunz zfs_arc_min on a running system. There must be some magic word to pipe into mdb -kw but I forgot it. I tried /etc/system but it's still at the old value after reboot: ZFS Tunables (/etc/system): set zfs:zfs_arc_min = 0x20 set zfs:zfs_arc_meta_limit=0x1 ARC Size: Current Size: 1314 MB (arcsize) Target Size (Adaptive): 5102 MB (c) Min Size (Hard Limit):2048 MB (zfs_arc_min) Max Size (Hard Limit):5102 MB (zfs_arc_max) I could use the memory now since I'm running out of it, trying to delete a large snapshot :-/ -- No part of this copyright message may be reproduced, read or seen, dead or alive or by any means, including but not limited to telepathy without the benevolence of the author. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS
2011/9/13 Paul Kraus p...@kraus-haus.org: The only tools I have found that work with zfs ACLs are the native zfs tools (zfs send / recv), the native Solaris tools (cp, mv, etc.), and Symantec NetBackup. I have not tried other commercial backup systems as we already have NBU in house. cpio, possibly? -- Frank Van Damme No part of this copyright message may be reproduced, read or seen, dead or alive or by any means, including but not limited to telepathy without the benevolence of the author. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs send/receive and ashift
Op 26-07-11 12:56, Fred Liu schreef: Any alternatives, if you don't mind? ;-) vpn's, openssl piped over netcat, a password-protected zip file,... ;) ssh would be the most practical, probably. -- No part of this copyright message may be reproduced, read or seen, dead or alive or by any means, including but not limited to telepathy without the benevolence of the author. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Zil on multiple usb keys
2011/7/15 Eugen Leitl eu...@leitl.org: Speaking of which, is there a point in using an eSATA flash stick? If yes, which? It depends on the drive off course, you'll have to look up benchmark results - but there are eSata sticks out there that are more or less built to Perform (as opposed to providing cheap storage). -- Frank Van Damme No part of this copyright message may be reproduced, read or seen, dead or alive or by any means, including but not limited to telepathy without the benevolence of the author. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Summary: Dedup memory and performance (again, again)
Op 15-07-11 04:27, Edward Ned Harvey schreef: Is anyone from Oracle reading this? I understand if you can't say what you're working on and stuff like that. But I am merely hopeful this work isn't going into a black hole... Anyway. Thanks for listening (I hope.) ttyl If they aren't, maybe someone from an open source Solaris version is :) -- No part of this copyright message may be reproduced, read or seen, dead or alive or by any means, including but not limited to telepathy without the benevolence of the author. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Summary: Dedup memory and performance (again, again)
Op 12-07-11 13:40, Jim Klimov schreef: Even if I batch background RM's so a hundred processes hang and then they all at once complete in a minute or two. Hmmm. I only run one rm process at a time. You think running more processes at the same time would be faster? -- No part of this copyright message may be reproduced, read or seen, dead or alive or by any means, including but not limited to telepathy without the benevolence of the author. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Summary: Dedup memory and performance (again, again)
Op 14-07-11 12:28, Jim Klimov schreef: Yes, quite often it seems so. Whenever my slow dcpool decides to accept a write, it processes a hundred pending deletions instead of one ;) Even so, it took quite a few pool or iscsi hangs and then reboots of both server and client, and about a week overall, to remove a 50Gb dir with 400k small files from a deduped pool served over iscsi from a volume in a physical pool. Just completed this night ;) It seems counter-intuitive - you'd say: concurrent disk access makes things only slower - , but it turns out to be true. I'm deleting a dozen times faster than before. How completely ridiculous. Thank you :-) -- No part of this copyright message may be reproduced, read or seen, dead or alive or by any means, including but not limited to telepathy without the benevolence of the author. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] question about COW and snapshots
Op 15-06-11 05:56, Richard Elling schreef: You can even have applications like databases make snapshots when they want. Makes me think of a backup utility called mylvmbackup, which is written with Linux in mind - basically it locks mysql tables, takes an LVM snapshot and releases the lock (and then you backup the database files from the snapshot). Should work at least as well with ZFS. -- No part of this copyright message may be reproduced, read or seen, dead or alive or by any means, including but not limited to telepathy without the benevolence of the author. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] question about COW and snapshots
Op 15-06-11 14:30, Simon Walter schreef: Anyone know how Google Docs does it? Anyone from Google on the list? :-) Seriously, this is the kind of feature to be found in Serious CMS applications, like, as already mentioned, Alfresco. -- No part of this copyright message may be reproduced, read or seen, dead or alive or by any means, including but not limited to telepathy without the benevolence of the author. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool import hangs any zfs-related programs, eats all RAM and dies in swapping hell
2011/6/10 Tim Cook t...@cook.ms: While your memory may be sufficient, that cpu is sorely lacking. Is it even 64bit? There's a reason intel couldn't give those things away in the early 2000s and amd was eating their lunch. A Pentium 4 is 32-bit. -- Frank Van Damme No part of this copyright message may be reproduced, read or seen, dead or alive or by any means, including but not limited to telepathy without the benevolence of the author. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] DDT sync?
2011/6/1 Edward Ned Harvey opensolarisisdeadlongliveopensola...@nedharvey.com: (2) The above is pretty much the best you can do, if your server is going to be a normal server, handling both reads writes. Because the data and the meta_data are both stored in the ARC, the data has a tendency to push the meta_data out. But in a special use case - Suppose you only care about write performance and saving disk space. For example, suppose you're the destination server of a backup policy. You only do writes, so you don't care about keeping data in cache. You want to enable dedup to save cost on backup disks. You only care about keeping meta_data in ARC. If you set primarycache=metadata I'll go test this now. The hypothesis is that my arc_meta_used should actually climb up to the arc_meta_limit before I start hitting any disk reads, so my write performance with/without dedup should be pretty much equal up to that point. I'm sacrificing the potential read benefit of caching data in ARC, in order to hopefully gain write performance - So write performance can be just as good with dedup enabled or disabled. In fact, if there's much duplicate data, the dedup write performance in this case should be significantly better than without dedup. I guess this is pretty much why I have primarycache=metadata and set zfs:zfs_arc_meta_limit=0x1 set zfs:zfs_arc_min=0xC000 in /etc/system. And the ARC size on this box tends to drop far below arc_min after a few days, not withstanding the fact it's supposed to be a hard limit. I call for an arc_data_max setting :) -- Frank Van Damme No part of this copyright message may be reproduced, read or seen, dead or alive or by any means, including but not limited to telepathy without the benevolence of the author. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] DDT sync?
Op 26-05-11 13:38, Edward Ned Harvey schreef: Perhaps a property could be set, which would store the DDT exclusively on that device. Oh yes please, let me put my DDT on an SSD. But what if you loose it (the vdev), would there be a way to reconstruct the DDT (which you need to be able to delete old, deduplicated files)? Let me guess - this requires tracing down all blocks and depends on an infamous feature called BPR? ;) Both the necessity to read write the primary storage pool... That's very hurtful. And even with infinite ram, it's going to be unavoidable for things like destroying snapshots, or anything at all you ever want to do after a reboot. Indeed. But then again, zfs also doesn't (yet?) keep its l2arc cache between reboots. Once it does, you could flush out the entire arc to l2arc before reboot. -- No part of this copyright message may be reproduced, read or seen, dead or alive or by any means, including but not limited to telepathy without the benevolence of the author. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] optimal layout for 8x 1 TByte SATA (consumer)
2011/5/26 Eugen Leitl eu...@leitl.org: How bad would raidz2 do on mostly sequential writes and reads (Athlon64 single-core, 4 GByte RAM, FreeBSD 8.2)? The best way is to go is striping mirrored pools, right? I'm worried about losing the two wrong drives out of 8. These are all 7200.11 Seagates, refurbished. I'd scrub once a week, that'd probably suck on raidz2, too? Thanks. Sequential? Let's suppose no spares. 4 mirrors of 2 = sustained bandwidth of 4 disks raidz2 with 8 disks = sustained bandwidth of 6 disks So :) -- Frank Van Damme No part of this copyright message may be reproduced, read or seen, dead or alive or by any means, including but not limited to telepathy without the benevolence of the author. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] offline dedup
2011/5/27 Edward Ned Harvey opensolarisisdeadlongliveopensola...@nedharvey.com: I don't think this is true. The reason you need arc+l2arc to store your DDT is because when you perform a write, the system will need to check and see if that block is a duplicate of an already existing block. If you dedup once, and later disable dedup, the system won't bother checking to see if there are duplicate blocks anymore. So the DDT won't need to be in arc+l2arc. I should say shouldn't. Except when deleting deduped blocks. -- Frank Van Damme No part of this copyright message may be reproduced, read or seen, dead or alive or by any means, including but not limited to telepathy without the benevolence of the author. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS, Oracle and Nexenta
Op 24-05-11 22:58, LaoTsao schreef: With various fock of opensource project E.g. Zfs, opensolaris, openindina etc there are all different There are not guarantee to be compatible I hope at least they'll try. Just in case I want to import/export zpools between Nexenta and OpenIndiana? -- No part of this copyright message may be reproduced, read or seen, dead or alive or by any means, including but not limited to telepathy without the benevolence of the author. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS, Oracle and Nexenta
Op 25-05-11 14:27, joerg.moellenk...@sun.com schreef: Well, at first ZFS development is no standard body and at the end everything has to be measured in compatibility to the Oracle ZFS implementation Why? Given that ZFS is Solaris ZFS just as well as Nexenta ZFS just as well as illumos ZFS, by what reason is Oracle ZFS being declared the standard or reference? Because they write the first so-many lines or because they make the biggest sales on it (kinda hard to sell licenses to an open source product)? -- No part of this copyright message may be reproduced, read or seen, dead or alive or by any means, including but not limited to telepathy without the benevolence of the author. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Solaris vs FreeBSD question
Op 20-05-11 01:17, Chris Forgeron schreef: I ended up switching back to FreeBSD after using Solaris for some time because I was getting tired of weird pool corruptions and the like. Did you ever manage to recover the data you blogged about on Sunday, February 6, 2011? -- No part of this copyright message may be reproduced, read or seen, dead or alive or by any means, including but not limited to telepathy without the benevolence of the author. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Faster copy from UFS to ZFS
Op 03-05-11 17:55, Brandon High schreef: -H: Hard links If you're going to this for 2 TB of data, remember to expand your swap space first (or have tons of memory). Rsync will need it to store every inode number in the directory. -- No part of this copyright message may be reproduced, read or seen, dead or alive or by any means, including but not limited to telepathy without the benevolence of the author. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Summary: Dedup and L2ARC memory requirements
Op 10-05-11 06:56, Edward Ned Harvey schreef: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Edward Ned Harvey BTW, here's how to tune it: echo arc_meta_limit/Z 0x3000 | sudo mdb -kw echo ::arc | sudo mdb -k | grep meta_limit arc_meta_limit= 768 MB Well ... I don't know what to think yet. I've been reading these numbers for like an hour, finding interesting things here and there, but nothing to really solidly point my finger at. The one thing I know for sure... The free mem drops at an unnatural rate. Initially the free mem disappears at a rate approx 2x faster than the sum of file size and metadata combined. Meaning the system could be caching the entire file and all the metadata, and that would only explain half of the memory disappearance. I'm seeing similar things. Yesterday I first rebooted with set zfs:zfs_arc_meta_limit=0x1 (that's 4 GiB) set in /etc/system and monitored while the box was doing its regular job (taking backups). zfs_arc_min is also set to 4 GiB. What I noticed is that shortly after the reboot, the arc started filling up rapidly, mostly with metadata. It shot up to: arc_meta_max = 3130 MB afterwards, the number for arc_meta_used steadily dropped. Some 12 hours ago, I started deleting files, it has deleted about 600 files since then. Now at the moment the arc size stays right at the minimum of 2 GiB, of which metadata fluctuates around 1650 MB. This is the output of the getmemstats.sh script you posted. Memory: 6135M phys mem, 539M free mem, 6144M total swap, 6144M free swap zfs:0:arcstats:c2147483648 = 2 GiB target size zfs:0:arcstats:c_max5350862848 = 5 GiB zfs:0:arcstats:c_min2147483648 = 2 GiB zfs:0:arcstats:data_size829660160 = 791 MiB zfs:0:arcstats:hdr_size 93396336= 89 MiB zfs:0:arcstats:other_size 411215168 = 392 MiB zfs:0:arcstats:size 1741492896 = 1661 Mi arc_meta_used = 1626 MB arc_meta_limit= 4096 MB arc_meta_max = 3130 MB I get way more cache misses then I'd like: Time read miss miss% dmis dm% pmis pm% mmis mm% arcszc 10:01:133K 380 10 1667 214 15 2597 1G 2G 10:02:132K 340 16372 302 46 323 16 1G 2G 10:03:132K 368 18473 321 46 347 17 1G 2G 10:04:131K 348 25444 303 63 335 24 1G 2G 10:05:132K 420 15874 332 36 383 14 1G 2G 10:06:133K 489 16 1326 357 35 427 14 1G 2G 10:07:132K 405 15492 355 39 401 15 1G 2G 10:08:132K 366 13402 326 37 366 13 1G 2G 10:09:131K 364 20181 345 58 363 20 1G 2G 10:10:134K 370 8592 311 21 3698 1G 2G 10:11:134K 351 8572 294 21 3508 1G 2G 10:12:133K 378 10592 319 26 372 10 1G 2G 10:13:133K 393 11532 339 28 393 11 1G 2G 10:14:132K 403 13402 363 35 402 13 1G 2G 10:15:133K 365 11482 317 30 365 11 1G 2G 10:16:132K 374 15402 334 40 374 15 1G 2G 10:17:133K 385 12432 341 28 383 12 1G 2G 10:18:134K 343 8642 279 19 3438 1G 2G 10:19:133K 391 10592 332 23 391 10 1G 2G So, one explanation I can think of is that the rest of the memory are l2arc pointers, supposing they are not actually counted in the arc memory usage totals (AFAIK l2arc pointers are considered to be part of arc). Then again my l2arc is still growing (slowly) and I'm only caching metadata at the moment, so you'd think it'd shrink if there's no more room for l2arc pointers. Besides I'm getting very little reads from ssd: capacity operationsbandwidth pool alloc free read write read write - - - - - - backups 5.49T 1.57T415121 3.13M 1.58M raidz1 5.49T 1.57T415121 3.13M 1.58M c0t0d0s1 - -170 16 2.47M 551K c0t1d0s1 - -171 16 2.46M 550K c0t2d0s1 - -170 16 2.53M 552K c0t3d0s1 - -170 16 2.44M 550K cache - - - - - - c1t5d0 63.4G 48.4G 20 0 2.45M 0 - - - - - - (typical statistic over 1 minute) I might try the windows solution and reboot the machine to free up memory and let it fill the cache all over again and see if I get more cache hits... hmmm... I set the
Re: [zfs-discuss] Summary: Dedup and L2ARC memory requirements
Op 09-05-11 14:36, Edward Ned Harvey schreef: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Edward Ned Harvey So now I'll change meta_max and see if it helps... Oh, know what? Nevermind. I just looked at the source, and it seems arc_meta_max is just a gauge for you to use, so you can know what's the highest arc_meta_used has ever reached. So the most useful thing for you to do would be to set this to 0 to reset the counter. And then you can start watching it over time. Ok good to know - but that confuses me even more since in my previous post my arc_meta_used was bigger than my arc_meta_limit (by about 50%) and now wince I doubled _limit, _used only shrunk by a couple megs. I'd really like to find some way to tell this machine CACHE MORE METADATA, DAMNIT! :-) -- No part of this copyright message may be reproduced, read or seen, dead or alive or by any means, including but not limited to telepathy without the benevolence of the author. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Summary: Dedup and L2ARC memory requirements
Op 09-05-11 15:42, Edward Ned Harvey schreef: in my previous post my arc_meta_used was bigger than my arc_meta_limit (by about 50%) I have the same thing. But as I sit here and run more and more extensive tests on it ... it seems like arc_meta_limit is sort of a soft limit. Or it only checks periodically or something like that. Because although I sometimes see size limit, and I definitely see max limit ... When I do bigger and bigger more intensive stuff, the size never grows much more than limit. It always gets knocked back down within a few seconds... I found a script called arc_summary.pl and look what it says. ARC Size: Current Size: 1734 MB (arcsize) Target Size (Adaptive): 1387 MB (c) Min Size (Hard Limit):637 MB (zfs_arc_min) Max Size (Hard Limit):5102 MB (zfs_arc_max) c = 1512 MB c_min = 637 MB c_max = 5102 MB size = 1736 MB ... arc_meta_used = 1735 MB arc_meta_limit= 2550 MB arc_meta_max = 1832 MB There are a dew seconds between running the script and ::arc | mdb -k, but it seems that it just doesn't use more arc than 1734 or so MB, and that nearly all of it is used for metadata. (I set primarycache=metadata to my data fs, so I deem it logical). So the goal seems shifted to trying to enlarge the arc size (what's it doing with the other memory??? I have close to no processes running.) -- No part of this copyright message may be reproduced, read or seen, dead or alive or by any means, including but not limited to telepathy without the benevolence of the author. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Summary: Dedup and L2ARC memory requirements
Op 06-05-11 05:44, Richard Elling schreef: As the size of the data grows, the need to have the whole DDT in RAM or L2ARC decreases. With one notable exception, destroying a dataset or snapshot requires the DDT entries for the destroyed blocks to be updated. This is why people can go for months or years and not see a problem, until they try to destroy a dataset. So what you are saying is you with your ram-starved system, don't even try to start using snapshots on that system. Right? -- No part of this copyright message may be reproduced, read or seen, dead or alive or by any means, including but not limited to telepathy without the benevolence of the author. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] gaining speed with l2arc
Hi, hello, another dedup question. I just installed an ssd disk as l2arc. This is a backup server with 6 GB RAM (ie I don't often read the same data again), basically it has a large number of old backups on it and they need to be deleted. Deletion speed seems to have improved although the majority of reads are still coming from disk. capacity operationsbandwidth pool alloc free read write read write - - - - - - backups 5.49T 1.58T 1.03K 6 3.13M 91.1K raidz1 5.49T 1.58T 1.03K 6 3.13M 91.1K c0t0d0s1 - -200 2 4.35M 20.8K c0t1d0s1 - -202 1 4.28M 24.7K c0t2d0s1 - -202 1 4.28M 24.9K c0t3d0s1 - -197 1 4.27M 13.1K cache - - - - - - c1t5d0 112G 7.96M 63 2 337K 66.6K The above output is while the machine is only deleting files (so I guess the goal is to have *all* metadata reads from the cache). So the first riddle: how to explain the low number of writes to l2arc compared to the reads from disk. Because reading bits of the DDT is supposed to be the biggest bottleneck, I reckoned it would be a good idea to try not to expire any part of my DDT from l2arc. l2arc is memory mapped, so they say, so perhaps there is a method to reserve as much memory for this as possible, too. Could one attain this by setting zfs_arc_meta_limit to a higher value? I don't need much process memory on this machine (I use rsync and not much else). I was also wondering if setting secondarycache=metadata for that zpool would be a good idea (to make sure l2arc stays reserver for metadata, since the DDT is considered metadata). Bad idea, or would it even help to set primarycache=metadata too, to not let RAM fill up with file data? P.S. the system is: NexentaOS_134f (I'm looking into newer OpenSolaris variants with bugs fixed/better performance, too). -- Frank Van Damme No part of this copyright message may be reproduced, read or seen, dead or alive or by any means, including but not limited to telepathy without the benevolence of the author. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS ... open source moving forward?
2010/12/10 Freddie Cash fjwc...@gmail.com: On Fri, Dec 10, 2010 at 5:31 AM, Edward Ned Harvey opensolarisisdeadlongliveopensola...@nedharvey.com wrote: It's been a while since I last heard anybody say anything about this. What's the latest version of publicly released ZFS? Has oracle made it closed-source moving forward? Nexenta ... openindiana ... etc ... Are they all screwed? ZFSv28 is available for FreeBSD 9-CURRENT. We won't know until after Oracle releases Solaris 11 whether or not they'll live up to their promise to open the source to ZFSv31. Until Solaris 11 is released, there's really not much point in debating it. And if they don't, it will be Sad, both in terms of useful code not being available to a wide community to review and amend, as in terms of Oracle not really getting the point about open source development. -- Frank Van Damme No part of this copyright message may be reproduced, read or seen, dead or alive or by any means, including but not limited to telepathy without the benevolence of the author. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] very slow boot: stuck at mounting zfs filesystems
2010/12/8 taemun tae...@gmail.com: Dedup? Taking a long time to boot after hard reboot after lookup? I'll bet that it hard locked whilst deleting some files or a dataset that was dedup'd. After the delete is started, it spends *ages* cleaning up the DDT (the table containing a list of dedup'd blocks). If you hard lock in the middle of this clean up, then the DDT isn't valid, to anything. The next mount attempt on that pool will do this operation for you. Which will take an inordinate amount of time. My pool spent eight days (iirc) in limbo, waiting for the DDT cleanup to finish. Once it did, it wrote out a shedload of blocks and then everything was fine. This was for a zfs destroy of a 900GB, 64KiB block dataset, over 2x 8-wide raidz vdevs. Eight days is just... scary. Ok so basically it seems you can't have all the advantages of zfs at once. No more fsck, but if you have a deduplicated pool the kernel will still consider it as unclean if you have a crash or unclean shutdown? I am indeed nearly continously deleting older files because each day a mass of files gets written to the machine (and backups rotated). Is it in some way possible to do the cleanup in smaller increments so the amount of cleanup work to do when you (hard)reboot is smaller? Unfortunately, raidz is of course slower for random reads than a set or mirrors. The raidz/mirror hybrid allocator available in snv_148+ is somewhat of a workaround for this, although I've not seen comprehensive figures for the gain it gives - http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6977913 -- Frank Van Damme No part of this copyright message may be reproduced, read or seen, dead or alive or by any means, including but not limited to telepathy without the benevolence of the author. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] very slow boot: stuck at mounting zfs filesystems
2010/12/8 gon...@comcast.net: To explain further the slow delete problem: It is absolutely critical for zfs to manage the incoming data rate. This is done reasonably well for write transactions. Delete transactions, prior to dedup, were very light-weight, nearly free, so these are not managed. Because of dedup, deletes become rather expensive, because they introduce a substantial seek penalty. Mostly because the need to update the dedupe meta data (reference counts and such) The mechanism of the problem: 1) Too many delete transactions are accepted into the open transaction group. 2) When this txg comes up to be synced to disk, the sync takes a very long time. ( instead of a healthy 1-2 seconds, minutes, hours or days) Ok, had to look that one up, but the fog starts clearing up. I reckon in zfs land, a command like sync has no effect at all? 3) Because the open txg can not be closed while the sync of a previous txg is in progress, eventually we run out of buffer space in the open txg, and all input is severely throttled. 4) Because of (3) other bad things happen, like the arc tries to shrink, memory shortage, making things worse. Yes... I see... speaking of which: the arc size on my system would be 1685483656 bytes - that's 1.6 GB in a system with 6 GB, with 3942 MB allocated to the kernel (dixit mdb's ::memstat module). So can i assume that the better part of the rest is allocated in buffers that needlessly fill up over time? I'd much rather have the memory used for ARC :) 5) Because delete-s persist across reboots, you are unable to mount your pool Once solution is booting into maintenance mode, and renaming the zfs cache file (look in /etc/zfs, I forget the name at the moment) You can then boot up and import your pool. The import will take a long time but meanwhile you are up and can do other things. At that point you have the option of getting rid of the pool and starting over ( possibly installing a better kernel and starting over).. After update, and import, update your pool to the current pool version and life will be much better. By now, the system booted up. It has taken quit a few hours though. This system is actually running Nexenta but I'll see if I can upgrade the kernel. I hope this helps, good luck It clarified a few things. Thank you very much. There are one or two things I still have to change on this system it seems... In addition, there was virtual memory related bug (allocating one of the zfs memory caches with the wrong object size) that would cause other components to hang, waiting for memory allocations. This was so bad in earlier kernels that systems would become unresponsive for a potentially very long time ( a phenomenon known as bricking). As I recall a lot fo fixes came in in the 140 series kernels to fix this. Anything 145 and above should be OK. I'm on 134f. No wonder. -- Frank Van Damme No part of this copyright message may be reproduced, read or seen, dead or alive or by any means, including but not limited to telepathy without the benevolence of the author. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] very slow boot: stuck at mounting zfs filesystems
Hello list, I'm having trouble with a server holding a lot of data. After a few months of uptime, it is currently rebooting from a lockup (reason unknown so far) but it is taking hours to boot up again. The boot process is stuck at the stage where it says: mounting zfs filesystems (1/5) the machine responds to pings and keystrokes. I can see disk activity; the disk leds blink one after another. The file system layout is: a 40 GB mirror for the syspool, and a raidz volume over 4 2TB disks which I use for taking backups (=the purpose of this machine). I have deduplication enabled on the backups pool (which turned out to be pretty slow for file deletes since there are a lot of files on the backups pool and I haven't installed an l2arc yet). The main memory is 6 GB, it's an HP server running Nexenta core platform (kernel version 134f). I assume sooner or later the machine will boot up, but I'm in a bit of a panic about how to solve this permanently - after all the last thing I want is not being able to restore data one day because it takes days to boot the machine. Does anyone have an idea how much longer it may take and if the problem may have anything to do with dedup? -- Frank Van Damme No part of this copyright message may be reproduced, read or seen, dead or alive or by any means, including but not limited to telepathy without the benevolence of the author. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] deduplication: l2arc size
Hi, this has already been the source of a lot of interesting discussions, so far I haven't found the ultimate conclusion. From some discussion on this list in February, I learned that an antry in ZFS' deduplication table takes (in practice) half a KiB of memory. At the moment my data looks like this (output of zdb -D)... DDT-sha256-zap-duplicate: 3299796 entries, size 350 on disk, 163 in core DDT-sha256-zap-unique: 9727611 entries, size 333 on disk, 151 in core dedup = 1.73, compress = 1.20, copies = 1.00, dedup * compress / copies = 2.07 So that means the DDT contains a total of 13,027,407 entries, meaning it's 6,670,032,384 bytes big. So suppose our data grow on with a factor 12, it will take 80 GB. So, it would be best to buy a 128 GB SSD as L2ARC cache. Correct? Thanks for enlightening me, -- Frank Van Damme ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss