[zfs-discuss] Dedupe asynchronous mode?
I'm a bit unclear how to use/try de-duplication in asynchronous mode? Can someone kindly clarify? Is it as simple as enabling then disabling after something completes? Thanks -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedupe asynchronous mode?
As I have understood it, reading Jeff Bonwicks blog, async dedup is not supported. The reason is that async is good if you have constraints on CPU and RAM. But todays modern CPU can dedup in real time, so async is not needed. Async allows dedup when you have spare clock cycles to burn (in the night). But todays modern CPU are very powerful. Async is not needed. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe question
Isn't dedupe in some ways the antithesis of setting copies 1? We go to a lot of trouble to create redundancy (n-way mirroring, raidz-n, copies=n, etc) to make things as robust as possible and then we reduce redundancy with dedupe and compression But are we reducing redundancy? I don't know the details of how dedupe is implemented, but I'd have thought that if copies=2, you get 2 copies of each dedupe block. So your data is just as safe since you haven't actually changed the redundancy, it's just that like you say: you're risking more data being lost in the event of a problem. However, the flip side of that is that dedupe in many circumstances will free up a lot of space, possibly enough to justify copies=3, or even 4. So if you were to use dedupe and compression, you could probably add more redundancy without loosing capacity. And with the speed benefits associated with dedupe to boot. More reliable and faster, at the same price. Sounds good to me :D -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe question
On Fri, 13 Nov 2009, Ross wrote: But are we reducing redundancy? I don't know the details of how dedupe is implemented, but I'd have thought that if copies=2, you get 2 copies of each dedupe block. So your data is just as safe since you haven't actually changed the redundancy, it's just that like you say: you're risking more data being lost in the event of a problem. Another point is that the degree of risk is related to the degree of total exposure. The more disk space consumed, the greater the chance that there will be data loss. Assuming that the algorithm and implementation are quite solid, it seems that dedupe should increase data reliability. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe question
On Fri, Nov 13, 2009 at 7:09 AM, Ross myxi...@googlemail.com wrote: Isn't dedupe in some ways the antithesis of setting copies 1? We go to a lot of trouble to create redundancy (n-way mirroring, raidz-n, copies=n, etc) to make things as robust as possible and then we reduce redundancy with dedupe and compression But are we reducing redundancy? I don't know the details of how dedupe is implemented, but I'd have thought that if copies=2, you get 2 copies of each dedupe block. So your data is just as safe since you haven't actually changed the redundancy, it's just that like you say: you're risking more data being lost in the event of a problem. However, the flip side of that is that dedupe in many circumstances will free up a lot of space, possibly enough to justify copies=3, or even 4. So if you were to use dedupe and compression, you could probably add more redundancy without loosing capacity. And with the speed benefits associated with dedupe to boot. More reliable and faster, at the same price. Sounds good to me :D I believe in a previous thread, Adam had said that it automatically keeps more copies of a block based on how many references there are to that block. IE: If there's 20 references it would keep 2 copies, whereas if there's 20,000 it would keep 5. I'll have to see if I can dig up the old thread. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe question
On 13.11.09 16:09, Ross wrote: Isn't dedupe in some ways the antithesis of setting copies 1? We go to a lot of trouble to create redundancy (n-way mirroring, raidz-n, copies=n, etc) to make things as robust as possible and then we reduce redundancy with dedupe and compression But are we reducing redundancy? I don't know the details of how dedupe is implemented, but I'd have thought that if copies=2, you get 2 copies of each dedupe block. So your data is just as safe since you haven't actually changed the redundancy, it's just that like you say: you're risking more data being lost in the event of a problem. However, the flip side of that is that dedupe in many circumstances will free up a lot of space, possibly enough to justify copies=3, or even 4. It is not possible to set copies to 4. There's space for only 3 addresses in the block pointer. There's also dedupditto property which specifies a threshold, and if reference count for deduped block goes above the threshold, another ditto copy of it is stored automatically. victor ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe question
Got some out-of-curiosity questions for the gurus if they have time to answer: Isn't dedupe in some ways the antithesis of setting copies 1? We go to a lot of trouble to create redundancy (n-way mirroring, raidz-n, copies=n, etc) to make things as robust as possible and then we reduce redundancy with dedupe and compression :-). What would be the difference in MTTDL between a scenario where dedupe ratio is exactly two and you've set copies=2 vs. no dedupe and copies=1? Intuitively MTTDL would be better because of the copies=2, but you'd lose twice the data when DL eventually happens. Similarly, if hypothetically dedupe ratio = 1.5 and you have a two-way mirror, vs. no dedupe and a 3 disk raidz1, which would be more reliable? Again intuition says the mirror because there's one less device to fail, but device failure isn't the only consideration. In both cases it sounds like you might gain a bit in performance, especially if the dedupe ratio is high because you don't have to write the actual duplicated blocks on a write and on a read you are more likely to have the data blocks in cache. Does this make sense? Maybe there are too many variables, but it would be so interesting to hear of possible decision making algorithms. A similar discussion applies to compression, although that seems to defeat redundancy more directly. This analysis requires good statistical maths skills! Thanks -- Frank ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe question
On Nov 12, 2009, at 1:36 PM, Frank Middleton wrote: Got some out-of-curiosity questions for the gurus if they have time to answer: Isn't dedupe in some ways the antithesis of setting copies 1? We go to a lot of trouble to create redundancy (n-way mirroring, raidz-n, copies=n, etc) to make things as robust as possible and then we reduce redundancy with dedupe and compression :-). What would be the difference in MTTDL between a scenario where dedupe ratio is exactly two and you've set copies=2 vs. no dedupe and copies=1? Intuitively MTTDL would be better because of the copies=2, but you'd lose twice the data when DL eventually happens. The MTTDL models I've used consider any loss a complete loss. But there are some interesting wrinkles to explore here... :-) Similarly, if hypothetically dedupe ratio = 1.5 and you have a two-way mirror, vs. no dedupe and a 3 disk raidz1, which would be more reliable? Again intuition says the mirror because there's one less device to fail, but device failure isn't the only consideration. In both cases it sounds like you might gain a bit in performance, especially if the dedupe ratio is high because you don't have to write the actual duplicated blocks on a write and on a read you are more likely to have the data blocks in cache. Does this make sense? Maybe there are too many variables, but it would be so interesting to hear of possible decision making algorithms. A similar discussion applies to compression, although that seems to defeat redundancy more directly. This analysis requires good statistical maths skills! There are several dimensions here. But I'm not yet convinced there is a configuration decision point to consume a more detailed analysis. In other words, if you could decide between two or more possible configurations, what would you wish to consider to improve the outcome? Thoughts? -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe question
Dennis Clarke wrote: On Sat, 2009-11-07 at 17:41 -0500, Dennis Clarke wrote: Does the dedupe functionality happen at the file level or a lower block level? it occurs at the block allocation level. I am writing a large number of files that have the fol structure : -- file begins 1024 lines of random ASCII chars 64 chars long some tilde chars .. about 1000 of then some text ( english ) for 2K more text ( english ) for 700 bytes or so -- ZFS's default block size is 128K and is controlled by the recordsize filesystem property. Unless you changed recordsize, each of the files above would be a single block distinct from the others. you may or may not get better dedup ratios with a smaller recordsize depending on how the common parts of the file line up with block boundaries. the cost of additional indirect blocks might overwhelm the savings from deduping a small common piece of the file. - Bill Well, I as curious about these sort of things and figured that a simple test would show me the behavior. Now the first test I did was to write 26^2 files [a-z][a-z].dat in 26^2 directories named [a-z][a-z] where each file is 64K of random non-compressible data and then some english text. I guess I was wrong about the 64K random text chunk also .. because I wrote out that data as chars from the set { [A-Z][a-z][0-9] } and thus .. compressible ASCII data as opposed to random binary data. So ... after doing that a few times I now see something fascinating : $ ls -lo /tester/foo/*/aa/aa.dat -rw-r--r-- 1 dclarke68330 Nov 7 22:38 /tester/foo/1/aa/aa.dat -rw-r--r-- 1 dclarke68330 Nov 7 22:45 /tester/foo/2/aa/aa.dat -rw-r--r-- 1 dclarke68330 Nov 7 22:43 /tester/foo/3/aa/aa.dat -rw-r--r-- 1 dclarke68330 Nov 7 22:43 /tester/foo/4/aa/aa.dat $ ls -lo /tester/foo/*/zz/az.dat -rw-r--r-- 1 dclarke68330 Nov 7 22:39 /tester/foo/1/zz/az.dat -rw-r--r-- 1 dclarke68330 Nov 7 22:47 /tester/foo/2/zz/az.dat -rw-r--r-- 1 dclarke68330 Nov 7 22:45 /tester/foo/3/zz/az.dat -rw-r--r-- 1 dclarke68330 Nov 7 22:47 /tester/foo/4/zz/az.dat $ find /tester/foo -type f | wc -l 70304 Those files, all 70,000+ of them, are unique and smaller than the filesystem blocksize. However : $ zfs get used,available,referenced,compressratio,recordsize,compression,dedup zp_dd/tester NAME PROPERTY VALUE SOURCE zp_dd/tester used 4.51G - zp_dd/tester available 3.49G - zp_dd/tester referenced 4.51G - zp_dd/tester compressratio 1.00x - zp_dd/tester recordsize 128K default zp_dd/tester compressionoff local zp_dd/tester dedup onlocal Compression factors don't interest me at the moment .. but see this : $ zpool get all zp_dd NAME PROPERTY VALUE SOURCE zp_dd size 67.5G - zp_dd capacity 6% - zp_dd altroot- default zp_dd health ONLINE - zp_dd guid 14649016030066358451 default zp_dd version21 default zp_dd bootfs - default zp_dd delegation on default zp_dd autoreplaceoff default zp_dd cachefile - default zp_dd failmode waitdefault zp_dd listsnapshots off default zp_dd autoexpand off default zp_dd dedupratio 1.95x - zp_dd free 63.3G - zp_dd allocated 4.22G - The dedupe ratio has climbed to 1.95x with all those unique files that are less than %recordsize% bytes. You can get more dedup information by running 'zdb -DD zp_dd'. This should show you how we break things down. Add more 'D' options and get even more detail. - George ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe question
On Sat, 7 Nov 2009, Dennis Clarke wrote: Now the first test I did was to write 26^2 files [a-z][a-z].dat in 26^2 directories named [a-z][a-z] where each file is 64K of random non-compressible data and then some english text. What method did you use to produce this random data? The dedupe ratio has climbed to 1.95x with all those unique files that are less than %recordsize% bytes. Perhaps there are other types of blocks besides user data blocks (e.g. metadata blocks) which become subject to deduplication? Presumably 'dedupratio' is based on a count of blocks rather than percentage of total data. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe question
On Sat, 7 Nov 2009, Dennis Clarke wrote: Now the first test I did was to write 26^2 files [a-z][a-z].dat in 26^2 directories named [a-z][a-z] where each file is 64K of random non-compressible data and then some english text. What method did you use to produce this random data? I'm using the tt800 method from Makoto Matsumoto described here : see http://random.mat.sbg.ac.at/generators/ and then here : /* * Generate the random text before we need it and also * outside of the area that measures the IO time. * We could have just read bytes from /dev/urandom but * you would be *amazed* how slow that is. */ random_buffer_start_hrt = gethrtime(); if ( random_buffer_start_hrt == -1 ) { perror(Could not get random_buffer high res start time); exit(EXIT_FAILURE); } for ( char_count = 0; char_count 65535; ++char_count ) { k_index = (int) ( genrand() * (double) 62 ); buffer_64k_rand_text[char_count]=alph[k_index]; } /* would be nice to break this into 0x40h char lines */ for ( p = 0x03fu; p 65535; p = p + 0x040u ) buffer_64k_rand_text[p]='\n'; buffer_64k_rand_text[65535]='\n'; buffer_64k_rand_text[65536]='\0'; random_buffer_end_hrt = gethrtime(); That works well. You know what ... I'm a schmuck. I didn't grab a time based seed first. All those files with random text .. have identical twins on the filesystem somewhere. :-P damn I'll go fix that. The dedupe ratio has climbed to 1.95x with all those unique files that are less than %recordsize% bytes. Perhaps there are other types of blocks besides user data blocks (e.g. metadata blocks) which become subject to deduplication? Presumably 'dedupratio' is based on a count of blocks rather than percentage of total data. I have no idea .. yet. I figure I'll try a few more experiments to see what it does and maybe, dare I say it, look at the source :-) -- Dennis Clarke dcla...@opensolaris.ca - Email related to the open source Solaris dcla...@blastwave.org - Email related to open source for Solaris ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe question
You can get more dedup information by running 'zdb -DD zp_dd'. This should show you how we break things down. Add more 'D' options and get even more detail. - George OKay .. thank you. Looks like I have piles of numbers here : # zdb -DDD zp_dd DDT-sha256-zap-duplicate: 37317 entries, size 342 on disk, 210 in core bucket allocated referenced __ __ __ refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE -- -- - - - -- - - - 218.4K763M355M355M37.9K 1.52G727M727M 418.0K 1.16G 1.15G 1.15G72.4K 4.67G 4.61G 4.61G 8 70 1.47M849K849K 657 12.0M 6.78M 6.78M 16 27 39.5K 31.5K 31.5K 535747K598K598K 326 4K 4K 4K 276180K180K180K 644 9.00K 6.50K 6.50K 340680K481K481K 1281 2K 1.50K 1.50K 170340K255K255K 2561 1K 1K 1K 313313K313K313K 5121 512 512 512 522261K261K261K Total36.4K 1.91G 1.50G 1.50G 113K 6.21G 5.33G 5.33G DDT-sha256-zap-unique: 154826 entries, size 335 on disk, 196 in core bucket allocated referenced __ __ __ refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE -- -- - - - -- - - - 1 151K 5.61G 2.52G 2.52G 151K 5.61G 2.52G 2.52G Total 151K 5.61G 2.52G 2.52G 151K 5.61G 2.52G 2.52G DDT histogram (aggregated over all DDTs): bucket allocated referenced __ __ __ refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE -- -- - - - -- - - - 1 151K 5.61G 2.52G 2.52G 151K 5.61G 2.52G 2.52G 218.4K763M355M355M37.9K 1.52G727M727M 418.0K 1.16G 1.15G 1.15G72.4K 4.67G 4.61G 4.61G 8 70 1.47M849K849K 657 12.0M 6.78M 6.78M 16 27 39.5K 31.5K 31.5K 535747K598K598K 326 4K 4K 4K 276180K180K180K 644 9.00K 6.50K 6.50K 340680K481K481K 1281 2K 1.50K 1.50K 170340K255K255K 2561 1K 1K 1K 313313K313K313K 5121 512 512 512 522261K261K261K Total 188K 7.52G 4.01G 4.01G 264K 11.8G 7.85G 7.85G dedup = 1.96, compress = 1.51, copies = 1.00, dedup * compress / copies = 2.95 # I have no idea what any of that means, yet :-) -- Dennis Clarke dcla...@opensolaris.ca - Email related to the open source Solaris dcla...@blastwave.org - Email related to open source for Solaris ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe question
On Sun, 8 Nov 2009, Dennis Clarke wrote: That works well. You know what ... I'm a schmuck. I didn't grab a time based seed first. All those files with random text .. have identical twins on the filesystem somewhere. :-P damn That is one reason why I asked. Failure to get a good seed is the most common problem. Using the time() system call is no longer good enough if multiple processes are somehow involved. It is useful to include additional information such as PID and microseconds. Reading a few characters from /dev/random to create the seed is even better. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] dedupe question
Does the dedupe functionality happen at the file level or a lower block level? I am writing a large number of files that have the fol structure : -- file begins 1024 lines of random ASCII chars 64 chars long some tilde chars .. about 1000 of then some text ( english ) for 2K more text ( english ) for 700 bytes or so -- Each file has the same tilde chars and then english text at the end of 64K of random character data. Before writing the data I see : # zpool get size,capacity,version,dedupratio,free,allocated zp_dd NAME PROPERTYVALUESOURCE zp_dd size67.5G- zp_dd capacity6% - zp_dd version 21 default zp_dd dedupratio 1.16x- zp_dd free63.3G- zp_dd allocated 4.19G- After I see this : # zpool get size,capacity,version,dedupratio,free,allocated zp_dd NAME PROPERTYVALUESOURCE zp_dd size67.5G- zp_dd capacity6% - zp_dd version 21 default zp_dd dedupratio 1.11x- zp_dd free63.1G- zp_dd allocated 4.36G- Note the drop in dedup ratio from 1.16x to 1.11x which seems to indicate that dedupe does not detect the english text is identical in every file. -- Dennis ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe question
Dennis Clarke wrote: Does the dedupe functionality happen at the file level or a lower block level? block level, but remember that block size may vary from file to file. I am writing a large number of files that have the fol structure : -- file begins 1024 lines of random ASCII chars 64 chars long some tilde chars .. about 1000 of then some text ( english ) for 2K more text ( english ) for 700 bytes or so -- Each file has the same tilde chars and then english text at the end of 64K of random character data. Before writing the data I see : # zpool get size,capacity,version,dedupratio,free,allocated zp_dd NAME PROPERTYVALUESOURCE zp_dd size67.5G- zp_dd capacity6% - zp_dd version 21 default zp_dd dedupratio 1.16x- zp_dd free63.3G- zp_dd allocated 4.19G- After I see this : # zpool get size,capacity,version,dedupratio,free,allocated zp_dd NAME PROPERTYVALUESOURCE zp_dd size67.5G- zp_dd capacity6% - zp_dd version 21 default zp_dd dedupratio 1.11x- zp_dd free63.1G- zp_dd allocated 4.36G- Note the drop in dedup ratio from 1.16x to 1.11x which seems to indicate that dedupe does not detect the english text is identical in every file. Theory: Your files may end up being in one large 128K block or maybe a couple of 64K blocks where there isn't much redundancy to de-dup. -tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe question
On Sat, 2009-11-07 at 17:41 -0500, Dennis Clarke wrote: Does the dedupe functionality happen at the file level or a lower block level? it occurs at the block allocation level. I am writing a large number of files that have the fol structure : -- file begins 1024 lines of random ASCII chars 64 chars long some tilde chars .. about 1000 of then some text ( english ) for 2K more text ( english ) for 700 bytes or so -- ZFS's default block size is 128K and is controlled by the recordsize filesystem property. Unless you changed recordsize, each of the files above would be a single block distinct from the others. you may or may not get better dedup ratios with a smaller recordsize depending on how the common parts of the file line up with block boundaries. the cost of additional indirect blocks might overwhelm the savings from deduping a small common piece of the file. - Bill ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe question
On Sat, 2009-11-07 at 17:41 -0500, Dennis Clarke wrote: Does the dedupe functionality happen at the file level or a lower block level? it occurs at the block allocation level. I am writing a large number of files that have the fol structure : -- file begins 1024 lines of random ASCII chars 64 chars long some tilde chars .. about 1000 of then some text ( english ) for 2K more text ( english ) for 700 bytes or so -- ZFS's default block size is 128K and is controlled by the recordsize filesystem property. Unless you changed recordsize, each of the files above would be a single block distinct from the others. you may or may not get better dedup ratios with a smaller recordsize depending on how the common parts of the file line up with block boundaries. the cost of additional indirect blocks might overwhelm the savings from deduping a small common piece of the file. - Bill Well, I as curious about these sort of things and figured that a simple test would show me the behavior. Now the first test I did was to write 26^2 files [a-z][a-z].dat in 26^2 directories named [a-z][a-z] where each file is 64K of random non-compressible data and then some english text. I guess I was wrong about the 64K random text chunk also .. because I wrote out that data as chars from the set { [A-Z][a-z][0-9] } and thus .. compressible ASCII data as opposed to random binary data. So ... after doing that a few times I now see something fascinating : $ ls -lo /tester/foo/*/aa/aa.dat -rw-r--r-- 1 dclarke68330 Nov 7 22:38 /tester/foo/1/aa/aa.dat -rw-r--r-- 1 dclarke68330 Nov 7 22:45 /tester/foo/2/aa/aa.dat -rw-r--r-- 1 dclarke68330 Nov 7 22:43 /tester/foo/3/aa/aa.dat -rw-r--r-- 1 dclarke68330 Nov 7 22:43 /tester/foo/4/aa/aa.dat $ ls -lo /tester/foo/*/zz/az.dat -rw-r--r-- 1 dclarke68330 Nov 7 22:39 /tester/foo/1/zz/az.dat -rw-r--r-- 1 dclarke68330 Nov 7 22:47 /tester/foo/2/zz/az.dat -rw-r--r-- 1 dclarke68330 Nov 7 22:45 /tester/foo/3/zz/az.dat -rw-r--r-- 1 dclarke68330 Nov 7 22:47 /tester/foo/4/zz/az.dat $ find /tester/foo -type f | wc -l 70304 Those files, all 70,000+ of them, are unique and smaller than the filesystem blocksize. However : $ zfs get used,available,referenced,compressratio,recordsize,compression,dedup zp_dd/tester NAME PROPERTY VALUE SOURCE zp_dd/tester used 4.51G - zp_dd/tester available 3.49G - zp_dd/tester referenced 4.51G - zp_dd/tester compressratio 1.00x - zp_dd/tester recordsize 128K default zp_dd/tester compressionoff local zp_dd/tester dedup onlocal Compression factors don't interest me at the moment .. but see this : $ zpool get all zp_dd NAME PROPERTY VALUE SOURCE zp_dd size 67.5G - zp_dd capacity 6% - zp_dd altroot- default zp_dd health ONLINE - zp_dd guid 14649016030066358451 default zp_dd version21 default zp_dd bootfs - default zp_dd delegation on default zp_dd autoreplaceoff default zp_dd cachefile - default zp_dd failmode waitdefault zp_dd listsnapshots off default zp_dd autoexpand off default zp_dd dedupratio 1.95x - zp_dd free 63.3G - zp_dd allocated 4.22G - The dedupe ratio has climbed to 1.95x with all those unique files that are less than %recordsize% bytes. -- Dennis Clarke dcla...@opensolaris.ca - Email related to the open source Solaris dcla...@blastwave.org - Email related to open source for Solaris ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe is in
I was under the impression that you can create a new zfs dataset and turn on the dedup functionality, and copy your data to it. Or am I wrong? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe is in
Orvar Korvar wrote: I was under the impression that you can create a new zfs dataset and turn on the dedup functionality, and copy your data to it. Or am I wrong? you don't even have to create a new dataset just do: # zfs set dedup=on dataset -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe is in
Trevor Pretty wrote: Darren J Moffat wrote: Orvar Korvar wrote: I was under the impression that you can create a new zfs dataset and turn on the dedup functionality, and copy your data to it. Or am I wrong? you don't even have to create a new dataset just do: # zfs set dedup=on dataset But like all ZFS functions will that not only get applied, when you (re)write (old)new data, like compression=on ? Correct but if you are creating a new dataset you are writting new data anyway. Which leads to the question would a scrub activate dedupe? Not at this time now. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe is in
Darren J Moffat wrote: Orvar Korvar wrote: I was under the impression that you can create a new zfs dataset and turn on the dedup functionality, and copy your data to it. Or am I wrong? you don't even have to create a new dataset just do: # zfs set dedup=on dataset But like all ZFS functions will that not only get applied, when you (re)write (old)new data, like compression=on ? Which leads to the question would a scrub activate dedupe? www.eagle.co.nz This email is confidential and may be legally privileged. If received in error please destroy and immediately notify us. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] dedupe is in
Deduplication was committed last night by Mr. Bonwick: Log message: PSARC 2009/571 ZFS Deduplication Properties 6677093 zfs should have dedup capability http://mail.opensolaris.org/pipermail/onnv-notify/2009-November/010683.html Via c0t0d0s0.org. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe is in
Terrific! Can't wait to read the man pages / blogs about how to use it... Alex. On Mon, Nov 2, 2009 at 12:21 PM, David Magda dma...@ee.ryerson.ca wrote: Deduplication was committed last night by Mr. Bonwick: Log message: PSARC 2009/571 ZFS Deduplication Properties 6677093 zfs should have dedup capability http://mail.opensolaris.org/pipermail/onnv-notify/2009-November/010683.html Via c0t0d0s0.org. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Marie von Ebner-Eschenbach - Even a stopped clock is right twice a day. - http://www.brainyquote.com/quotes/authors/m/marie_von_ebnereschenbac.html ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe is in
Why didn't one of the developers from green-bytes do the commit? :P /sarcasm ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe is in
On Mon, Nov 2, 2009 at 2:25 PM, Alex Lam S.L. alexla...@gmail.com wrote: Terrific! Can't wait to read the man pages / blogs about how to use it... Alex, you may wish to check PSARC 2009/571 materials [1] for a sneak preview :) [1] http://arc.opensolaris.org/caselog/PSARC/2009/571/ Alex. On Mon, Nov 2, 2009 at 12:21 PM, David Magda dma...@ee.ryerson.ca wrote: Deduplication was committed last night by Mr. Bonwick: Log message: PSARC 2009/571 ZFS Deduplication Properties 6677093 zfs should have dedup capability http://mail.opensolaris.org/pipermail/onnv-notify/2009-November/010683.html Via c0t0d0s0.org. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Marie von Ebner-Eschenbach - Even a stopped clock is right twice a day. - http://www.brainyquote.com/quotes/authors/m/marie_von_ebnereschenbac.html ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Regards, Cyril ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe is in
David Magda wrote: Deduplication was committed last night by Mr. Bonwick: Log message: PSARC 2009/571 ZFS Deduplication Properties 6677093 zfs should have dedup capability http://mail.opensolaris.org/pipermail/onnv-notify/2009-November/010683.html And PSARC 2009/479 zpool recovery support is in as well: http://mail.opensolaris.org/pipermail/onnv-notify/2009-October/010682.html ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe is in
Terrific! Can't wait to read the man pages / blogs about how to use it... Just posted one: http://blogs.sun.com/bonwick/en_US/entry/zfs_dedup Enjoy, and let me know if you have any questions or suggestions for follow-on posts. Jeff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe is in
On Mon, Nov 2, 2009 at 7:20 AM, Jeff Bonwick jeff.bonw...@sun.com wrote: Terrific! Can't wait to read the man pages / blogs about how to use it... Just posted one: http://blogs.sun.com/bonwick/en_US/entry/zfs_dedup Enjoy, and let me know if you have any questions or suggestions for follow-on posts. Jeff On systems with crypto accelerators (particularly Niagara 2) does the hash calculation code use the crypto accelerators, so long as a supported hash is used? Assuming the answer is yes, have performance comparisons been done between weaker hash algorithms implemented in software and sha256 implemented in hardware? I've been waiting very patiently to see this code go in. Thank you for all your hard work (and the work of those that helped too!). -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe is in
This is truly awesome news! What's the best way to dedup existing datasets? Will send/recv work, or do we just cp things around? Regards, Tristan Jeff Bonwick wrote: Terrific! Can't wait to read the man pages / blogs about how to use it... Just posted one: http://blogs.sun.com/bonwick/en_US/entry/zfs_dedup Enjoy, and let me know if you have any questions or suggestions for follow-on posts. Jeff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss __ This email has been scanned by the MessageLabs Email Security System. For more information please visit http://www.messagelabs.com/email __ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe is in
Double WOHOO! Thanks Victor! -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe is in
On 02.11.09 18:38, Ross wrote: Double WOHOO! Thanks Victor! Thanks should go to Tim Haley, Jeff Bonwick and George Wilson ;-) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe is in
Ok, thanks everyone then (but still thanks to Victor for the heads up) :-) On Mon, Nov 2, 2009 at 4:03 PM, Victor Latushkin victor.latush...@sun.com wrote: On 02.11.09 18:38, Ross wrote: Double WOHOO! Thanks Victor! Thanks should go to Tim Haley, Jeff Bonwick and George Wilson ;-) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe is in
Terrific! Can't wait to read the man pages / blogs about how to use it... Just posted one: http://blogs.sun.com/bonwick/en_US/entry/zfs_dedup Enjoy, and let me know if you have any questions or suggestions for follow-on posts. Looking at FIPS-180-3 in sections 4.1.2 and 4.1.3 I was thinking that the major leap from SHA256 to SHA512 was a 32-bit to 64-bit step. If the implementation of the SHA256 ( or possibly SHA512 at some point ) algorithm is well threaded then one would be able to leverage those massively multi-core Niagara T2 servers. The SHA256 hash is based on six 32-bit functions whereas SHA512 is based on six 64-bit functions. The CMT Niagara T2 can easily process those 64-bit hash functions and the multi-core CMT trend is well established. So long as context switch times are very low one would think that IO with a SHA512 based de-dupe implementation would be possible and even realistic. That would solve the hash collision concern I would think. Merely thinking out loud here ... -- Dennis Clarke dcla...@opensolaris.ca - Email related to the open source Solaris dcla...@blastwave.org - Email related to open source for Solaris ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe is in
On Mon, Nov 2, 2009 at 11:58 AM, Dennis Clarke dcla...@blastwave.org wrote: Terrific! Can't wait to read the man pages / blogs about how to use it... Just posted one: http://blogs.sun.com/bonwick/en_US/entry/zfs_dedup Enjoy, and let me know if you have any questions or suggestions for follow-on posts. Looking at FIPS-180-3 in sections 4.1.2 and 4.1.3 I was thinking that the major leap from SHA256 to SHA512 was a 32-bit to 64-bit step. If the implementation of the SHA256 ( or possibly SHA512 at some point ) algorithm is well threaded then one would be able to leverage those massively multi-core Niagara T2 servers. The SHA256 hash is based on six 32-bit functions whereas SHA512 is based on six 64-bit functions. The CMT Niagara T2 can easily process those 64-bit hash functions and the multi-core CMT trend is well established. So long as context switch times are very low one would think that IO with a SHA512 based de-dupe implementation would be possible and even realistic. That would solve the hash collision concern I would think. Merely thinking out loud here ... And my out loud thinking on this says that the crypto accelerator on a T2 system does hardware acceleration of SHA256. NAME n2cp - Ultra-SPARC T2 crypto provider device driver DESCRIPTION The n2cp device driver is a multi-threaded, loadable hardware driver supporting hardware assisted acceleration of the following cryptographic operations, which are built into the Ultra-SPARC T2 CMT processor: DES: CKM_DES_CBC, CKM_DES_ECB DES3: CKM_DES3_CBC, CKM_DES3_ECB, AES: CKM_AES_CBC, CKM_AES_ECB, CKM_AES_CTR RC4: CKM_RC4 MD5: KM_MD5, CKM_MD5_HMAC, CKM_MD5_HMAC_GENERAL, CKM_SSL3_MD5_MAC SHA-1: CKM_SHA_1, CKM_SHA_1_HMAC, CKM_SHA_1_HMAC_GENERAL, CKM_SSL3_SHA1_MAC SHA-256:CKM_SHA256, CKM_SHA256_HMAC, CKM_SHA256_HMAC_GENERAL According to page 35 of http://www.slideshare.net/ramesh_r_nagappan/wirespeed-cryptographic-acceleration-for-soa-and-java-ee-security, a T2 CPU can do 41 Gb/s of SHA256. The implication here is that this keeps the MAU's busy but the rest of the core is still idle for things like compression, TCP, etc. -- Mike Gerdts http://mgerdts.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe is in
On Mon, Nov 02, 2009 at 12:58:32PM -0500, Dennis Clarke wrote: Looking at FIPS-180-3 in sections 4.1.2 and 4.1.3 I was thinking that the major leap from SHA256 to SHA512 was a 32-bit to 64-bit step. ZFS doesn't have enough room in blkptr_t for 512-bi hashes. Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe is in
Okay, nice to hear ZFS can now use dedup. But how can I update my current OpenSolaris (2009.06) or Solaris 10 (5/09) to use this. Or have I wait for a new stable release of Solaris 10 / OpenSolaris? -- Daniel -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe is in
On 03/11/2009, at 7:32 AM, Daniel Streicher wrote: But how can I update my current OpenSolaris (2009.06) or Solaris 10 (5/09) to use this. Or have I wait for a new stable release of Solaris 10 / OpenSolaris? For OpenSolaris, you change your repository and switch to the development branches - should be available to public in about 3-3.5 weeks time. Plenty of instructions on how to do this on the net and in this list. For Solaris, you need to wait for the next update release. cheers, James ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe is in
Looks great - and by the time OpenSolaris build has it, I will have a brand new laptop to put it on ;-) One question though - I have a file server at home with 4x750GB on raidz1. When I upgrade to the latest build and set dedup=on, given that it does not have an offline mode, there is no way to operate on the existing dataset? As a workaround I can move files in and out of the pool through an external 500GB HDD, and with the ZFS snapshots I don't really risk much about losing data if anything goes (not too horribly, anyway) wrong. Thanks to you guys again for the great work! Alex. On Mon, Nov 2, 2009 at 1:20 PM, Jeff Bonwick jeff.bonw...@sun.com wrote: Terrific! Can't wait to read the man pages / blogs about how to use it... Just posted one: http://blogs.sun.com/bonwick/en_US/entry/zfs_dedup Enjoy, and let me know if you have any questions or suggestions for follow-on posts. Jeff -- Mike Ditka - If God had wanted man to play soccer, he wouldn't have given us arms. - http://www.brainyquote.com/quotes/authors/m/mike_ditka.html ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe is in
ZFS dedup will be in snv_128, but putbacks to snv_128 will not likely close till the end of this week. The OpenSolaris dev repository was updated to snv_126 last Thursday: http://mail.opensolaris.org/pipermail/opensolaris-announce/2009-October/001317.html So it looks like about 5 weeks before the dev repository will be updated to snv_128. Then we see if any bugs emerge as we all rush to test it out... Regards Nigel Smith -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe is in
James Lever wrote: On 03/11/2009, at 7:32 AM, Daniel Streicher wrote: But how can I update my current OpenSolaris (2009.06) or Solaris 10 (5/09) to use this. Or have I wait for a new stable release of Solaris 10 / OpenSolaris? For OpenSolaris, you change your repository and switch to the development branches - should be available to public in about 3-3.5 weeks time. Plenty of instructions on how to do this on the net and in this list. For Solaris, you need to wait for the next update release. at which stage a patch ( kernel Patch ) will be released that can be applied to pre update 9 releases to get the latest zpool version, existing pools would require a zpool upgrade. Enda cheers, James ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe is in
Mike Gerdts wrote: On Mon, Nov 2, 2009 at 7:20 AM, Jeff Bonwick jeff.bonw...@sun.com wrote: Terrific! Can't wait to read the man pages / blogs about how to use it... Just posted one: http://blogs.sun.com/bonwick/en_US/entry/zfs_dedup Enjoy, and let me know if you have any questions or suggestions for follow-on posts. Jeff On systems with crypto accelerators (particularly Niagara 2) does the hash calculation code use the crypto accelerators, so long as a supported hash is used? Not yet, it is coming. Currently ZFS has a private copy of SHA256 (for legacy reasons). I have an RTI pending to switch it to the same copy that the crypto framework uses. That is an optimised software implementation (SPARC, Intel and AMD64) but won't yet use the Niagara 2 or on chip crypto. There is an issue with very early boot and the crypto framework I have to resolve so that will come later. Assuming the answer is yes, have performance comparisons been done between weaker hash algorithms implemented in software and sha256 implemented in hardware? I've done some comparisons on that. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedupe is in
Great stuff, Jeff and company. You all rock. =-) A potential topic for the follow-up posts: auto-ditto, and the philosophy behind choosing a default threshold for creating a second copy. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss