Re: [zfs-discuss] Improving L1ARC cache efficiency with dedup

2011-12-29 Thread Brad Diggs
Reducing the record size would negatively impact performance. For rational why, see thesection titled "Match Average I/O Block Sizes" in my blog post on filesystem caching:http://www.thezonemanager.com/2009/03/filesystem-cache-optimization.htmlBrad
Brad Diggs | Principal Sales Consultant |972.814.3698eMail:brad.di...@oracle.comTech Blog:http://TheZoneManager.comLinkedIn:http://www.linkedin.com/in/braddiggs

On Dec 29, 2011, at 8:08 AM, Robert Milkowski wrote:Try reducing recordsize to 8K or even less *before* you put any data.This can potentially improve your dedup ratio and keep it higher after you start modifying data.From:zfs-discuss-boun...@opensolaris.org[mailto:zfs-discuss-boun...@opensolaris.org]On Behalf OfBrad DiggsSent:28 December 2011 21:15To:zfs-discuss discussion listSubject:Re: [zfs-discuss] Improving L1ARC cache efficiency with dedupAs promised, here are the findings from my testing. I created 6 directory server instances where the firstinstance has roughly 8.5GB of data. Then I initialized the remaining 5 instances from a binary backup ofthe first instance. Then, I rebooted the server to start off with an empty ZFS cache. The following tableshows the increased L1ARC size, increased search rate performance, and increase CPU% busy witheach starting and applying load to each successive directory server instance. The L1ARC cache grewa little bit with each additional instance but largely stayed the same size. Likewise, the ZFS dedup ratioremained the same because no data on the directory server instances was changing.image001.pngHowever, once I started modifying the data of the replicated directory server topology, the caching efficiencyquickly diminished. The following table shows that the delta for each instance increased by roughly 2GBafter only 300k of changes.image002.pngI suspect the divergence in data as seen by ZFS deduplication most likelyoccurs because reduplicationoccurs at the block level rather than at the byte level. When a write is sent toone directory server instance,the exact same write is propagated to the other 5 instances and thereforeshould be considered a duplicate. However this was not the case. There could be other reasons for thedivergence as well.The two key takeaways from this exercise were as follows. There is tremendous caching potentialthrough the use of ZFS deduplication. However, the current block level deduplication does notbenefit directory as much as it perhaps could if deduplication occurred at the byte level rather thanthe block level. It very could be that even byte level deduplication doesn't work as well either. Until that option is available, we won't know for sure.Regards,Bradimage003.pngBrad Diggs | Principal Sales ConsultantTech Blog:http://TheZoneManager.comLinkedIn:http://www.linkedin.com/in/braddiggsOn Dec 12, 2011, at 10:05 AM, Brad Diggs wrote:Thanks everyone for your input on this thread. It sounds like there is sufficient weightbehind the affirmative that I will include this methodology into my performance analysistest plan. If the performance goes well, I will share some of the results when we concludein January/February timeframe.Regarding the great dd use case provided earlier in this thread, the L1 and L2 ARCdetect and prevent streaming reads such as from dd from populating the cache. Seemy previous blog post at the web site link below for a way around this protectivecaching control of ZFS.http://www.thezonemanager.com/2010/02/directory-data-priming-strategies.htmlThanks again!BradPastedGraphic-2.tiffBrad Diggs | Principal Sales ConsultantTech Blog:http://TheZoneManager.comLinkedIn:http://www.linkedin.com/in/braddiggsOn Dec 8, 2011, at 4:22 PM, Mark Musante wrote:You can see the original ARC case here:http://arc.opensolaris.org/caselog/PSARC/2009/557/20091013_lori.altOn 8 Dec 2011, at 16:41, Ian Collins wrote:On 12/ 9/11 12:39 AM, Darren J Moffat wrote:On 12/07/11 20:48, Mertol Ozyoney wrote:Unfortunetly the answer is no. Neither l1 nor l2 cache is dedup aware.The only vendor i know that can do this is NetappIn fact , most of our functions, like replication is not dedup aware.For example, thecnicaly it's possible to optimize our replication thatit does not send daya chunks if a data chunk with the same chechsumexists in target, without enabling dedup on target and source.We already do that with 'zfs send -D':-DPerform dedup processing on the stream. Deduplicatedstreams cannot be received on systems that do notsupport the stream deduplication feature.Is there any more published information on how this feature works?--Ian.___zfs-discuss mailing listzfs-discuss@opensolaris.orghttp://mail.opensolaris.org/mailman/listinfo/zfs-discuss___zfs-discuss mailing listzfs-discuss@opensolaris.orghttp://mail.opensolaris.org/mailman/listinfo/zfs-discuss___zfs-discuss mailing listzfs-discuss@opensolar

Re: [zfs-discuss] Improving L1ARC cache efficiency with dedup

2011-12-29 Thread Brad Diggs
Jim,You are spot on. I was hoping that the writes would be close enough to identical thatthere would be a high ratio of duplicate data since I use the same record size, page size,compression algorithm, … etc. However, that was not the case. The main thing that Iwanted to prove though was that if the data was the same the L1 ARC only caches thedata that was actually written to storage. That is a really cool thing! I am sure there willbe future study on this topic as it applies to other scenarios.With regards to directory engineering investing any energy into optimizing ODSEE DSto more effectively leverage this caching potential, that won't happen. OUD far outperforms ODSEE. That said OUD may get some focus in this area. However, time willtell on that one.For now, I hope everyone benefits from the little that I did validate.Have a great day!Brad
Brad Diggs | Principal Sales ConsultantTech Blog:http://TheZoneManager.comLinkedIn:http://www.linkedin.com/in/braddiggs


On Dec 29, 2011, at 4:45 AM, Jim Klimov wrote:Thanks for running and publishing the tests :)A comment on your testing technique follows, though.2011-12-29 1:14, Brad Diggs wrote:As promised, here are the findings from my testing. I created 6directory server instances ...However, once I started modifying the data of the replicated directoryserver topology, the caching efficiencyquickly diminished. The following table shows that the delta for eachinstance increased by roughly 2GBafter only 300k of changes.I suspect the divergence in data as seen by ZFS deduplication mostlikely occurs because reduplicationoccurs at the block level rather than at the byte level. When a write issent to one directory server instance,the exact same write is propagated to the other 5 instances andtherefore should be considered a duplicate.However this was not the case. There could be other reasons for thedivergence as well.Hello, Brad,If you tested with Sun DSEE (and I have no reason tobelieve other descendants of iPlanet Directory serverwould work differently under the hood), then there aretwo factors hindering your block-dedup gains:1) The data is stored in the backend BerkeleyDB binaryfile. In Sun DSEE7 and/or in ZFS this could also becompressed data. Since for ZFS you dedup unique blocks,including same data at same offsets, it is quite unlikelyyou'd get the same data often enough. For example, eachdatabase might position same userdata blocks at differentoffsets due to garbage collection or whatever otheroptimisation the DB might think of, making on-diskblocks different and undedupable.You might look if it is possible to tune the databaseto write in sector-sized - min.block-sized (512b/4096b)records and consistently use the same DSEE compression(or lack thereof) - in this case you might get more sameblocks and win with dedup. But you'll likely lose withcompression, especially of the empty sparse structurewhich a database initially is.2) During replication each database actually becomesunique. There are hidden records with "ns" prefix whichmark when the record was created and replicated, whoinitiated it, etc. Timestamps in the data alreadywarrant uniqueness ;)This might be an RFE for the DSEE team though - to keepsuch volatile metadata separately from userdata. Thenyour DS instances would more likely dedup well afterreplication, and unique metadata would be storedseparately and stay unique. You might even keep it ina different dataset with no dedup, then... :)---So, at the moment, this expectation does not hold true: "When a write is sent to one directory server instance, the exact same write is propagated to the other five instances and therefore should be considered a duplicate."These writes are not exact.HTH,//Jim Klimov___zfs-discuss mailing listzfs-discuss@opensolaris.orghttp://mail.opensolaris.org/mailman/listinfo/zfs-discuss___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving L1ARC cache efficiency with dedup

2011-12-29 Thread Brad Diggs
S11 FCSBrad
Brad Diggs | Principal Sales Consultant |972.814.3698eMail:brad.di...@oracle.comTech Blog:http://TheZoneManager.comLinkedIn:http://www.linkedin.com/in/braddiggs

On Dec 29, 2011, at 8:11 AM, Robert Milkowski wrote:And these results are from S11 FCS I assume.On older builds or Illumos based distros I would expect L1 arc to grow much bigger.From:zfs-discuss-boun...@opensolaris.org[mailto:zfs-discuss-boun...@opensolaris.org]On Behalf OfBrad DiggsSent:28 December 2011 21:15To:zfs-discuss discussion listSubject:Re: [zfs-discuss] Improving L1ARC cache efficiency with dedupAs promised, here are the findings from my testing. I created 6 directory server instances where the firstinstance has roughly 8.5GB of data. Then I initialized the remaining 5 instances from a binary backup ofthe first instance. Then, I rebooted the server to start off with an empty ZFS cache. The following tableshows the increased L1ARC size, increased search rate performance, and increase CPU% busy witheach starting and applying load to each successive directory server instance. The L1ARC cache grewa little bit with each additional instance but largely stayed the same size. Likewise, the ZFS dedup ratioremained the same because no data on the directory server instances was changing.image001.pngHowever, once I started modifying the data of the replicated directory server topology, the caching efficiencyquickly diminished. The following table shows that the delta for each instance increased by roughly 2GBafter only 300k of changes.image002.pngI suspect the divergence in data as seen by ZFS deduplication most likelyoccurs because reduplicationoccurs at the block level rather than at the byte level. When a write is sent toone directory server instance,the exact same write is propagated to the other 5 instances and thereforeshould be considered a duplicate. However this was not the case. There could be other reasons for thedivergence as well.The two key takeaways from this exercise were as follows. There is tremendous caching potentialthrough the use of ZFS deduplication. However, the current block level deduplication does notbenefit directory as much as it perhaps could if deduplication occurred at the byte level rather thanthe block level. It very could be that even byte level deduplication doesn't work as well either. Until that option is available, we won't know for sure.Regards,Bradimage003.pngBrad Diggs | Principal Sales ConsultantTech Blog:http://TheZoneManager.comLinkedIn:http://www.linkedin.com/in/braddiggsOn Dec 12, 2011, at 10:05 AM, Brad Diggs wrote:Thanks everyone for your input on this thread. It sounds like there is sufficient weightbehind the affirmative that I will include this methodology into my performance analysistest plan. If the performance goes well, I will share some of the results when we concludein January/February timeframe.Regarding the great dd use case provided earlier in this thread, the L1 and L2 ARCdetect and prevent streaming reads such as from dd from populating the cache. Seemy previous blog post at the web site link below for a way around this protectivecaching control of ZFS.http://www.thezonemanager.com/2010/02/directory-data-priming-strategies.htmlThanks again!BradPastedGraphic-2.tiffBrad Diggs | Principal Sales ConsultantTech Blog:http://TheZoneManager.comLinkedIn:http://www.linkedin.com/in/braddiggsOn Dec 8, 2011, at 4:22 PM, Mark Musante wrote:You can see the original ARC case here:http://arc.opensolaris.org/caselog/PSARC/2009/557/20091013_lori.altOn 8 Dec 2011, at 16:41, Ian Collins wrote:On 12/ 9/11 12:39 AM, Darren J Moffat wrote:On 12/07/11 20:48, Mertol Ozyoney wrote:Unfortunetly the answer is no. Neither l1 nor l2 cache is dedup aware.The only vendor i know that can do this is NetappIn fact , most of our functions, like replication is not dedup aware.For example, thecnicaly it's possible to optimize our replication thatit does not send daya chunks if a data chunk with the same chechsumexists in target, without enabling dedup on target and source.We already do that with 'zfs send -D':-DPerform dedup processing on the stream. Deduplicatedstreams cannot be received on systems that do notsupport the stream deduplication feature.Is there any more published information on how this feature works?--Ian.___zfs-discuss mailing listzfs-discuss@opensolaris.orghttp://mail.opensolaris.org/mailman/listinfo/zfs-discuss___zfs-discuss mailing listzfs-discuss@opensolaris.orghttp://mail.opensolaris.org/mailman/listinfo/zfs-discuss___zfs-discuss mailing listzfs-discuss@opensolaris.orghttp://mail.opensolaris.org/mailman/listinfo/zfs-discuss___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving L1ARC cache efficiency with dedup

2011-12-12 Thread Brad Diggs
Thanks everyone for your input on this thread. It sounds like there is sufficient weightbehind the affirmative that I will include this methodology into my performance analysistest plan. If the performance goes well, I will share some of the results when we concludein January/February timeframe.Regarding the great dd use case provided earlier in this thread, the L1 and L2 ARCdetect and prevent streaming reads such as from dd from populating the cache. Seemy previous blog post at the web site link below for a way around this protectivecaching control of ZFS.http://www.thezonemanager.com/2010/02/directory-data-priming-strategies.htmlThanks again!Brad
Brad Diggs | Principal Sales ConsultantTech Blog:http://TheZoneManager.comLinkedIn:http://www.linkedin.com/in/braddiggs


On Dec 8, 2011, at 4:22 PM, Mark Musante wrote:You can see the original ARC case here:http://arc.opensolaris.org/caselog/PSARC/2009/557/20091013_lori.altOn 8 Dec 2011, at 16:41, Ian Collins wrote:On 12/ 9/11 12:39 AM, Darren J Moffat wrote:On 12/07/11 20:48, Mertol Ozyoney wrote:Unfortunetly the answer is no. Neither l1 nor l2 cache is dedup aware.The only vendor i know that can do this is NetappIn fact , most of our functions, like replication is not dedup aware.For example, thecnicaly it's possible to optimize our replication thatit does not send daya chunks if a data chunk with the same chechsumexists in target, without enabling dedup on target and source.We already do that with 'zfs send -D': -D Perform dedup processing on the stream. Deduplicated streams cannot be received on systems that do not support the stream deduplication feature.Is there any more published information on how this feature works?-- Ian.___zfs-discuss mailing listzfs-discuss@opensolaris.orghttp://mail.opensolaris.org/mailman/listinfo/zfs-discuss___zfs-discuss mailing listzfs-discuss@opensolaris.orghttp://mail.opensolaris.org/mailman/listinfo/zfs-discuss___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Improving L1ARC cache efficiency with dedup

2011-12-07 Thread Brad Diggs
Hello,I have a hypothetical question regarding ZFS reduplication. Does the L1ARC cache benefitfrom reduplicationin the sense that the L1ARC will only need to cache one copy of the reduplicated dataversus many copies? Here is an example:Imagine that I have a server with 2TB of RAM and a PB of disk storage. On this server I create a single 1TBdata file thatis full of unique data. Then I make 9 copies of that file giving each file a unique name andlocation withinthe same ZFS zpool. If I start up 10 application instances where each application reads all ofits ownuniquecopy of the data, will theL1ARC contain only the deduplicated data or will it cache separatecopies the data from each file? Insimpler terms, will the L1ARC require 10TB of RAM or just 1TB of RAM tocache all 10 1TB files worth ofdata?My hope is that since the data only physically occupies 1TB of storage via deduplication that the L1ARCwill also only require 1TB of RAM for the data.Note that I know the deduplication table will use the L1ARC as well. However, the focus of my questionis on how the L1ARC would benefit from a data caching standpoint.Thanks in advance!Brad
Brad Diggs | Principal Sales ConsultantTech Blog:http://TheZoneManager.comLinkedIn:http://www.linkedin.com/in/braddiggs


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS on solid state as disk rather than L2ARC...

2010-09-15 Thread Brad Diggs
Has anyone done much testing of just using the solid state devices (F20 or F5100) asdevices for ZFS pools? Are there any concerns with running in this mode versus usingsolid state devices for L2ARC cache?Second, has anyone done this sort of testing with MLC based solid state drives?What has your experience been?Thanks in advance!BradBrad Diggs | Principal Security Sales ConsultantOracleNorth America Technology Organization16000 Dallas Parkway, Dallas, TX 75248Tech Blog:http://TheZoneManager.comLinkedIn:http://www.linkedin.com/in/braddiggs ___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] hybrid drive: flash and platters

2010-05-25 Thread Brad Diggs
Hello,As an avid fan of the application to flash technologies to the storage stratum, I researched theDMCache project (maintainedhere). It appears that the DmCache project is quite a bit behindL2ARC but headed in the right direction.I found the lwn article very interesting as it is effectivelya Linux application of L2ARC to improveMySQL performance. I had proposed the same ideain my blog post titledFilesystem Cache Optimization Strategies.The net there is that if you can cache the data in the filesystem cache, you can improve overallperformance by reducing the I/O to disk. I had hoped to have someone do some benchmarkingof MySQL in a cache optimized server with F20 PCIe flash cards but never got around to it.So, if you want to get all of the caching benefits of DmCache, just run your app on Solaris 10 today. ;-)Have a great day!Brad Brad Diggs | Principal Security Sales Consultant | +1.972.814.3698OracleNorth America Technology Organization16000 Dallas Parkway, Dallas, TX 75248eMail:brad.di...@oracle.comTech Blog:http://TheZoneManager.comLinkedIn:http://www.linkedin.com/in/braddiggs On May 21, 2010, at 8:00 PM, David Magda wrote:Seagate is planning on releasing a disk that's part spinning rust and part flash:	http://www.theregister.co.uk/2010/05/21/seagate_momentus_xt/The design will have the flash be transparent to the operating system, but I wish they would have some way to access the two components separately. ZFS could certainly make use of it, and Linux is also working on a capability:	http://kernelnewbies.org/KernelProjects/DmCache	http://lwn.net/Articles/385442/___zfs-discuss mailing listzfs-discuss@opensolaris.orghttp://mail.opensolaris.org/mailman/listinfo/zfs-discuss___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-22 Thread Brad Diggs
Have you considered running your script with ZFS pre-fetching disabled  
altogether to see if

the results are consistent between runs?

Brad
Brad Diggs
Senior Directory Architect
Virtualization Architect
xVM Technology Lead


Sun Microsystems, Inc.
Phone x52957/+1 972-992-0002
Mail bradley.di...@sun.com
Blog http://TheZoneManager.com
Blog http://BradDiggs.com

On Jul 15, 2009, at 9:59 AM, Bob Friesenhahn wrote:


On Wed, 15 Jul 2009, Ross wrote:

Yes, that makes sense.  For the first run, the pool has only just  
been mounted, so the ARC will be empty, with plenty of space for  
prefetching.


I don't think that this hypothesis is quite correct.  If you use  
'zpool iostat' to monitor the read rate while reading a large  
collection of files with total size far larger than the ARC, you  
will see that there is no fall-off in read performance once the ARC  
becomes full.  The performance problem occurs when there is still  
metadata cached for a file but the file data has since been expunged  
from the cache.  The implication here is that zfs speculates that  
the file data will be in the cache if the metadata is cached, and  
this results in a cache miss as well as disabling the file read- 
ahead algorithm.  You would not want to do read-ahead on data that  
you already have in a cache.


Recent OpenSolaris seems to take a 2X performance hit rather than  
the 4X hit that Solaris 10 takes.  This may be due to improvement of  
existing algorithm function performance (optimizations) rather than  
a related design improvement.


I wonder if there is any tuning that can be done to counteract  
this? Is there any way to tell ZFS to bias towards prefetching  
rather than preserving data in the ARC?  That may provide better  
performance for scripts like this, or for random access workloads.


Recent zfs development focus has been on how to keep prefetch from  
damaging applications like database where prefetch causes more data  
to be read than is needed.  Since OpenSolaris now apparently  
includes an option setting which blocks file data caching and  
prefetch, this seems to open the door for use of more aggressive  
prefetch in the normal mode.


In summary, I agree with Richard Elling's hypothesis (which is the  
same as my own).


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-13 Thread Brad Diggs
You might want to have a look at my blog on filesystem cache  
tuning...  It will probably help

you to avoid memory contention between the ARC and your apps.

http://www.thezonemanager.com/2009/03/filesystem-cache-optimization.html

Brad
Brad Diggs
Senior Directory Architect
Virtualization Architect
xVM Technology Lead


Sun Microsystems, Inc.
Phone x52957/+1 972-992-0002
Mail bradley.di...@sun.com
Blog http://TheZoneManager.com
Blog http://BradDiggs.com

On Jul 4, 2009, at 2:48 AM, Phil Harman wrote:

ZFS doesn't mix well with mmap(2). This is because ZFS uses the ARC  
instead of the Solaris page cache. But mmap() uses the latter. So if  
anyone maps a file, ZFS has to keep the two caches in sync.


cp(1) uses mmap(2). When you use cp(1) it brings pages of the files  
it copies into the Solaris page cache. As long as they remain there  
ZFS will be slow for those files, even if you subsequently use  
read(2) to access them.


If you reboot, your cpio(1) tests will probably go fast again, until  
someone uses mmap(2) on the files again. I think tar(1) uses  
read(2), but from my iPod I can't be sure. It would be interesting  
to see how tar(1) performs if you run that test before cp(1) on a  
freshly rebooted system.


I have done some work with the ZFS team towards a fix, but it is  
only currently in OpenSolaris.


The other thing that slows you down is that ZFS only flushes to disk  
every 5 seconds if there are no synchronous writes. It would be  
interesting to see iostat -xnz 1 while you are running your tests.  
You may find the disks are writing very efficiently for one second  
in every five.


Hope this helps,
Phil

blogs.sun.com/pgdh


Sent from my iPod

On 4 Jul 2009, at 05:26, Bob Friesenhahn  
bfrie...@simple.dallas.tx.us wrote:



On Fri, 3 Jul 2009, Bob Friesenhahn wrote:


Copy MethodData Rate
==
cpio -pdum75 MB/s
cp -r32 MB/s
tar -cf - . | (cd dest  tar -xf -)26 MB/s


It seems that the above should be ammended.  Running the cpio based  
copy again results in zpool iostat only reporting a read bandwidth  
of 33 MB/second.  The system seems to get slower and slower as it  
runs.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can't rm file when No space left on device...

2008-06-10 Thread Brad Diggs
Great point.  Hadn't thought of it in that way.
I haven't tried truncating a file prior to trying
to remove it.  Either way though, I think it is a
bug if once the filesystem fills up, you can't remove
a file.

Brad

On Thu, 2008-06-05 at 21:13 -0600, Keith Bierman wrote:
 On Jun 5, 2008, at 8:58 PM   6/5/, Brad Diggs wrote:
 
  Hi Keith,
 
  Sure you can truncate some files but that effectively corrupts
  the files in our case and would cause more harm than good. The
  only files in our volume are data files.
 
 
 
 
 So an rm is ok, but a truncation is not?
 
 Seems odd to me, but if that's your constraint so be it.
 
-- 
-
  _/_/_/  _/_/  _/ _/   Brad Diggs
 _/  _/_/  _/_/   _/Communications Area Market
_/_/_/  _/_/  _/  _/ _/ Senior Directory Architect
   _/  _/_/  _/   _/_/
  _/_/_/   _/_/_/   _/ _/   Office:  972-992-0002
E-Mail:  [EMAIL PROTECTED]
 M  I  C  R  O  S  Y  S  T  E  M  S

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Can't rm file when No space left on device...

2008-06-04 Thread Brad Diggs
Hello,

A customer recently brought to my attention that ZFS can get
into a situation where the filesystem is full but no files 
can be removed.  The workaround is to remove a snapshot and
then you should have enough free space to remove a file.  
Here is a sample series of commands to reproduce the 
problem.

# mkfile 1g /tmp/disk.raw
# zpool create -f zFullPool /tmp/disk2.raw
# sz=`df -k /zFullPool | awk '{ print $2 }' | tail -1`
# mkfile $((${sz}-1024))k /zFullPool/f1
# zfs snapshot [EMAIL PROTECTED]
# sz=`df -k /zFullPool | awk '{ print $2 }' | tail -1`
# mkfile ${sz}k /zFullPool/f2
/zFullPool/f2: initialized 401408 of 1031798784 bytes: No space left on
device
# df -k /zFullPool
Filesystemkbytesused   avail capacity  Mounted on
zFullPool1007659 1007403   0   100%/zFullPool
# rm -f /zFullPool/f1
# ls -al /zFullPool
total 2014797
drwxr-xr-x   2 root sys4 Jun  4 12:15 .
drwxr-xr-x  31 root root   18432 Jun  4 12:14 ..
-rw--T   1 root root 1030750208 Jun  4 12:15 f1
-rw---   1 root root 1031798784 Jun  4 12:15 f2
# rm -f /zFullPool/f2
# ls -al /zFullPool
total 2014797
drwxr-xr-x   2 root sys4 Jun  4 12:15 .
drwxr-xr-x  31 root root   18432 Jun  4 12:14 ..
-rw--T   1 root root 1030750208 Jun  4 12:15 f1
-rw---   1 root root 1031798784 Jun  4 12:15 f2

At this point, the only way in which I can free up sufficient
space to remove either file is to first remove the snapshot.

# zfs destroy [EMAIL PROTECTED]
# rm -f /zFullPool/f1
# ls -al /zFullPool
total 1332
drwxr-xr-x   2 root sys3 Jun  4 12:17 .
drwxr-xr-x  31 root root   18432 Jun  4 12:14 ..
-rw---   1 root root 1031798784 Jun  4 12:15 f2

Is there an existing bug on this that is going to address
enabling the removal of a file without the pre-requisite 
removal of a snapshot?

Thanks in advance,
Brad
-- 
-
  _/_/_/  _/_/  _/ _/   Brad Diggs
 _/  _/_/  _/_/   _/Communications Area Market
_/_/_/  _/_/  _/  _/ _/ Senior Directory Architect
   _/  _/_/  _/   _/_/
  _/_/_/   _/_/_/   _/ _/   Office:  972-992-0002
E-Mail:  [EMAIL PROTECTED]
 M  I  C  R  O  S  Y  S  T  E  M  S

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] How do you determine the zfs_vdev_cache_size current value?

2008-04-29 Thread Brad Diggs
How do you ascertain the current zfs vdev cache size (e.g. 
zfs_vdev_cache_size) via mdb or kstat or any other cmd?

Thanks in advance,
Brad
-- 
The Zone Manager
http://TheZoneManager.COM
http://opensolaris.org/os/project/zonemgr

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Is gzip planned to be in S10U5?

2008-02-13 Thread Brad Diggs
Hello,

Is the gzip compression algorithm planned to be in Solaris 10 Update 5?

Thanks in advance,
Brad
-- 
The Zone Manager
http://TheZoneManager.COM
http://opensolaris.org/os/project/zonemgr

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] UFS on zvol Cache Questions...

2008-02-08 Thread Brad Diggs
Hello Darren,

Please find responses in line below...

On Fri, 2008-02-08 at 10:52 +, Darren J Moffat wrote:
 Brad Diggs wrote:
  I would like to use ZFS but with ZFS I cannot prime the cache
  and I don't have the ability to control what is in the cache 
  (e.g. like with the directio UFS option).
 
 Why do you believe you need that at all ?  

My application is directory server.  The #1 resource that 
directory needs to make maximum utilization of is RAM.  In 
order to do that, I want to control every aspect of RAM
utilization both to safely use as much RAM as possible AND
avoid contention among things trying to use RAM.

Lets consider the following example.  A customer has a 
50M entry directory.  The sum of the data (db3 files) is
approximately 60GB.  However, there is another 2GB for the
root filesystem, 30GB for the changelog, 1GB for the 
transaction logs, and 10GB for the informational logs.

The system on which directory server will run has only 
64GB of RAM.  The system is configured with the following
partitions:

  FS  Used(GB)  Description
   /  2 root
   /db60directory data
   /logs  41changelog, txn logs, and info logs
   swap   10system swap

I prefer to keep the directory db cache and entry caches
relatively small.  So the db cache is 2GB and the entry 
cache is 100M.  This leaves roughly 63GB of RAM for my 60GB
of directory data and Solaris. The only way to ensure that
the directory data (/db) is the only thing in the filesystem
cache is to set directio on / (root) and (/logs).

 What do you do to prime the cache with UFS 

cd ds_instance_dir/db
for i in `find . -name '*.db3`
do
  dd if=${i} of=/dev/null
done

 and what benefit do you think it is giving you ?

Priming the directory server data into filesystem cache 
reduces ldap response time for directory data in the
filesystem cache.  This could mean the difference between
a sub ms response time and a response time on the order of
tens or hundreds of ms depending on the underlying storage
speed.  For telcos in particular, minimal response time is 
paramount.

Another common scenario is when we do benchmark bakeoffs
with another vendor's product.  If the data isn't pre-
primed, then ldap response time and throughput will be
artificially degraded until the data is primed into either
the filesystem or directory (db or entry) cache.  Priming
via ldap operations can take many hours or even days 
depending on the number of entries in the directory server.
However, priming the same data via dd takes minutes to hours
depending on the size of the files.  

As you know in benchmarking scenarios, time is the most limited
resource that we typically have.  Thus, priming via dd is much
preferred.

Lastly, in order to achieve optimal use of available RAM, we
use directio for the root (/) and other non-data filesystems.
This makes certain that the only data in the filesystem cache
is the directory data.

 Have you tried just using ZFS and found it doesn't perform as you need 
 or are you assuming it won't because it doesn't have directio ?

We have done extensive testing with ZFS and love it.  The three 
areas lacking for our use cases are as follows:
 * No ability to control what is in cache. e.g. no directio
 * No absolute ability to apply an upper boundary to the amount
   of RAM consumed by ZFS.  I know that the arc cache has a 
   control that seems to work well. However, the arc cache is
   only part of ZFS ram consumption.
 * No ability to rapidly prime the ZFS cache with the data that 
   I want in the cache.

I hope that helps give understanding to where I am coming from!

Brad

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] UFS on zvol Cache Questions...

2008-02-07 Thread Brad Diggs
Hello,

I have a unique deployment scenario where the marriage
of ZFS zvol and UFS seem like a perfect match.  Here are
the list of feature requirements for my use case:

* snapshots
* rollback
* copy-on-write
* ZFS level redundancy (mirroring, raidz, ...)
* compression
* filesystem cache control (control what's in and out)
* priming the filesystem cache (dd if=file of=/dev/null)
* control the upper boundary of RAM consumed by the
  filesystem.  This helps me to avoid contention between
  the filesystem cache and my application.

Before zfs came along, I could achieve all but rollback,
copy-on-write and compression through UFS+some volume manager.

I would like to use ZFS but with ZFS I cannot prime the cache
and I don't have the ability to control what is in the cache 
(e.g. like with the directio UFS option).

If I create a ZFS zvol and format it as a UFS filesystem, it
seems like I get the best of both worlds.  Can anyone poke 
holes in this strategy?

I think the biggest possible risk factor is if the ZFS zvol
still uses the arc cache.  If this is the case, I may be 
double-dipping on the filesystem cache.  e.g. The UFS filesystem
uses some RAM and ZFS zvol uses some RAM for filesystem cache.
Is this a true statement or does the zvol use a minimal amount
of system RAM?

Lastly, if I were to try this scenario, does anyone know how to
monitor the RAM consumed by the zvol and UFS?  e.g. Is there a 
dtrace script for monitoring ZFS or UFS memory consuption?

Thanks in advance,
Brad

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Tunable parameter to zfs memory use

2006-12-24 Thread Brad Diggs
 What would you want to observe if your system hit the upper
 limit in zfs_max_phys_mem?

I would want zfs to behave well and safely like every other app on which you 
apply boundary conditions.  It is the responsibility of zfs to know its 
boundaries and stay within them.  Otherwise, your system may exist only for zfs 
and not for the application or services you wish to run on that system of which 
zfs is just one part.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss