Re: [zfs-discuss] slog writing patterns vs SSD tech. (was SSD's and ZFS...)

2009-07-24 Thread Kyle McDonald

Bob Friesenhahn wrote:


Of course, it is my understanding that the zfs slog is written 
sequentially so perhaps this applies instead:


Actually, reading up on these drives I've started to wonder about the 
slog writing pattern. While these drives do seem to do a great job at 
random writes, most of the promise shows at sequential writes, so Does 
the slog attempt to write sequentially through the space given to it?


Also there are all sorts of analysis out there about how the drives 
always attempt to write new data to the pages and blocks they know are 
empty since they can't overwrite one page (usually 4k) without erasing 
the whole (512k) block the page is in. This leads to a drop in write 
performance after all the space (both the space you paid for, and any 
extra space the vendor putin to work around this issue) has been used 
once. This shows up in regular filesystems because when a file is 
deleted the drive only sees a new (over)write of some meta-data so the 
OS can record that the file is gone, but the drive is never told that 
the blocks the file was occupying are now free and can be pre-erased at 
the drives convience.


The Drive vendors have come up with a new TRIM command, which some OS's 
(Win7) are talking about supporting in their Filesystems. Obviously for 
use only as an sLog device ZFS itself doesn't need (until people start 
using SSD's as regular pool devices) to know how to use TRIM, but I 
would think that the slog code would need to use it in order to keep 
write speeds up and latencies down. No?


If so, what's the current concensus, thoughts, plans, etc. on if and 
when TRIM will be usable in Solaris/ZFS?


-Kyle


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slog writing patterns vs SSD tech.

2009-07-24 Thread Miles Nordin
 km == Kyle McDonald kmcdon...@egenera.com writes:

km hese drives do seem to do a great job at random writes, most
km of the promise shows at sequential writes, so Does the slog
km attempt to write sequentially through the space given to it?

when writing to the slog, some user-visible application has to wait
for the slog to return that the write was committed to disk.  so
whether it's random or not, you're waiting for it, and io/s translates
closely into latency because the writes cannot be batched with normal
level of aggressiveness (or they should jsut go to the main pool which
will eventually have to handle the entire workload anyway).  io/s is
the number that matters.

``but but but but!''

thwack NO!  Everyone who is using the code, writing the code, and
building the systems says, io/s is the number that matters.  If you've
got some experience otherwise, fine, odd things turn up all the time.
but AFAICT the consensus is clear right now.

km they can't overwrite one page (usually 4k) without erasing the
km whole (512k) block the page is in.

don't presume to get into the business of their black box so far.
That's almost certainly not what they do.  They probably do COW like
ZFS and (yaffs and jffs2 and ubifs), so they will do the 4k writes to
partly-empty pages until the page is full.  In the background a gc
thread will evacuate and rewrite pages that have become spattered with
unreferenced sectors.  They will write to the flash filesystem to keep
track of things about itself, like half-erased cels, toasted cels,
per-cel erase counts.  Then there is probably a defragmenter thread,
or else the gc is itself data-reorganizing.  And there is some lookup
state kept in DRAM during operation, and reconstructed from
post-mortem observation of what's in the FLASH at boot, like with any
filesystem.

Just look at the observed performance on microbenchmarks or in actual
use rather than trying to reverse-reason about these fancy and
otherwise-unobtainable closed-source filesystems, whichis what they
are really selling, in disk/``game cartridge'' form factor.

km The Drive vendors have come up with a new TRIM command, which
km some OS's (Win7) are talking about supporting in their
km Filesystems.

this would be useful for VM's with thin-provisioned disks, too.

km I would think that the slog code would need to use it in order
km to keep write speeds up and latencies down. No?

read the goofy gamer site review please.  No, not with the latest
intel firmware, it's not needed.


pgpSAHGwqAg1s.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slog writing patterns vs SSD tech. (was SSD's and ZFS...)

2009-07-24 Thread Richard Elling


On Jul 24, 2009, at 10:46 AM, Kyle McDonald wrote:


Bob Friesenhahn wrote:


Of course, it is my understanding that the zfs slog is written  
sequentially so perhaps this applies instead:


Actually, reading up on these drives I've started to wonder about  
the slog writing pattern. While these drives do seem to do a great  
job at random writes, most of the promise shows at sequential  
writes, so Does the slog attempt to write sequentially through the  
space given to it?


Short answer is yes. But you can measure it with iopattern.
http://www.richardelling.com/Home/scripts-and-programs-1/iopattern
use the -d option to look at your slog device.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slog writing patterns vs SSD tech.

2009-07-24 Thread Kyle McDonald

Miles Nordin wrote:

km == Kyle McDonald kmcdon...@egenera.com writes:



km hese drives do seem to do a great job at random writes, most
km of the promise shows at sequential writes, so Does the slog
km attempt to write sequentially through the space given to it?


thwack NO!  Everyone who is using the code, writing the code, and
building the systems says, io/s is the number that matters.  If you've
got some experience otherwise, fine, odd things turn up all the time.
but AFAICT the consensus is clear right now.

  
Yeah I know. I get it. I screwed up and used the the wrong term. OK? I 
agree with you.


Still when all the previously erased pages are gone, write latencies go 
up (drastically - in some cases worse than a spinning HD,) and io/s goes 
down. So what I really wanted to get into was the question below.

km they can't overwrite one page (usually 4k) without erasing the
km whole (512k) block the page is in.

don't presume to get into the business of their black box so far.
  

I'm not.

Guys like this are:

http://www.anandtech.com/storage/showdoc.aspx?i=3531p=8

That's almost certainly not what they do.  They probably do COW like
ZFS and (yaffs and jffs2 and ubifs), so they will do the 4k writes to
partly-empty pages until the page is full.  In the background a gc
thread will evacuate and rewrite pages that have become spattered with
unreferenced sectors. 
That's where the problem comes in. They have no knowledge of the upper 
filesystem, and don't know what previously written blocks are still 
referenced. When the OS FS rewrites a directory to remove a pointer to 
the string of blocks the file used to use, and updates it's list of 
which LBA sectors are now free vs. in use, it probably happens pretty 
much exactly like you say.


But that doesn't let the SSD mark the sectors the file used as 
unreferenced, so the gc thread can't evacuate them ahead of time and 
add them to the empty page pool.

km The Drive vendors have come up with a new TRIM command, which
km some OS's (Win7) are talking about supporting in their
km Filesystems.

this would be useful for VM's with thin-provisioned disks, too.
  
True. Keeping or Putting the 'holes' back in the 'holey' disk files when 
the VM frees up space would be very useful.

km I would think that the slog code would need to use it in order
km to keep write speeds up and latencies down. No?

read the goofy gamer site review please.  No, not with the latest
intel firmware, it's not needed.
  
I did read at least one review that compared old and new firmware on the 
Intel M model. In that I'm pretty sure they still saw a performance hit
(in latency) when the entire drive had been written to. It may have 
taken longer to hit, and it may have not been as drastic but it was 
still there.


Which review are you talking about?

So what if Intel has fixed it. Not everyone is going to use the intel 
drives. If the TRIM command (assuming it can help at all) can keep the 
other brands and models performing close to how they performed when new, 
then I'd say it's useful in the ZFS slogs too - Just because one vendor 
might have made it unnecessary, doesn't mean it is for everyone.


Does it?

 -Kyle






___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slog writing patterns vs SSD tech.

2009-07-24 Thread Bob Friesenhahn

On Fri, 24 Jul 2009, Kyle McDonald wrote:


http://www.anandtech.com/storage/showdoc.aspx?i=3531p=8


This an interesting test report.  Something quite interesting for zfs 
is if the write rate is continually high, then the write performance 
will be limited by the FLASH erase performance, regardless of the use 
of something like TRIM.  TRIM only improves write latency in the case 
that the FLASH erase is able to keep ahead of the write rate.  If the 
writes are bottlenecked, then using TRIM is likely to decrease the 
write performance.  If data is written at an almost constant rate, 
then a time may come where the drive suddenly hits the wall and is 
no longer able to erase the data as fast as it comes in.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slog writing patterns vs SSD tech.

2009-07-24 Thread Richard Elling

On Jul 24, 2009, at 2:33 PM, Bob Friesenhahn wrote:


On Fri, 24 Jul 2009, Kyle McDonald wrote:


http://www.anandtech.com/storage/showdoc.aspx?i=3531p=8


This an interesting test report.  Something quite interesting for  
zfs is if the write rate is continually high, then the write  
performance will be limited by the FLASH erase performance,  
regardless of the use of something like TRIM.  TRIM only improves  
write latency in the case that the FLASH erase is able to keep ahead  
of the write rate.  If the writes are bottlenecked, then using TRIM  
is likely to decrease the write performance.  If data is written at  
an almost constant rate, then a time may come where the drive  
suddenly hits the wall and is no longer able to erase the data as  
fast as it comes in.


Yep. Good thing we can zfs add a log to spread the load.

NB zfs add log != zfs attach log
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss