Re: [zfs-discuss] SSD vs hybrid drive - any advice?

2011-07-25 Thread Orvar Korvar
How long have you been using a SSD? Do you see any performance decrease? I 
mean, ZFS does not support TRIM, so I wonder about long term effects...
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD vs hybrid drive - any advice?

2011-07-25 Thread Erik Trimble

On 7/25/2011 3:32 AM, Orvar Korvar wrote:

How long have you been using a SSD? Do you see any performance decrease? I 
mean, ZFS does not support TRIM, so I wonder about long term effects...


Frankly, for the kind of use that ZFS puts on a SSD, TRIM makes no 
impact whatsoever.


TRIM is primarily useful for low-volume changes - that is, for a 
filesystem that generally has few deletes over time (i.e. rate of change 
is low).


Using a SSD as a ZIL or L2ARC device puts a very high write load on the 
device (even as an L2ARC, there is a considerably higher write load than 
a typical filesystem use).   SSDs in such a configuration can't really 
make use of TRIM, and depend on the internal SSD controller block 
re-allocation algorithms to improve block layout.


Now, if you're using the SSD as primary media (i.e. in place of a Hard 
Drive), there is a possibility that TRIM could help.  I honestly can't 
be sure that it would help, however, as ZFS's Copy-on-Write nature means 
that it tends to write entire pages of blocks, rather than just small 
blocks. Which is fine from the SSD's standpoint.



On a related note:  I've been using a OCZ Vertex 2 as my primary drive 
in a laptop, which runs Windows XP (no TRIM support). I haven't noticed 
any dropoff in performance in the year its be in service.  I'm doing 
typical productivity laptop-ish things (no compiling, etc.), so it 
appears that the internal SSD controller is more than smart enough to 
compensate even without TRIM.



Honestly, I think TRIM isn't really useful for anyone.  It took too long 
to get pushed out to the OSes, and the SSD vendors seem to have just 
compensated by making a smarter controller able to do better 
reallocation.  Which, to me, is the better ideal, in any case.


--
Erik Trimble
Java Platform Group Infrastructure
Mailstop:  usca22-317
Phone:  x67195
Santa Clara, CA
Timezone: US/Pacific (UTC-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zil on multiple usb keys

2011-07-25 Thread Darren J Moffat

On 07/23/11 04:57, Michael DeMan wrote:

Generally performance is going to pretty bad as well - USB sticks are
not made to be written too rapidly. They are entirely different animals
than SSDs. I would not be surprised (but would be curious to know if you
still move forward on this) that you will find performance even worse
trying to do this.


Back in the snv_120 ish era I tried this experiement on both my pool and 
on a friends.  In both cases we were serving NFS (he was also doing 
CIFS) which was mostly read but also had periods where 1-2 G of data was 
rapidly added (uploading photos or videos) over the network.


In both the USB flash drive and in the case of a San Disk Extreme IV 
CF card in a CF-IDE enclosure the performance did not improve and in 
fact in the case of the CF card the enclosure was bugging such that the 
changes we had to make to the ata config did actually make it slower.


I removed the separate log device from both of those pools (by manual 
hacking with specially build zfs kernel modules because slog removal 
didn't exist back then.).


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD vs hybrid drive - any advice?

2011-07-25 Thread Tomas Ögren
On 25 July, 2011 - Erik Trimble sent me these 2,0K bytes:

 On 7/25/2011 3:32 AM, Orvar Korvar wrote:
 How long have you been using a SSD? Do you see any performance decrease? I 
 mean, ZFS does not support TRIM, so I wonder about long term effects...

 Frankly, for the kind of use that ZFS puts on a SSD, TRIM makes no  
 impact whatsoever.

 TRIM is primarily useful for low-volume changes - that is, for a  
 filesystem that generally has few deletes over time (i.e. rate of change  
 is low).

 Using a SSD as a ZIL or L2ARC device puts a very high write load on the  
 device (even as an L2ARC, there is a considerably higher write load than  
 a typical filesystem use).   SSDs in such a configuration can't really  
 make use of TRIM, and depend on the internal SSD controller block  
 re-allocation algorithms to improve block layout.

 Now, if you're using the SSD as primary media (i.e. in place of a Hard  
 Drive), there is a possibility that TRIM could help.  I honestly can't  
 be sure that it would help, however, as ZFS's Copy-on-Write nature means  
 that it tends to write entire pages of blocks, rather than just small  
 blocks. Which is fine from the SSD's standpoint.

You still need the flash erase cycle.

 On a related note:  I've been using a OCZ Vertex 2 as my primary drive  
 in a laptop, which runs Windows XP (no TRIM support). I haven't noticed  
 any dropoff in performance in the year its be in service.  I'm doing  
 typical productivity laptop-ish things (no compiling, etc.), so it  
 appears that the internal SSD controller is more than smart enough to  
 compensate even without TRIM.


 Honestly, I think TRIM isn't really useful for anyone.  It took too long  
 to get pushed out to the OSes, and the SSD vendors seem to have just  
 compensated by making a smarter controller able to do better  
 reallocation.  Which, to me, is the better ideal, in any case.

Bullshit. I just got a OCZ Vertex 3, and the first fill was 450-500MB/s.
Second and sequent fills are at half that speed. I'm quite confident
that it's due to the flash erase cycle that's needed, and if stuff can
be TRIM:ed (and thus flash erased as well), speed would be regained.
Overwriting an previously used block requires a flash erase, and if that
can be done in the background when the timing is not critical instead of
just before you can actually write the block you want, performance will
increase.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD vs hybrid drive - any advice?

2011-07-25 Thread Joerg Schilling
Erik Trimble erik.trim...@oracle.com wrote:

 On 7/25/2011 3:32 AM, Orvar Korvar wrote:
  How long have you been using a SSD? Do you see any performance decrease? I 
  mean, ZFS does not support TRIM, so I wonder about long term effects...

 Frankly, for the kind of use that ZFS puts on a SSD, TRIM makes no 
 impact whatsoever.

 TRIM is primarily useful for low-volume changes - that is, for a 
 filesystem that generally has few deletes over time (i.e. rate of change 
 is low).

 Using a SSD as a ZIL or L2ARC device puts a very high write load on the 
 device (even as an L2ARC, there is a considerably higher write load than 
 a typical filesystem use).   SSDs in such a configuration can't really 
 make use of TRIM, and depend on the internal SSD controller block 
 re-allocation algorithms to improve block layout.

 Now, if you're using the SSD as primary media (i.e. in place of a Hard 
 Drive), there is a possibility that TRIM could help.  I honestly can't 
 be sure that it would help, however, as ZFS's Copy-on-Write nature means 
 that it tends to write entire pages of blocks, rather than just small 
 blocks. Which is fine from the SSD's standpoint.

Writing to an SSD is: clear + write + verify

As the SSD cannot know that the rewritten blocks have been unused for a while, 
the SSD cannot handle the clear operation at a time when there is no interest 
in the block, the TRIM command is needed to give this knowledge to the SSD.

Jörg

-- 
 EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
   j...@cs.tu-berlin.de(uni)  
   joerg.schill...@fokus.fraunhofer.de (work) Blog: 
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Summary: Dedup memory and performance (again, again)

2011-07-25 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Ian Collins
 
 Add to that: if running dedup, get plenty of RAM and cache.

Add plenty RAM.  And tweak your arc_meta_limit.  You can at least get dedup
performance that's on the same order of magnitude as performance without
dedup.

Cache devices don't really help dedup very much - Because each DDT stored in
ARC/L2ARC takes 376 bytes, and each reference to an L2ARC entry requires 176
bytes of ARC.  So in order to prevent an individual DDT entry from being
evicted to disk, you must either keep the 376 bytes in ARC, or evict it to
L2ARC and keep 176 bytes.  This is a very small payload.  A good payload
would be to evict a 128k block from ARC into L2ARC, keeping the 176 bytes
only in ARC.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD vs hybrid drive - any advice?

2011-07-25 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Erik Trimble
 
 Honestly, I think TRIM isn't really useful for anyone.  

I'm going to have to disagree.

There are only two times when TRIM isn't useful:
1)  Your demand of the system is consistently so low that it never adds up
to anything meaningful... Basically you always have free unused blocks so
adding more unused blocks to the pile doesn't matter at all, or you never
bother to delete anything...  Or it's just a lightweight server processing
requests where network latency greatly outweighs any disk latency, etc.  AKA
your demand is very low.
or
2)  Your demand of the system is consistently so high that even with TRIM,
the device would never be able to find any idle time to perform an erase
cycle on blocks marked for TRIM.  

In case #2, it is at least theoretically possible for devices to become
smart enough to process the TRIM block erasures in parallel even while there
are other operations taking place simultaneously.  I don't know if device
mfgrs implement things that way today.  There is at least a common
perception (misperception?) that devices cannot process TRIM requests while
they are 100% busy processing other tasks.

Or your disk is always 100% full.  I guess that makes 3 cases, but the 3rd
one is esoteric.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Large scale performance query

2011-07-25 Thread Phil Harrison
Hi All,

Hoping to gain some insight from some people who have done large scale
systems before? I'm hoping to get some performance estimates, suggestions
and/or general discussion/feedback. I cannot discuss the exact specifics of
the purpose but will go into as much detail as I can.

Technical Specs:
216x 3TB 7k3000 HDDs
24x 9 drive RAIDZ3
4x JBOD Chassis (45 bay)
1x server (36 bay)
2x AMD 12 Core CPU
128GB EEC RAM
2x 480GB SSD Cache
10Gbit NIC

Workloads:

Mainly streaming compressed data. That is, pulling compressed data in a
sequential manner however could have multiple streams happening at once
making it somewhat random. We are hoping to have 5 clients pull 500Mbit
sustained.

Considerations:

The main reason RAIDZ3 was chosen was so we can distribute the parity across
the JBOD enclosures. With this method even if an entire JBOD enclosure is
taken offline the data is still accessible.

Questions:

How to manage the physical locations of such a vast number of drives? I have
read this (
http://blogs.oracle.com/eschrock/entry/external_storage_enclosures_in_solaris)
and am hoping some can shed some light if the SES2 enclosure identification
has worked for them? (enclosures are SES2)

What kind of performance would you expect from this setup? I know we can
multiple the base IOPS by 24 but what about max sequential read/write?

Thanks,

Phil
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD vs hybrid drive - any advice?

2011-07-25 Thread Orvar Korvar
There is at least a common perception (misperception?) that devices cannot 
process TRIM requests while they are 100% busy processing other tasks.

Just to confirm; SSD disks can do TRIM while processing other tasks? 

I heard that Illumos is working on TRIM support for ZFS and will release 
something soon. Anyone knows more?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Large scale performance query

2011-07-25 Thread Orvar Korvar
Wow. If you ever finish this monster, I would really like to hear more about 
the performance and how you connected everything. Could be useful as a 
reference for anyone else building big stuff.

*drool*
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Large scale performance query

2011-07-25 Thread Roberto Waltman

Phil Harrison wrote:
 Hi All,

 Hoping to gain some insight from some people who have done large scale
 systems before? I'm hoping to get some performance estimates, suggestions
 and/or general discussion/feedback.

No personal experience, but you may find this useful:
Petabytes on a budget

http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage/

-- 

Roberto Waltman

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS Mount Options

2011-07-25 Thread Tony MacDoodle
I have a zfs pool called logs (about 200G).
I would like to create 2 volumes using this chunk of storage.
However, they would have different mount points.
ie. 50G would be mounted as /oarcle/logs
100G would be mounted as /session/logs

is this possible? Do I have to use the legacy mount options?

Thas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Mount Options

2011-07-25 Thread Andrew Gabriel

Tony MacDoodle wrote:

I have a zfs pool called logs (about 200G).
I would like to create 2 volumes using this chunk of storage.
However, they would have different mount points.
ie. 50G would be mounted as /oarcle/logs
100G would be mounted as /session/logs
 
is this possible?


Yes...

zfs create -o mountpoint=/oracle/logs logs/oracle
zfs create -o mountpoint=/session/logs logs/session

If you don't otherwise specify, the two filesystem will share the pool 
without any constraints.


If you wish to limit their max space...

zfs set quota=50g logs/oracle
zfs set quota=100g logs/session

and/or if you wish to reserve a minimum space...

zfs set reservation=50g logs/oracle
zfs set reservation=100g logs/session


Do I have to use the legacy mount options?


You don't have to.

--
Andrew Gabriel
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Large scale performance query

2011-07-25 Thread Tiernan OToole
they dont go into too much detail on their setup, and they are not running
Solaris, but they do mention how their SATA cards see different drives,
based on where they are placed they also have a second revision at
http://blog.backblaze.com/2011/07/20/petabytes-on-a-budget-v2-0revealing-more-secrets/
which
talks about building their system with 135Tb in a single 45 bay 4U box...

I am also interested in this kind of scale... Looking at the BackBlaze box,
i am thinking of building something like this, but not in one go... so,
anything you do find out in your build, keep us informed! :)

--Tiernan

On Mon, Jul 25, 2011 at 4:25 PM, Roberto Waltman li...@rwaltman.com wrote:


 Phil Harrison wrote:
  Hi All,
 
  Hoping to gain some insight from some people who have done large scale
  systems before? I'm hoping to get some performance estimates, suggestions
  and/or general discussion/feedback.

 No personal experience, but you may find this useful:
 Petabytes on a budget


 http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage/

 --

 Roberto Waltman

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




-- 
Tiernan O'Toole
blog.lotas-smartman.net
www.geekphotographer.com
www.tiernanotoole.ie
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Large scale performance query

2011-07-25 Thread Brandon High
On Sun, Jul 24, 2011 at 11:34 PM, Phil Harrison philha...@gmail.com wrote:

 What kind of performance would you expect from this setup? I know we can
 multiple the base IOPS by 24 but what about max sequential read/write?


You should have a theoretical max close to 144x single-disk throughput. Each
raidz3 has 6 data drives which can be read from simultaneously, multiplied
by your 24 vdevs. Of course, you'll hit your controllers' limits well before
that.

Even with a controller per JBOD, you'll be limited by the SAS connection.
The 7k3000 has throughput from 115 - 150 MB/s, meaning each of your JBODs
will be capable of 5.2 GB/sec - 6.8 GB/sec, roughly 10 times the bandwidth
of a single SAS 6g connection. Use multipathing if you can to increase the
bandwidth to each JBOD.

Depending on the types of access that clients are performing, your cache
devices may not be any help. If the data is read multiple times by multiple
clients, then you'll see some benefit. If it's only being read infrequently
or by one client, it probably won't help much at all. That said, if your
access is mostly sequential then random access latency shouldn't affect you
too much, and you will still have more bandwidth from your main storage
pools than from the cache devices.

-B

-- 
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Large scale performance query

2011-07-25 Thread Roy Sigurd Karlsbakk
 Workloads:
 
 Mainly streaming compressed data. That is, pulling compressed data in
 a sequential manner however could have multiple streams happening at
 once making it somewhat random. We are hoping to have 5 clients pull
 500Mbit sustained.

That shouldn't be much of a problem with that amount of drives. I have a couple 
of smaller setups with 11x7-drive raidz2, about 100TiB each, and even they can 
handle 2,5Gbps load.

 Considerations:
 
 The main reason RAIDZ3 was chosen was so we can distribute the parity
 across the JBOD enclosures. With this method even if an entire JBOD
 enclosure is taken offline the data is still accessible.

Sounds like a good idea to me.

 How to manage the physical locations of such a vast number of drives?
 I have read this (
 http://blogs.oracle.com/eschrock/entry/external_storage_enclosures_in_solaris
 ) and am hoping some can shed some light if the SES2 enclosure
 identification has worked for them? (enclosures are SES2)

Which enclosures will you be using? From the data you've posted, it looks like 
SuperMicro, and AFAIK, the ones we have, don't support SES2.

 What kind of performance would you expect from this setup? I know we
 can multiple the base IOPS by 24 but what about max sequential
 read/write?

Parallell read/write from several clients will look like random I/O on the 
server. If bandwidth is crucial, use RAID1+0.

Also, it looks to me you're planning to fill up all external bays with data 
drives - where do you plan to put the root? If you're looking at the SuperMicro 
SC847 line, there's indeed room for a couple of 2,5 drives inside, but the 
chassis is screwed tightly together and doesn't allow for opening during 
runtime. Also, those drives are placed in a rather awkward slot.

If planning to use RAIDz, a couple of SSDs for the SLOG will help write 
performance a lot especially during scrub/resilver. For streaming, L2ARC won't 
be of much use, though.

Finally, a few spares won't hurt even with redundancy levels as high as RAIDz3. 

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Large scale performance query

2011-07-25 Thread Roy Sigurd Karlsbakk
 Even with a controller per JBOD, you'll be limited by the SAS
 connection. The 7k3000 has throughput from 115 - 150 MB/s, meaning
 each of your JBODs will be capable of 5.2 GB/sec - 6.8 GB/sec, roughly
 10 times the bandwidth of a single SAS 6g connection. Use multipathing
 if you can to increase the bandwidth to each JBOD.

With (something like) LSI 9211 and those supermicro babies I guess he's 
planning on using, you'll have one quad-port SAS2 cable to each backplane/SAS 
expander, one in front and one in the back, meaning theroretical 24Gbps (or 
2,4GBps) to each backplane. With a maximum of 24 drives per back, this should 
probably suffice, since you'll never get 150MB/s sustained from all drives.

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD vs hybrid drive - any advice?

2011-07-25 Thread Erik Trimble

On 7/25/2011 6:43 AM, Edward Ned Harvey wrote:

From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
boun...@opensolaris.org] On Behalf Of Erik Trimble

Honestly, I think TRIM isn't really useful for anyone.

I'm going to have to disagree.

There are only two times when TRIM isn't useful:
1)  Your demand of the system is consistently so low that it never adds up
to anything meaningful... Basically you always have free unused blocks so
adding more unused blocks to the pile doesn't matter at all, or you never
bother to delete anything...  Or it's just a lightweight server processing
requests where network latency greatly outweighs any disk latency, etc.  AKA
your demand is very low.
or
2)  Your demand of the system is consistently so high that even with TRIM,
the device would never be able to find any idle time to perform an erase
cycle on blocks marked for TRIM.

In case #2, it is at least theoretically possible for devices to become
smart enough to process the TRIM block erasures in parallel even while there
are other operations taking place simultaneously.  I don't know if device
mfgrs implement things that way today.  There is at least a common
perception (misperception?) that devices cannot process TRIM requests while
they are 100% busy processing other tasks.

Or your disk is always 100% full.  I guess that makes 3 cases, but the 3rd
one is esoteric.



What I'm saying is that #2 occurs all the time with ZFS, at least as a 
ZIL or L2ARC.  TRIM is really only useful when the SSD has some 
downtime to work.  As a ZIL or L2ARC, the SSD *has* no pauses, and 
can't do GC in the background usefully (which is what TRIM helps).


Instead, what I've seen is that the increased smarts of the new 
generation SSD controllers do a better job of on-the-fly reallocation.



--
Erik Trimble
Java Platform Group Infrastructure
Mailstop:  usca22-317
Phone:  x67195
Santa Clara, CA
Timezone: US/Pacific (UTC-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD vs hybrid drive - any advice?

2011-07-25 Thread Erik Trimble

On 7/25/2011 4:28 AM, Tomas Ögren wrote:

On 25 July, 2011 - Erik Trimble sent me these 2,0K bytes:


On 7/25/2011 3:32 AM, Orvar Korvar wrote:

How long have you been using a SSD? Do you see any performance decrease? I 
mean, ZFS does not support TRIM, so I wonder about long term effects...

Frankly, for the kind of use that ZFS puts on a SSD, TRIM makes no
impact whatsoever.

TRIM is primarily useful for low-volume changes - that is, for a
filesystem that generally has few deletes over time (i.e. rate of change
is low).

Using a SSD as a ZIL or L2ARC device puts a very high write load on the
device (even as an L2ARC, there is a considerably higher write load than
a typical filesystem use).   SSDs in such a configuration can't really
make use of TRIM, and depend on the internal SSD controller block
re-allocation algorithms to improve block layout.

Now, if you're using the SSD as primary media (i.e. in place of a Hard
Drive), there is a possibility that TRIM could help.  I honestly can't
be sure that it would help, however, as ZFS's Copy-on-Write nature means
that it tends to write entire pages of blocks, rather than just small
blocks. Which is fine from the SSD's standpoint.

You still need the flash erase cycle.


On a related note:  I've been using a OCZ Vertex 2 as my primary drive
in a laptop, which runs Windows XP (no TRIM support). I haven't noticed
any dropoff in performance in the year its be in service.  I'm doing
typical productivity laptop-ish things (no compiling, etc.), so it
appears that the internal SSD controller is more than smart enough to
compensate even without TRIM.


Honestly, I think TRIM isn't really useful for anyone.  It took too long
to get pushed out to the OSes, and the SSD vendors seem to have just
compensated by making a smarter controller able to do better
reallocation.  Which, to me, is the better ideal, in any case.

Bullshit. I just got a OCZ Vertex 3, and the first fill was 450-500MB/s.
Second and sequent fills are at half that speed. I'm quite confident
that it's due to the flash erase cycle that's needed, and if stuff can
be TRIM:ed (and thus flash erased as well), speed would be regained.
Overwriting an previously used block requires a flash erase, and if that
can be done in the background when the timing is not critical instead of
just before you can actually write the block you want, performance will
increase.

/Tomas


I should have been more clear:  I consider the native speed of a SSD 
to be that which is obtained AFTER you've filled the entire drive once. 
That is, once you've blown through the extra reserve NAND, and are now 
into the full read/erase/write cycle.   IMHO, that's really what the 
sustained performance of an SSD is, not the bogus numbers reported by 
venders.


TRIM is really only useful for drives which have a low enough load 
factor to do background GC on unused blocks.  For ZFS, that *might* be 
the case when the SSD is used as primary backing store, but certainly 
isn't the case when it's used as ZIL or L2ARC.


Even with TRIM, performance after a complete fill of the SSD will drop 
noticeable, as the SSD has to do GC sometime.  You might not notice it 
right away given your usage pattern, but, with OR without TRIM, a used 
SSD under load will perform the same.


--
Erik Trimble
Java Platform Group Infrastructure
Mailstop:  usca22-317
Phone:  x67195
Santa Clara, CA
Timezone: US/Pacific (UTC-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD vs hybrid drive - any advice?

2011-07-25 Thread Erik Trimble

On 7/25/2011 4:49 AM, joerg.schill...@fokus.fraunhofer.de wrote:

Erik Trimbleerik.trim...@oracle.com  wrote:


On 7/25/2011 3:32 AM, Orvar Korvar wrote:

How long have you been using a SSD? Do you see any performance decrease? I 
mean, ZFS does not support TRIM, so I wonder about long term effects...

Frankly, for the kind of use that ZFS puts on a SSD, TRIM makes no
impact whatsoever.

TRIM is primarily useful for low-volume changes - that is, for a
filesystem that generally has few deletes over time (i.e. rate of change
is low).

Using a SSD as a ZIL or L2ARC device puts a very high write load on the
device (even as an L2ARC, there is a considerably higher write load than
a typical filesystem use).   SSDs in such a configuration can't really
make use of TRIM, and depend on the internal SSD controller block
re-allocation algorithms to improve block layout.

Now, if you're using the SSD as primary media (i.e. in place of a Hard
Drive), there is a possibility that TRIM could help.  I honestly can't
be sure that it would help, however, as ZFS's Copy-on-Write nature means
that it tends to write entire pages of blocks, rather than just small
blocks. Which is fine from the SSD's standpoint.

Writing to an SSD is: clear + write + verify

As the SSD cannot know that the rewritten blocks have been unused for a while,
the SSD cannot handle the clear operation at a time when there is no interest
in the block, the TRIM command is needed to give this knowledge to the SSD.

Jörg


Except in many cases with ZFS, that data is irrelevant by the time it 
can be used, or is much less useful than with other filesystems.  
Copy-on-Write tends to end up with whole SSD pages of blocks being 
rendered unused, rather than individual blocks inside pages.  So, the 
SSD often can avoid the read-erase-modify-write cycle, and just do 
erase-write instead.  TRIM *might* help somewhat when you have a 
relatively quiet ZFS filesystem, but I'm not really convinced of how 
much of a benefit it would be.


As I've mentioned in other posts, ZIL and L2ARC are too hot for TRIM 
to have any noticeable impact - the SSD is constantly being used, and 
has no time for GC.  It's stuck in the read-erase-modify-write cycle 
even with TRIM.


--
Erik Trimble
Java Platform Group Infrastructure
Mailstop:  usca22-317
Phone:  x67195
Santa Clara, CA
Timezone: US/Pacific (UTC-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD vs hybrid drive - any advice?

2011-07-25 Thread Erik Trimble

On 7/25/2011 8:03 AM, Orvar Korvar wrote:

There is at least a common perception (misperception?) that devices cannot process 
TRIM requests while they are 100% busy processing other tasks.

Just to confirm; SSD disks can do TRIM while processing other tasks?

I heard that Illumos is working on TRIM support for ZFS and will release 
something soon. Anyone knows more?
SSDs do Garbage Collection when the controller has spare cycles. I'm not 
certain if there is a time factor (i.e. is it periodic, or just when 
there's time in the controller's queue).   So, theoretically, TRIM helps 
GC when the drive is at low utilization, but not when the SSD is under 
significant load.  Under high load, the SSD doesn't have the luxury of 
searching the NAND for unused blocks, aggregating them, writing them 
to a new page, and then erasing the old location.  It has to allocate 
stuff NOW, so it goes right to the dreaded read-modify-erase-write 
cycle.  Even under high load, the SSD can process the TRIM request 
(i.e. it will mark a block as unused), but that's not useful until a GC 
is performed (unless you are so lucky as to mark an *entire* page as 
unused), so, it doesn't really matter.  The GC run is what fixes the 
NAND allocation, not the TRIM command itself.


I can't speak for the ZFS developers as to TRIM support.  I *believe* 
this would have to happen both at the device level and the filesystem 
level. But, I certainly could be wrong. (IllumOS currently supports TRIM 
in the SATA framework - not sure about the SAS framework)


--
Erik Trimble
Java Platform Group Infrastructure
Mailstop:  usca22-317
Phone:  x67195
Santa Clara, CA
Timezone: US/Pacific (UTC-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss