Re: SSD optimizations

2010-12-13 Thread Gordan Bobic

On 12/13/2010 05:11 AM, Sander wrote:

Gordan Bobic wrote (ao):

On 12/12/2010 17:24, Paddy Steed wrote:

In a few weeks parts for my new computer will be arriving. The storage
will be a 128GB SSD. A few weeks after that I will order three large
disks for a RAID array. I understand that BTRFS RAID 5 support will be
available shortly. What is the best possible way for me to get the
highest performance out of this setup. I know of the option to optimize
for SSD's


BTRFS is hardly the best option for SSDs. I typically use ext4
without a journal on SSDs, or ext2 if that is not available.
Journalling causes more writes to hit the disk, which wears out
flash faster. Plus, SSDs typically have much slower writes than
reads, so avoiding writes is a good thing.


Gordan, this you wrote is so wrong I don't even know where to begin.

You'd better google a bit on the subject (ssd, and btrfs on ssd) as much
is written about it already.


I suggest you back your opinion up with some hard data before making 
such statements. Here's a quick test - make an ext2 fs and a btrfs on 
two similar disk partitions (any disk, for the sake of the experiment it 
doesn't have to be an ssd), then check vmstat -d to get a base line. 
Then put the kernel sources on each it, do a full build, then make clean 
and check vmstat -d again. Check the vmstat -d output again. See how 
many writes (sectors) hit the disk with ext2 and how many with btrfs. 
You'll find that there were many more writes with BTRFS. You can't go 
faster when doing more. Journaling is expensive.


Gordan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SSD optimizations

2010-12-13 Thread Peter Harris
On Mon, Dec 13, 2010 at 4:25 AM, Gordan Bobic wrote:
 I suggest you back your opinion up with some hard data before making such
 statements. Here's a quick test - make an ext2 fs and a btrfs on two similar
 disk partitions (any disk, for the sake of the experiment it doesn't have to
 be an ssd)

Okay, here's some hard data.

Acer Aspire One ZG5 with an SSDPAMM0008G1 (cheap/slow) SSD, Fedora 13.

Doing a standard yum update, measuring the yum cleanup phase while
browsing with Firefox:

Default extN: machine becomes completely unusable for minutes.
btrfs with ssd_spread: machine functions normally, cleanup finishes in
(often much) under 15 seconds.

Regardless of what vmstat says, btrfs is clearly faster on this hardware.

Peter Harris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SSD optimizations

2010-12-13 Thread Gordan Bobic

On 13/12/2010 14:33, Peter Harris wrote:

On Mon, Dec 13, 2010 at 4:25 AM, Gordan Bobic wrote:

I suggest you back your opinion up with some hard data before making such
statements. Here's a quick test - make an ext2 fs and a btrfs on two similar
disk partitions (any disk, for the sake of the experiment it doesn't have to
be an ssd)


Okay, here's some hard data.

Acer Aspire One ZG5 with an SSDPAMM0008G1 (cheap/slow) SSD, Fedora 13.

Doing a standard yum update, measuring the yum cleanup phase while
browsing with Firefox:

Default extN: machine becomes completely unusable for minutes.
btrfs with ssd_spread: machine functions normally, cleanup finishes in
(often much) under 15 seconds.

Regardless of what vmstat says, btrfs is clearly faster on this hardware.


extN is too broad. ext2, ext3, or ext4? If ext4, with journal or 
without? I am talking specifically about extN _without_ a journal. I use 
ext2 and ext4-without-a-journal on all my cheap flash (mostly SD/CF 
cards and USB sticks) with a deadline scheduler and I have not observed 
any massive slowdown like you describe.


Either way, there is also the longevity of the flash to be considered, 
and vmstat's write reading is very indicative of that.


Gordan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SSD optimizations

2010-12-13 Thread cwillu
On Mon, Dec 13, 2010 at 3:25 AM, Gordan Bobic gor...@bobich.net wrote:
 On 12/13/2010 05:11 AM, Sander wrote:

 Gordan Bobic wrote (ao):

 On 12/12/2010 17:24, Paddy Steed wrote:

 In a few weeks parts for my new computer will be arriving. The storage
 will be a 128GB SSD. A few weeks after that I will order three large
 disks for a RAID array. I understand that BTRFS RAID 5 support will be
 available shortly. What is the best possible way for me to get the
 highest performance out of this setup. I know of the option to optimize
 for SSD's

 BTRFS is hardly the best option for SSDs. I typically use ext4
 without a journal on SSDs, or ext2 if that is not available.
 Journalling causes more writes to hit the disk, which wears out
 flash faster. Plus, SSDs typically have much slower writes than
 reads, so avoiding writes is a good thing.

 Gordan, this you wrote is so wrong I don't even know where to begin.

 You'd better google a bit on the subject (ssd, and btrfs on ssd) as much
 is written about it already.

 I suggest you back your opinion up with some hard data before making such
 statements. Here's a quick test - make an ext2 fs and a btrfs on two similar
 disk partitions (any disk, for the sake of the experiment it doesn't have to
 be an ssd), then check vmstat -d to get a base line. Then put the kernel
 sources on each it, do a full build, then make clean and check vmstat -d
 again. Check the vmstat -d output again. See how many writes (sectors) hit
 the disk with ext2 and how many with btrfs. You'll find that there were many
 more writes with BTRFS. You can't go faster when doing more. Journaling is
 expensive.

Of course.  But that applies to rotating media as well (where the
seeks involved hurt much more), and has little if anything to do with
why you would use btrfs instead of ext2.

Good ssd drives (by which I mean anything but consumer flash as it
exists on sd cards and usb sticks) have very good wear leveling, good
enough that you could overwrite the same logical sector billions of
times before you'd experience any failure due to wear.  The issues
with cheaper ssd drives (which I distinguish from things like sd
cards) are uniformly performance degredation due to crappy garbage
collection and lack of trim support to compensate.  A journal is _not_
a problem here.

On crappy flash, yes, you want to avoid a journal, mainly because the
write leveling for a given sector only occurs over a fixed small
number of erase blocks, resulting in a filesystem that you can burn
out quite easily — I have a small pile of sd cards on my desk that I
sent to such a fate.  Even here there is reason to use btrfs.  The
journaling performed is much less strenuous that ext3/4:  it's
basically just a version stamp, as opposed to actually journaling the
metadata involved.  The actual metadata writes, being copy-on-write,
provide pretty much the best case for crappy flash, as cow inherently
wear-levels over the entire device (ssd_spread).  To say nothing of
checksums and duplicated metadata, allowing you to actually determine
if you're running into corrupted metadata, and often recover from it
transparently.  Ext2's behavior in this respect is less than ideal.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SSD optimizations

2010-12-13 Thread Gordan Bobic

On 13/12/2010 15:17, cwillu wrote:


In a few weeks parts for my new computer will be arriving. The storage
will be a 128GB SSD. A few weeks after that I will order three large
disks for a RAID array. I understand that BTRFS RAID 5 support will be
available shortly. What is the best possible way for me to get the
highest performance out of this setup. I know of the option to optimize
for SSD's


BTRFS is hardly the best option for SSDs. I typically use ext4
without a journal on SSDs, or ext2 if that is not available.
Journalling causes more writes to hit the disk, which wears out
flash faster. Plus, SSDs typically have much slower writes than
reads, so avoiding writes is a good thing.


Gordan, this you wrote is so wrong I don't even know where to begin.

You'd better google a bit on the subject (ssd, and btrfs on ssd) as much
is written about it already.


I suggest you back your opinion up with some hard data before making such
statements. Here's a quick test - make an ext2 fs and a btrfs on two similar
disk partitions (any disk, for the sake of the experiment it doesn't have to
be an ssd), then check vmstat -d to get a base line. Then put the kernel
sources on each it, do a full build, then make clean and check vmstat -d
again. Check the vmstat -d output again. See how many writes (sectors) hit
the disk with ext2 and how many with btrfs. You'll find that there were many
more writes with BTRFS. You can't go faster when doing more. Journaling is
expensive.


Of course.  But that applies to rotating media as well (where the
seeks involved hurt much more), and has little if anything to do with
why you would use btrfs instead of ext2.


Indeed - btrfs is about features, most specifically the chesumming that 
allows smart recovery from disk media failure. But on flash, write 
volumes are something that shouldn't be ignored.



Good ssd drives (by which I mean anything but consumer flash as it
exists on sd cards and usb sticks) have very good wear leveling, good
enough that you could overwrite the same logical sector billions of
times before you'd experience any failure due to wear.


It comes down to volumes even in the best case scenario. A _very_ good 
SSD (e.g. Intel) might get write amplification down to about 1.2:1, but 
more typical figures are in the region of 10-20:1. Every write that can 
be avoided, should be avoided.



The issues
with cheaper ssd drives (which I distinguish from things like sd
cards) are uniformly performance degredation due to crappy garbage
collection and lack of trim support to compensate.  A journal is _not_
a problem here.


The journal doesn't help. It can cause more than a 50% overhead on 
metadata-heavy operations.



On crappy flash, yes, you want to avoid a journal, mainly because the
write leveling for a given sector only occurs over a fixed small
number of erase blocks, resulting in a filesystem that you can burn
out quite easily — I have a small pile of sd cards on my desk that I
sent to such a fate.  Even here there is reason to use btrfs.  The
journaling performed is much less strenuous that ext3/4:  it's
basically just a version stamp, as opposed to actually journaling the
metadata involved.  The actual metadata writes, being copy-on-write,
provide pretty much the best case for crappy flash, as cow inherently
wear-levels over the entire device (ssd_spread).  To say nothing of
checksums and duplicated metadata, allowing you to actually determine
if you're running into corrupted metadata, and often recover from it
transparently.  Ext2's behavior in this respect is less than ideal.


I'm not disputing that, but the OP was talking about using the SSD as a 
cache for a slower disk subsystem. That is likely to waste the SSD 
pretty quickly purely by volume of writes, regardless of how good the 
wear leveling is. That may be fine on a setup where the SSD is treated 
as disposable throw-away cache item that doesn't lose you data when it 
goes wrong, but what was being discussed isn't an expensive enterprise 
grade setup that behaves that way.


Gordan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SSD optimizations

2010-12-13 Thread Paddy Steed

Thank you for all your replies.
On Mon, 2010-12-13 at 00:04 +, Gordan Bobic wrote:
 On 12/12/2010 17:24, Paddy Steed wrote:
  In a few weeks parts for my new computer will be arriving. The storage
  will be a 128GB SSD. A few weeks after that I will order three large
  disks for a RAID array. I understand that BTRFS RAID 5 support will be
  available shortly. What is the best possible way for me to get the
  highest performance out of this setup. I know of the option to optimize
  for SSD's
 
 BTRFS is hardly the best option for SSDs. I typically use ext4 without a 
 journal on SSDs, or ext2 if that is not available. Journalling causes 
 more writes to hit the disk, which wears out flash faster. Plus, SSDs 
 typically have much slower writes than reads, so avoiding writes is a 
 good thing. AFAIK there is no way to disable journaling on BTRFS.

My write speed is similar to the read speed (OCZ Vertex 128GB) and it
also comes with a waranty that runs out after the drive will be
obsolete. Using up flash cycles is not an issue for me.

  but wont that affect all the drives in the array, not to
  mention having the SSD in the raid array will make the usable size much
  smaller as RAID 5 goes by the smallest disk.
 
 If you are talking about BTRFS' parity RAID implementation, it is hard 
 to comment in any way on it before it has actually been implemented. 
 Especially if you are looking for something stable for production use, 
 you should probably avoid features that immature.

I would take images until I felt it was stable every day. I spoke to
`cmasion' who has now finished fsck and is working on RAID 5 fully.

  Is there a way to use it as
  a cache the works even on power down.
 
 You want to use the SSD as a _write_ cache? That doesn't sound too 
 sensible at all.

As previously stated, wear is not an issue.

 What you are looking for is hierarchical/tiered storage. I am not aware 
 of existance of such a thing for Linux. BTRFS has no feature for it. You 
 might be able to cobble up a solution that uses aufs or mhddfs (both 
 fuse based) with some cron jobs to shift most recently used files to 
 your SSD, but the fuse overheads will probably limit the usefulness of 
 this approach.
 
  My current plan is to have
  the /tmp directory in RAM on tmpfs
 
 Ideally, quite a lot should really be on tmpfs, in addition to /tmp and 
 /var/tmp.
 Have a look at my patches here:
 https://bugzilla.redhat.com/show_bug.cgi?id=223722
 
 My motivation for this was mainly to improve performance on slow flash 
 (when running off a USB stick or an SD card), but it also removes the 
 most write-heavy things off the flash and into RAM. Less flash wear and 
 more speed.
 
 If you are putting a lot onto tmpfs, you may also want to look at the 
 compcache project which provides a compressed swap RAM disk. Much faster 
 than actual swap - to the point where it actually makes swapping feasible.
 
  the /boot directory on a dedicated
  partition on the SSD along with a 12GB swap partition also on the SSD
  with the rest of the space (on the SSD) available as a cache.
 
 Swap on SSD is generally a bad idea. If your machine starts swapping 
 it'll grind to a halt anyway, regardless of whether it's swapping to 
 SSD, and heavy swapping to SSD will just kill the flash prematurely.
 
  The three
  mechanical hard drives will be on a RAID 5 array using BTRFS. Can anyone
  suggest any improvements to my plan and also how to implement the cache?
 
 A very soft solution using aufs and cron jobs for moving things with 
 the most recent atime to the SSD is probably as good as it's going to 
 get at the moment, but bear in mind that fuse overheads will probably 
 offset any performance benefit you gain from the SSD. You could get 
 clever and instead of just using atime set up inotify logging and put 
 the most frequently (as opposed to most recently) accessed files onto 
 your SSD. This would, in theory, give you more benefit. You also have to 
 bear in mind that the most frequently accessed files will be cached in 
 RAM anyway, so your pre-caching onto SSD is only really going to be 
 relevant when your working set size is considerably bigger than your RAM 
 - at which point your performance is going to take a significant 
 nosedive anyway (especially if you then hit a fuse file system).
 
 In either case, you should not put the frequently written files onto 
 flash (recent mtime).
 
 Also note that RAID5 is potentially very slow on writes, especially 
 small writes. It is also unsuitable for arrays over about 4TB (usable) 
 in size for disk reliability reasons.
 
 Gordan

So, no-one has any idea's on how to implement the cache. Would making it
all swap work, does to OS cache files in swap?


signature.asc
Description: This is a digitally signed message part


Re: SSD optimizations

2010-12-13 Thread Gordan Bobic

On 13/12/2010 17:17, Paddy Steed wrote:


So, no-one has any idea's on how to implement the cache. Would making it
all swap work, does to OS cache files in swap?


No, it doesn't. I don't believe there are any plans to implement 
hierarchical storage in BTRFS, but perhaps one of the developers can 
confirm or deny that. As for how to do it - I don't have any ideas other 
than what I mentioned earlier (aufs to overlay file systems and cron 
jobs to rotate things in and out of cache).


Gordan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SSD optimizations

2010-12-13 Thread Tomasz Torcz
On Mon, Dec 13, 2010 at 05:17:51PM +, Paddy Steed wrote:
 So, no-one has any idea's on how to implement the cache. Would making it
 all swap work, does to OS cache files in swap?

  Quite the opposite.  Too many people have ideas for SSD-as-cache in Linux,
in non particular order:
— bcache
— cleancache 
— btrfs temperature tracking
— dm-hstore
— dm-cache / flashcache

  Patches are in various states of implementation, some with explicit btrfs
support.  There's no clear winner at this time, but some of above solutions
are shipped in distro kernels.

-- 
Tomasz Torcz God, root, what's the difference?
xmpp: zdzich...@chrome.pl God is more forgiving.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SSD optimizations

2010-12-13 Thread Ric Wheeler

On 12/13/2010 01:20 PM, Tomasz Torcz wrote:

On Mon, Dec 13, 2010 at 05:17:51PM +, Paddy Steed wrote:

So, no-one has any idea's on how to implement the cache. Would making it
all swap work, does to OS cache files in swap?

   Quite the opposite.  Too many people have ideas for SSD-as-cache in Linux,
in non particular order:
— bcache
— cleancache
— btrfs temperature tracking
— dm-hstore
— dm-cache / flashcache

   Patches are in various states of implementation, some with explicit btrfs
support.  There's no clear winner at this time, but some of above solutions
are shipped in distro kernels.



People are working on quite a few ways that btrfs can leverage SSD devices. One 
technique would be to use the SSD as a block level cache, another would be to 
steer all metadata to the SSD (leaving your bulk data on normal drives).


ric


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


SSD optimizations

2010-12-12 Thread Paddy Steed
In a few weeks parts for my new computer will be arriving. The storage
will be a 128GB SSD. A few weeks after that I will order three large
disks for a RAID array. I understand that BTRFS RAID 5 support will be
available shortly. What is the best possible way for me to get the
highest performance out of this setup. I know of the option to optimize
for SSD's, but wont that affect all the drives in the array, not to
mention having the SSD in the raid array will make the usable size much
smaller as RAID 5 goes by the smallest disk. Is there a way to use it as
a cache the works even on power down. My current plan is to have
the /tmp directory in RAM on tmpfs, the /boot directory on a dedicated
partition on the SSD along with a 12GB swap partition also on the SSD
with the rest of the space (on the SSD) available as a cache. The three
mechanical hard drives will be on a RAID 5 array using BTRFS. Can anyone
suggest any improvements to my plan and also how to implement the cache?


signature.asc
Description: This is a digitally signed message part


Re: SSD optimizations

2010-12-12 Thread Gordan Bobic

On 12/12/2010 17:24, Paddy Steed wrote:

In a few weeks parts for my new computer will be arriving. The storage
will be a 128GB SSD. A few weeks after that I will order three large
disks for a RAID array. I understand that BTRFS RAID 5 support will be
available shortly. What is the best possible way for me to get the
highest performance out of this setup. I know of the option to optimize
for SSD's


BTRFS is hardly the best option for SSDs. I typically use ext4 without a 
journal on SSDs, or ext2 if that is not available. Journalling causes 
more writes to hit the disk, which wears out flash faster. Plus, SSDs 
typically have much slower writes than reads, so avoiding writes is a 
good thing. AFAIK there is no way to disable journaling on BTRFS.



but wont that affect all the drives in the array, not to
mention having the SSD in the raid array will make the usable size much
smaller as RAID 5 goes by the smallest disk.


If you are talking about BTRFS' parity RAID implementation, it is hard 
to comment in any way on it before it has actually been implemented. 
Especially if you are looking for something stable for production use, 
you should probably avoid features that immature.



Is there a way to use it as
a cache the works even on power down.


You want to use the SSD as a _write_ cache? That doesn't sound too 
sensible at all.


What you are looking for is hierarchical/tiered storage. I am not aware 
of existance of such a thing for Linux. BTRFS has no feature for it. You 
might be able to cobble up a solution that uses aufs or mhddfs (both 
fuse based) with some cron jobs to shift most recently used files to 
your SSD, but the fuse overheads will probably limit the usefulness of 
this approach.



My current plan is to have
the /tmp directory in RAM on tmpfs


Ideally, quite a lot should really be on tmpfs, in addition to /tmp and 
/var/tmp.

Have a look at my patches here:
https://bugzilla.redhat.com/show_bug.cgi?id=223722

My motivation for this was mainly to improve performance on slow flash 
(when running off a USB stick or an SD card), but it also removes the 
most write-heavy things off the flash and into RAM. Less flash wear and 
more speed.


If you are putting a lot onto tmpfs, you may also want to look at the 
compcache project which provides a compressed swap RAM disk. Much faster 
than actual swap - to the point where it actually makes swapping feasible.



the /boot directory on a dedicated
partition on the SSD along with a 12GB swap partition also on the SSD
with the rest of the space (on the SSD) available as a cache.


Swap on SSD is generally a bad idea. If your machine starts swapping 
it'll grind to a halt anyway, regardless of whether it's swapping to 
SSD, and heavy swapping to SSD will just kill the flash prematurely.



The three
mechanical hard drives will be on a RAID 5 array using BTRFS. Can anyone
suggest any improvements to my plan and also how to implement the cache?


A very soft solution using aufs and cron jobs for moving things with 
the most recent atime to the SSD is probably as good as it's going to 
get at the moment, but bear in mind that fuse overheads will probably 
offset any performance benefit you gain from the SSD. You could get 
clever and instead of just using atime set up inotify logging and put 
the most frequently (as opposed to most recently) accessed files onto 
your SSD. This would, in theory, give you more benefit. You also have to 
bear in mind that the most frequently accessed files will be cached in 
RAM anyway, so your pre-caching onto SSD is only really going to be 
relevant when your working set size is considerably bigger than your RAM 
- at which point your performance is going to take a significant 
nosedive anyway (especially if you then hit a fuse file system).


In either case, you should not put the frequently written files onto 
flash (recent mtime).


Also note that RAID5 is potentially very slow on writes, especially 
small writes. It is also unsuitable for arrays over about 4TB (usable) 
in size for disk reliability reasons.


Gordan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SSD optimizations

2010-12-12 Thread Sander
Gordan Bobic wrote (ao):
 On 12/12/2010 17:24, Paddy Steed wrote:
 In a few weeks parts for my new computer will be arriving. The storage
 will be a 128GB SSD. A few weeks after that I will order three large
 disks for a RAID array. I understand that BTRFS RAID 5 support will be
 available shortly. What is the best possible way for me to get the
 highest performance out of this setup. I know of the option to optimize
 for SSD's
 
 BTRFS is hardly the best option for SSDs. I typically use ext4
 without a journal on SSDs, or ext2 if that is not available.
 Journalling causes more writes to hit the disk, which wears out
 flash faster. Plus, SSDs typically have much slower writes than
 reads, so avoiding writes is a good thing.

Gordan, this you wrote is so wrong I don't even know where to begin.

You'd better google a bit on the subject (ssd, and btrfs on ssd) as much
is written about it already.

Sander

-- 
Humilis IT Services and Solutions
http://www.humilis.net
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SSD Optimizations

2010-03-13 Thread Stephan von Krawczynski
On Thu, 11 Mar 2010 13:00:17 -0500
Chris Mason chris.ma...@oracle.com wrote:

 On Thu, Mar 11, 2010 at 06:35:06PM +0100, Stephan von Krawczynski wrote:
  On Thu, 11 Mar 2010 15:39:05 +0100
  Sander san...@humilis.net wrote:
  
   Stephan von Krawczynski wrote (ao):
Honestly I would just drop the idea of an SSD option simply because the
vendors implement all kinds of neat strategies in their devices. So in 
the end
you cannot really tell if the option does something constructive and not
destructive in combination with a SSD controller.
   
   My understanding of the ssd mount option is also that the fs doens't try
   to do all kinds of smart (and potential expensive) things which make
   sense for rotating media to reduce seeks and the like.
   
 Sander
  
  Such an optimization sounds valid on first sight. But re-think closely: how
  does the fs really know about seeks needed during some operation?
 
 Well the FS makes a few assumptions (in the nonssd case).  First it
 assumes the storage is not a memory device.  If things would fit in
 memory we wouldn't need filesytems in the first place.

Ok, here is the bad news. This assumption everything from right to completely
wrong, and you cannot really tell the mainstream answer.
Two examples from opposite parts of the technology world:
- History: way back in the 80's there was a 3rd party hardware for C=1541
(floppy drive for C=64) that read in the complete floppy and served all
incoming requests from the ram buffer. So your assumption can already be wrong
for a trivial floppy drive from ancient times.
- Nowadays: being a linux installation today chances are that the matrix has
you. Quite a lot of installations are virtualized. So your storage is a virtual
one either, which means it is likely being a fs buffer from the host system,
i.e. RAM.
And sorry to say: if things would fit in memory you probably still need a fs
simply because there is no actual way to organize data (be it executable or
not) in RAM without a fs layer. You can't save data without an abstract file
data type. To have one accessible you need a fs.
Btw the other way round is as interesting: there is currently no fs for linux
that knows how to execute in place. Meaning if you really had only RAM and you
have a fs to organize your data it would be just logical to have ways to _not_
load data (in other parts of the RAM), but to use it in its original storage
(RAM-)space. 

 Then it assumes that adjacent blocks are cheap to read and blocks that
 are far away are expensive to read.  Given expensive raid controllers,
 cache, and everything else, you're correct that sometimes this
 assumption is wrong.

As already mentioned this assumption may be completely wrong even without a
raid controller, being within a virtual environment. Even far away blocks can
be one byte away in the next fs buffer of the underlying host fs (assuming
your device is in fact a file on the host;-).

  But, on average seeking hurts.  Really a lot.

Yes, seeking hurts. But there is no way to know if there is seeking at all.
On the other hand, if your storage is a netblock device seeking on the server
is probably your smallest problem, compared to the network latency in between.
 
 We try to organize files such that files that are likely to be read
 together are found together on disk.  Btrfs is fairly good at this
 during file creation and not as good as ext*/xfs as files over
 overwritten and modified again and again (due to cow).

You are basically saying that btrfs perfectly organizes write-once devices ;-)

 If you turn mount -o ssd on for your drive and do a test, you might not
 notice much difference right away.  ssds tend to be pretty good right
 out of the box.  Over time it tends to help, but it is a very hard thing
 to benchmark in general.

Honestly, this sounds like I give up to me ;-)
You just said that generally it is very hard to benchmark. Which means
nobody can see or feel it in real world in non-tech language.

Please understand that I am the last one critizing your and others' brillant
work and the time you spend for btrfs. Only I do believe that if you spent one
hour on some fs like glusterfs for every 10 hours you spend on btrfs you would
be both king and queen for the linux HA community :-)
(but probably unemployed, so I can't really beat you for it)
 
 -chris

-- 
Regards,
Stephan

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SSD Optimizations

2010-03-13 Thread Hubert Kario
On Saturday 13 March 2010 18:02:10 Stephan von Krawczynski wrote:
 On Fri, 12 Mar 2010 17:00:08 +0100
 Hubert Kario h...@qbs.com.pl wrote:
   Even on true
   spinning disks your assumption is wrong for relocated sectors.
  
  Which we don't have to worry about because if the drive has less than 5
  of 'em, the impact of hitting them is marginal and if there are more,
  the user has much more pressing problem than the performance of the
  drive or FS.
 
 Are you really sure that a drive firmware tells you about the true number
 of relocated sectors? I mean if it makes the product look better in
 comparison to another product, are you really sure that the firmware will
 not tell you what you expect to see only to make you content and happy
 with your drive?

because Joe Sixpack reads SMART values, and even if he does, he will be much 
more angry when a drive that has no or few relocations fails, that when a 
drive that reports that's failing fails.

If the drive arrives with badsectors, it goes where it came from the same day 
if it meets an IT guy worth its salt, any IT guy knows that some HDDs develop 
badsectors no matter the make and model, but if they do, you replace them.

And as the Google disk survey showed, the SMART has very high percentage of 
Type I errors, but very few Type II errors.

But we're off-topic here

   Which
   basically means that every disk controller firmware fiddles around with
   the physical layout since decades. Please accept that you cannot do a
   disks' job in FS. The more advanced technology gets the more disks
   become black boxes with a defined software interface. Use this
   interface and drop the idea of having inside knowledge of such a
   device. That's other peoples' work. If you want to design smart SSD
   controllers hire at a company that builds those.
  
  And I don't think that doing disks' job in the FS is good idea, but I
  think that we should be able to minimise the impact of the translation
  layer.
  
  The way to do this, is to threat the device as a block device with
  sectors the size of erase-blocks. That's nothing too fancy, don't you
  think?
 
 I don't believe anyone is able to tell the size of erase-blocks of some
 device - current and future - for sure.

Well, if the engeneer that designed it doesn't know this, I don't know how he 
got his degree.

Just because it isn't publicised now, doesn't mean it won't be in near future.

Besides that, to detect how big the erase-blocks are in size is easy, if they 
have any impact on the performance, if they don't have any impact (whatever 
the reason) tunning for their size is pointless anyway. 

 I do believe that making this
 guess only reduces the future design options for new devices - if its
 creators care at all about your guess.

Did I, or any one else, say that we want to hardwire a specific erase-block 
size to the design of the FS?! That would be utter stupidity!

 Why not let the fs designer take his creative options in fs layer and let
 the device designer use his brain on the device level and all meet at the
 predefined software interface in between - and nowhere _else_.

We (well, at least Gordon and I) just want a stripe_width option added to 
the mkfs.btrfs, just like it is there for ext2/3/4, reiserfs, xfs and jfs to 
name a few. It would need very few additional tweaks to make it SSD friendly, 
hardly any considering how -o ssd or -o ssd_spread already work.

You're forgetting there's an elephant in the room that won't to talk to 
devices that don't have sectors 512B in size. If not for it, there wouldn't 
even _be_ SSDs with 512B sectors.

It's not the way Flash memory works.

The 512B abstraction is there to be compatible, to work with one current OS, 
it's not there because it describes better the way Flash memory works or is 
the best way to address the data on the device itself.

There are already consumer HDDs with 4kiB sector size, so the situation is  
getting better. We can only hope that in few years time the SSDs will have 
sectors the size of erase-blocks. But in the mean time, stripe_width would be 
enough.


Besides, the stripe_width option will be not only useful for the SSDs but also 
in environments where btrfs is on a device that is a RAID5/6 array 
(reconfiguring a server with many virtual machines is far from easy and 
sometimes just can't be done because of heterogeneous virtualised OSs that 
need the data protection provided by lower layers).

-- 
Hubert Kario
QBS - Quality Business Software
ul. Ksawerów 30/85
02-656 Warszawa
POLAND
tel. +48 (22) 646-61-51, 646-74-24
fax +48 (22) 646-61-50
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SSD Optimizations

2010-03-13 Thread Chris Mason
On Sat, Mar 13, 2010 at 05:43:59PM +0100, Stephan von Krawczynski wrote:
 On Thu, 11 Mar 2010 13:00:17 -0500
 Chris Mason chris.ma...@oracle.com wrote:
 
  On Thu, Mar 11, 2010 at 06:35:06PM +0100, Stephan von Krawczynski wrote:
   On Thu, 11 Mar 2010 15:39:05 +0100
   Sander san...@humilis.net wrote:
   
Stephan von Krawczynski wrote (ao):
 Honestly I would just drop the idea of an SSD option simply because 
 the
 vendors implement all kinds of neat strategies in their devices. So 
 in the end
 you cannot really tell if the option does something constructive and 
 not
 destructive in combination with a SSD controller.

My understanding of the ssd mount option is also that the fs doens't try
to do all kinds of smart (and potential expensive) things which make
sense for rotating media to reduce seeks and the like.

Sander
   
   Such an optimization sounds valid on first sight. But re-think closely: 
   how
   does the fs really know about seeks needed during some operation?
  
  Well the FS makes a few assumptions (in the nonssd case).  First it
  assumes the storage is not a memory device.  If things would fit in
  memory we wouldn't need filesytems in the first place.
 
 Ok, here is the bad news. This assumption everything from right to completely
 wrong, and you cannot really tell the mainstream answer.
 Two examples from opposite parts of the technology world:
 - History: way back in the 80's there was a 3rd party hardware for C=1541
 (floppy drive for C=64) that read in the complete floppy and served all
 incoming requests from the ram buffer. So your assumption can already be wrong
 for a trivial floppy drive from ancient times.

Agreed, I'll try my best not to tune btrfs for trivial floppies from
ancient times ;)

  Then it assumes that adjacent blocks are cheap to read and blocks that
  are far away are expensive to read.  Given expensive raid controllers,
  cache, and everything else, you're correct that sometimes this
  assumption is wrong.
 
 As already mentioned this assumption may be completely wrong even without a
 raid controller, being within a virtual environment. Even far away blocks can
 be one byte away in the next fs buffer of the underlying host fs (assuming
 your device is in fact a file on the host;-).

Ok, there are roughly three environments at play here.

1) Seeking hurts, and you have no idea if adjacent block numbers are
close together on the device.

2) Seeking doesn't hurt and you have no idea if adjacent block numbers
are close together on the device. (SSD).

3) Seeking hurts and you can assume adjacent block numbers are close
together on the device (disks).

Type one is impossible to tune, and so it isn't interesting in this
discussion.  There are an infinite number of ways to actually store data
you care about, and just because one of those ways can't be tuned
doesn't mean we should stop trying to tune for the ones that most people
actually use.

 
   But, on average seeking hurts.  Really a lot.
 
 Yes, seeking hurts. But there is no way to know if there is seeking at all.
 On the other hand, if your storage is a netblock device seeking on the server
 is probably your smallest problem, compared to the network latency in between.
  

Very true, and if I were using such a setup in performance critical
applications, I would:

1) Tune the network so that seeks mattered again
2) Tune the seeks.

  We try to organize files such that files that are likely to be read
  together are found together on disk.  Btrfs is fairly good at this
  during file creation and not as good as ext*/xfs as files over
  overwritten and modified again and again (due to cow).
 
 You are basically saying that btrfs perfectly organizes write-once devices ;-)

Storage is all about trade offs, and optimizing read access for write
once vs write many is a very different thing.  It's surprising how many
of your files are written once and never read, let alone written and
then never changed.

 
  If you turn mount -o ssd on for your drive and do a test, you might not
  notice much difference right away.  ssds tend to be pretty good right
  out of the box.  Over time it tends to help, but it is a very hard thing
  to benchmark in general.
 
 Honestly, this sounds like I give up to me ;-)
 You just said that generally it is very hard to benchmark. Which means
 nobody can see or feel it in real world in non-tech language.

No, it just means it is hard to benchmark.   SSDs, even really good
ssds, are not deterministic.  Sometimes they are faster than others and
the history of how you've abused it in the past factors into how well it
performs in the future.

A simple graph that talks about the performance of one drive in one
workload needs a lot of explanation.

 
 Please understand that I am the last one critizing your and others' brillant
 work and the time you spend for btrfs. Only I do believe that if you spent one
 hour on some fs like glusterfs 

Re: SSD Optimizations

2010-03-13 Thread Jeremy Fitzhardinge

On 03/13/2010 08:43 AM, Stephan von Krawczynski wrote:

- Nowadays: being a linux installation today chances are that the matrix has
you. Quite a lot of installations are virtualized. So your storage is a virtual
one either, which means it is likely being a fs buffer from the host system,
i.e. RAM.
   


That would be a strictly amateur-hour implementation.  It is very 
important for data integrity that at least all writes are synchronous, 
and ideally all IO should be uncached in the host.  In that case the 
performance of the guest's virtual IO device will be broadly similar to 
a real hardware device.


J
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SSD Optimizations

2010-03-12 Thread Stephan von Krawczynski
On Fri, 12 Mar 2010 02:07:40 +0100
Hubert Kario h...@qbs.com.pl wrote:

  [...]
  If the FS were to be smart and know about the 256kb requirement, it
  would do a read/modify/write cycle somewhere and then write the 4KB.
 
 If all the free blocks have been TRIMmed, FS should pick a completely free 
 erasure size block and write those 4KiB of data.
 
 Correct implementation of wear leveling in the drive should notice that the 
 write is entirely inside a free block and make just a write cycle adding 
 zeros 
 to the end of supplied data.

Your assumption here is that your _addressed_ block layout is completely
identical to the SSDs disk layout. Else you cannot know where a free
erasure block is located and how to address it from FS.
I really wonder what this assumption is based on. You still think a SSD is a
true disk with linear addressing. I doubt that very much. Even on true
spinning disks your assumption is wrong for relocated sectors. Which basically
means that every disk controller firmware fiddles around with the physical
layout since decades. Please accept that you cannot do a disks' job in FS. The
more advanced technology gets the more disks become black boxes with a defined
software interface. Use this interface and drop the idea of having inside
knowledge of such a device. That's other peoples' work. If you want to design
smart SSD controllers hire at a company that builds those.

-- 
Regards,
Stephan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SSD Optimizations

2010-03-12 Thread Hubert Kario
On Friday 12 March 2010 10:15:28 Stephan von Krawczynski wrote:
 On Fri, 12 Mar 2010 02:07:40 +0100
 
 Hubert Kario h...@qbs.com.pl wrote:
   [...]
   If the FS were to be smart and know about the 256kb requirement, it
   would do a read/modify/write cycle somewhere and then write the 4KB.
  
  If all the free blocks have been TRIMmed, FS should pick a completely
  free erasure size block and write those 4KiB of data.
  
  Correct implementation of wear leveling in the drive should notice that
  the write is entirely inside a free block and make just a write cycle
  adding zeros to the end of supplied data.
 
 Your assumption here is that your _addressed_ block layout is completely
 identical to the SSDs disk layout.
 Else you cannot know where a free
 erasure block is located and how to address it from FS.
 I really wonder what this assumption is based on. You still think a SSD is
 a true disk with linear addressing. I doubt that very much.

I made no such assumptions.
Im sure that the linearity on the ATA LBA level isn't so linear on the device 
level, especially after wear-leveling takes its toll, but I assume that the 
smallest block of data that the translation layer can address is erase-block 
sized and that all the erase-block are equal in size. Otherwise the algorithm 
would be needlessly complicated which would make it both slower and more error 
prone.

 Even on true
 spinning disks your assumption is wrong for relocated sectors.

Which we don't have to worry about because if the drive has less than 5 of 
'em, the impact of hitting them is marginal and if there are more, the user 
has much more pressing problem than the performance of the drive or FS.

 Which
 basically means that every disk controller firmware fiddles around with
 the physical layout since decades. Please accept that you cannot do a
 disks' job in FS. The more advanced technology gets the more disks become
 black boxes with a defined software interface. Use this interface and drop
 the idea of having inside knowledge of such a device. That's other
 peoples' work. If you want to design smart SSD controllers hire at a
 company that builds those.

And I don't think that doing disks' job in the FS is good idea, but I think 
that we should be able to minimise the impact of the translation layer.

The way to do this, is to threat the device as a block device with sectors the 
size of erase-blocks. That's nothing too fancy, don't you think?

-- 
Hubert Kario
QBS - Quality Business Software
ul. Ksawerów 30/85
02-656 Warszawa
POLAND
tel. +48 (22) 646-61-51, 646-74-24
fax +48 (22) 646-61-50
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SSD Optimizations

2010-03-11 Thread Daniel J Blueman
On Wed, Mar 10, 2010 at 11:13 PM, Gordan Bobic gor...@bobich.net wrote:
 Marcus Fritzsch wrote:

 Hi there,

 On Wed, Mar 10, 2010 at 8:49 PM, Gordan Bobic gor...@bobich.net wrote:

 [...]
 Are there similar optimizations available in BTRFS?

 There is an SSD mount option available[1].

 [1] http://btrfs.wiki.kernel.org/index.php/Getting_started#Mount_Options

 But what _exactly_ does it do?

Chris explains the change to favour spatial locality in allocator
behaviour in with '-o ssd'. '-o ssd_spread' does the opposite, where
RMW cycles are higher penalty. Elsewhere IIRC, Chris also said BTRFS
attempts to submit 128KB BIOs where possible (or wishful thinking?):

http://markmail.org/message/4sq4uco2lghgxzzz
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SSD Optimizations

2010-03-11 Thread Hubert Kario
On Thursday 11 March 2010 08:38:53 Sander wrote:
 Hello Gordan,
 
 Gordan Bobic wrote (ao):
  Mike Fedyk wrote:
  On Wed, Mar 10, 2010 at 11:49 AM, Gordan Bobic gor...@bobich.net wrote:
  Are there options available comparable to ext2/ext3 to help reduce
  wear and improve performance?
 
 With SSDs you don't have to worry about wear.

Sorry, but you do have to worry about wear. I was able to destroy a relatively 
new SD card (2007 or early 2008) just by writing on the first 10MiB over and 
over again for two or three days. The end of the card still works without 
problems but about 10 sectors on the beginning give write errors.

And with journaled file systems that write over and over again on the same spot 
you do have to worry about wear leveling. It depends on the underlying block 
allocation algorithm, but I'm sure that most of the cheap SSDs do wear 
leveling only inside big blocks, not on whole hard drive, making it much 
easier to hit the 10 000-100 000 erase cycles boundary.

Still, I think that if you can prolong the life of hardware without noticable 
performance degradation, you should do it. Just because it may help the drive 
with some defects last those 3-5years between upgreades without any problems.

 
  And while I appreciate hopeful remarks along the lines of I think
  you'll get more out of btrfs, I am really after specifics of what
  the ssd mount option does, and what features comparable to the
  optimizations that can be done with ext2/3/4 (e.g. the mentioned
  stripe-width option) are available to get the best possible
  alignment of data and metadata to increase both performance and life
  expectancy of a SSD.
 
 Alignment is about the partition, not the fs, and thus taken care of
 with fdisk and the like.
 
 If you don't create a partition, the fs is aligned with the SSD.

But it does not align internal FS structures to the SSD erase block size and 
that's what Gordon asked for.

And sorry Gordon, I don't know. But there's a 'ssd_spread' option that tries 
to allocate blocks as far as possible (within reason) from  themselfs. That 
should, in most cases, make the fs structures reside on an erase block by  
themself.
I'm afraid that you'll need to dive into the code to know about block 
alignment or one of the developers will need to provide us with info.

 
  Also, for drives that don't support TRIM, is there a way to make the
  FS apply aggressive re-use of erased space (in order to help the
  drive's internal wear-leveling)?
 
 TRIM has nothing to do with wear-leveling (although it helps reducing
 wear).
 TRIM lets the OS tell the disk which blocks are not in use anymore, and
 thus don't have to be copied during a rewrite of the blocks.
 Wear-leveling is the SSD making sure all blocks are more or less equally
 written to avoid continuous load on the same blocks.

Isn't this all about wear leveling? TRIM has no meaning for magnetic media. 
It's used to tell the drive which parts of medium contain only junk data and 
can be used in block rotation, making the wear-leveling easier and more 
effective.

-- 
Hubert Kario
QBS - Quality Business Software
ul. Ksawerów 30/85
02-656 Warszawa
POLAND
tel. +48 (22) 646-61-51, 646-74-24
fax +48 (22) 646-61-50
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SSD Optimizations

2010-03-11 Thread Stephan von Krawczynski
On Thu, 11 Mar 2010 11:59:57 +0100
Hubert Kario h...@qbs.com.pl wrote:

 On Thursday 11 March 2010 08:38:53 Sander wrote:
  Hello Gordan,
  
  Gordan Bobic wrote (ao):
   Mike Fedyk wrote:
   On Wed, Mar 10, 2010 at 11:49 AM, Gordan Bobic gor...@bobich.net wrote:
   Are there options available comparable to ext2/ext3 to help reduce
   wear and improve performance?
  
  With SSDs you don't have to worry about wear.
 
 Sorry, but you do have to worry about wear. I was able to destroy a 
 relatively 
 new SD card (2007 or early 2008) just by writing on the first 10MiB over and 
 over again for two or three days. The end of the card still works without 
 problems but about 10 sectors on the beginning give write errors.

Sorry, the topic was SSD, not SD. SSDs have controllers that contain heavy
closed magic to circumvent all kinds of troubles you get when using classical
flash and SD cards.
Honestly I would just drop the idea of an SSD option simply because the
vendors implement all kinds of neat strategies in their devices. So in the end
you cannot really tell if the option does something constructive and not
destructive in combination with a SSD controller.
Of course you may well discuss about an option for passive flash devices like
ide-CF/SD or the like. There is no controller involved so your fs
implementation may well work out.

-- 
Regards,
Stephan

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SSD Optimizations

2010-03-11 Thread Gordan Bobic
On Thu, 11 Mar 2010 08:38:53 +0100, Sander san...@humilis.net wrote:
 Are there options available comparable to ext2/ext3 to help reduce
 wear and improve performance?
 
 With SSDs you don't have to worry about wear.

And if you believe that you clearly swallowed the marketing spiel hook
line and sinker without enough real world exprience to show you otherwise.
But I'm not going to go off on a tangent now enumerating various victories
of marketing over mathematics and empirical evidence relating to currently
popular technologies.

In short - I have several dead SSDs of various denominations that
demonstrate otherwise - and all within their warranty period, and not
having been used in pathologically write-heavy environments. You do have to
worry about wear. Operations that increase wear also reduce speed (erasing
a block is slow, and if the disk is fully tainted you cannot write without
erasing), so you doubly have to worry about it.

Also remember that hardware sectors are 512 bytes, and FS blocks tend to
be 4096 bytes. It is thus entirely plausible that if you aren't careful
you'll end up with blocks straddling two erase block boundaries. If that
happens, you'll make wear twice as bad because you are facing a situation
where you may need to erase and write two blocks rather than one. Half the
performance, twice the wear.

 And while I appreciate hopeful remarks along the lines of I think
 you'll get more out of btrfs, I am really after specifics of what
 the ssd mount option does, and what features comparable to the
 optimizations that can be done with ext2/3/4 (e.g. the mentioned
 stripe-width option) are available to get the best possible
 alignment of data and metadata to increase both performance and life
 expectancy of a SSD.
 
 Alignment is about the partition, not the fs, and thus taken care of
 with fdisk and the like.
 
 If you don't create a partition, the fs is aligned with the SSD.

I'm talking about internal FS data structures, not the partition
alignment.

 Also, for drives that don't support TRIM, is there a way to make the
 FS apply aggressive re-use of erased space (in order to help the
 drive's internal wear-leveling)?
 
 TRIM has nothing to do with wear-leveling (although it helps reducing
 wear).

That's self contradictory. If it helps reduce wear it has something to do
with wear leveling.

 TRIM lets the OS tell the disk which blocks are not in use anymore, and
 thus don't have to be copied during a rewrite of the blocks.
 Wear-leveling is the SSD making sure all blocks are more or less equally
 written to avoid continuous load on the same blocks.

And thus it is impossible to do wear leveling when all blocks have been
written to once without TRIM. So I'd say that in the long term, without
TRIM there is no wear leveling. That makes them pretty related.

So considering that there are various nuggets of opinion floating around
(correct or otherwise) saying that ext4 has support for TRIM, I'd like to
know whether there similar support in BTRFS at the moment?

Gordan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SSD Optimizations

2010-03-11 Thread Stephan von Krawczynski
On Thu, 11 Mar 2010 12:17:30 +
Gordan Bobic gor...@bobich.net wrote:

 On Thu, 11 Mar 2010 12:31:03 +0100, Stephan von Krawczynski
 sk...@ithnet.com wrote:
On Wed, Mar 10, 2010 at 11:49 AM, Gordan Bobic gor...@bobich.net
wrote:
Are there options available comparable to ext2/ext3 to help
 reduce
wear and improve performance?
   
   With SSDs you don't have to worry about wear.
  
  Sorry, but you do have to worry about wear. I was able to destroy a
  relatively
  new SD card (2007 or early 2008) just by writing on the first 10MiB
 over
  and
  over again for two or three days. The end of the card still works
  without
  problems but about 10 sectors on the beginning give write errors.
  
  Sorry, the topic was SSD, not SD.
 
 SD == SSD with an SD interface.

That really is quite a statement. You really talk of a few-bucks SD card (like
the one in my android handy) as an SSD comparable with Intel XE only with
different interface? Come on, stay serious. The product is not only made of
SLCs and some raw logic.
 
  SSDs have controllers that contain heavy
  closed magic to circumvent all kinds of troubles you get when using
  classical flash and SD cards.
 
 There is absolutely no basis for thinking that SD cards don't contain wear
 leveling logic. SD standard, and thus SD cards support a lot of fancy copy
 protection capabilities, which means there is a lot of firmware involvement
 on SD cards. It is unlikely that any reputable SD card manufacturer
 wouldn't also build wear leveling logic into it.

I really don't guess about what is built into an SD or even CF card. But we
hopefully agree that there is a significant difference compared to a product
that calls itself a _disk_.
 
  Honestly I would just drop the idea of an SSD option simply because the
  vendors implement all kinds of neat strategies in their devices. So in
 the
  end you cannot really tell if the option does something constructive and
 not
  destructive in combination with a SSD controller.
 
 You can make an educated guess. For starters given that visible sector
 sizes are not equal to FS block sizes, it means that FS block sizes can
 straddle erase block boundaries without the flash controller, no matter how
 fancy, being able to determine this. Thus, at the very least, aligning FS
 structures so that they do not straddle erase block boundaries is useful in
 ALL cases. Thinking otherwise is just sticking your head in the sand
 because you cannot be bothered to think.

And your guess is that intel engineers had no glue when designing the XE
including its controller? You think they did not know what you and me know and
therefore pray every day that some smart fs designer falls from heaven and
saves their product from dying in between? Really?
 
  Of course you may well discuss about an option for passive flash devices
  like ide-CF/SD or the like. There is no controller involved so your fs
  implementation may well work out.
 
 I suggest you educate yourself on the nature of IDE and CF (which is just
 IDE with a different connector). There most certainly are controllers
 involved. The days when disks (mechanical or solid state) didn't integrate
 controllers ended with MFM/RLL and ESDI disks some 20+ years ago.

I suggest you don't talk to someone administering some hundred boxes based on
CF and SSD mediums for _years_ about pro and con of the respective
implementation and its long term usage.
Sorry, the world is not built out of paper, sometimes you meet the hard facts.
And one of it is that the ssd option in fs is very likely already overrun by
the ssd controller designers and mostly _superfluous_. The market has already
decided to make SSDs compatible to standard fs layouts.

-- 
Regards,
Stephan

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SSD Optimizations

2010-03-11 Thread Gordan Bobic
On Thu, 11 Mar 2010 13:59:09 +0100, Stephan von Krawczynski
sk...@ithnet.com wrote:

On Wed, Mar 10, 2010 at 11:49 AM, Gordan Bobic
gor...@bobich.net
wrote:
Are there options available comparable to ext2/ext3 to help
 reduce
wear and improve performance?
   
   With SSDs you don't have to worry about wear.
  
  Sorry, but you do have to worry about wear. I was able to destroy a
  relatively
  new SD card (2007 or early 2008) just by writing on the first 10MiB
 over
  and
  over again for two or three days. The end of the card still works
  without
  problems but about 10 sectors on the beginning give write errors.
  
  Sorry, the topic was SSD, not SD.
 
 SD == SSD with an SD interface.
 
 That really is quite a statement. You really talk of a few-bucks SD card
 (like the one in my android handy) as an SSD comparable with Intel XE
only with
 different interface? Come on, stay serious. The product is not only made
of
 SLCs and some raw logic.

I am saying that there is no reason for the firmware in an SD card to not
be as advanced. If the manufacturer has some advanced logic in their SATA
SSD, I cannot see any valid engineering reason to not apply the same logic
in a SD product.

  SSDs have controllers that contain heavy
  closed magic to circumvent all kinds of troubles you get when using
  classical flash and SD cards.
 
 There is absolutely no basis for thinking that SD cards don't contain
 wear
 leveling logic. SD standard, and thus SD cards support a lot of fancy
 copy
 protection capabilities, which means there is a lot of firmware
 involvement
 on SD cards. It is unlikely that any reputable SD card manufacturer
 wouldn't also build wear leveling logic into it.
 
 I really don't guess about what is built into an SD or even CF card. But
we
 hopefully agree that there is a significant difference compared to a
 product that calls itself a _disk_.

Wo don't agree on that. Not at all. I don't see any reason why a CF card
and an IDE SSD made by the same manufacturer should have any difference
between them other than capacity and the physical package.

  Honestly I would just drop the idea of an SSD option simply because
the
  vendors implement all kinds of neat strategies in their devices. So
in
 the
  end you cannot really tell if the option does something constructive
  and not destructive in combination with a SSD controller.
 
 You can make an educated guess. For starters given that visible sector
 sizes are not equal to FS block sizes, it means that FS block sizes can
 straddle erase block boundaries without the flash controller, no matter
 how
 fancy, being able to determine this. Thus, at the very least, aligning
FS
 structures so that they do not straddle erase block boundaries is
useful
 in
 ALL cases. Thinking otherwise is just sticking your head in the sand
 because you cannot be bothered to think.
 
 And your guess is that intel engineers had no glue when designing the XE
 including its controller? You think they did not know what you and me
know
 and
 therefore pray every day that some smart fs designer falls from heaven
and
 saves their product from dying in between? Really?

I am saying that there are problems that CANNOT be solved on the disk
firmware level. Some problems HAVE to be addressed higher up the stack.

  Of course you may well discuss about an option for passive flash
  devices
  like ide-CF/SD or the like. There is no controller involved so your
fs
  implementation may well work out.
 
 I suggest you educate yourself on the nature of IDE and CF (which is
just
 IDE with a different connector). There most certainly are controllers
 involved. The days when disks (mechanical or solid state) didn't
 integrate
 controllers ended with MFM/RLL and ESDI disks some 20+ years ago.
 
 I suggest you don't talk to someone administering some hundred boxes
based
 on
 CF and SSD mediums for _years_ about pro and con of the respective
 implementation and its long term usage.
 Sorry, the world is not built out of paper, sometimes you meet the hard
 facts.
 And one of it is that the ssd option in fs is very likely already
overrun
 by
 the ssd controller designers and mostly _superfluous_. The market has
 already decided to make SSDs compatible to standard fs layouts.

Seems to me that you haven't done any analysis of comparative long term
failure rates between SSDs used with default layouts (Default? Really? You
mean you don't apply any special partitioning on your hundreds of servers?)
and those with carefully aligned FS-es. Just because defaults may be good
enough for your use case, doesn't mean that somebody with a use case that's
harder on the flash will observe the same reliability, or deem the the
unoptimized performance figures good enough.

Gordan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SSD Optimizations

2010-03-11 Thread Hubert Kario
On Thursday 11 March 2010 14:20:23 Gordan Bobic wrote:
 On Thu, 11 Mar 2010 13:59:09 +0100, Stephan von Krawczynski
 
 sk...@ithnet.com wrote:
 On Wed, Mar 10, 2010 at 11:49 AM, Gordan Bobic
 gor...@bobich.net
 
 wrote:
 Are there options available comparable to ext2/ext3 to help
 
  reduce
 
 wear and improve performance?
   
With SSDs you don't have to worry about wear.
  
   Sorry, but you do have to worry about wear. I was able to destroy a
   relatively
   new SD card (2007 or early 2008) just by writing on the first 10MiB
 
  over
 
   and
   over again for two or three days. The end of the card still works
   without
   problems but about 10 sectors on the beginning give write errors.
  
   Sorry, the topic was SSD, not SD.
 
  SD == SSD with an SD interface.
 
  That really is quite a statement. You really talk of a few-bucks SD card
  (like the one in my android handy) as an SSD comparable with Intel XE
 
 only with
 
  different interface? Come on, stay serious. The product is not only made
 
 of
 
  SLCs and some raw logic.
 
 I am saying that there is no reason for the firmware in an SD card to not
 be as advanced. If the manufacturer has some advanced logic in their SATA
 SSD, I cannot see any valid engineering reason to not apply the same logic
 in a SD product.

The _SD_standard_ states that the media has to implement wear-leveling.
So any card with an SD logo implements it.

As I stated previously, the algorithms used in SD cards may not be as advanced 
as those in top-of-the-line Intel SSDs, but I bet they don't differ by much to 
the ones used in cheapest SSD drives.

Besides, why shouldn't we help the drive firmware by 
- writing the data only in erase-block sizes
- trying to write blocks that are smaller than the erase-block in a way that 
won't cross the erase-block boundary
- using TRIM on deallocated parts of the drive

This will not only increase the life of the SSD but also increase its 
performance.

 
   Honestly I would just drop the idea of an SSD option simply because
 
 the
 
   vendors implement all kinds of neat strategies in their devices. So
 
 in
 
  the
 
   end you cannot really tell if the option does something constructive
   and not destructive in combination with a SSD controller.
 
  You can make an educated guess. For starters given that visible sector
  sizes are not equal to FS block sizes, it means that FS block sizes can
  straddle erase block boundaries without the flash controller, no matter
  how
  fancy, being able to determine this. Thus, at the very least, aligning
 
 FS
 
  structures so that they do not straddle erase block boundaries is
 
 useful
 
  in
  ALL cases. Thinking otherwise is just sticking your head in the sand
  because you cannot be bothered to think.
 
  And your guess is that intel engineers had no glue when designing the XE
  including its controller? You think they did not know what you and me
 
 know
 
  and
  therefore pray every day that some smart fs designer falls from heaven
 
 and
 
  saves their product from dying in between? Really?
 
 I am saying that there are problems that CANNOT be solved on the disk
 firmware level. Some problems HAVE to be addressed higher up the stack.

Exactly, you can't assume that the SSDs firmware understands any and all file 
system layouts, especially if they are on fragmented LVM or other logical 
volume manager partitions.

-- 
Hubert Kario
QBS - Quality Business Software
02-656 Warszawa, ul. Ksawerów 30/85
tel. +48 (22) 646-61-51, 646-74-24
www.qbs.com.pl

System Zarządzania Jakością
zgodny z normą ISO 9001:2000
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SSD Optimizations

2010-03-11 Thread Chris Mason
On Wed, Mar 10, 2010 at 07:49:34PM +, Gordan Bobic wrote:
 I'm looking to try BTRFS on a SSD, and I would like to know what SSD
 optimizations it applies. Is there a comprehensive list of what ssd
 mount option does? How are the blocks and metadata arranged? Are
 there options available comparable to ext2/ext3 to help reduce wear
 and improve performance?
 
 Specifically, on ext2 (journal means more writes, so I don't use
 ext3 on SSDs, since fsck typically only takes a few seconds when
 access time is  100us), I usually apply the
 -b 4096 -E stripe-width = (erase_block/4096)
 parameters to mkfs in order to reduce the multiple erase cycles on
 the same underlying block.
 
 Are there similar optimizations available in BTRFS?

All devices (raid, ssd, single spindle) tend to benefit from big chunks
of writes going down close together on disk.  This is true for different
reasons on each one, but it is still the easiest way to optimize writes.
COW filesystems like btrfs are very well suited to send down lots of big writes
because we're always reallocating things.

For traditional storage, we also need to keep blocks from one file (or
files in a directory) close together to reduce seeks during reads.  SSDs
have no such restrictions, and so the mount -o ssd related options in
btrfs focus on tossing out tradeoffs that slow down writes in hopes of
reading faster later.

Someone already mentioned the mount -o ssd and ssd_spread options.
Mount -o ssd is targeted at faster SSD that is good at wear leveling and
generally just benefits from having a bunch of data sent down close
together.  In mount -o ssd, you might find a write pattern like this:

block N, N+2, N+3, N+4, N+6, N+7, N+16, N+17, N+18, N+19, N+20 ...

It's a largely contiguous chunk of writes, but there may be gaps.  Good
ssds don't really care about the gaps, and they benefit more from the
fact that we're preferring to reuse blocks that had once been written
than to go off and find completely contiguous areas of the disk to
write (which are more likely to have never been written at all).

mount -o ssd_spread is much more strict.  You'll get N,N+2,N+3,N+4,N+5
etc because crummy ssds really do care about the gaps.

Now, btrfs could go off and probe for the erasure size and work very
hard to align things to it.  As someone said, alignment of the partition
table is very important here as well.  But for modern ssd this generally
matters much less than just doing big ios and letting the little log
structured squirrel inside the device figure things out.

For trim, we do have mount -o discard.  It does introduce a run time
performance hit (this varies wildly from device to device) and we're
tuning things as discard capable devices become more common.  If anyone
is looking for a project it would be nice to have an ioctl that triggers
free space discards in bulk.

-chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SSD Optimizations

2010-03-11 Thread Sander
Stephan von Krawczynski wrote (ao):
 Honestly I would just drop the idea of an SSD option simply because the
 vendors implement all kinds of neat strategies in their devices. So in the end
 you cannot really tell if the option does something constructive and not
 destructive in combination with a SSD controller.

My understanding of the ssd mount option is also that the fs doens't try
to do all kinds of smart (and potential expensive) things which make
sense for rotating media to reduce seeks and the like.

Sander

-- 
Humilis IT Services and Solutions
http://www.humilis.net
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SSD Optimizations

2010-03-11 Thread Stephan von Krawczynski
On Thu, 11 Mar 2010 15:01:55 +0100
Hubert Kario h...@qbs.com.pl wrote:

 [...]
 The _SD_standard_ states that the media has to implement wear-leveling.
 So any card with an SD logo implements it.
 
 As I stated previously, the algorithms used in SD cards may not be as 
 advanced 
 as those in top-of-the-line Intel SSDs, but I bet they don't differ by much 
 to 
 the ones used in cheapest SSD drives.

Well, we are all pretty sure about that. And that is exactly the reason why
these are not surviving the market pressure. Why should one care about bad
products that are possibly already extincted because of their bad performance
when the fs is production ready some day?
 
 Besides, why shouldn't we help the drive firmware by 
 - writing the data only in erase-block sizes
 - trying to write blocks that are smaller than the erase-block in a way that 
 won't cross the erase-block boundary

Because if the designing engineer of a good SSD controller wasn't able to cope
with that he will have no chance to design a second one.

 - using TRIM on deallocated parts of the drive

Another story. That is a designed part of a software interface between fs and
drive bios on which both agreed in its usage pattern. Whereas the above points
are pure guess based on dumb and old hardware and its behaviour.
 
 This will not only increase the life of the SSD but also increase its 
 performance.

TRIM: maybe yes. Rest: pure handwaving.

 [...]
   And your guess is that intel engineers had no glue when designing the XE
   including its controller? You think they did not know what you and me
   know and
   therefore pray every day that some smart fs designer falls from heaven
   and saves their product from dying in between? Really?
  
  I am saying that there are problems that CANNOT be solved on the disk
  firmware level. Some problems HAVE to be addressed higher up the stack.
 
 Exactly, you can't assume that the SSDs firmware understands any and all file 
 system layouts, especially if they are on fragmented LVM or other logical 
 volume manager partitions.

Hopefully the firmware understands exactly no fs layout at all. That would be
braindead. Instead it should understand how to arrange incoming and outgoing
data in a way that its own technical requirements are met as perfect as
possible. This is no spinning disk, it is completely irrelevant what the data
layout looks like as long as the controller finds its way through and copes
best with read/write/erase cycles. It may well use additional RAM for caching
and data reordering.
Do you really believe ascending block numbers are placed in ascending
addresses inside the disk (as an example)? Why should they? What does that
mean for fs block ordering? If you don't know anyway what a controller does to
your data ordering, how do you want to help it with its job?
Please accept that we are _not_ talking about trivial flash mem here or
pseudo-SSDs consisting of sd cards. The market has already evolved better
products. The dinosaurs are extincted even if some are still looking alive.

 -- 
 Hubert Kario

-- 
Regards,
Stephan

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SSD Optimizations

2010-03-11 Thread Asdo

Gordan Bobic wrote:

TRIM lets the OS tell the disk which blocks are not in use anymore, and
thus don't have to be copied during a rewrite of the blocks.
Wear-leveling is the SSD making sure all blocks are more or less equally
written to avoid continuous load on the same blocks.



And thus it is impossible to do wear leveling when all blocks have been
written to once without TRIM. So I'd say that in the long term, without
TRIM there is no wear leveling. That makes them pretty related.
  

I'm no expert of SSDs, however

1- I think the SSD would rewrite once-written blocks to other locations, 
so to reuse the same physical blocks for wear levelling. The 
written-once blocks are very good candidates because their write-count 
is 1


2- I think SSDs show you a smaller usable size than what they physically 
have. In this way they always have some more blocks for moving data to, 
so to free blocks which have a low write-count.


3- If you think #2 is not enough you can leave part of the SSD disk 
unused, by leaving unused space after the last partition.




Actually, after considering #1 and #2, I don't think TRIM is really 
needed for SSDs, are you sure it is really needed? I think it's more a 
kind of optimization, but it needs to be very fast for it to be useful 
as an optimization, faster than an internal block rewrite by the SSD 
wear levelling, and so fast as SATA/SAS command that the computer is not 
significantly slowed down by using it. Instead IIRC I read something 
about it being slow and maybe was even requiring FUA or barrier or 
flush? I don't remember exactly.


There is one place where TRIM would be very useful though, and it's not 
for SSDs, but it's in virtualization: if the Virtual Machine frees 
space, the VM file system should use TRIM to signal to the host that 
some space is unused. The host should have a way to tell its filesystem 
that the VM-disk-file has a new hole in that position, so that disk 
space can be freed on the host for use for another VM. This would allow 
much greater overcommit of disk spaces to virtual machines.


There's probably no need for TRIM support itself on the host 
filesystem, but another mechanism is needed that allows to sparsify an 
existing file creating a hole in it (which I think is not possible with 
the filesystems syscalls we have now, correct me if I'm wrong). There 
*is* need for TRIM support in the guest filesystem though.


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SSD Optimizations

2010-03-11 Thread Gordan Bobic
On Thu, 11 Mar 2010 16:35:33 +0100, Stephan von Krawczynski
sk...@ithnet.com wrote:

 Besides, why shouldn't we help the drive firmware by 
 - writing the data only in erase-block sizes
 - trying to write blocks that are smaller than the erase-block in a way
 that won't cross the erase-block boundary
 
 Because if the designing engineer of a good SSD controller wasn't able
to
 cope with that he will have no chance to design a second one.

You seem to be confusing quality of implementation with theoretical
possibility.

 This will not only increase the life of the SSD but also increase its 
 performance.
 
 TRIM: maybe yes. Rest: pure handwaving.
 
 [...]
   And your guess is that intel engineers had no glue when designing
   the XE
   including its controller? You think they did not know what you and
me
   know and
   therefore pray every day that some smart fs designer falls from
   heaven
   and saves their product from dying in between? Really?
  
  I am saying that there are problems that CANNOT be solved on the disk
  firmware level. Some problems HAVE to be addressed higher up the
stack.
 
 Exactly, you can't assume that the SSDs firmware understands any and
all
 file
 system layouts, especially if they are on fragmented LVM or other
 logical
 volume manager partitions.
 
 Hopefully the firmware understands exactly no fs layout at all. That
would
 be
 braindead. Instead it should understand how to arrange incoming and
 outgoing
 data in a way that its own technical requirements are met as perfect as
 possible. This is no spinning disk, it is completely irrelevant what the
 data
 layout looks like as long as the controller finds its way through and
copes
 best with read/write/erase cycles. It may well use additional RAM for
 caching and data reordering.
 Do you really believe ascending block numbers are placed in ascending
 addresses inside the disk (as an example)? Why should they? What does
that
 mean for fs block ordering? If you don't know anyway what a controller
 does to
 your data ordering, how do you want to help it with its job?
 Please accept that we are _not_ talking about trivial flash mem here or
 pseudo-SSDs consisting of sd cards. The market has already evolved
better
 products. The dinosaurs are extincted even if some are still looking
alive.

I am assuming that you are being deliberately facetious here (the
alternative is less kind). The simple fact is that you cannot come up with
some magical data (re)ordering method that nullifies problems of common
use-cases that are quite nasty for flash based media.

For example - you have a disk that has had all it's addressable blocks
tainted. A new write comes in - what do you do with it? Worse, a write
comes in spanning two erase blocks as a consequence of the data
re-alignment in the firmware. You have no choice but to wipe them both and
re-write the data. You'd be better off not doing the magic and assuming
that the FS is sensibly aligned.

Having a large chunk of spare non-addressable space for this doesn't
necessarily help you, either, unless it is about the same size as the
addressable space (worse case scenario, if you accept that the vast
majority of FS-es use 4KB block sizes, you can cut a corner there by a
factor of 8). All of that adds to cost - flash is still expensive.

The bottom line is that you _cannot_ solve wear-leveling completely just
in firmware. There is no doubt you can get some of the way there, but it is
mathematically impossible to solve completely without intervention from
further up the stack. Since some black-box firmware optimizations may quite
concievably make the wear problem worse, it makes perfect sense to just
hopefully assume that the FS is trying to help - it's unlikely to make
things worse and may well make things a lot better.

Gordan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SSD Optimizations

2010-03-11 Thread Gordan Bobic
On Thu, 11 Mar 2010 14:42:40 +0100, Asdo a...@shiftmail.org wrote:

 1- I think the SSD would rewrite once-written blocks to other locations,

 so to reuse the same physical blocks for wear levelling. The 
 written-once blocks are very good candidates because their write-count 
 is 1

There are likely to be millions of blocks with the same write count. How
do you pick the optimal ones?

 2- I think SSDs show you a smaller usable size than what they physically

 have. In this way they always have some more blocks for moving data to, 
 so to free blocks which have a low write-count.

I'm pretty sure that is the case, too. However, to be able to deal with
the worst case scenario you would have to effectively double the amoutn of
flash (only expose half of it as addressable). With some corner cutting and
assumptions about file systems' block sizes (usually 4KB these days), you
can cut a corner, but that's dodgy until we get bigger hardware sectors as
standard.

 3- If you think #2 is not enough you can leave part of the SSD disk 
 unused, by leaving unused space after the last partition.

That is true, but I'd rather apply some higher level logic to this rather
than expect the firmware do make a guess about things it has absolutely no
way of knowing for sure.

 Actually, after considering #1 and #2, I don't think TRIM is really 
 needed for SSDs, are you sure it is really needed?

I don't think there's any doubt that trim helps flash longevity.

 I think it's more a 
 kind of optimization, but it needs to be very fast for it to be useful 
 as an optimization, faster than an internal block rewrite by the SSD 
 wear levelling, and so fast as SATA/SAS command that the computer is not

 significantly slowed down by using it. Instead IIRC I read something 
 about it being slow and maybe was even requiring FUA or barrier or 
 flush? I don't remember exactly.

There is no obligation on part of the disk to do anything in response to
the trim command, IIRC. It is advisory. It doesn't have to clear the blocks
online. In fact, trim is sector based, and it is unlikely an SSD would act
until it can sensibly free an entire erase block.

 There is one place where TRIM would be very useful though, and it's not 
 for SSDs, but it's in virtualization: if the Virtual Machine frees 
 space, the VM file system should use TRIM to signal to the host that 
 some space is unused. The host should have a way to tell its filesystem 
 that the VM-disk-file has a new hole in that position, so that disk 
 space can be freed on the host for use for another VM. This would allow 
 much greater overcommit of disk spaces to virtual machines.

Indeed, I brought this very point up on the KVM mailing list a while back.

Gordan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SSD Optimizations

2010-03-11 Thread Gordan Bobic
On Thu, 11 Mar 2010 09:21:30 -0500, Chris Mason chris.ma...@oracle.com
wrote:
 On Wed, Mar 10, 2010 at 07:49:34PM +, Gordan Bobic wrote:
 I'm looking to try BTRFS on a SSD, and I would like to know what SSD
 optimizations it applies. Is there a comprehensive list of what ssd
 mount option does? How are the blocks and metadata arranged? Are
 there options available comparable to ext2/ext3 to help reduce wear
 and improve performance?
 
 Specifically, on ext2 (journal means more writes, so I don't use
 ext3 on SSDs, since fsck typically only takes a few seconds when
 access time is  100us), I usually apply the
 -b 4096 -E stripe-width = (erase_block/4096)
 parameters to mkfs in order to reduce the multiple erase cycles on
 the same underlying block.
 
 Are there similar optimizations available in BTRFS?
 
 All devices (raid, ssd, single spindle) tend to benefit from big chunks
 of writes going down close together on disk.  This is true for different
 reasons on each one, but it is still the easiest way to optimize writes.
 COW filesystems like btrfs are very well suited to send down lots of big
 writes because we're always reallocating things.

Doesn't this mean _more_ writes? If that's the case, then that would make
btrfs a _bad_ choice for flash based media with limite write cycles.

 For traditional storage, we also need to keep blocks from one file (or
 files in a directory) close together to reduce seeks during reads.  SSDs
 have no such restrictions, and so the mount -o ssd related options in
 btrfs focus on tossing out tradeoffs that slow down writes in hopes of
 reading faster later.
 
 Someone already mentioned the mount -o ssd and ssd_spread options.
 Mount -o ssd is targeted at faster SSD that is good at wear leveling and
 generally just benefits from having a bunch of data sent down close
 together.  In mount -o ssd, you might find a write pattern like this:
 
 block N, N+2, N+3, N+4, N+6, N+7, N+16, N+17, N+18, N+19, N+20 ...
 
 It's a largely contiguous chunk of writes, but there may be gaps.  Good
 ssds don't really care about the gaps, and they benefit more from the
 fact that we're preferring to reuse blocks that had once been written
 than to go off and find completely contiguous areas of the disk to
 write (which are more likely to have never been written at all).
 
 mount -o ssd_spread is much more strict.  You'll get N,N+2,N+3,N+4,N+5
 etc because crummy ssds really do care about the gaps.
 
 Now, btrfs could go off and probe for the erasure size and work very
 hard to align things to it.  As someone said, alignment of the partition
 table is very important here as well.  But for modern ssd this generally
 matters much less than just doing big ios and letting the little log
 structured squirrel inside the device figure things out.

Thanks, that's quite helpful. Can you provide any insight into alignment
of FS structures in such a way that they do not straddle erase block
boundaries?

 For trim, we do have mount -o discard.  It does introduce a run time
 performance hit (this varies wildly from device to device) and we're
 tuning things as discard capable devices become more common.  If anyone
 is looking for a project it would be nice to have an ioctl that triggers
 free space discards in bulk.

Are you saying that -o discard implements trim support?

Gordan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SSD Optimizations

2010-03-11 Thread Chris Mason
On Thu, Mar 11, 2010 at 04:03:59PM +, Gordan Bobic wrote:
 On Thu, 11 Mar 2010 16:35:33 +0100, Stephan von Krawczynski
 sk...@ithnet.com wrote:
 
  Besides, why shouldn't we help the drive firmware by 
  - writing the data only in erase-block sizes
  - trying to write blocks that are smaller than the erase-block in a way
  that won't cross the erase-block boundary
  
  Because if the designing engineer of a good SSD controller wasn't able
 to
  cope with that he will have no chance to design a second one.
 
 You seem to be confusing quality of implementation with theoretical
 possibility.
 
  This will not only increase the life of the SSD but also increase its 
  performance.
  
  TRIM: maybe yes. Rest: pure handwaving.
  
  [...]
And your guess is that intel engineers had no glue when designing
the XE
including its controller? You think they did not know what you and
 me
know and
therefore pray every day that some smart fs designer falls from
heaven
and saves their product from dying in between? Really?
   
   I am saying that there are problems that CANNOT be solved on the disk
   firmware level. Some problems HAVE to be addressed higher up the
 stack.
  
  Exactly, you can't assume that the SSDs firmware understands any and
 all
  file
  system layouts, especially if they are on fragmented LVM or other
  logical
  volume manager partitions.
  
  Hopefully the firmware understands exactly no fs layout at all. That
 would
  be
  braindead. Instead it should understand how to arrange incoming and
  outgoing
  data in a way that its own technical requirements are met as perfect as
  possible. This is no spinning disk, it is completely irrelevant what the
  data
  layout looks like as long as the controller finds its way through and
 copes
  best with read/write/erase cycles. It may well use additional RAM for
  caching and data reordering.
  Do you really believe ascending block numbers are placed in ascending
  addresses inside the disk (as an example)? Why should they? What does
 that
  mean for fs block ordering? If you don't know anyway what a controller
  does to
  your data ordering, how do you want to help it with its job?
  Please accept that we are _not_ talking about trivial flash mem here or
  pseudo-SSDs consisting of sd cards. The market has already evolved
 better
  products. The dinosaurs are extincted even if some are still looking
 alive.
 
 I am assuming that you are being deliberately facetious here (the
 alternative is less kind). The simple fact is that you cannot come up with
 some magical data (re)ordering method that nullifies problems of common
 use-cases that are quite nasty for flash based media.
 
 For example - you have a disk that has had all it's addressable blocks
 tainted. A new write comes in - what do you do with it? Worse, a write
 comes in spanning two erase blocks as a consequence of the data
 re-alignment in the firmware. You have no choice but to wipe them both and
 re-write the data. You'd be better off not doing the magic and assuming
 that the FS is sensibly aligned.

Ok, how exactly would the FS help here?  We have a device with a 256kb
erasure size, and userland does a 4k write followed by an fsync.

If the FS were to be smart and know about the 256kb requirement, it
would do a read/modify/write cycle somewhere and then write the 4KB.

The underlying implementation is the same in the device.  It picks a
destination, reads it then writes it back.  You could argue (and many
people do) that this operation is risky and has a good chance of
destroying old data.  Perhaps we're best off if the FS does the rmw
cycle instead into an entirely safe location.

It's a great place for research and people are definitely looking at it.

But with all of that said, it has nothing to do with alignment or trim.
Modern ssds are a raid device with a large stripe size, and someone
somewhere is going to do a read/modify/write to service any small write.
You can force this up to the FS or the application, it'll happen
somewhere.

The filesystem metadata writes are a very small percentage of the
problem overall.  Sure we can do better and try to force larger metadata
blocks.  This was the whole point behind btrfs' support for large tree
blocks, which I'll be enabling again shortly.

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SSD Optimizations

2010-03-11 Thread Martin K. Petersen
 Gordan == Gordan Bobic gor...@bobich.net writes:

Gordan I fully agree that it's important for wear leveling on flash
Gordan media, but from the security point of view, I think TRIM would
Gordan be a useful feature on all storage media. If the erased blocks
Gordan were trimmed it would provide a potentially useful feature of
Gordan securely erasing the sectors that are no longer used. It would
Gordan be useful and much more transparent than the secure erase
Gordan features that only operate on the entire disk. Just MHO.

Except there are no guarantees that TRIM does anything, even if the
drive claims to support it.

There are a couple of IDENTIFY DEVICE knobs that indicate whether the
drive deterministically returns data after a TRIM.  And whether the
resulting data is zeroes.  We query these values and report them to the
filesystem.

However, testing revealed several devices that reported the right thing
but which did in fact return the old data afterwards.

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SSD Optimizations

2010-03-11 Thread Chris Mason
On Thu, Mar 11, 2010 at 04:18:48PM +, Gordan Bobic wrote:
 On Thu, 11 Mar 2010 09:21:30 -0500, Chris Mason chris.ma...@oracle.com
 wrote:
  On Wed, Mar 10, 2010 at 07:49:34PM +, Gordan Bobic wrote:
  I'm looking to try BTRFS on a SSD, and I would like to know what SSD
  optimizations it applies. Is there a comprehensive list of what ssd
  mount option does? How are the blocks and metadata arranged? Are
  there options available comparable to ext2/ext3 to help reduce wear
  and improve performance?
  
  Specifically, on ext2 (journal means more writes, so I don't use
  ext3 on SSDs, since fsck typically only takes a few seconds when
  access time is  100us), I usually apply the
  -b 4096 -E stripe-width = (erase_block/4096)
  parameters to mkfs in order to reduce the multiple erase cycles on
  the same underlying block.
  
  Are there similar optimizations available in BTRFS?
  
  All devices (raid, ssd, single spindle) tend to benefit from big chunks
  of writes going down close together on disk.  This is true for different
  reasons on each one, but it is still the easiest way to optimize writes.
  COW filesystems like btrfs are very well suited to send down lots of big
  writes because we're always reallocating things.
 
 Doesn't this mean _more_ writes? If that's the case, then that would make
 btrfs a _bad_ choice for flash based media with limite write cycles.

It just means that when we do write, we don't overwrite the existing
data in the file.  We allocate a new block instead and write there
(freeing the old one)
.
This gives us a lot of control over grouping writes together, instead of
being restricted to the layout from when the file was first created.

It also fragments the files much more, but this isn't an issue on ssd.

 
  For traditional storage, we also need to keep blocks from one file (or
  files in a directory) close together to reduce seeks during reads.  SSDs
  have no such restrictions, and so the mount -o ssd related options in
  btrfs focus on tossing out tradeoffs that slow down writes in hopes of
  reading faster later.
  
  Someone already mentioned the mount -o ssd and ssd_spread options.
  Mount -o ssd is targeted at faster SSD that is good at wear leveling and
  generally just benefits from having a bunch of data sent down close
  together.  In mount -o ssd, you might find a write pattern like this:
  
  block N, N+2, N+3, N+4, N+6, N+7, N+16, N+17, N+18, N+19, N+20 ...
  
  It's a largely contiguous chunk of writes, but there may be gaps.  Good
  ssds don't really care about the gaps, and they benefit more from the
  fact that we're preferring to reuse blocks that had once been written
  than to go off and find completely contiguous areas of the disk to
  write (which are more likely to have never been written at all).
  
  mount -o ssd_spread is much more strict.  You'll get N,N+2,N+3,N+4,N+5
  etc because crummy ssds really do care about the gaps.
  
  Now, btrfs could go off and probe for the erasure size and work very
  hard to align things to it.  As someone said, alignment of the partition
  table is very important here as well.  But for modern ssd this generally
  matters much less than just doing big ios and letting the little log
  structured squirrel inside the device figure things out.
 
 Thanks, that's quite helpful. Can you provide any insight into alignment
 of FS structures in such a way that they do not straddle erase block
 boundaries?

We align on 4k (but partition alignment can defeat this).  We don't
attempt to understand or guess at erasure blocks.  Unless the filesystem
completely takes over the FTL duties, I don't think it makes sense to do
more than send large writes whenever we can.

The raid 5/6 patches will add more knobs for strict alignment, but I'd
be very surprised if they made a big difference on modern ssd.

 
  For trim, we do have mount -o discard.  It does introduce a run time
  performance hit (this varies wildly from device to device) and we're
  tuning things as discard capable devices become more common.  If anyone
  is looking for a project it would be nice to have an ioctl that triggers
  free space discards in bulk.
 
 Are you saying that -o discard implements trim support?

Yes, it sends trim/discards down to devices that support it.

-chris

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SSD Optimizations

2010-03-11 Thread Martin K. Petersen
 Gordan == Gordan Bobic gor...@bobich.net writes:

Gordan SD == SSD with an SD interface.

No, not really.

It is true that conceivably you could fit a sophisticated controller in
an SD card form factor.  But fact is that takes up space which could
otherwise be used for flash.  There may also be power consumption/heat
dissipation concerns.

Most SD card controllers have very, very simple wear leveling that in
most cases rely on the filesystem being FAT.  These cards are aimed at
cameras, MP3 players, etc. after all. And consequently it's trivial to
wear out an SD card by writing a block over and over.

The same is kind of true for Compact Flash.  There are two types of
cards, I prefer to think of them as camera grade and industrial.  Camera
grade CF is really no different from SD cards or any other consumer
flash form factor.

Industrial CF cards have controllers with sophisticated wear leveling.
Usually this is not quite as clever as a big SSD, but it is close
enough that you can treat the device as a real disk drive.  I.e. it has
multiple channels working in parallel unlike the consumer devices.

As a result of the smarter controller logic and the bigger bank of spare
flash, industrial cards are much smaller in capacity.  Typically in the
1-4 GB range.  But they are in many cases indistinguishable from a real
SSD in terms of performance and reliability.


Gordan You can make an educated guess. For starters given that visible
Gordan sector sizes are not equal to FS block sizes, it means that FS
Gordan block sizes can straddle erase block boundaries without the
Gordan flash controller, no matter how fancy, being able to determine
Gordan this. Thus, at the very least, aligning FS structures so that
Gordan they do not straddle erase block boundaries is useful in ALL
Gordan cases. Thinking otherwise is just sticking your head in the sand
Gordan because you cannot be bothered to think.

There are no means of telling what the erase block size is.  None.  We
have no idea.  The vendors won't talk.  It's part of their IP.

Also, there is no point in being hung up on the whole erase block thing.
Only crappy SSDs use block mapping where that matters.  These devices
will die a horrible death soon enough.  Good SSDs use a technique akin
to logging filesystems in which the erase block size and all other other
physical characteristics don't matter (from a host perspective).

-- 
Martin K. Petersen  Oracle Linux Engineering
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SSD Optimizations

2010-03-11 Thread Stephan von Krawczynski
On Thu, 11 Mar 2010 15:39:05 +0100
Sander san...@humilis.net wrote:

 Stephan von Krawczynski wrote (ao):
  Honestly I would just drop the idea of an SSD option simply because the
  vendors implement all kinds of neat strategies in their devices. So in the 
  end
  you cannot really tell if the option does something constructive and not
  destructive in combination with a SSD controller.
 
 My understanding of the ssd mount option is also that the fs doens't try
 to do all kinds of smart (and potential expensive) things which make
 sense for rotating media to reduce seeks and the like.
 
   Sander

Such an optimization sounds valid on first sight. But re-think closely: how
does the fs really know about seeks needed during some operation? If your
disk is a single plate one your seeks are completely different from multi
plate. So even a simple case is more or less unpredictable. If you consider a
RAID or SAN as device base it should be clear that trying to optimize for
certain device types is just a fake. What does that tell you? The optimization
was a pure loss of work hours in the first place. In fact if you look at this
list a lot of talks going on are highly academic and have no real usage
scenario.
Sometimes trying to be super-smart is indeed not useful (for a fs) ...
-- 
Regards,
Stephan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SSD Optimizations

2010-03-11 Thread Chris Mason
On Thu, Mar 11, 2010 at 06:35:06PM +0100, Stephan von Krawczynski wrote:
 On Thu, 11 Mar 2010 15:39:05 +0100
 Sander san...@humilis.net wrote:
 
  Stephan von Krawczynski wrote (ao):
   Honestly I would just drop the idea of an SSD option simply because the
   vendors implement all kinds of neat strategies in their devices. So in 
   the end
   you cannot really tell if the option does something constructive and not
   destructive in combination with a SSD controller.
  
  My understanding of the ssd mount option is also that the fs doens't try
  to do all kinds of smart (and potential expensive) things which make
  sense for rotating media to reduce seeks and the like.
  
  Sander
 
 Such an optimization sounds valid on first sight. But re-think closely: how
 does the fs really know about seeks needed during some operation?

Well the FS makes a few assumptions (in the nonssd case).  First it
assumes the storage is not a memory device.  If things would fit in
memory we wouldn't need filesytems in the first place.

Then it assumes that adjacent blocks are cheap to read and blocks that
are far away are expensive to read.  Given expensive raid controllers,
cache, and everything else, you're correct that sometimes this
assumption is wrong.  But, on average seeking hurts.  Really a lot.

We try to organize files such that files that are likely to be read
together are found together on disk.  Btrfs is fairly good at this
during file creation and not as good as ext*/xfs as files over
overwritten and modified again and again (due to cow).

If you turn mount -o ssd on for your drive and do a test, you might not
notice much difference right away.  ssds tend to be pretty good right
out of the box.  Over time it tends to help, but it is a very hard thing
to benchmark in general.

-chris
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


SSD Optimizations

2010-03-10 Thread Gordan Bobic
I'm looking to try BTRFS on a SSD, and I would like to know what SSD 
optimizations it applies. Is there a comprehensive list of what ssd 
mount option does? How are the blocks and metadata arranged? Are there 
options available comparable to ext2/ext3 to help reduce wear and 
improve performance?


Specifically, on ext2 (journal means more writes, so I don't use ext3 on 
SSDs, since fsck typically only takes a few seconds when access time is 
 100us), I usually apply the

-b 4096 -E stripe-width = (erase_block/4096)
parameters to mkfs in order to reduce the multiple erase cycles on the 
same underlying block.


Are there similar optimizations available in BTRFS?

Gordan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SSD Optimizations

2010-03-10 Thread Marcus Fritzsch
Hi there,

On Wed, Mar 10, 2010 at 8:49 PM, Gordan Bobic gor...@bobich.net wrote:
 [...]
 Are there similar optimizations available in BTRFS?

There is an SSD mount option available[1].

Cheers,
Marcus

[1] http://btrfs.wiki.kernel.org/index.php/Getting_started#Mount_Options
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SSD Optimizations

2010-03-10 Thread Mike Fedyk
On Wed, Mar 10, 2010 at 11:49 AM, Gordan Bobic gor...@bobich.net wrote:
 I'm looking to try BTRFS on a SSD, and I would like to know what SSD
 optimizations it applies. Is there a comprehensive list of what ssd mount
 option does? How are the blocks and metadata arranged? Are there options
 available comparable to ext2/ext3 to help reduce wear and improve
 performance?

 Specifically, on ext2 (journal means more writes, so I don't use ext3 on
 SSDs, since fsck typically only takes a few seconds when access time is 
 100us), I usually apply the
 -b 4096 -E stripe-width = (erase_block/4096)
 parameters to mkfs in order to reduce the multiple erase cycles on the same
 underlying block.

 Are there similar optimizations available in BTRFS?

I think you'll get more out of btrfs, but another thing you can look
into is ext4 without the journal.  Support was added for that recently
(thanks to google).
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SSD Optimizations

2010-03-10 Thread Gordan Bobic

Marcus Fritzsch wrote:

Hi there,

On Wed, Mar 10, 2010 at 8:49 PM, Gordan Bobic gor...@bobich.net wrote:

[...]
Are there similar optimizations available in BTRFS?


There is an SSD mount option available[1].

[1] http://btrfs.wiki.kernel.org/index.php/Getting_started#Mount_Options


But what _exactly_ does it do?

Is there a way to leverage any knowledge of erase block size at file 
system creation time?


Are there any special parameters that might affect locations of 
superblocks and metadata?


Is there a way to ensure they don't span erase block boundaries?

What about ATA TRIM command support? Is this available? Is it included 
in the version in Fedora 13?


Gordan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SSD Optimizations

2010-03-10 Thread Gordan Bobic

Mike Fedyk wrote:

On Wed, Mar 10, 2010 at 11:49 AM, Gordan Bobic gor...@bobich.net wrote:

I'm looking to try BTRFS on a SSD, and I would like to know what SSD
optimizations it applies. Is there a comprehensive list of what ssd mount
option does? How are the blocks and metadata arranged? Are there options
available comparable to ext2/ext3 to help reduce wear and improve
performance?

Specifically, on ext2 (journal means more writes, so I don't use ext3 on
SSDs, since fsck typically only takes a few seconds when access time is 
100us), I usually apply the
-b 4096 -E stripe-width = (erase_block/4096)
parameters to mkfs in order to reduce the multiple erase cycles on the same
underlying block.

Are there similar optimizations available in BTRFS?


I think you'll get more out of btrfs, but another thing you can look
into is ext4 without the journal.  Support was added for that recently
(thanks to google).


How is this different to using mkfs.ext2 from e4fsprogs?

And while I appreciate hopeful remarks along the lines of I think 
you'll get more out of btrfs, I am really after specifics of what the 
ssd mount option does, and what features comparable to the optimizations 
that can be done with ext2/3/4 (e.g. the mentioned stripe-width option) 
are available to get the best possible alignment of data and metadata to 
increase both performance and life expectancy of a SSD.


Also, for drives that don't support TRIM, is there a way to make the FS 
apply aggressive re-use of erased space (in order to help the drive's 
internal wear-leveling)?


I have looked through the documentation and the wiki, but it provides 
very little of actual substance.


Gordan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SSD Optimizations

2010-03-10 Thread Sander
Hello Gordan,

Gordan Bobic wrote (ao):
 Mike Fedyk wrote:
 On Wed, Mar 10, 2010 at 11:49 AM, Gordan Bobic gor...@bobich.net wrote:
 Are there options available comparable to ext2/ext3 to help reduce
 wear and improve performance?

With SSDs you don't have to worry about wear.

 And while I appreciate hopeful remarks along the lines of I think
 you'll get more out of btrfs, I am really after specifics of what
 the ssd mount option does, and what features comparable to the
 optimizations that can be done with ext2/3/4 (e.g. the mentioned
 stripe-width option) are available to get the best possible
 alignment of data and metadata to increase both performance and life
 expectancy of a SSD.

Alignment is about the partition, not the fs, and thus taken care of
with fdisk and the like.

If you don't create a partition, the fs is aligned with the SSD.

 Also, for drives that don't support TRIM, is there a way to make the
 FS apply aggressive re-use of erased space (in order to help the
 drive's internal wear-leveling)?

TRIM has nothing to do with wear-leveling (although it helps reducing
wear).
TRIM lets the OS tell the disk which blocks are not in use anymore, and
thus don't have to be copied during a rewrite of the blocks.
Wear-leveling is the SSD making sure all blocks are more or less equally
written to avoid continuous load on the same blocks.

Sander

-- 
Humilis IT Services and Solutions
http://www.humilis.net
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html