Re: Treatise on Formatting FLASH Storage Devices

2009-05-15 Thread Hal Murray
[Apologies for the delay. I typed most of this in a long time ago, then I got 
distracted.]

Mitch: Many thanks for the heads up.  I think I knew about most of the 
quirks, but I
hadn't noticed the interactions of the alignment preferences before.

Al Fazio is Intel's flash guru/evangelist.  He gave a talk at Stanford last 
Nov.  He's going after the disk market.
  http://www.stanford.edu/class/ee380/Abstracts/081112.html

You may be able to watch the video here:
  http://www.stanford.edu/class/ee380/fall-schedule-20082009.html

His slides have a lot of good numbers and background info on flash technology.

Intel is now selling flash based disks for laptops.  Expensive, but no 
moving parts.  Also, very low power when they are idle.


What I remember from his talk:
  Raw flash chips leak and wear out.
  Reads contribute to the wear.

  Newer chips are denser and wear out faster.

  For cameras, the numbers are OK.
The total number of writes/reads is small because they are all caused by 
a human pushing a button.  Even if you push the button a dozen times in row, 
that's tiny relative to a disk file system

  For use as typical PC file systems, the numbers are horrible.

  Intel did a lot of work in this area...
ECC and such at the word/block level
Lots of heuristics for wear leveling and avoiding writes
 
and probably a few more interesting things that I have forgotten.

There is no seek time on flash disks.  They did a neat demo.  They setup a 
big box full of their Flash disks and ran one of the standard database demos 
at blinding fast speed.

---

The wear-out is interesting.  Nothing else in computers works that way.  
Computer geeks don't have any experience thinking about that aspect of a 
problem.

--

Somebody mentioned log structured file systems...

At the hardware level, the basic flash chip is not restricted to writing 
pages.  An erase sets each byte to all ones.  Writes turn them off.  You can 
turn off more bits with another write to the same location.  Thus you get 9 
states out of a byte before you have to erase again.

So for an append-only file, if you don't need the value of 0xFF, you can 
append with no extra-write penalties.  (that assumes you are happy without 
ECC and such, but you could add ECC at the record/line level



-- 
These are my opinions, not necessarily my employer's.  I hate spam.



___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Ted T'So writes about aligning FSs to eraseblocks... (Re: Treatise on Formatting FLASH Storage Devices)

2009-02-20 Thread Martin Langhoff
On Wed, Feb 4, 2009 at 11:40 PM, Mitch Bradley w...@laptop.org wrote:
 http://wiki.laptop.org/go/How_to_Damage_a_FLASH_Storage_Device

hot off the press I think. Interesting hints on how to get the
partitioning right:

http://thunk.org/tytso/blog/2009/02/20/aligning-filesystems-to-an-ssds-erase-block-size/

cheers,


m
-- 
 martin.langh...@gmail.com
 mar...@laptop.org -- School Server Architect
 - ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 - http://wiki.laptop.org/go/User:Martinlanghoff
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Treatise on Formatting FLASH Storage Devices

2009-02-04 Thread quozl
On Wed, Feb 04, 2009 at 12:40:38AM -1000, Mitch Bradley wrote:
 http://wiki.laptop.org/go/How_to_Damage_a_FLASH_Storage_Device
 Read it and weep.

+1

Fixed a couple of typos in the last section.

Also, re:

Conversely, if the layout is bad, every cluster write might split two
pages, forcing the FTL to perform four internal I/O operations instead
of one.

Is it therefore four times slower?

-- 
James Cameronmailto:qu...@us.netrek.org http://quozl.netrek.org/
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Treatise on Formatting FLASH Storage Devices

2009-02-04 Thread david
On Wed, 4 Feb 2009, Mitch Bradley wrote:

 http://wiki.laptop.org/go/How_to_Damage_a_FLASH_Storage_Device

 Read it and weep.

this completely ignores wear leveling, which is very nessasary for just 
about any filesystem, but especially for FAT (which appear to be the only 
filesystems this author is familiar with)

all in all this doesn't seem like a very useful page.

David Lang
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Treatise on Formatting FLASH Storage Devices

2009-02-04 Thread Benjamin M. Schwartz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

da...@lang.hm wrote:
 On Wed, 4 Feb 2009, Mitch Bradley wrote:
 
 http://wiki.laptop.org/go/How_to_Damage_a_FLASH_Storage_Device

 Read it and weep.
 
 this completely ignores wear leveling, which is very nessasary for just 
 about any filesystem, but especially for FAT (which appear to be the only 
 filesystems this author is familiar with)

Umm, what?

To alleviate the wear out problems, the FTL must move data around so
that repeated writes to a given sector don't cause too many writes to the
same NAND page.

Mitch is describing FLASH devices like SD cards.  All such devices have
a built-in microcontroller (the FTL) that performs wear-leveling.
Layering additional wear-leveling filesystems like JFFS2 or UBIFS on top
of the FTL requires a reverse translation (block device-MTD) and is not
recommended.  e.g.  From http://www.linux-mtd.infradead.org/doc/ubifs.html :

UBIFS was designed to work on top of raw flash, which has nothing to do
with block devices. This is why UBIFS does not work on MMC cards and the
like - they look like block devices to the outside world because they
implement FTL (Flash Translation Layer) support in hardware, which simply
speaking emulates a block device

As for the author only being familiar with FAT, that is hilarious.  Mitch
implemented JFFS2 support in OFW, and wrote this page to explain how he
produced optimal ext2 formatting of FTL FLASH.  Indeed, that is the
subject of
http://wiki.laptop.org/go/How_to_Damage_a_FLASH_Storage_Device#Screwed-Up_Formatting

- --Ben
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.9 (GNU/Linux)

iEYEARECAAYFAkmJq7wACgkQUJT6e6HFtqT4sACdH/YR07Eq+l+i2M53HuWlZbF3
6bYAn3Aw3X7+k+cThHg9elaI/Jjiokp/
=6lfi
-END PGP SIGNATURE-
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Treatise on Formatting FLASH Storage Devices

2009-02-04 Thread Ignacio Vazquez-Abrams
On Wed, 2009-02-04 at 00:40 -1000, Mitch Bradley wrote:
 http://wiki.laptop.org/go/How_to_Damage_a_FLASH_Storage_Device
 
 Read it and weep.

It's a great article, but people that aren't very familiar with
filesystems and the filesystem tools are going to read the article, look
at their tools, scratch their heads, decide the whole thing is too hard,
and go on making the same mistakes. It would be useful if actual command
arguments could be given for various sane defaults.

-- 
Ignacio Vazquez-Abrams ivazquez...@gmail.com

PLEASE don't CC me; I'm already subscribed


signature.asc
Description: This is a digitally signed message part
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Treatise on Formatting FLASH Storage Devices

2009-02-04 Thread pgf
da...@lang.hm wrote:
  On Wed, 4 Feb 2009, Mitch Bradley wrote:
  
   http://wiki.laptop.org/go/How_to_Damage_a_FLASH_Storage_Device
  
   Read it and weep.
  
  this completely ignores wear leveling, which is very nessasary for just 
  about any filesystem, but especially for FAT (which appear to be the only 
  filesystems this author is familiar with)
  
  all in all this doesn't seem like a very useful page.

since i'm 99% sure that mitch wrote that page, let me be the
first to disagree.  :-)

i believe that everything he's said is completely spot on.  i
confess i wasn't conscious of the partition vs. erase block
alignment issue until a couple of months ago when i first heard
mitch mention it, but i'm absolutely sure the effect is real.

paul
=-
 paul fox, p...@laptop.org
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Treatise on Formatting FLASH Storage Devices

2009-02-04 Thread david
On Wed, 4 Feb 2009, Benjamin M. Schwartz wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 da...@lang.hm wrote:
 On Wed, 4 Feb 2009, Mitch Bradley wrote:

 http://wiki.laptop.org/go/How_to_Damage_a_FLASH_Storage_Device

 Read it and weep.

 this completely ignores wear leveling, which is very nessasary for just
 about any filesystem, but especially for FAT (which appear to be the only
 filesystems this author is familiar with)

 Umm, what?

 To alleviate the wear out problems, the FTL must move data around so
 that repeated writes to a given sector don't cause too many writes to the
 same NAND page.

 Mitch is describing FLASH devices like SD cards.  All such devices have
 a built-in microcontroller (the FTL) that performs wear-leveling.
 Layering additional wear-leveling filesystems like JFFS2 or UBIFS on top
 of the FTL requires a reverse translation (block device-MTD) and is not
 recommended.  e.g.  From http://www.linux-mtd.infradead.org/doc/ubifs.html :

 UBIFS was designed to work on top of raw flash, which has nothing to do
 with block devices. This is why UBIFS does not work on MMC cards and the
 like - they look like block devices to the outside world because they
 implement FTL (Flash Translation Layer) support in hardware, which simply
 speaking emulates a block device

so if the device is performing wear leveling, then the fact that your FAT 
is on the same eraseblock as your partition table should not matter in the 
least, since the wear leveling will avoid stressing any particlar part of 
the flash.

as such I see no point in worrying about the partition table being on the 
same eraseblock as a frequently written item.

as for the block boundry not being an eraseblock boundry if the partition 
starts at block 1

if you use 1k blocks and have 256k eraseblocks, then 1 out of every 256 
writes will generate two erases instead of one

worst case is you use 4k blocks and have 128k eraseblocks, at which point 
1 out of every 32 writes will generate two erases instead of one.

to use the intel terminology, these result in write amplification factors 
of approximatly 1.005 and 1.03 respectivly.

neither of these qualify as a 'flash killer' in my mind.

now, if a FAT or superblock happens to span an eraseblock, then you will 
have a much more significant issue, but nothing that is said in this 
document refers to this problem (and in fact, it indicates that things 
like this follow the start of the partition very closely, which implies 
that unless the partition starts very close to the end of an eraseblock 
it's highly unlikely that these will span eraseblocks)

so I still see this as crying wolf.

as for ubifs, that is designed for when you have access to the raw flash, 
which is not the case for any device where you have a flash translation 
layer in place, so it is really only useful on embedded system, not on 
commercially available flash drives of any type.

 As for the author only being familiar with FAT, that is hilarious.  Mitch
 implemented JFFS2 support in OFW, and wrote this page to explain how he
 produced optimal ext2 formatting of FTL FLASH.  Indeed, that is the
 subject of
 http://wiki.laptop.org/go/How_to_Damage_a_FLASH_Storage_Device#Screwed-Up_Formatting

I didn't read carefully enough before I made that comment. my apologies.

David Lang
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Treatise on Formatting FLASH Storage Devices

2009-02-04 Thread david
On Wed, 4 Feb 2009, da...@lang.hm wrote:

 On Wed, 4 Feb 2009, Benjamin M. Schwartz wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1


 Umm, what?

 To alleviate the wear out problems, the FTL must move data around so
 that repeated writes to a given sector don't cause too many writes to the
 same NAND page.

 UBIFS was designed to work on top of raw flash, which has nothing to do
 with block devices. This is why UBIFS does not work on MMC cards and the
 like - they look like block devices to the outside world because they
 implement FTL (Flash Translation Layer) support in hardware, which simply
 speaking emulates a block device

 so if the device is performing wear leveling, then the fact that your FAT
 is on the same eraseblock as your partition table should not matter in the
 least, since the wear leveling will avoid stressing any particlar part of
 the flash.

 as such I see no point in worrying about the partition table being on the
 same eraseblock as a frequently written item.

 as for the block boundry not being an eraseblock boundry if the partition
 starts at block 1

 if you use 1k blocks and have 256k eraseblocks, then 1 out of every 256
 writes will generate two erases instead of one

 worst case is you use 4k blocks and have 128k eraseblocks, at which point
 1 out of every 32 writes will generate two erases instead of one.

 to use the intel terminology, these result in write amplification factors
 of approximatly 1.005 and 1.03 respectivly.

 neither of these qualify as a 'flash killer' in my mind.

 now, if a FAT or superblock happens to span an eraseblock, then you will
 have a much more significant issue, but nothing that is said in this
 document refers to this problem (and in fact, it indicates that things
 like this follow the start of the partition very closely, which implies
 that unless the partition starts very close to the end of an eraseblock
 it's highly unlikely that these will span eraseblocks)

 so I still see this as crying wolf.

A far more significant problem would be the use of a journal on flash. 
since there are generally two writes to the journal for every write to the 
storage (one write to put the data in the journal and one write to mark 
the journal entry as completed), and frequently each write to the journal 
gets pushed out immediatly (rather than waiting to consolodate writes) for 
safety, the journal gets _far_ more writes than anything else on the 
storage device.

so using a full journaling filesystem (ext3 with data=journaled for 
example) would produce a write amplification factor of at least 3.

David Lang

 as for ubifs, that is designed for when you have access to the raw flash,
 which is not the case for any device where you have a flash translation
 layer in place, so it is really only useful on embedded system, not on
 commercially available flash drives of any type.

 As for the author only being familiar with FAT, that is hilarious.  Mitch
 implemented JFFS2 support in OFW, and wrote this page to explain how he
 produced optimal ext2 formatting of FTL FLASH.  Indeed, that is the
 subject of
 http://wiki.laptop.org/go/How_to_Damage_a_FLASH_Storage_Device#Screwed-Up_Formatting

 I didn't read carefully enough before I made that comment. my apologies.

 David Lang
 ___
 Devel mailing list
 Devel@lists.laptop.org
 http://lists.laptop.org/listinfo/devel

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Treatise on Formatting FLASH Storage Devices

2009-02-04 Thread Mitch Bradley
I am the author of the page in question.  To establish my credentials, I 
wrote my first filesystem forensic tool in 1980, to diagnose and repair 
a Unix filesystem that had been damaged by a kernel misconfigured that 
made it swap on top of the filesystem.  That was when 10 MB disk packs 
the size of garbage can lids cost $5000.

Since then I have written filesystem readers, writers, and forensic 
tools for UFS, ext2, FAT12/16/32, ISO-9660, NFS, Mac HFS, romfs, and 
JFFS2.  I have studied, with an eye toward implementation, the data 
structures for NTFS, UBIFS, cramfs, and squashfs.

da...@lang.hm wrote:

 so if the device is performing wear leveling, then the fact that your 
 FAT is on the same eraseblock as your partition table should not 
 matter in the least, since the wear leveling will avoid stressing any 
 particlar part of the flash.

That would be true in a perfect world, but wear leveling is hard to do 
perfectly.  Relocating requires maintaining two copies of the erase 
block, as well as hidden metadata that tells you which copy is which, 
plus a hidden allocation map.  Updating all of these things in a way 
that avoids catastrophic loss of the entire device (due to inconsistent 
metadata) is tricky.  Some FTLs get it (at least mostly) right, many 
don't.  FTL software is, after all, software, so obscure bugs are always 
possible.  Making hardware behave stably during power loss is triply 
difficult.

I suspect, based on cryptic hints in various specs and standards that 
I've read, that some FTLs have special optimizations for FAT filesystems 
with  the factory-supplied layout.  If the FAT is in a known nice 
location, you can apply different caching and wear leveling policies to 
that known hot-spot, and perhaps even reduce the overall metadata by 
using the FAT as part of the block-substitution metadata for the data 
area.  Many manufacturers could care less about what Linux hackers want 
to do - their market being ordinary users who stick the device in a 
camera - so such cheat optimizations are fair game from a business 
standpoint.


 as such I see no point in worrying about the partition table being on 
 the same eraseblock as a frequently written item.

Many filesystem layouts can recover from damage to the allocation maps, 
either automatically or with an offline tool.  It's possible to rebuild 
ext2 allocation bitmaps from inode and directory information.  For FAT 
filesystems, there's a backup FAT copy that will at least let you roll 
back to a semi-consistent recent state.  But there's no redundant for 
the partition map or the BPB.  If you should lose one of those during a 
botched write, it's bye-bye to all your data, barring mad forensic skills.

In stress testing of some LBA NAND devices, we saw several cases 
where, after a fairly long period, the devices completely locked up and 
lost the ability to read or rewrite the first block.  I had done a bad 
job of partitioning it, because I wasn't paying enough attention when I 
created the test image.  It's unclear what the results would have been 
had the layout been better - the stress test takes several weeks and the 
failures are statistical in nature - but I can't help believing that, 
for a device with a known wear-out mechanism and elaborate workarounds 
to hide that fact, working it harder than necessary will reduce its 
lifetime and possibly trigger microcode bugs that might otherwise cause 
no trouble.


 as for the block boundry not being an eraseblock boundry if the 
 partition starts at block 1

 if you use 1k blocks and have 256k eraseblocks, then 1 out of every 
 256 writes will generate two erases instead of one

 worst case is you use 4k blocks and have 128k eraseblocks, at which 
 point 1 out of every 32 writes will generate two erases instead of one.

 to use the intel terminology, these result in write amplification 
 factors of approximatly 1.005 and 1.03 respectivly.

 neither of these qualify as a 'flash killer' in my mind.

The main amplification comes not from the erases, but from the writes.  
If the cluster/block space begins in the middle of FLASH page, then 
1-block write will involve a read-modify-write of two adjacent pages.  
That is four internal accesses instead of one.  Each such access takes 
between 100 and 200 uS, depending on the degree to which you can 
pipeline the accesses - and read-modify-write is hard to pipeline.  So 
the back-end throughput can easily be reduced by a factor of 4 or even 
more.  The write-amplification factor is 2 by a trivial analysis, and it 
can get worse if you factor in the requirement for writing the pages 
within an erase block sequentially.  The implied coupling between the 
two spanned pages increases the difficulty of replacement-page 
allocation, increasing the probability of garbage collection.

The erase amplification factor tracks the write amplification factor.  
You must do at least one erase for every 64 writes, assuming perfect 
efficiency of your 

Re: Treatise on Formatting FLASH Storage Devices

2009-02-04 Thread Benjamin M. Schwartz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Mitch Bradley wrote:
 It has been my experience that USB sticks and SD cards with intact 
 factory formatting tend to last longer and run faster than ones that 
 have been reformatted with random layouts.

This gives us Linux users a bit of a dilemma if we want to use FTL flash
for primary storage.  FAT does not provide the file access permissions,
symlinks, hardlinks, or even case sensitivity, that we desire for most
filesystems on unixy systems.  However, FTL devices behave as a sort of
FAT-oriented black box, full of secret proprietary firmware that loves
FAT.  One obvious proposal, therefore, would be to use FAT for storage,
but wrap it with a layer that implements all our favorite POSIX stuff.

This has been done before for Linux, in the guise of UMSDOS/UVFAT [1][2].
 Although that work has fallen out of date, I suspect one could
reimplement it quickly using new linux features such as FUSE.  The
question is: would such an approach be worthwhile?

- --Ben

[1] http://linux.voyager.hr/umsdos/
[2] http://en.wikipedia.org/wiki/UMSDOS
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.9 (GNU/Linux)

iEYEARECAAYFAkmKBJ0ACgkQUJT6e6HFtqR8kwCfc9MlcbGv1yaSEog6lNJoqmey
kE0AmwRxwXtORZSITzyDUW5gqu9xBpoq
=Kxa1
-END PGP SIGNATURE-
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Treatise on Formatting FLASH Storage Devices

2009-02-04 Thread Mitch Bradley

  http://wiki.laptop.org/go/How_to_Damage_a_FLASH_Storage_Device
  
  Read it and weep.
 

 It's a great article, but people that aren't very familiar with
 filesystems and the filesystem tools are going to read the article, look
 at their tools, scratch their heads, decide the whole thing is too hard,
 and go on making the same mistakes. It would be useful if actual command
 arguments could be given for various sane defaults.

Indeed, I'd like to do that as time permits.  It's sort of an open-ended 
proposition though, because there are so many possibilities.  Deciding 
where to draw the line is tricky.  And for each example, to do it right, 
I'll have to write a mini-treatise on the pertinent assumptions and 
limitations.

For me, the hard thing about writing is finding an appropriate stopping 
point - i.e. there is always more that could be said.  So I tend to 
avoid starting, because I know that the task will grow to fill all 
available time and then some.

For starters, I wanted to lay out the issues and the theory, so at least 
people start to realize that there is a possible problem.  From what 
I've seen, most people are currently totally unaware of it.

At least I did give one concrete recommendation - avoid reformatting if 
you can.

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Treatise on Formatting FLASH Storage Devices

2009-02-04 Thread Mitch Bradley
Benjamin M. Schwartz wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Mitch Bradley wrote:
   
 It has been my experience that USB sticks and SD cards with intact 
 factory formatting tend to last longer and run faster than ones that 
 have been reformatted with random layouts.
 

 This gives us Linux users a bit of a dilemma if we want to use FTL flash
 for primary storage.  FAT does not provide the file access permissions,
 symlinks, hardlinks, or even case sensitivity, that we desire for most
 filesystems on unixy systems.  However, FTL devices behave as a sort of
 FAT-oriented black box, full of secret proprietary firmware that loves
 FAT. 

I think that FTLs are getting better over time, so maybe the 
FAT-specific optimizations are starting to be replaced by more generic 
algorithms.  The rapidly growing market for FLASH-based storage is 
certainly attracting lots of development dollars.

In the absence of FAT-specific optimizations, perhaps properly-aligned 
ext2 layouts will work well.

Another solution is to choose high-quality devices.  I've had good 
results with some models from SanDisk and Transcend.  But sometimes it 
comes at a cost penalty - the really good SanDisk Extreme III SD cards 
cost 2.5x the going rate for commodity cards of the same capacity.  The 
good cards appear to be rather more tolerant of abuse than the El 
Cheapo's.  But even with the tough ones, I think its prudent to treat 
them gently.

  One obvious proposal, therefore, would be to use FAT for storage,
 but wrap it with a layer that implements all our favorite POSIX stuff.
   

Puppy Linux does something like that, using a (FAT, ISO9660, or 
whatever) file as a container for an ext2 filesystem image.

The practice is also rather common in the world of virtualization.  My 
primary Linux system is actually a colinux vm running under Windows 
with the ext2 filesystem image inside an NTFS file (actually three 
files, one for root, one for home, and one for miscellaneous big wads of 
client-specific stuff) .  That's proven itself very convenient over 
time; I've transported and mixed-and-matched those filesystem images to 
several different host machines.  Based on that and other pleasant 
experiences with VM filesystem snapshots, I'm of the opinion that a 
quiet revolution is brewing in the way that people deal with filesystems 
inside files.

You could treat an existing FAT filesystem as a very flexible 
partitioning scheme.  You can make any number of partitions, of any 
size, with a hierarchical namespace, with a good collection of tools for 
manipulating them.


 This has been done before for Linux, in the guise of UMSDOS/UVFAT [1][2].
  Although that work has fallen out of date, I suspect one could
 reimplement it quickly using new linux features such as FUSE.  The
 question is: would such an approach be worthwhile?
   

I think that the wrapped-metadata approach has largely run its course.  
There used to be a lot of activity in that area, but I haven't seen much 
of interest lately.  I think that containerization is likely to be the 
happening thing for the near term.

 - --Ben

 [1] http://linux.voyager.hr/umsdos/
 [2] http://en.wikipedia.org/wiki/UMSDOS
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v2.0.9 (GNU/Linux)

 iEYEARECAAYFAkmKBJ0ACgkQUJT6e6HFtqR8kwCfc9MlcbGv1yaSEog6lNJoqmey
 kE0AmwRxwXtORZSITzyDUW5gqu9xBpoq
 =Kxa1
 -END PGP SIGNATURE-
   

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Treatise on Formatting FLASH Storage Devices

2009-02-04 Thread Bobby Powers
On Wed, Feb 4, 2009 at 4:11 PM, Benjamin M. Schwartz
bmsch...@fas.harvard.edu wrote:
 This gives us Linux users a bit of a dilemma if we want to use FTL flash
 for primary storage.  FAT does not provide the file access permissions,
 symlinks, hardlinks, or even case sensitivity, that we desire for most
 filesystems on unixy systems.  However, FTL devices behave as a sort of
 FAT-oriented black box, full of secret proprietary firmware that loves
 FAT.  One obvious proposal, therefore, would be to use FAT for storage,
 but wrap it with a layer that implements all our favorite POSIX stuff.

What about a small script that could do two things:
- determine and dump the factory partitioning data from a device (by
looking at how the FAT filesystem is laid out) to a file (perhaps we
could build up a database for popular FLASH devices, like the SanDisk
Ultra III's?)
- take the factory partitioning data from a device (or dump file) and
create a new partition map and well behaved ext2/3/4/whatever file
system on the device

I'm quite new to this wide world of filesystems and block devices, so
let me know if there are clear or obvious reasons this can't be done,
or why it would be harder than it sounds.

Bobby
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Treatise on Formatting FLASH Storage Devices

2009-02-04 Thread James Cameron
On Wed, Feb 04, 2009 at 07:51:44PM -0500, Bobby Powers wrote:
 What about a small script that could do two things:

Sounds great.

-- 
James Cameronmailto:qu...@us.netrek.org http://quozl.netrek.org/
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Treatise on Formatting FLASH Storage Devices

2009-02-04 Thread david
On Wed, 4 Feb 2009, Mitch Bradley wrote:

 da...@lang.hm wrote:
 
 so if the device is performing wear leveling, then the fact that your FAT 
 is on the same eraseblock as your partition table should not matter in the 
 least, since the wear leveling will avoid stressing any particlar part of 
 the flash.

 That would be true in a perfect world, but wear leveling is hard to do 
 perfectly.  Relocating requires maintaining two copies of the erase block, as 
 well as hidden metadata that tells you which copy is which, plus a hidden 
 allocation map.  Updating all of these things in a way that avoids 
 catastrophic loss of the entire device (due to inconsistent metadata) is 
 tricky.  Some FTLs get it (at least mostly) right, many don't.  FTL software 
 is, after all, software, so obscure bugs are always possible.  Making 
 hardware behave stably during power loss is triply difficult.

so it sounds like you are basicly saying that if the FAT/Superblock gets 
corrupted due to a bug in the FTL software it's easier to recover than if 
the FAT gets corrupted, so isolating the two is a benifit. is this a fair 
reading?

I will note that even if you never write to the partition table, that 
eraseblock will migrate around the media (the fact that it's never written 
to make it a good candidate to swap with a high-useage block. it will move 
less, but it will still move.

 I suspect, based on cryptic hints in various specs and standards that I've 
 read, that some FTLs have special optimizations for FAT filesystems with  the 
 factory-supplied layout.  If the FAT is in a known nice location, you can 
 apply different caching and wear leveling policies to that known hot-spot,

this makes sense

 and perhaps even reduce the overall metadata by using the FAT as part of the 
 block-substitution metadata for the data area.

this I don't understand.

 Many manufacturers could care 
 less about what Linux hackers want to do - their market being ordinary users 
 who stick the device in a camera - so such cheat optimizations are fair 
 game from a business standpoint.

this is definantly true

 as such I see no point in worrying about the partition table being on the 
 same eraseblock as a frequently written item.

 Many filesystem layouts can recover from damage to the allocation maps, 
 either automatically or with an offline tool.  It's possible to rebuild ext2 
 allocation bitmaps from inode and directory information.  For FAT 
 filesystems, there's a backup FAT copy that will at least let you roll back 
 to a semi-consistent recent state.  But there's no redundant for the 
 partition map or the BPB.  If you should lose one of those during a botched 
 write, it's bye-bye to all your data, barring mad forensic skills.

I've recovered from partition table mistakes in the past, it's not that 
hard (and in the cases like flash where the media is small enough that 
there is usually only one partition it becomes as close to trivial as such 
things can be)

 In stress testing of some LBA NAND devices, we saw several cases where, 
 after a fairly long period, the devices completely locked up and lost the 
 ability to read or rewrite the first block.  I had done a bad job of 
 partitioning it, because I wasn't paying enough attention when I created the 
 test image.  It's unclear what the results would have been had the layout 
 been better - the stress test takes several weeks and the failures are 
 statistical in nature - but I can't help believing that, for a device with a 
 known wear-out mechanism and elaborate workarounds to hide that fact, working 
 it harder than necessary will reduce its lifetime and possibly trigger 
 microcode bugs that might otherwise cause no trouble.

interesting datapoint, but not something that I would call conclusive 
(especially when some of the elaborate workarounds you are referring to 
are speculation, not documented)

 as for the block boundry not being an eraseblock boundry if the partition 
 starts at block 1
 
 if you use 1k blocks and have 256k eraseblocks, then 1 out of every 256 
 writes will generate two erases instead of one
 
 worst case is you use 4k blocks and have 128k eraseblocks, at which point 1 
 out of every 32 writes will generate two erases instead of one.
 
 to use the intel terminology, these result in write amplification factors 
 of approximatly 1.005 and 1.03 respectivly.
 
 neither of these qualify as a 'flash killer' in my mind.

 The main amplification comes not from the erases, but from the writes.  If 
 the cluster/block space begins in the middle of FLASH page, then 1-block 
 write will involve a read-modify-write of two adjacent pages.  That is four 
 internal accesses instead of one.  Each such access takes between 100 and 200 
 uS, depending on the degree to which you can pipeline the accesses - and 
 read-modify-write is hard to pipeline.  So the back-end throughput can easily 
 be reduced by a factor of 4 or even more.  The write-amplification factor is 
 2 

Re: Treatise on Formatting FLASH Storage Devices

2009-02-04 Thread david
On Wed, 4 Feb 2009, Bobby Powers wrote:

 On Wed, Feb 4, 2009 at 4:11 PM, Benjamin M. Schwartz
 bmsch...@fas.harvard.edu wrote:
 This gives us Linux users a bit of a dilemma if we want to use FTL flash
 for primary storage.  FAT does not provide the file access permissions,
 symlinks, hardlinks, or even case sensitivity, that we desire for most
 filesystems on unixy systems.  However, FTL devices behave as a sort of
 FAT-oriented black box, full of secret proprietary firmware that loves
 FAT.  One obvious proposal, therefore, would be to use FAT for storage,
 but wrap it with a layer that implements all our favorite POSIX stuff.

 What about a small script that could do two things:
 - determine and dump the factory partitioning data from a device (by
 looking at how the FAT filesystem is laid out) to a file (perhaps we
 could build up a database for popular FLASH devices, like the SanDisk
 Ultra III's?)
 - take the factory partitioning data from a device (or dump file) and
 create a new partition map and well behaved ext2/3/4/whatever file
 system on the device

all you need to do is to not change the partition table, do a mkfs on the 
existing partiton and then use tar or cp to put files there.

no need to involve dump files.

David Lang
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Treatise on Formatting FLASH Storage Devices

2009-02-04 Thread david
On Wed, 4 Feb 2009, Mitch Bradley wrote:

 Benjamin M. Schwartz wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Mitch Bradley wrote:

 It has been my experience that USB sticks and SD cards with intact
 factory formatting tend to last longer and run faster than ones that
 have been reformatted with random layouts.


 This gives us Linux users a bit of a dilemma if we want to use FTL flash
 for primary storage.  FAT does not provide the file access permissions,
 symlinks, hardlinks, or even case sensitivity, that we desire for most
 filesystems on unixy systems.  However, FTL devices behave as a sort of
 FAT-oriented black box, full of secret proprietary firmware that loves
 FAT.

 I think that FTLs are getting better over time, so maybe the
 FAT-specific optimizations are starting to be replaced by more generic
 algorithms.  The rapidly growing market for FLASH-based storage is
 certainly attracting lots of development dollars.

 In the absence of FAT-specific optimizations, perhaps properly-aligned
 ext2 layouts will work well.

 Another solution is to choose high-quality devices.  I've had good
 results with some models from SanDisk and Transcend.  But sometimes it
 comes at a cost penalty - the really good SanDisk Extreme III SD cards
 cost 2.5x the going rate for commodity cards of the same capacity.  The
 good cards appear to be rather more tolerant of abuse than the El
 Cheapo's.  But even with the tough ones, I think its prudent to treat
 them gently.

the small devices are cheap enough, I'll bet a lot of people would be 
willing to chip in a few bucks if someone were to orginize a controlled 
test (or possibly one of the hardware websites can be prodded into doing a 
long enough and large enough test to have some valid statistics)

  One obvious proposal, therefore, would be to use FAT for storage,
 but wrap it with a layer that implements all our favorite POSIX stuff.


the problem with this is that your access won't follow the FAT pattern 
anymore, it will follow the pattern of the higher-level filesystem.

also, most setups like this that I have seen concentrate the 'extra' 
metadata for read efficiancy, but that's exactly the wrong thing to do for 
flash.

David Lang
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel