Re: [Techteam] NAND full issue

2008-07-27 Thread Mitch Bradley

Martin Langhoff wrote:

On Sat, Jul 26, 2008 at 1:00 PM, Daniel Drake [EMAIL PROTECTED] wrote:
  

unionfs will involve a kernel change.



Erik's got a ko to add to the initrd AIUI.

  

Have we considered sorting by date and removing from oldest to new until
the threshold is reached? Perhaps excluding starred items.



Both date and size are flawed -- Greg and Cjb have explored the flaws
of both approaches on [EMAIL PROTECTED] The best notes on this are from Mitch so
far - he looked at the FSs and spotted things we can safely delete.
And we cannot query for starred items without starting the journal,
which does not start in no-space conditions.

IMHO, Cjb's script should delete caches and the files various files we
know are safe to nuke _before_ we even consider user data
  


The big-ticket item is the contents of ~/.sugar/data .  According to 
Tomeu in the attached
message, those files are leaks and could be deleted on boot without 
further ado.


In the images that I analyzed, those files represented 50% or more of 
the size of /home,
ranging from ~500 to ~800 MB.  As a quick fix, just deleting those 
leak files on every
boot would probably reduce the incidence of NAND-full reboot failures by 
two orders
of magnitude, which should be enough to downgrade the problem from 
critical to

minor annoyance.

Ironically, the presence of that leak (which Tomeu claims is fixed in 
Update.1) may
inoculate the system against NAND fillup from other causes, by 
reserving space that

can be reclaimed upon reboot.

That suggests a long-term safety strategy:

a) Determine how much free space is needed to ensure that the system can 
boot up
to a level where filesystem cleanup and maintenance can be performed.  
For example,

let's say the magic amount is 50 MB.

b) Add a 50 MB space-holder file to the filesystem on the first boot (or 
as part of the
initial image).  Fill it with incompressible data so it actually 
occupies the full space.


c) On each boot, if there is less than 50 MB free, delete the 
space-holder file to free
up space, and boot in maintenance mode so the user or a helper can clean 
up.  Continue
booting in maintenance mode until the free space has increased above 
some threshold.

After the cleanup, recreate a space-holder file.

Of course, it is still important to automate as much cleanup as can 
safely be performed
without user intervention, but the space-holder approach is still a good 
backstop to

avoid trips to a repair center.


 - Mitch has identified stray CVS directories. These are safe to nuke.
 - /var/cache/yum
 - ~/.sugar/default/logs
 - ~/.sugar/default/gecko/Cache
 - Someone mentioned large support files in eToys.

Might be worthwhile to nuke large Activities in ~/Activities.

If not enough space is available, then it makes sense to nuke user data.

cheers,



m
  


---BeginMessage---
Hi,

the files in .sugar/default/data are leaks that were fixed in Update.1:

http://dev.laptop.org/ticket/5637

That should reduce flash usage a lot, but I guess we need to continue
anyway with our plans to deal with full file systems.

Should we tell Uruguay to delete the files in that dir after every boot?

Thanks,

Tomeu

On Thu, Jul 24, 2008 at 1:05 AM, Mitch Bradley [EMAIL PROTECTED] wrote:

 2138.IMG shows the same pattern -

  /home/olpc1.196 GB
./.sugar/default   1.111 GB
   ./datastore1214593451.73/store  279 MB
   ./data  798 MB



 The takeaway point here is that the datastore or journal or whatever is
 filling up pretty darn fast.

 The .sugar/default/data/ directory has a lot of .xo files with the
 following sizes: 7267307,  3428082, 1533872, 1298769

 When you drop megabyte+ files on a regular basis, it doesn't take long
 to fill our NAND FLASH.





 Mitch Bradley wrote:
 The analysis of 2145.img is similar, albeit somewhat smaller.

   /home/olpc1.093 GB
 ./.sugar/default941 MB
./datastore1213821699.21/store  354 MB  (about half of that is in
 ./preview)
./data  469 MB


 Mitch Bradley wrote:

 ls-r analysis of 25F5.IMG  (Reported file data sizes, not the actual
 space on the NAND media)

   /home/olpc  1.358 GB

 ./Activities   291 MB
   ./simcity.activity   18 MB
   ./Xcratch.activity   16 MB
   ./TuxPaint.activity  42 MB  (27 MB is stamps)

 ./.sugar 1.067 GB
   ./default/datastore/store337 MB
   ./default/data   602 MB
   ./default/gecko   52 MB
   ./default/org.laptop.WebActivity  44 MB (16 MB is
 click_on_letter, 10 MB is gcompris)
   ./default/org.laptop.RecordActivity/instance  27 MB

 The dates on items in .sugar/default/data range from 2008-04-26 to
 2008-06-28.  The activity is heavily clustered toward a few days,
 mostly 05-25 and 06-2x.

 John Watlington wrote:

 

Re: [Techteam] NAND full issue

2008-07-26 Thread Ton van Overbeek
On Fri, Jul 25, 2008 at 9:27 PM, Deepak Saxena [EMAIL PROTECTED] wrote:
 On Jul 25 2008, at 20:00, Daniel Drake was caught saying:
 So unionfs is the formal bug fix for 8.2 going forward, or is it a
 Uruguay-specific thing?

 unionfs will involve a kernel change. Are we planning to shift them from
 2.6.22 to 2.6.25 where unionfs has been included, or are we going to
 backport unionfs to 2.6.22?

 Also, I am a little wary of unionfs, I have used it in the past and
 found it to be buggy and unreliable. It may be better now, but we should
 be cautious.

 I've done an analysis of the UFS code and it may be possible to
 have a standalone unionfs module w/o changes to core kernel. See [1]
 for my email sent to UFS maintainers and list. My concern is that
 by forking the code this way, we're introducing another variable.

 However...  Erik has been using AUFS[2] as UFS was crashing badly and
 not allowing sugar to boot. AUFS is completely standalone and requires
 no changes to the deployed kernel.  This is also non-upstream so we should
 run it through some form of stress test in our desired configuration.

 ~Deepak

 [1] http://www.fsl.cs.sunysb.edu/pipermail/unionfs/2008-July/005895.html
 [2] http://aufs.sourceforge.net/


This might be old news, but Knoppix (the original linux live CD)
changed from unionfs to aufs
some years ago with good results. I suppose you could ask Klaus
Knopper about his
experiences with the reliability of aufs. See www.knopper.net (in German) or
www.knoppix.com (in English).

HTH

Ton van Overbeek
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: NAND full issue

2008-07-25 Thread Daniel Drake
Kimberley Quirk wrote:
 OLPC's response is Failsafe for 656, per703, and 8.1.2; and a formal  
 bug fix for 8.2 going forward:

 Uruguay:
 Erik is working with Uruguay on the solution described as Union  
 Mount below. It is important that Uruguay own this bug fix themselves  
 and can maintain it as needed, test it to their satisfaction, decide  
 how to distribute it. This can be delivered as a USB or wireless  
 download. Uruguay also has the choice to use the options supported by  
 OLPC above.

So unionfs is the formal bug fix for 8.2 going forward, or is it a 
Uruguay-specific thing?

unionfs will involve a kernel change. Are we planning to shift them from 
2.6.22 to 2.6.25 where unionfs has been included, or are we going to 
backport unionfs to 2.6.22?

Also, I am a little wary of unionfs, I have used it in the past and 
found it to be buggy and unreliable. It may be better now, but we should 
be cautious.

 RECOVERY SOLUTIONS -
 Automatic Free Space:
 Provide USB bootable build that would free space in some way. Can we  
 identify a class of things that we know can be deleted (like cracklib  
 dictionary of unsafe passwords, large activities). Add a note that a  
 delete is going to happen during boot.

Only works the first time they fill it up, obviously.

 Failsafe:
 Can be inserted in the build, include 'automatic free space'. It opens  
 the datastore and sorts by size, wants to find 50M, pops off the stack  
 deleting stuff from largest to smallest. Can it explain afterwards  
 what it has done or explain ahead of time what it might do. Provide  
 options for what to delete.

Have we considered sorting by date and removing from oldest to new until 
the threshold is reached? Perhaps excluding starred items.

 The Fix: (fix to 7587)
 When the NAND is full, Sugar will boot but not be allowed to write. A  
 notification about space and inability to write needs to be displayed.

...and the space can be freed by deleting activities and journal items?
No unionfs involved?

I feel that is the best way forward.

Daniel

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: [Techteam] NAND full issue

2008-07-25 Thread Martin Langhoff
On Sat, Jul 26, 2008 at 1:00 PM, Daniel Drake [EMAIL PROTECTED] wrote:
 unionfs will involve a kernel change.

Erik's got a ko to add to the initrd AIUI.

 Have we considered sorting by date and removing from oldest to new until
 the threshold is reached? Perhaps excluding starred items.

Both date and size are flawed -- Greg and Cjb have explored the flaws
of both approaches on [EMAIL PROTECTED] The best notes on this are from Mitch so
far - he looked at the FSs and spotted things we can safely delete.
And we cannot query for starred items without starting the journal,
which does not start in no-space conditions.

IMHO, Cjb's script should delete caches and the files various files we
know are safe to nuke _before_ we even consider user data

 - Mitch has identified stray CVS directories. These are safe to nuke.
 - /var/cache/yum
 - ~/.sugar/default/logs
 - ~/.sugar/default/gecko/Cache
 - Someone mentioned large support files in eToys.

Might be worthwhile to nuke large Activities in ~/Activities.

If not enough space is available, then it makes sense to nuke user data.

cheers,



m
-- 
 [EMAIL PROTECTED] -- School Server Architect
 - ask interesting questions
 - don't get distracted with shiny stuff - working code first
 - http://wiki.laptop.org/go/User:Martinlanghoff
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: [Techteam] NAND full issue

2008-07-25 Thread Deepak Saxena
On Jul 25 2008, at 20:00, Daniel Drake was caught saying:
 So unionfs is the formal bug fix for 8.2 going forward, or is it a 
 Uruguay-specific thing?
 
 unionfs will involve a kernel change. Are we planning to shift them from 
 2.6.22 to 2.6.25 where unionfs has been included, or are we going to 
 backport unionfs to 2.6.22?

 Also, I am a little wary of unionfs, I have used it in the past and 
 found it to be buggy and unreliable. It may be better now, but we should 
 be cautious.

I've done an analysis of the UFS code and it may be possible to 
have a standalone unionfs module w/o changes to core kernel. See [1]
for my email sent to UFS maintainers and list. My concern is that
by forking the code this way, we're introducing another variable.

However...  Erik has been using AUFS[2] as UFS was crashing badly and 
not allowing sugar to boot. AUFS is completely standalone and requires
no changes to the deployed kernel.  This is also non-upstream so we should
run it through some form of stress test in our desired configuration.  

~Deepak

[1] http://www.fsl.cs.sunysb.edu/pipermail/unionfs/2008-July/005895.html
[2] http://aufs.sourceforge.net/

-- 
Deepak Saxena - Kernel Developer - [EMAIL PROTECTED]
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel