Re: [Techteam] NAND full issue
Martin Langhoff wrote: On Sat, Jul 26, 2008 at 1:00 PM, Daniel Drake [EMAIL PROTECTED] wrote: unionfs will involve a kernel change. Erik's got a ko to add to the initrd AIUI. Have we considered sorting by date and removing from oldest to new until the threshold is reached? Perhaps excluding starred items. Both date and size are flawed -- Greg and Cjb have explored the flaws of both approaches on [EMAIL PROTECTED] The best notes on this are from Mitch so far - he looked at the FSs and spotted things we can safely delete. And we cannot query for starred items without starting the journal, which does not start in no-space conditions. IMHO, Cjb's script should delete caches and the files various files we know are safe to nuke _before_ we even consider user data The big-ticket item is the contents of ~/.sugar/data . According to Tomeu in the attached message, those files are leaks and could be deleted on boot without further ado. In the images that I analyzed, those files represented 50% or more of the size of /home, ranging from ~500 to ~800 MB. As a quick fix, just deleting those leak files on every boot would probably reduce the incidence of NAND-full reboot failures by two orders of magnitude, which should be enough to downgrade the problem from critical to minor annoyance. Ironically, the presence of that leak (which Tomeu claims is fixed in Update.1) may inoculate the system against NAND fillup from other causes, by reserving space that can be reclaimed upon reboot. That suggests a long-term safety strategy: a) Determine how much free space is needed to ensure that the system can boot up to a level where filesystem cleanup and maintenance can be performed. For example, let's say the magic amount is 50 MB. b) Add a 50 MB space-holder file to the filesystem on the first boot (or as part of the initial image). Fill it with incompressible data so it actually occupies the full space. c) On each boot, if there is less than 50 MB free, delete the space-holder file to free up space, and boot in maintenance mode so the user or a helper can clean up. Continue booting in maintenance mode until the free space has increased above some threshold. After the cleanup, recreate a space-holder file. Of course, it is still important to automate as much cleanup as can safely be performed without user intervention, but the space-holder approach is still a good backstop to avoid trips to a repair center. - Mitch has identified stray CVS directories. These are safe to nuke. - /var/cache/yum - ~/.sugar/default/logs - ~/.sugar/default/gecko/Cache - Someone mentioned large support files in eToys. Might be worthwhile to nuke large Activities in ~/Activities. If not enough space is available, then it makes sense to nuke user data. cheers, m ---BeginMessage--- Hi, the files in .sugar/default/data are leaks that were fixed in Update.1: http://dev.laptop.org/ticket/5637 That should reduce flash usage a lot, but I guess we need to continue anyway with our plans to deal with full file systems. Should we tell Uruguay to delete the files in that dir after every boot? Thanks, Tomeu On Thu, Jul 24, 2008 at 1:05 AM, Mitch Bradley [EMAIL PROTECTED] wrote: 2138.IMG shows the same pattern - /home/olpc1.196 GB ./.sugar/default 1.111 GB ./datastore1214593451.73/store 279 MB ./data 798 MB The takeaway point here is that the datastore or journal or whatever is filling up pretty darn fast. The .sugar/default/data/ directory has a lot of .xo files with the following sizes: 7267307, 3428082, 1533872, 1298769 When you drop megabyte+ files on a regular basis, it doesn't take long to fill our NAND FLASH. Mitch Bradley wrote: The analysis of 2145.img is similar, albeit somewhat smaller. /home/olpc1.093 GB ./.sugar/default941 MB ./datastore1213821699.21/store 354 MB (about half of that is in ./preview) ./data 469 MB Mitch Bradley wrote: ls-r analysis of 25F5.IMG (Reported file data sizes, not the actual space on the NAND media) /home/olpc 1.358 GB ./Activities 291 MB ./simcity.activity 18 MB ./Xcratch.activity 16 MB ./TuxPaint.activity 42 MB (27 MB is stamps) ./.sugar 1.067 GB ./default/datastore/store337 MB ./default/data 602 MB ./default/gecko 52 MB ./default/org.laptop.WebActivity 44 MB (16 MB is click_on_letter, 10 MB is gcompris) ./default/org.laptop.RecordActivity/instance 27 MB The dates on items in .sugar/default/data range from 2008-04-26 to 2008-06-28. The activity is heavily clustered toward a few days, mostly 05-25 and 06-2x. John Watlington wrote:
Re: [Techteam] NAND full issue
On Fri, Jul 25, 2008 at 9:27 PM, Deepak Saxena [EMAIL PROTECTED] wrote: On Jul 25 2008, at 20:00, Daniel Drake was caught saying: So unionfs is the formal bug fix for 8.2 going forward, or is it a Uruguay-specific thing? unionfs will involve a kernel change. Are we planning to shift them from 2.6.22 to 2.6.25 where unionfs has been included, or are we going to backport unionfs to 2.6.22? Also, I am a little wary of unionfs, I have used it in the past and found it to be buggy and unreliable. It may be better now, but we should be cautious. I've done an analysis of the UFS code and it may be possible to have a standalone unionfs module w/o changes to core kernel. See [1] for my email sent to UFS maintainers and list. My concern is that by forking the code this way, we're introducing another variable. However... Erik has been using AUFS[2] as UFS was crashing badly and not allowing sugar to boot. AUFS is completely standalone and requires no changes to the deployed kernel. This is also non-upstream so we should run it through some form of stress test in our desired configuration. ~Deepak [1] http://www.fsl.cs.sunysb.edu/pipermail/unionfs/2008-July/005895.html [2] http://aufs.sourceforge.net/ This might be old news, but Knoppix (the original linux live CD) changed from unionfs to aufs some years ago with good results. I suppose you could ask Klaus Knopper about his experiences with the reliability of aufs. See www.knopper.net (in German) or www.knoppix.com (in English). HTH Ton van Overbeek ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: NAND full issue
Kimberley Quirk wrote: OLPC's response is Failsafe for 656, per703, and 8.1.2; and a formal bug fix for 8.2 going forward: Uruguay: Erik is working with Uruguay on the solution described as Union Mount below. It is important that Uruguay own this bug fix themselves and can maintain it as needed, test it to their satisfaction, decide how to distribute it. This can be delivered as a USB or wireless download. Uruguay also has the choice to use the options supported by OLPC above. So unionfs is the formal bug fix for 8.2 going forward, or is it a Uruguay-specific thing? unionfs will involve a kernel change. Are we planning to shift them from 2.6.22 to 2.6.25 where unionfs has been included, or are we going to backport unionfs to 2.6.22? Also, I am a little wary of unionfs, I have used it in the past and found it to be buggy and unreliable. It may be better now, but we should be cautious. RECOVERY SOLUTIONS - Automatic Free Space: Provide USB bootable build that would free space in some way. Can we identify a class of things that we know can be deleted (like cracklib dictionary of unsafe passwords, large activities). Add a note that a delete is going to happen during boot. Only works the first time they fill it up, obviously. Failsafe: Can be inserted in the build, include 'automatic free space'. It opens the datastore and sorts by size, wants to find 50M, pops off the stack deleting stuff from largest to smallest. Can it explain afterwards what it has done or explain ahead of time what it might do. Provide options for what to delete. Have we considered sorting by date and removing from oldest to new until the threshold is reached? Perhaps excluding starred items. The Fix: (fix to 7587) When the NAND is full, Sugar will boot but not be allowed to write. A notification about space and inability to write needs to be displayed. ...and the space can be freed by deleting activities and journal items? No unionfs involved? I feel that is the best way forward. Daniel ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: [Techteam] NAND full issue
On Sat, Jul 26, 2008 at 1:00 PM, Daniel Drake [EMAIL PROTECTED] wrote: unionfs will involve a kernel change. Erik's got a ko to add to the initrd AIUI. Have we considered sorting by date and removing from oldest to new until the threshold is reached? Perhaps excluding starred items. Both date and size are flawed -- Greg and Cjb have explored the flaws of both approaches on [EMAIL PROTECTED] The best notes on this are from Mitch so far - he looked at the FSs and spotted things we can safely delete. And we cannot query for starred items without starting the journal, which does not start in no-space conditions. IMHO, Cjb's script should delete caches and the files various files we know are safe to nuke _before_ we even consider user data - Mitch has identified stray CVS directories. These are safe to nuke. - /var/cache/yum - ~/.sugar/default/logs - ~/.sugar/default/gecko/Cache - Someone mentioned large support files in eToys. Might be worthwhile to nuke large Activities in ~/Activities. If not enough space is available, then it makes sense to nuke user data. cheers, m -- [EMAIL PROTECTED] -- School Server Architect - ask interesting questions - don't get distracted with shiny stuff - working code first - http://wiki.laptop.org/go/User:Martinlanghoff ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: [Techteam] NAND full issue
On Jul 25 2008, at 20:00, Daniel Drake was caught saying: So unionfs is the formal bug fix for 8.2 going forward, or is it a Uruguay-specific thing? unionfs will involve a kernel change. Are we planning to shift them from 2.6.22 to 2.6.25 where unionfs has been included, or are we going to backport unionfs to 2.6.22? Also, I am a little wary of unionfs, I have used it in the past and found it to be buggy and unreliable. It may be better now, but we should be cautious. I've done an analysis of the UFS code and it may be possible to have a standalone unionfs module w/o changes to core kernel. See [1] for my email sent to UFS maintainers and list. My concern is that by forking the code this way, we're introducing another variable. However... Erik has been using AUFS[2] as UFS was crashing badly and not allowing sugar to boot. AUFS is completely standalone and requires no changes to the deployed kernel. This is also non-upstream so we should run it through some form of stress test in our desired configuration. ~Deepak [1] http://www.fsl.cs.sunysb.edu/pipermail/unionfs/2008-July/005895.html [2] http://aufs.sourceforge.net/ -- Deepak Saxena - Kernel Developer - [EMAIL PROTECTED] ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel