Re: NAND out of space crash
I think this is a huge problem. Here in Uruguay they are seeing a flood of machines with this problem, and it will only get worse over time (and we will encounter this in every other deployment soon.) They desperately need a fix... wad On Jul 21, 2008, at 12:55 PM, Greg Smith wrote: Hi All, I found http://dev.laptop.org/ticket/7125 which looks like a good place to track this problem. I marked it blocker for 8.2.0. Here's what I think we need: - Sugar GUI always starts, no matter how much space is free on the NAND. - If Sugar starts and you are low on space (exact size tbd) then we should alert the user to start clearing space in the journal. I think Eben will work on the second part. Can someone solve the first part? Suggested steps would be to propose a solution, get buy in, code it and check it in. I shouldn't have mentioned partitioning :-( All I meant was that we cannot solve this on upgrade by whacking all user data. Thanks, Greg S Date: Sat, 19 Jul 2008 12:39:04 -0400 From: Erik Garrison [EMAIL PROTECTED] Subject: Re: NAND out of space crash (was Display warnings in sugar (Emiliano Pastorino)) To: [EMAIL PROTECTED] Cc: devel@lists.laptop.org Message-ID: [EMAIL PROTECTED] Content-Type: text/plain; charset=us-ascii On Sat, Jul 19, 2008 at 11:47:21AM -0400, Greg Smith wrote: Hi All, Emiliano has an elegant workaround but crashing the XO on NAND full (to un-recoverable state?) is a heinous bug that affects essentially all users. If someone has the bug ID handy can you send it out and mark it a blocker for 8.2.0 (priority = blocker and keyword includes blocks: 8.2.0)? Can I get a design proposal (no re-partitioning please!), scoping and lead engineer on it ASAP? If you have to stop working on something else to do this, let me know what will drop and I'll help weigh the consequences. My impression is that the long-term benefits of partitioning mean that it's worthwhile to devote effort to it. Are we not going to work on partitioning in the future? In addition to a more solid solution to the NAND fillup issue, we get the opportunity to improve system performance and upgrade procedures. Partitioning will allow us to test out LZO data compression for the XO's filesystems (excluding /boot and /security). We would expect a significant i/o performance boost from the use of LZO. Additionally, partitioning would improve OFW-level system updates (e.g. copy- nand) by making it far simpler for the update procedure to leave user data intact. That said there are obviously a lot of troubles with partitioning. Updating an existing system to a partitioned one without mashing user data is a major issue. Erik ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: NAND out of space crash
There are two issues here that we should be sure to not intertwingle: 1) whatever behavior Sugar may have when low/out of space, during operation, or at boot time. 2) JFFS2's behavior when the file system is almost full. When it gets almost full, it can spend all its time trying to garbage collect, and you can lose completely (the system sort of gets the slows, and grinds to a halt). As to 2), there are patches done by Nokia (deployed on the N800 and similar devices) that reserve some extra space and report out of space before the system gets the slows. These are in Dave's incoming queue to merge into JFFS2 the last I heard. I don't know if he's merged them. - Jim On Mon, 2008-07-21 at 13:45 -0300, John Watlington wrote: I think this is a huge problem. Here in Uruguay they are seeing a flood of machines with this problem, and it will only get worse over time (and we will encounter this in every other deployment soon.) They desperately need a fix... wad On Jul 21, 2008, at 12:55 PM, Greg Smith wrote: Hi All, I found http://dev.laptop.org/ticket/7125 which looks like a good place to track this problem. I marked it blocker for 8.2.0. Here's what I think we need: - Sugar GUI always starts, no matter how much space is free on the NAND. - If Sugar starts and you are low on space (exact size tbd) then we should alert the user to start clearing space in the journal. I think Eben will work on the second part. Can someone solve the first part? Suggested steps would be to propose a solution, get buy in, code it and check it in. I shouldn't have mentioned partitioning :-( All I meant was that we cannot solve this on upgrade by whacking all user data. Thanks, Greg S Date: Sat, 19 Jul 2008 12:39:04 -0400 From: Erik Garrison [EMAIL PROTECTED] Subject: Re: NAND out of space crash (was Display warnings in sugar (Emiliano Pastorino)) To: [EMAIL PROTECTED] Cc: devel@lists.laptop.org Message-ID: [EMAIL PROTECTED] Content-Type: text/plain; charset=us-ascii On Sat, Jul 19, 2008 at 11:47:21AM -0400, Greg Smith wrote: Hi All, Emiliano has an elegant workaround but crashing the XO on NAND full (to un-recoverable state?) is a heinous bug that affects essentially all users. If someone has the bug ID handy can you send it out and mark it a blocker for 8.2.0 (priority = blocker and keyword includes blocks: 8.2.0)? Can I get a design proposal (no re-partitioning please!), scoping and lead engineer on it ASAP? If you have to stop working on something else to do this, let me know what will drop and I'll help weigh the consequences. My impression is that the long-term benefits of partitioning mean that it's worthwhile to devote effort to it. Are we not going to work on partitioning in the future? In addition to a more solid solution to the NAND fillup issue, we get the opportunity to improve system performance and upgrade procedures. Partitioning will allow us to test out LZO data compression for the XO's filesystems (excluding /boot and /security). We would expect a significant i/o performance boost from the use of LZO. Additionally, partitioning would improve OFW-level system updates (e.g. copy- nand) by making it far simpler for the update procedure to leave user data intact. That said there are obviously a lot of troubles with partitioning. Updating an existing system to a partitioned one without mashing user data is a major issue. Erik ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel -- Jim Gettys [EMAIL PROTECTED] One Laptop Per Child ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: NAND out of space crash
On Mon, Jul 21, 2008 at 12:52 PM, Jim Gettys [EMAIL PROTECTED] wrote: There are two issues here that we should be sure to not intertwingle: 1) whatever behavior Sugar may have when low/out of space, during operation, or at boot time. A number of independent issues here: a) the initscripts should be sure to unfreeze the dcon if/when X fails to start. This ensures that the system is obviously recoverable (you can recover by rebooting with the check key held down, but this is not obvious!). b) sugar should, ideally, start even if flash is full. It is currently failing when writing to ~olpc/.boot_time or some such, and crashing. c) once sugar starts, there should be a message indicating that the NAND is critically full. d) trying to save new content to the journal should also give an obvious message that the NAND is full. e) removing content from the journal should work even if NAND is full. I think (a), (b), and (e) are critical for 8.2. (c) is being handled independently by Uruguay, and (c) and (d) should be targets for 9.1. 2) JFFS2's behavior when the file system is almost full. When it gets almost full, it can spend all its time trying to garbage collect, and you can lose completely (the system sort of gets the slows, and grinds to a halt). As to 2), there are patches done by Nokia (deployed on the N800 and similar devices) that reserve some extra space and report out of space before the system gets the slows. These are in Dave's incoming queue to merge into JFFS2 the last I heard. I don't know if he's merged them. These are less critical, IMO. I have filled up NAND, and the slows are not debilitating. The issues above are. We should encourage Dave to fix this issue and the other known JFFS2 bugs (trac #6480, for instance) -- or get dsaxena to do so -- for 9.1. --scott -- ( http://cscott.net/ ) ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: NAND out of space crash
On Jul 21 2008, at 13:39, C. Scott Ananian was caught saying: 2) JFFS2's behavior when the file system is almost full. When it gets almost full, it can spend all its time trying to garbage collect, and you can lose completely (the system sort of gets the slows, and grinds to a halt). As to 2), there are patches done by Nokia (deployed on the N800 and similar devices) that reserve some extra space and report out of space before the system gets the slows. These are in Dave's incoming queue to merge into JFFS2 the last I heard. I don't know if he's merged them. These are less critical, IMO. I have filled up NAND, and the slows are not debilitating. The issues above are. We should encourage Dave to fix this issue and the other known JFFS2 bugs (trac #6480, for instance) -- or get dsaxena to do so -- for 9.1. #6480 is fixed as of yesterday, should be in next joyride. I'll be re-doing Nokia's patches so that they go upstream if we still want them after 8.2 is out; however, I don't think the approach used by them actually helps us. We already have a very limited amount of storage space and reserving space for the root user just reduces what the end user can actually use. I think analyzing performance of non-JFFS2 file systems and picking a replacement should be a high-priority item for 9.1 update. ~Deepak -- Deepak Saxena [EMAIL PROTECTED] ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: NAND out of space crash
On Mon, 2008-07-21 at 09:51 -0700, Deepak Saxena wrote: On Jul 21 2008, at 13:39, C. Scott Ananian was caught saying: 2) JFFS2's behavior when the file system is almost full. When it gets almost full, it can spend all its time trying to garbage collect, and you can lose completely (the system sort of gets the slows, and grinds to a halt). As to 2), there are patches done by Nokia (deployed on the N800 and similar devices) that reserve some extra space and report out of space before the system gets the slows. These are in Dave's incoming queue to merge into JFFS2 the last I heard. I don't know if he's merged them. These are less critical, IMO. I have filled up NAND, and the slows are not debilitating. The issues above are. We should encourage Dave to fix this issue and the other known JFFS2 bugs (trac #6480, for instance) -- or get dsaxena to do so -- for 9.1. #6480 is fixed as of yesterday, should be in next joyride. I'll be re-doing Nokia's patches so that they go upstream if we still want them after 8.2 is out; however, I don't think the approach used by them actually helps us. We already have a very limited amount of storage space and reserving space for the root user just reduces what the end user can actually use. IIRC, the issue is the GC runs more and more often the closer to full you run. By reserving some space, you avoid the performance cliff. Since we expect to be running nearly full most of the time, it would seem to me avoiding this cliff is important. I think analyzing performance of non-JFFS2 file systems and picking a replacement should be a high-priority item for 9.1 update. No argument here - Jim ~Deepak -- Jim Gettys [EMAIL PROTECTED] One Laptop Per Child ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: NAND out of space crash
Hi, agreed on the action items, not so sure about the roadmap. On Mon, Jul 21, 2008 at 7:39 PM, C. Scott Ananian [EMAIL PROTECTED] wrote: d) trying to save new content to the journal should also give an obvious message that the NAND is full. Should the DS also reserve some free space? Regards, Tomeu ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: NAND out of space crash
On Mon, 2008-07-21 at 09:51 -0700, Deepak Saxena wrote: On Jul 21 2008, at 13:39, C. Scott Ananian was caught saying: 2) JFFS2's behavior when the file system is almost full. When it gets almost full, it can spend all its time trying to garbage collect, and you can lose completely (the system sort of gets the slows, and grinds to a halt). As to 2), there are patches done by Nokia (deployed on the N800 and similar devices) that reserve some extra space and report out of space before the system gets the slows. These are in Dave's incoming queue to merge into JFFS2 the last I heard. I don't know if he's merged them. These are less critical, IMO. I have filled up NAND, and the slows are not debilitating. The issues above are. We should encourage Dave to fix this issue and the other known JFFS2 bugs (trac #6480, for instance) -- or get dsaxena to do so -- for 9.1. #6480 is fixed as of yesterday, should be in next joyride. Yeah. Since it was purely cosmetic I figured it might as well just wait to come through 'naturally'. I'll be re-doing Nokia's patches so that they go upstream if we still want them after 8.2 is out; however, I don't think the approach used by them actually helps us. We already have a very limited amount of storage space and reserving space for the root user just reduces what the end user can actually use. I think analyzing performance of non-JFFS2 file systems and picking a replacement should be a high-priority item for 9.1 update. I'm looking at making btrfs work on pure flash. It looks fairly sane in that respect. Using a 'standard' file system will have benefits... -- dwmw2 ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: NAND out of space crash
On Mon, Jul 21, 2008 at 1:55 PM, David Woodhouse [EMAIL PROTECTED] wrote: #6480 is fixed as of yesterday, should be in next joyride. Yeah. Since it was purely cosmetic I figured it might as well just wait to come through 'naturally'. It's not purely cosmetic: in my testing the bogus accounting affects the output of 'df', so that sugar thinks there is space available, even though writes will all fail due to insufficient space. I should have noted this more clearly in the bug. --scott -- ( http://cscott.net/ ) ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: NAND out of space crash
On Mon, Jul 21, 2008 at 1:39 PM, C. Scott Ananian [EMAIL PROTECTED] wrote: A number of independent issues here: I have edited http://dev.laptop.org/ticket/7125 to clarify the pieces of this bug and to make the component tasks (including #5317) more obvious. I have *not* attempted to set milestones or priorities; that's up to Greg/Michael/the component authors. Clearly some of these items are more critical that others; I agree with Deepak and dwmw2 in that it might be easier/better to fix the root allocation issue in 9.1 by simply moving to a better filesystem, since the slows are not the critical item for this bug. Anyway, please continue the discussion in trac for #7125 and its children. --scott -- ( http://cscott.net/ ) ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: NAND out of space crash
On Jul 21 2008, at 13:55, Jim Gettys was caught saying: #6480 is fixed as of yesterday, should be in next joyride. I'll be re-doing Nokia's patches so that they go upstream if we still want them after 8.2 is out; however, I don't think the approach used by them actually helps us. We already have a very limited amount of storage space and reserving space for the root user just reduces what the end user can actually use. IIRC, the issue is the GC runs more and more often the closer to full you run. By reserving some space, you avoid the performance cliff. Since we expect to be running nearly full most of the time, it would seem to me avoiding this cliff is important. I can go ahead and apply the existing Nokia patch into the 8.2 kernel as a short-term measure but don't want to arbitrarilly choose a reservation size. Dave, do you have a suggestion as to what percentage should be reserved to keep the GC from going out of control? If not, we'll need to run some performance tests to find the sweet spot. ~Deepak -- Deepak Saxena [EMAIL PROTECTED] ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: NAND out of space crash
On Mon, Jul 21, 2008 at 01:39:25PM -0400, C. Scott Ananian wrote: On Mon, Jul 21, 2008 at 12:52 PM, Jim Gettys [EMAIL PROTECTED] wrote: There are two issues here that we should be sure to not intertwingle: 1) whatever behavior Sugar may have when low/out of space, during operation, or at boot time. A number of independent issues here: ... b) sugar should, ideally, start even if flash is full. It is currently failing when writing to ~olpc/.boot_time or some such, and crashing. In olpc-utils: usr/bin/olpc-session. This was done for performance testing work, and I am unaware of other references to the file. We can either comment out this stanza or remove it. I have attached patches to do either. Erik From 3527ba05f79f2a6543baa004a8b6fbf613dcd735 Mon Sep 17 00:00:00 2001 From: Erik Garrison [EMAIL PROTECTED] Date: Mon, 21 Jul 2008 14:25:23 -0400 Subject: [PATCH] Stop writing ~/.boot_time at startup so we can improve our chances in NAND-fillup land. --- usr/bin/olpc-session |7 --- 1 files changed, 4 insertions(+), 3 deletions(-) diff --git a/usr/bin/olpc-session b/usr/bin/olpc-session index c50b5f1..a38bd4b 100755 --- a/usr/bin/olpc-session +++ b/usr/bin/olpc-session @@ -60,9 +60,10 @@ xset -r 9 -r 220 -r 67 -r 68 -r 69 -r 70 -r 71 -r 72 -r 73 -r 74 -r 79 -r \ # source custom user session, if present [ -f $HOME/.xsession ] . $HOME/.xsession -# useful for performance work -mv $HOME/.boot_time $HOME/.boot_time.prev 2/dev/null -cat /proc/uptime $HOME/.boot_time +# Uncomment the following lines to save a record of our startup time. +# This is useful for performance work. +# mv $HOME/.boot_time $HOME/.boot_time.prev 2/dev/null +# cat /proc/uptime $HOME/.boot_time # finally, run sugar exec /usr/bin/ck-xinit-session /usr/bin/sugar -- 1.5.4.3 From 66cbebe1338dd9167d49b69cb71b4911676bb013 Mon Sep 17 00:00:00 2001 From: Erik Garrison [EMAIL PROTECTED] Date: Mon, 21 Jul 2008 14:21:06 -0400 Subject: [PATCH] Stop writing ~/.boot_time at startup so we can improve our chances in NAND-fillup land. --- usr/bin/olpc-session |4 1 files changed, 0 insertions(+), 4 deletions(-) diff --git a/usr/bin/olpc-session b/usr/bin/olpc-session index c50b5f1..4a82845 100755 --- a/usr/bin/olpc-session +++ b/usr/bin/olpc-session @@ -60,9 +60,5 @@ xset -r 9 -r 220 -r 67 -r 68 -r 69 -r 70 -r 71 -r 72 -r 73 -r 74 -r 79 -r \ # source custom user session, if present [ -f $HOME/.xsession ] . $HOME/.xsession -# useful for performance work -mv $HOME/.boot_time $HOME/.boot_time.prev 2/dev/null -cat /proc/uptime $HOME/.boot_time - # finally, run sugar exec /usr/bin/ck-xinit-session /usr/bin/sugar -- 1.5.4.3 ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: NAND out of space crash
On Mon, Jul 21, 2008 at 2:31 PM, Erik Garrison [EMAIL PROTECTED] wrote: b) sugar should, ideally, start even if flash is full. It is currently failing when writing to ~olpc/.boot_time or some such, and crashing. In olpc-utils: usr/bin/olpc-session. This was done for performance testing work, and I am unaware of other references to the file. We can either comment out this stanza or remove it. I have attached patches to do either. Erik, would you mind claiming #7586 and/or #7587? I don't think we need to remove the boot time code; we just need to make sure that the shell script doesn't exit if it fails. --scott -- ( http://cscott.net/ ) ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: NAND out of space crash
It sounds like you are working on the root causes. Tday I'm hanging out with the logistics/repair team, and the problem is worse than I thought this morning. They are being innundated with new problems caused by full disk (but weren't really aware that was the cause.) Since fixes in 8.2 won't help them for months, they need the short term fix (c). I will talk to Fiorella and her team about progress on that tmw. They also need a way of repairing these in the field. Mailing them back to LATU for reflashing is costing a fortune. Over 55% of their returns for repair are fixed by reflashing/reactivating. The problem with a teacher reflashing them are two: 1) The teachers don't have activation keys for the machines, and Uruguay doesn't want to start giving them out. 2) Currently, there is no monolithic image for Uruguay (I was unaware of this, but they say that first they reflash, then they activate, then they install the Uruguay specific scripts.) It seems like we should be able to produce a upgrade and customize key that does this in one step, and preserves the activation key for the laptop. Thoughts ? wad On Jul 21, 2008, at 2:39 PM, C. Scott Ananian wrote: On Mon, Jul 21, 2008 at 12:52 PM, Jim Gettys [EMAIL PROTECTED] wrote: There are two issues here that we should be sure to not intertwingle: 1) whatever behavior Sugar may have when low/out of space, during operation, or at boot time. A number of independent issues here: a) the initscripts should be sure to unfreeze the dcon if/when X fails to start. This ensures that the system is obviously recoverable (you can recover by rebooting with the check key held down, but this is not obvious!). b) sugar should, ideally, start even if flash is full. It is currently failing when writing to ~olpc/.boot_time or some such, and crashing. c) once sugar starts, there should be a message indicating that the NAND is critically full. d) trying to save new content to the journal should also give an obvious message that the NAND is full. e) removing content from the journal should work even if NAND is full. I think (a), (b), and (e) are critical for 8.2. (c) is being handled independently by Uruguay, and (c) and (d) should be targets for 9.1. 2) JFFS2's behavior when the file system is almost full. When it gets almost full, it can spend all its time trying to garbage collect, and you can lose completely (the system sort of gets the slows, and grinds to a halt). As to 2), there are patches done by Nokia (deployed on the N800 and similar devices) that reserve some extra space and report out of space before the system gets the slows. These are in Dave's incoming queue to merge into JFFS2 the last I heard. I don't know if he's merged them. These are less critical, IMO. I have filled up NAND, and the slows are not debilitating. The issues above are. We should encourage Dave to fix this issue and the other known JFFS2 bugs (trac #6480, for instance) -- or get dsaxena to do so -- for 9.1. --scott -- ( http://cscott.net/ ) ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: NAND out of space crash
On Mon, Jul 21, 2008 at 3:57 PM, John Watlington [EMAIL PROTECTED] wrote: It seems like we should be able to produce a upgrade and customize key that does this in one step, and preserves the activation key for the laptop. Yes. The issues in the past have just been coordination-related. I believe Emiliano is capable of generating a build image with the Uruguay scripts installed, which is the first half of the problem. --scott -- ( http://cscott.net/ ) ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: NAND out of space crash
They are being innundated with new problems caused by full disk (but weren't really aware that was the cause.) Since fixes in 8.2 won't help them for months, they need the short term fix (c). Mitch added Forth words to delete files from the NAND flash, after we had similar troubles after Christmas (bug #5744, #5719, #5317): Changed 7 months ago by [EMAIL PROTECTED] OFW q2d07c and later have the ability to delete files from the JFFS2 filesystem, so long as there is at least one empty page for storing the deletion node. ok dir n:\home\olpc\.sugar\default\data\ ok rm n:\home\olpc\.sugar\default\data\XXX where XXX is the name of the file you want to delete. [I don't know how often there will be no empty page for the deletion node - I suspect we'll find out.] I suggest that OLPC figure out a short list of reasonably large files that we supply on NAND, but which aren't actually needed by most students (perhaps a language translation for a language they don't use; or an activity binary that they can easily reinstall later). Include that list along with instructions on how to remove one or more of these files when they get into this jam. Of course, getting to Forth requires a normal computer (i.e. a developer key, which every child is entitled to, but apparently no children actually get). You can get developer keys, even from a crashed XO that won't boot NAND, using a collector key, web access, and a lot of patience. Somebody who had the sooper secret OLPC script-signing key could write a Forth script that field teachers could run on crashed lockdown XO's, which would put them into Forth and let them type. (Perhaps if you believe deeply in making security expensive, it can check to see if the NAND is more than 95% full, and only let them type if so. Or it can provide a menu of files for deletion. Or it can limit itself in any number of ways, making it less useful but more quote-unquote safe) John ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: NAND out of space crash
I should've said that just removing a couple of useless or easily replaced files -- rather than reflashing -- means that the kids don't lose all their work when the NAND fills up. John ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: NAND out of space crash
On Mon, 2008-07-21 at 10:29 -0700, Deepak Saxena wrote: I can go ahead and apply the existing Nokia patch into the 8.2 kernel as a short-term measure but don't want to arbitrarilly choose a reservation size. Dave, do you have a suggestion as to what percentage should be reserved to keep the GC from going out of control? If not, we'll need to run some performance tests to find the sweet spot. I don't have a suggestion. But I'd prefer not to apply the overly complex patch from Artem -- just add a 'root only' threshold and hard-code it for now (we should really expose _all_ the thresholds in sysfs). -- dwmw2 ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: NAND out of space crash (was Display warnings in sugar (Emiliano Pastorino))
Hi All, Emiliano has an elegant workaround but crashing the XO on NAND full (to un-recoverable state?) is a heinous bug that affects essentially all users. If someone has the bug ID handy can you send it out and mark it a blocker for 8.2.0 (priority = blocker and keyword includes blocks:8.2.0)? Can I get a design proposal (no re-partitioning please!), scoping and lead engineer on it ASAP? If you have to stop working on something else to do this, let me know what will drop and I'll help weigh the consequences. Thanks, Greg S [EMAIL PROTECTED] wrote: Date: Thu, 17 Jul 2008 15:44:56 -0400 From: C. Scott Ananian [EMAIL PROTECTED] Subject: Re: [sugar] Display warnings in sugar To: Tomeu Vizoso [EMAIL PROTECTED] Cc: devel@lists.laptop.org, Eben Eliason [EMAIL PROTECTED], [EMAIL PROTECTED] Message-ID: [EMAIL PROTECTED] Content-Type: text/plain; charset=ISO-8859-1 On Thu, Jul 17, 2008 at 5:21 AM, Tomeu Vizoso [EMAIL PROTECTED] wrote: On Thu, Jul 17, 2008 at 2:27 AM, C. Scott Ananian [EMAIL PROTECTED] wrote: I hope our alert system will use the freedesktop.org standard: http://www.galago-project.org/specs/notification/index.php It is widely used in Gnome, and when I last reviewed it seems to be a solid and capable spec. The interfaces in that spec look quite good, although perhaps would benefit from a simpler, alternative API that also abstracts the D-Bus stuff. Perhaps rainbow should do some rate limiting or permissions checking, not sure. Sure, wrap the actual DBus calls with a simplied sugar/python method if you like, but *please* let's implement a listener for that API so that unmodified applications can interact sensibly with Sugar, and so that our system tools activities can interoperate with non-Sugar window managers. Similarly, we should really implement that standard freedesktop.org startup notification spec, so we can get sensible notifications and icons for 'ordinary' applications. --scott ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: NAND out of space crash (was Display warnings in sugar (Emiliano Pastorino))
On Sat, Jul 19, 2008 at 11:47:21AM -0400, Greg Smith wrote: Hi All, Emiliano has an elegant workaround but crashing the XO on NAND full (to un-recoverable state?) is a heinous bug that affects essentially all users. If someone has the bug ID handy can you send it out and mark it a blocker for 8.2.0 (priority = blocker and keyword includes blocks:8.2.0)? Can I get a design proposal (no re-partitioning please!), scoping and lead engineer on it ASAP? If you have to stop working on something else to do this, let me know what will drop and I'll help weigh the consequences. My impression is that the long-term benefits of partitioning mean that it's worthwhile to devote effort to it. Are we not going to work on partitioning in the future? In addition to a more solid solution to the NAND fillup issue, we get the opportunity to improve system performance and upgrade procedures. Partitioning will allow us to test out LZO data compression for the XO's filesystems (excluding /boot and /security). We would expect a significant i/o performance boost from the use of LZO. Additionally, partitioning would improve OFW-level system updates (e.g. copy-nand) by making it far simpler for the update procedure to leave user data intact. That said there are obviously a lot of troubles with partitioning. Updating an existing system to a partitioned one without mashing user data is a major issue. Erik ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: NAND out of space crash (was Display warnings in sugar (Emiliano Pastorino))
On Sat, Jul 19, 2008 at 12:58:13PM -0400, Benjamin M. Schwartz wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Erik Garrison wrote: | On Sat, Jul 19, 2008 at 11:47:21AM -0400, Greg Smith wrote: | Hi All, | | Emiliano has an elegant workaround but crashing the XO on NAND full (to | un-recoverable state?) is a heinous bug that affects essentially all users. | | If someone has the bug ID handy can you send it out and mark it a | blocker for 8.2.0 (priority = blocker and keyword includes blocks:8.2.0)? | | Can I get a design proposal (no re-partitioning please!), scoping and | lead engineer on it ASAP? | | If you have to stop working on something else to do this, let me know | what will drop and I'll help weigh the consequences. | | My impression is that the long-term benefits of partitioning mean that | it's worthwhile to devote effort to it. Are we not going to work on | partitioning in the future? Adding partitioning does not automatically solve the NAND fillup problem. ~ The fundamental issue is that Sugar tries to write files on boot, and fails to boot if it cannot write those files. The correct solution is to make sure that Sugar can boot even if it cannot write files. This change is needed in order to enable booting on full NAND, whether or not partitioning is used to separate system and user files. In short, these issues, while related, are largely decoupled, and can be attacked separately. You are absolutely correct. Partitioning can be used to isolate the system filesystem(s) from the effects of user-level data creation, and thus mitigate the risk of fillup of a partition yielding an unbootable system. However, the solution is wholly ineffectual wrt. the fillup issue until we ensure Sugar only needs to write to the partition which we are confident will have space. If we are going to check all the file write requirements of the Sugar shell, we might as well implement the far better solution of enabling Sugar to boot without writing anything. Below is a patch to Sugar which resolves the only python-side case of a file write during startup which I was able to find. I couldn't find reference to the configuration variables saved in _save_session_info elsewhere in the sugar repository. If these variables are pulled from the config file after Sugar startup, then this patch is a bad idea on its own. diff --git a/src/main.py b/src/main.py index b1ecc93..1899438 100644 --- a/src/main.py +++ b/src/main.py @@ -55,15 +55,19 @@ def _save_session_info(): #do not rely on it # session_info_file = os.path.join(env.get_profile_path(), session.info) -f = open(session_info_file, w) +try: +f = open(session_info_file, w) + +cp = ConfigParser() +cp.add_section('Session') +cp.set('Session', 'dbus_address', os.environ['DBUS_SESSION_BUS_ADDRESS']) +cp.set('Session', 'display', gtk.gdk.display_get_default().get_name()) +cp.write(f) -cp = ConfigParser() -cp.add_section('Session') -cp.set('Session', 'dbus_address', os.environ['DBUS_SESSION_BUS_ADDRESS']) -cp.set('Session', 'display', gtk.gdk.display_get_default().get_name()) -cp.write(f) +f.close() +except IOError, (errno, sterror): +logger.error(Could not open session_info_file. %s % sterror) -f.close() def _setup_translations(): locale_path = os.path.join(config.prefix, 'share', 'locale') diff --git a/src/main.py b/src/main.py index b1ecc93..1899438 100644 --- a/src/main.py +++ b/src/main.py @@ -55,15 +55,19 @@ def _save_session_info(): #do not rely on it # session_info_file = os.path.join(env.get_profile_path(), session.info) -f = open(session_info_file, w) +try: +f = open(session_info_file, w) + +cp = ConfigParser() +cp.add_section('Session') +cp.set('Session', 'dbus_address', os.environ['DBUS_SESSION_BUS_ADDRESS']) +cp.set('Session', 'display', gtk.gdk.display_get_default().get_name()) +cp.write(f) -cp = ConfigParser() -cp.add_section('Session') -cp.set('Session', 'dbus_address', os.environ['DBUS_SESSION_BUS_ADDRESS']) -cp.set('Session', 'display', gtk.gdk.display_get_default().get_name()) -cp.write(f) +f.close() +except IOError, (errno, sterror): +logger.error(Could not open session_info_file. %s % sterror) -f.close() def _setup_translations(): locale_path = os.path.join(config.prefix, 'share', 'locale') ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: NAND out of space crash (was Display warnings in sugar (Emiliano Pastorino))
disclaimer. The attached patch is untested and likely insufficient to solve this problem. On Sat, Jul 19, 2008 at 01:39:20PM -0400, Erik Garrison wrote: On Sat, Jul 19, 2008 at 12:58:13PM -0400, Benjamin M. Schwartz wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Erik Garrison wrote: | On Sat, Jul 19, 2008 at 11:47:21AM -0400, Greg Smith wrote: | Hi All, | | Emiliano has an elegant workaround but crashing the XO on NAND full (to | un-recoverable state?) is a heinous bug that affects essentially all users. | | If someone has the bug ID handy can you send it out and mark it a | blocker for 8.2.0 (priority = blocker and keyword includes blocks:8.2.0)? | | Can I get a design proposal (no re-partitioning please!), scoping and | lead engineer on it ASAP? | | If you have to stop working on something else to do this, let me know | what will drop and I'll help weigh the consequences. | | My impression is that the long-term benefits of partitioning mean that | it's worthwhile to devote effort to it. Are we not going to work on | partitioning in the future? Adding partitioning does not automatically solve the NAND fillup problem. ~ The fundamental issue is that Sugar tries to write files on boot, and fails to boot if it cannot write those files. The correct solution is to make sure that Sugar can boot even if it cannot write files. This change is needed in order to enable booting on full NAND, whether or not partitioning is used to separate system and user files. In short, these issues, while related, are largely decoupled, and can be attacked separately. You are absolutely correct. Partitioning can be used to isolate the system filesystem(s) from the effects of user-level data creation, and thus mitigate the risk of fillup of a partition yielding an unbootable system. However, the solution is wholly ineffectual wrt. the fillup issue until we ensure Sugar only needs to write to the partition which we are confident will have space. If we are going to check all the file write requirements of the Sugar shell, we might as well implement the far better solution of enabling Sugar to boot without writing anything. Below is a patch to Sugar which resolves the only python-side case of a file write during startup which I was able to find. I couldn't find reference to the configuration variables saved in _save_session_info elsewhere in the sugar repository. If these variables are pulled from the config file after Sugar startup, then this patch is a bad idea on its own. diff --git a/src/main.py b/src/main.py index b1ecc93..1899438 100644 --- a/src/main.py +++ b/src/main.py @@ -55,15 +55,19 @@ def _save_session_info(): #do not rely on it # session_info_file = os.path.join(env.get_profile_path(), session.info) -f = open(session_info_file, w) +try: +f = open(session_info_file, w) + +cp = ConfigParser() +cp.add_section('Session') +cp.set('Session', 'dbus_address', os.environ['DBUS_SESSION_BUS_ADDRESS']) +cp.set('Session', 'display', gtk.gdk.display_get_default().get_name()) +cp.write(f) -cp = ConfigParser() -cp.add_section('Session') -cp.set('Session', 'dbus_address', os.environ['DBUS_SESSION_BUS_ADDRESS']) -cp.set('Session', 'display', gtk.gdk.display_get_default().get_name()) -cp.write(f) +f.close() +except IOError, (errno, sterror): +logger.error(Could not open session_info_file. %s % sterror) -f.close() def _setup_translations(): locale_path = os.path.join(config.prefix, 'share', 'locale') diff --git a/src/main.py b/src/main.py index b1ecc93..1899438 100644 --- a/src/main.py +++ b/src/main.py @@ -55,15 +55,19 @@ def _save_session_info(): #do not rely on it # session_info_file = os.path.join(env.get_profile_path(), session.info) -f = open(session_info_file, w) +try: +f = open(session_info_file, w) + +cp = ConfigParser() +cp.add_section('Session') +cp.set('Session', 'dbus_address', os.environ['DBUS_SESSION_BUS_ADDRESS']) +cp.set('Session', 'display', gtk.gdk.display_get_default().get_name()) +cp.write(f) -cp = ConfigParser() -cp.add_section('Session') -cp.set('Session', 'dbus_address', os.environ['DBUS_SESSION_BUS_ADDRESS']) -cp.set('Session', 'display', gtk.gdk.display_get_default().get_name()) -cp.write(f) +f.close() +except IOError, (errno, sterror): +logger.error(Could not open session_info_file. %s % sterror) -f.close() def _setup_translations(): locale_path = os.path.join(config.prefix, 'share', 'locale') ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel