Re: [Sugar-devel] [Dextrose] Stability stuff

2010-09-06 Thread Tomeu Vizoso
On Fri, Sep 3, 2010 at 23:59, Bernie Innocenti  wrote:
> El Fri, 03-09-2010 a las 11:23 -0400, Martin Abente escribió:
>
>> Well, thats true in theory, assuming all the activities are properly
>> designed for sugar. In the field you already know thats not the case.
>> Also... even when the activities are being implemented in python
>> through the Activity Class, the read and write methods needs to be
>> implemented by the programmer. That means it
>> depends on the activity specifics again.
>
> Yes, but if an activity fails to save when Sugar asks it to quit, then
> it's already buggy today: we also have a "Stop" item in the menu of the
> activity frame icon.
>
>
>> > This is also a very good suggestion. We could start by doing this, which
>> > is a lot easier and almost equally effective.
>>
>> I see it this way: Why waiting to get sick to do something about it.
>> Preventive medicine
>> is always better. Why waiting for the machine to freeze (waiting 3 or more
>> minutes until its back
>> to a usable state again) to do something about it, also with potential
>> data loss.
>>
>> Having a message telling kids that the machine is too overloaded should be
>> enough, with
>> recommendations about saving any current work and closing earlier
>> activities.
>>
>> This kind of mechanisms should help to the overall stability, and it makes
>> even more sense when you
>> think about XO's 1 scenarios.
>>
>> :)
>
> Yes, I already agreed with you on this. The hard part of this patch
> would be setting a threshold to disallow opening another activity.
> Memory footprint of activities varies wildly. Shall we take the worst
> case, pissing off users who knew what they were doing, or shall we be
> optimistic, risking the current behavior in some cases?
>
> If we also had both the "graceful stop on oom" that I was thinking of,
> we could afford to be be optimistic in the "oom prevention" code.
>
> Anyway, for now I'd vote for doing what you suggest in the easiest
> possible way even if it saves the system only 50% of the times. It would
> still be a huge improvement upon the current behavior.

Whatever we end up doing, we should not leave much chance of
undercalculating the available memory of we may render Sugar mostly
useless.

This remembers me when we deployed the free-space warning and users
were able to get into situations where they could not use Sugar
because of not enough space but also couldn't remove stuff.

Regards,

Tomeu

> --
>   // Bernie Innocenti - http://codewiz.org/
>  \X/  Sugar Labs       - http://sugarlabs.org/
>
> ___
> Sugar-devel mailing list
> Sugar-devel@lists.sugarlabs.org
> http://lists.sugarlabs.org/listinfo/sugar-devel
>
___
Sugar-devel mailing list
Sugar-devel@lists.sugarlabs.org
http://lists.sugarlabs.org/listinfo/sugar-devel


Re: [Sugar-devel] [Dextrose] Stability stuff

2010-09-03 Thread Bernie Innocenti
El Fri, 03-09-2010 a las 11:23 -0400, Martin Abente escribió:

> Well, thats true in theory, assuming all the activities are properly
> designed for sugar. In the field you already know thats not the case.
> Also... even when the activities are being implemented in python
> through the Activity Class, the read and write methods needs to be
> implemented by the programmer. That means it
> depends on the activity specifics again.

Yes, but if an activity fails to save when Sugar asks it to quit, then
it's already buggy today: we also have a "Stop" item in the menu of the
activity frame icon.


> > This is also a very good suggestion. We could start by doing this, which
> > is a lot easier and almost equally effective.
> 
> I see it this way: Why waiting to get sick to do something about it.
> Preventive medicine
> is always better. Why waiting for the machine to freeze (waiting 3 or more
> minutes until its back 
> to a usable state again) to do something about it, also with potential
> data loss.
> 
> Having a message telling kids that the machine is too overloaded should be
> enough, with 
> recommendations about saving any current work and closing earlier
> activities. 
> 
> This kind of mechanisms should help to the overall stability, and it makes
> even more sense when you
> think about XO's 1 scenarios.
> 
> :)

Yes, I already agreed with you on this. The hard part of this patch
would be setting a threshold to disallow opening another activity.
Memory footprint of activities varies wildly. Shall we take the worst
case, pissing off users who knew what they were doing, or shall we be
optimistic, risking the current behavior in some cases?

If we also had both the "graceful stop on oom" that I was thinking of,
we could afford to be be optimistic in the "oom prevention" code.

Anyway, for now I'd vote for doing what you suggest in the easiest
possible way even if it saves the system only 50% of the times. It would
still be a huge improvement upon the current behavior.

-- 
   // Bernie Innocenti - http://codewiz.org/
 \X/  Sugar Labs   - http://sugarlabs.org/

___
Sugar-devel mailing list
Sugar-devel@lists.sugarlabs.org
http://lists.sugarlabs.org/listinfo/sugar-devel


Re: [Sugar-devel] [Dextrose] Stability stuff

2010-09-03 Thread Martin Langhoff
On Fri, Sep 3, 2010 at 11:23 AM, Martin Abente
 wrote:
> for sugar. In the field you already know thats not the case. Also... even
> when
> the activities are being implemented in python through the Activity Class,
> the
> read and write methods needs to be implemented by the programmer. That
> means it
> depends on the activity specifics again.

Well, if the activity doesn't save on close,it won't save on close and
will be messing up user data left-and-right.

We cannot design the system for brokenness...

cheers,


m
-- 
 martin.langh...@gmail.com
 mar...@laptop.org -- School Server Architect
 - ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 - http://wiki.laptop.org/go/User:Martinlanghoff
___
Sugar-devel mailing list
Sugar-devel@lists.sugarlabs.org
http://lists.sugarlabs.org/listinfo/sugar-devel


Re: [Sugar-devel] [Dextrose] Stability stuff

2010-09-03 Thread Martin Abente
On Fri, 03 Sep 2010 01:46:02 +0200, Bernie Innocenti 
wrote:
> El Thu, 02-09-2010 a las 09:26 -0400, Martin Abente escribió:
>> Weird, I really tried to trigger it on our last Dextrose build and
never
>> happened.
> 
> Perhaps it's gone, but I have not done anything to fix it. The bug seems
> to be in Pytrhon, dbus or their dependencies.
> 
> 
>> The whole idea of killing activities is a little bit controversial I
>> think, you have to assume to many things about activities, so far just
a
>> few activities in sugar uses all the proper mechanisms, I am afraid
that
>> in
>> most of the cases kids would just loose their current work.
> 
> I thought almost all activities understood the protocol for quitting
> cleanly (probably a dbus message). You can test it by clicking Stop from
> the menu on the icons top of the frame. That wouldn't work without
> sending an IPC message of some kind (probably we use dbus because we
> can't stand to use established X11 standards to manage applications).
> 

Well, thats true in theory, assuming all the activities are properly
designed
for sugar. In the field you already know thats not the case. Also... even
when
the activities are being implemented in python through the Activity Class,
the
read and write methods needs to be implemented by the programmer. That
means it
depends on the activity specifics again.


> 
>> What about... If the system load is already close to a "critical"
point,
>> SUGAR could just stop new activities from being executed with a proper
>> warning, and suggestions.
> 
> This is also a very good suggestion. We could start by doing this, which
> is a lot easier and almost equally effective.


I see it this way: Why waiting to get sick to do something about it.
Preventive medicine
is always better. Why waiting for the machine to freeze (waiting 3 or more
minutes until its back 
to a usable state again) to do something about it, also with potential
data loss.

Having a message telling kids that the machine is too overloaded should be
enough, with 
recommendations about saving any current work and closing earlier
activities. 

This kind of mechanisms should help to the overall stability, and it makes
even more sense when you
think about XO's 1 scenarios.

:)

  


___
Sugar-devel mailing list
Sugar-devel@lists.sugarlabs.org
http://lists.sugarlabs.org/listinfo/sugar-devel


Re: [Sugar-devel] [Dextrose] Stability stuff

2010-09-03 Thread Sascha Silbe
Excerpts from Bernie Innocenti's message of Fri Sep 03 01:46:02 +0200 2010:

> I thought almost all activities understood the protocol for quitting
> cleanly (probably a dbus message). You can test it by clicking Stop from
> the menu on the icons top of the frame. That wouldn't work without
> sending an IPC message of some kind (probably we use dbus because we
> can't stand to use established X11 standards to manage applications).

We simply use X11 to close the window:

In jarabe/view/palettes.py:

class CurrentActivityPalette(BasePalette):

def __stop_activate_cb(self, menu_item):
self._home_activity.get_window().close(1)


Sascha

--
http://sascha.silbe.org/
http://www.infra-silbe.de/


signature.asc
Description: PGP signature
___
Sugar-devel mailing list
Sugar-devel@lists.sugarlabs.org
http://lists.sugarlabs.org/listinfo/sugar-devel


Re: [Sugar-devel] [Dextrose] Stability stuff

2010-09-02 Thread Bernie Innocenti
El Thu, 02-09-2010 a las 09:26 -0400, Martin Abente escribió:
> Weird, I really tried to trigger it on our last Dextrose build and never
> happened.

Perhaps it's gone, but I have not done anything to fix it. The bug seems
to be in Pytrhon, dbus or their dependencies.


> The whole idea of killing activities is a little bit controversial I
> think, you have to assume to many things about activities, so far just a
> few activities in sugar uses all the proper mechanisms, I am afraid that in
> most of the cases kids would just loose their current work.

I thought almost all activities understood the protocol for quitting
cleanly (probably a dbus message). You can test it by clicking Stop from
the menu on the icons top of the frame. That wouldn't work without
sending an IPC message of some kind (probably we use dbus because we
can't stand to use established X11 standards to manage applications).


> What about... If the system load is already close to a "critical" point,
> SUGAR could just stop new activities from being executed with a proper
> warning, and suggestions.

This is also a very good suggestion. We could start by doing this, which
is a lot easier and almost equally effective.

-- 
   // Bernie Innocenti - http://codewiz.org/
 \X/  Sugar Labs   - http://sugarlabs.org/

___
Sugar-devel mailing list
Sugar-devel@lists.sugarlabs.org
http://lists.sugarlabs.org/listinfo/sugar-devel


Re: [Sugar-devel] [Dextrose] Stability stuff

2010-09-02 Thread Martin Abente
Weird, I really tried to trigger it on our last Dextrose build and never
happened.

The whole idea of killing activities is a little bit controversial I
think, you have to assume to many things about activities, so far just a
few activities in sugar uses all the proper mechanisms, I am afraid that in
most of the cases kids would just loose their current work.

What about... If the system load is already close to a "critical" point,
SUGAR could just stop new activities from being executed with a proper
warning, and suggestions.
___
Sugar-devel mailing list
Sugar-devel@lists.sugarlabs.org
http://lists.sugarlabs.org/listinfo/sugar-devel


Re: [Sugar-devel] [Dextrose] Stability stuff

2010-09-02 Thread Bernie Innocenti
El Tue, 31-08-2010 a las 12:00 -0400, Martin Abente escribió:
> Hey guys, 
> 
> I have been testing our last dextrose build for the XO 1, and comparing it
> to the previous os179py (Sugar 0.84) version. I have noticed that the
> kernel included in the os179os provides a mechanism for killing activities
> when the laptop runs out of memory.

This is the kernel out-of-memory killer. It's been in the Linux kernel
since 2.4.x, so all versions of the XO software include it.

The OOM killer makes its guess using heuristics. Sometimes, it could
kill the wrong process, leaving the machine in an unusable state.
Killing processes should be seen as a last-resort action, to recover
from a situation that should never happen. Unfortunately, Sugar does not
have any mechanism to gracefully quit activities when memory is tight. 

Technically, this is not a bug in Sugar or in Dextrose. OOM killing is
the normal behavior of Linux even on servers. It's just that it's too
easy to trigger on the XO.

If you grep the OLPC and Sugar development mailing lists, you'll find
many threads in which this topic was discussed and solutions were
proposed. One such threads happened recently in conjunction with the
discussion of Anish's CPU & Memory meter.

I liked the solution that was proposed last: when memory gets tight,
Sugar simply asks the least recently used activity to quit (and thus
save to the journal). Optionally, we could put on a notification to let
the user know what happened (after the fact).


> I have tried to trigger this mechanism on our last dextrose build, but
> with no results. Is it possible that our last kernel does not include this 
> mechanism? And in that case is there any reason for not including it?

No, all kernels include it. There are a bunch of tunables
in /proc/sys/vm to make the oom killer behave differently. The most
important one is "swappiness", which seems to be set a little bit too
high on the current OLPC kernels (all of them, not just Dextrose).

Note that we're using the very same kernel that OLPC uses on os852, so
bugs should be shared.

-- 
   // Bernie Innocenti - http://codewiz.org/
 \X/  Sugar Labs   - http://sugarlabs.org/

___
Sugar-devel mailing list
Sugar-devel@lists.sugarlabs.org
http://lists.sugarlabs.org/listinfo/sugar-devel