Re: Stability and Memory Pressure in 8.2

2008-10-07 Thread Tomeu Vizoso
On Wed, Oct 1, 2008 at 12:50 PM, S Page [EMAIL PROTECTED] wrote:
 Tomeu Vizoso wrote:

 Read has serious memory problems because renders whole pages
 into memory, regardless of what is the viewed area. Any chance the
 first pages of the PDF you opened contained big images?

 Nope, it's a saved Project Gutenberg PDF
 Coradella_Collegiate_Bookshelf_Collection_austen-persuasion.pdf, looks like
 it has one small image.
 Of course when I reproduce everything works fine.

 Is there *anything* testers can run before or after their XO goes funny?  My
 fantasy would be a section in http://wiki.laptop.org/go/Friends_in_testing :

  Run log_mem.py from a console as you start Sugar.  This snapshots memory
 every 10 seconds to a rotating set of files in tmpfs until it detects that
 the machine has serious memory problems, then it runs a detailed
 /sys/procmem dump.  It also runs strace on the OOMKiller thread.  If your XO
 locks up, zip this directory and attach it to bug 4321.

 Without something like that, it'll be hard to get anything more from testers
 than anecdotes.

Agreed and I think that the steps you outlined before could work great
here. Can you take over this task? Or someone else that feels more
capable?

Thanks,

Tomeu
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Stability and Memory Pressure in 8.2

2008-10-07 Thread Sayamindu Dasgupta
On Tue, Sep 30, 2008 at 1:50 PM, Tomeu Vizoso [EMAIL PROTECTED] wrote:
 On Tue, Sep 30, 2008 at 4:13 AM, S Page [EMAIL PROTECTED] wrote:
 On September 8th, Michael Stone wrote:
 Kim, Greg, and I have concluded that the instability we experience under
 memory-pressure in 8.2-759 and similar is the single hard issue that
 we wish to _attempt_ to address before releasing 8.2 on current
 timeframes.

 How did it go?

 I was going through my journal in 8.2-763.  Browse and Paint open,
 accidentally started Read, suddenly the cursor stopped moving and XO
 completely unresponsive.  I assume it's memory, but we never learned how
 to tell.

 Over two minutes later the first page of the PDF appeared and *then*
 immediately Sugar restarted.

 Just one datapoint.

 Thanks, Read has serious memory problems because renders whole pages
 into memory, regardless of what is the viewed area. Any chance the
 first pages of the PDF you opened contained big images?



I noticed some similar sounding freezes with Read, and it appeared to
me that the initial fit-to screen-width takes up significant amount of
resources. Of course, I'm not sure that this is the reason, but given
the problems we are having with zoom, it may be one of the
contributing factors (apart from the usual overhead of opening the
file and initial rendering of the pages)

Thanks,
Sayamindu



-- 
Sayamindu Dasgupta
[http://sayamindu.randomink.org/ramblings]
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Stability and Memory Pressure in 8.2

2008-10-01 Thread S Page
Tomeu Vizoso wrote:

 Read has serious memory problems because renders whole pages
 into memory, regardless of what is the viewed area. Any chance the
 first pages of the PDF you opened contained big images?

Nope, it's a saved Project Gutenberg PDF 
Coradella_Collegiate_Bookshelf_Collection_austen-persuasion.pdf, looks 
like it has one small image.
Of course when I reproduce everything works fine.

Is there *anything* testers can run before or after their XO goes funny? 
  My fantasy would be a section in 
http://wiki.laptop.org/go/Friends_in_testing :

   Run log_mem.py from a console as you start Sugar.  This snapshots 
memory every 10 seconds to a rotating set of files in tmpfs until it 
detects that the machine has serious memory problems, then it runs a 
detailed /sys/procmem dump.  It also runs strace on the OOMKiller 
thread.  If your XO locks up, zip this directory and attach it to bug 4321.

Without something like that, it'll be hard to get anything more from 
testers than anecdotes.

Sincerely,
--
=S
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Stability and Memory Pressure in 8.2

2008-09-30 Thread Tomeu Vizoso
On Tue, Sep 30, 2008 at 4:13 AM, S Page [EMAIL PROTECTED] wrote:
 On September 8th, Michael Stone wrote:
 Kim, Greg, and I have concluded that the instability we experience under
 memory-pressure in 8.2-759 and similar is the single hard issue that
 we wish to _attempt_ to address before releasing 8.2 on current
 timeframes.

 How did it go?

 I was going through my journal in 8.2-763.  Browse and Paint open,
 accidentally started Read, suddenly the cursor stopped moving and XO
 completely unresponsive.  I assume it's memory, but we never learned how
 to tell.

 Over two minutes later the first page of the PDF appeared and *then*
 immediately Sugar restarted.

 Just one datapoint.

Thanks, Read has serious memory problems because renders whole pages
into memory, regardless of what is the viewed area. Any chance the
first pages of the PDF you opened contained big images?

Regards,

Tomeu
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Stability and Memory Pressure in 8.2

2008-09-29 Thread S Page
On September 8th, Michael Stone wrote:
 Kim, Greg, and I have concluded that the instability we experience under
 memory-pressure in 8.2-759 and similar is the single hard issue that
 we wish to _attempt_ to address before releasing 8.2 on current
 timeframes.

How did it go?

I was going through my journal in 8.2-763.  Browse and Paint open, 
accidentally started Read, suddenly the cursor stopped moving and XO 
completely unresponsive.  I assume it's memory, but we never learned how 
to tell.

Over two minutes later the first page of the PDF appeared and *then* 
immediately Sugar restarted.

Just one datapoint.
Yours sincerely,
--
=S Page
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Stability and Memory Pressure in 8.2

2008-09-14 Thread Marco Pesenti Gritti
On Sun, Sep 14, 2008 at 6:42 AM, James Cameron [EMAIL PROTECTED] wrote:
 I recall someone noticed that the animated activity icon was redrawing
 the whole screen.  I think it got fixed.  Since it got fixed, I haven't
 seen as many OOMs during olpc-update.

It was not fixed.

http://dev.laptop.org/ticket/8000

Actually, we really should try to fix this one for 8.2.

Marco
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Stability and Memory Pressure in 8.2

2008-09-11 Thread C. Scott Ananian
On Wed, Sep 10, 2008 at 8:13 AM, Tomeu Vizoso [EMAIL PROTECTED] wrote:
 On Wed, Sep 10, 2008 at 2:05 PM, James Cameron [EMAIL PROTECTED] wrote:
 But I did notice one odd thing that I wasn't fully aware of until now
 ... the byte-code of the built-in modules was present, complete with doc
 strings ... for example;

 Yes, we are aware of this one and have a fix on the line:

 https://bugzilla.redhat.com/show_bug.cgi?id=460334

 There has been a thread recently on devel or sugar ml about it.

 If you could help us quantify how much this could help, it would be
 much appreciated.

Here's a quick reference to that previous thread:
  http://lists.laptop.org/pipermail/sugar/2008-August/007969.html

I guess I meant to turn on -OO on joyride, but didn't quite get around
to it; it would require patching/forking our numpy and python, and
then tweaking the sugar-shell startup to use -OO.  It looked like this
would save ~6M, but I don't know yet how much extra NAND space it
would take for the .pyo files.  I might be able to experiment and make
a build or two on the faster branch to quantify this.
 --scott

-- 
 ( http://cscott.net/ )
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Stability and Memory Pressure in 8.2

2008-09-11 Thread Tomeu Vizoso
On Thu, Sep 11, 2008 at 5:30 PM, C. Scott Ananian [EMAIL PROTECTED] wrote:
 On Wed, Sep 10, 2008 at 8:13 AM, Tomeu Vizoso [EMAIL PROTECTED] wrote:
 On Wed, Sep 10, 2008 at 2:05 PM, James Cameron [EMAIL PROTECTED] wrote:
 But I did notice one odd thing that I wasn't fully aware of until now
 ... the byte-code of the built-in modules was present, complete with doc
 strings ... for example;

 Yes, we are aware of this one and have a fix on the line:

 https://bugzilla.redhat.com/show_bug.cgi?id=460334

 There has been a thread recently on devel or sugar ml about it.

 If you could help us quantify how much this could help, it would be
 much appreciated.

 Here's a quick reference to that previous thread:
  http://lists.laptop.org/pipermail/sugar/2008-August/007969.html

 I guess I meant to turn on -OO on joyride, but didn't quite get around
 to it; it would require patching/forking our numpy and python, and
 then tweaking the sugar-shell startup to use -OO.  It looked like this
 would save ~6M, but I don't know yet how much extra NAND space it
 would take for the .pyo files.  I might be able to experiment and make
 a build or two on the faster branch to quantify this.

Would be great if you could look into it. I guess we could drop the
.pyc files and use the .pyo instead.

Thanks,

Tomeu
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Stability and Memory Pressure in 8.2

2008-09-11 Thread Tomeu Vizoso
On Thu, Sep 11, 2008 at 6:51 PM, C. Scott Ananian [EMAIL PROTECTED] wrote:
 On Thu, Sep 11, 2008 at 12:16 PM, Tomeu Vizoso [EMAIL PROTECTED] wrote:
 On Thu, Sep 11, 2008 at 5:30 PM, C. Scott Ananian [EMAIL PROTECTED] wrote:
 On Wed, Sep 10, 2008 at 8:13 AM, Tomeu Vizoso [EMAIL PROTECTED] wrote:
 On Wed, Sep 10, 2008 at 2:05 PM, James Cameron [EMAIL PROTECTED] wrote:
 But I did notice one odd thing that I wasn't fully aware of until now
 ... the byte-code of the built-in modules was present, complete with doc
 strings ... for example;

 Yes, we are aware of this one and have a fix on the line:

 https://bugzilla.redhat.com/show_bug.cgi?id=460334

 There has been a thread recently on devel or sugar ml about it.

 If you could help us quantify how much this could help, it would be
 much appreciated.

 Here's a quick reference to that previous thread:
  http://lists.laptop.org/pipermail/sugar/2008-August/007969.html

 I guess I meant to turn on -OO on joyride, but didn't quite get around
 to it; it would require patching/forking our numpy and python, and
 then tweaking the sugar-shell startup to use -OO.  It looked like this
 would save ~6M, but I don't know yet how much extra NAND space it
 would take for the .pyo files.  I might be able to experiment and make
 a build or two on the faster branch to quantify this.

 Would be great if you could look into it. I guess we could drop the
 .pyc files and use the .pyo instead.

 http://dev.laptop.org/ticket/8431 now tracks the issue.

 I've started by putting appropriately patched versions of python and
 numpy into joyride, so you can experiment with -OO on a joyride image
 without having to worry about these particular bugs.  I've confirmed
 that python 2.5.2 and numpy 1.2.0 already have/will have the relevant
 patches, so we probably won't need the fork by our next major release.
  --scott

 p.s. does anyone know why fedora isn't using python 2.5.2 yet?  It was
 released in February '08; I'm surprised that it's not in F9 or F10.

I think the plan is to get one 2.5.1 more in rawhide with some new
patches (including the -OO fix) and then doing a 2.5.2 rpm. If 2.5.2
brings more trouble than what can be solved for F10, then we can go
back to 2.5.1+patches.

Regards,

Tomeu
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Stability and Memory Pressure in 8.2

2008-09-10 Thread Marco Pesenti Gritti
On Wed, Sep 10, 2008 at 4:29 AM, Gary C Martin [EMAIL PROTECTED] wrote:

 OK, news is not great on the Activity front...

 SUMMARY: 759 vs 711 each Activity instance in 759 consumes an average
 of 1Mb more memory than the same Activity running in 711, with
 Write-57 reportedly taking significantly more than that (perhaps ~7Mb).

 Is top and/or ps memory usage calculated in the same way between these
 builds? Could make collecting real data pretty painful.

Unfortunately not, there has been changes in the kernel. My
understanding is that private memory will be the same, while
calculation of shared memory has changed. Riccardo has a new kernel
somewhere with instructions on how to install it on 711. That should
make the memory usage comparable.

Marco
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Stability and Memory Pressure in 8.2

2008-09-10 Thread Marco Pesenti Gritti
On Wed, Sep 10, 2008 at 4:29 AM, Gary C Martin [EMAIL PROTECTED] wrote:
 Well, I was hoping to see the numbers go the other way with the
 rainbow fork trick sharing more module code between Activities. Could
 be worse I guess – I should also test opening N instances of the same
 Activity and see which way memory usage has moved in that scenario.

Now that's worrying. Could you try to disable security (remove
/etc/olpc-security)? That will kill the rainbow trick and comparing
data should tell us if it's helping  memory at all.

Marco
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Stability and Memory Pressure in 8.2

2008-09-10 Thread John Gilmore
When measuring memory usage, cat /proc/XXX/smaps provides the most
accurate info available (as far as I know), and produces directly
comparable results in all OLPC software releases.  XXX is the process
number you're examining (first column of ps output).

The smaps file also tells you how many of the pages in each allocation
are actually *resident* in memory at this instant; how many are
uniquely used by this process (versus shared with other processes);
and how many of them are *dirty* (written by the process).  It also
includes the info that the pmap command produces (from /proc/XXX/maps).

E.g. part of the output for the sugar-shell process includes:

b6094000-b6101000 r-xp  1f:00 3391   /usr/lib/libcairo.so.2.17.5
Size:436 kB
Rss: 244 kB
Pss:  70 kB
Shared_Clean:244 kB
Shared_Dirty:  0 kB
Private_Clean: 0 kB
Private_Dirty: 0 kB
Referenced:  244 kB
b6101000-b6103000 rw-p 0006d000 1f:00 3391   /usr/lib/libcairo.so.2.17.5
Size:  8 kB
Rss:   8 kB
Pss:   8 kB
Shared_Clean:  0 kB
Shared_Dirty:  0 kB
Private_Clean: 0 kB
Private_Dirty: 8 kB
Referenced:8 kB

This says that there are two parts of the libcairo shared library
mapped into the process's address space.  One is 436 kB and is
readable and executable (it's the code segment).  Of that 436k, 244k
is currently resident in memory (Rss), all of that 244k is shared with
other processes and is clean (not written to).  The second part of
libcairo is only 8k bytes long, it's read/write and not executable
(it's the data segment and the BSS segment), all 8k has been read into
RAM, all 8k is private (not shared with other processes) and dirty
(has been modified by the process).  There's a lot more info in there
too, such as what virtual addresses these things are mapped into, and
from what offset within the file.  The size, nm, and objdump -h
commands from yum install binutils will help you compare these
offsets back into the compiler output and thus into the source code.

Dirty memory is particularly pernicious on the XO, since the XO has
nowhere to keep it except in RAM.  On normal Linux systems, when dirty
pages are not expected to be used again soon, they can be paged out to
the swap space on disk.  On the XO, which has no swap space, those
pages burn up RAM permanently, even if the process goes to sleep for a
year and never wakes up again.  The dirty memory is released only when
the process exits (or when the process explicitly unmaps it, which
seldom happens).  These long-term dirty pages produce more memory
pressure (less available memory) for all the other processes that are
actually active and getting work done for the user.

When a Linux system gets memory pressure, it tosses out whatever it
can to swap space (nothing, on the XO) and then it starts throwing
away pages that it knows it can later re-read from the file system.
The resident, clean pages that are mapped from files are what get
thrown away (like the first segment of libcairo above).  When the XO
runs low on memory, this means that it throws away a lot of pages
containing executable code.  If the code on those pages is
subsequently executed, those pages will be read back in from the file
system.  Note that reading in a 4k page from the JFFS2 compressed
filesystem is not a cheap operation; a lot of system CPU time goes
into decompressing it (compared to an ordinary Linux system with a
hard disk and an ext3 filesystem).  Throwing away code pages and then
immediately reading them back in again, over and over, thrashing,
may be why the XO gets very slow when memory is tight.

It may be possible and useful to store some commonly used executables
and shared libraries as uncompressed files in jffs2, making them much
faster to page back in from Flash.  Nobody has tried doing this, as
far as I know.

I don't know how to instrument the kernel virtual memory subsystem to
gain visibility into which pages are being discarded when, and which
are being read in later.  I think that info would be extremely useful
for debugging 8.2's low-memory hangs.  Thrashing would become obvious
if you see the same pages being read in over and over.  Of course,
when the machine is thrashing, it's hard to see any output from its
kernel...

John
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Stability and Memory Pressure in 8.2

2008-09-10 Thread Tomeu Vizoso
On Wed, Sep 10, 2008 at 11:53 AM, John Gilmore [EMAIL PROTECTED] wrote:

 It may be possible and useful to store some commonly used executables
 and shared libraries as uncompressed files in jffs2, making them much
 faster to page back in from Flash.  Nobody has tried doing this, as
 far as I know.

Please, I would love to see this as well...

Thanks,

Tomeu
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Stability and Memory Pressure in 8.2

2008-09-10 Thread Tomeu Vizoso
On Tue, Sep 9, 2008 at 6:10 AM, Michael Stone [EMAIL PROTECTED] wrote:

  * We need to check carefully for memory-leaks. Three mechanisms which
occur to me include:

Looks like we have regressed on http://dev.laptop.org/ticket/5532 .
Just entered http://dev.laptop.org/ticket/8394 because most of the
details on the older ticket aren't relevant any more. It contains a
fix.

This means we leak 20KB per buddy, so 10 buddies appearing per hour
would match with Paul's observations.

We should setup automated tests and check if we still leak, how we
could resource this?

Regards,

Tomeu
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Stability and Memory Pressure in 8.2

2008-09-10 Thread riccardo
On Wed, 2008-09-10 at 09:15 +0200, Marco Pesenti Gritti wrote:
 On Wed, Sep 10, 2008 at 4:29 AM, Gary C Martin [EMAIL PROTECTED] wrote:
 
  OK, news is not great on the Activity front...
 
  SUMMARY: 759 vs 711 each Activity instance in 759 consumes an average
  of 1Mb more memory than the same Activity running in 711, with
  Write-57 reportedly taking significantly more than that (perhaps ~7Mb).
 
  Is top and/or ps memory usage calculated in the same way between these
  builds? Could make collecting real data pretty painful.
 
 Unfortunately not, there has been changes in the kernel. My
 understanding is that private memory will be the same, while
 calculation of shared memory has changed. Riccardo has a new kernel
 somewhere with instructions on how to install it on 711. That should
 make the memory usage comparable.
 
 Marco

I used this newer kernel (as it accounts also for pss) for measurements
with ps_mem on build 703:
http://dev.laptop.org/~rlucchese/utils/703/kernel-2.6.25-20080501.3.olpc.231c7b715f4a8d0.i586.rpm

It can be installed on the xo with:
$ rpm -ivh kernel-rpm
$ cp -a /boot/* /versions/boot/current/boot/

You will also have to update the ram disk image; you can follow the 
instructions at the bottom of
http://wiki.laptop.org/go/Kernel_Building


You may also want to try this patched ps_mem (shows pids and doesn't group 
entries by process name):
http://dev.laptop.org/~rlucchese/utils/ps_mem.py


riccardo


___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Stability and Memory Pressure in 8.2

2008-09-10 Thread James Cameron
I had a few hours look at the second largest process, the journal
activity, on Joyride 2412.

VmPeak:40440 kB
VmSize:40436 kB
VmLck: 0 kB
VmHWM: 28824 kB
VmRSS: 28824 kB
VmData:11632 kB
VmStk:   172 kB
VmExe: 4 kB
VmLib: 21992 kB
VmPTE:48 kB

so it costs 29Mb or so of RSS, most of which is presumably shared.

This is confirmed by smaps, which showed 9Mb or so used by heap.  That
was the main memory cost, so I concentrated on it.  It was the largest
Private_Dirty.

0824f000-08be9000 rw-p 0824f000 00:00 0  [heap]
Size:   9832 kB
Rss:9608 kB
Pss:9608 kB
Shared_Clean:  0 kB
Shared_Dirty:  0 kB
Private_Clean: 0 kB
Private_Dirty:  9608 kB
Referenced: 9608 kB

For a while I tried working with /proc/$PID/mem until I figured it just
would not work, always got ESRCH on read(2), and mem_read showed I could
only do it to processes that are children of my process.  Odd, so
abandoned that method.

Then I used gdb to generate-core-file and wander through the heap memory
to get an idea of what it might contain.  I did not make a complete
analysis.  I need to learn more about the heap structures before I do
so.

But I did notice one odd thing that I wasn't fully aware of until now
... the byte-code of the built-in modules was present, complete with doc
strings ... for example;

(gdb) x/4bs 0x824f78c
0x824f78c:   int(x[, base]) - integer\n\nConvert a string or
number to an integer, if possible.  A floating point\nargument will be
truncated towards zero (this does not include a string\nrepresentation
of a floating...
0x824f854:point number!)  When converting a string, use\nthe
optional base.  It is an error to supply a base when converting
a\nnon-string. If the argument is outside the integer range a long
object\nwill be retu...
0x824f91c:   rned instead.
0x824f92a:   

and not just once, twice:

$ strings journal.core |grep supply a base when converting
the optional base.  It is an error to supply a base when converting a
the optional base.  It is an error to supply a base when converting a

Has anyone got an idea of how to measure the heap by usage?

-- 
James Cameronmailto:[EMAIL PROTECTED] http://quozl.netrek.org/
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Stability and Memory Pressure in 8.2

2008-09-10 Thread pgf
tomeu wrote:
  On Wed, Sep 10, 2008 at 2:05 PM, James Cameron [EMAIL PROTECTED] wrote:
  
   Has anyone got an idea of how to measure the heap by usage?
  
  Not from outside python, but from inside we are using heapy:
  
  http://guppy-pe.sourceforge.net/

i started down that path yesterday afternoon, and realized that it
wasn't clear to me how i needed to invoke it.  it seems to want
to be imported before you start the rest of your program, which
sort of forces you into interactive mode.  is that your understanding?
i had been hoping to be able to attach to the sugar shell process,
in the way one might do with gdb.  perhaps that's not possible.

btw, i continued doing monitoring of the machines i had running:
i need to look again after they've been running overnight when i
get to the office, but the growth i was seeing may be network related,
as tomeu suggested yesterday.  (i had at least one case of no growth
at all when i had disabled the wireless.)

paul
=-
 paul fox, [EMAIL PROTECTED]
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Stability and Memory Pressure in 8.2

2008-09-10 Thread riccardo
Paul, 

On Wed, 2008-09-10 at 08:18 -0400, [EMAIL PROTECTED] wrote:
 tomeu wrote:
   On Wed, Sep 10, 2008 at 2:05 PM, James Cameron [EMAIL PROTECTED] wrote:
   
Has anyone got an idea of how to measure the heap by usage?
   
   Not from outside python, but from inside we are using heapy:
   
   http://guppy-pe.sourceforge.net/
 
 i started down that path yesterday afternoon, and realized that it
 wasn't clear to me how i needed to invoke it.  it seems to want
 to be imported before you start the rest of your program, which
 sort of forces you into interactive mode.  is that your understanding?
 i had been hoping to be able to attach to the sugar shell process,
 in the way one might do with gdb.  perhaps that's not possible.
 

There is kick-start tutorial on how to use heapy's remote monitor at the
56th page of http://guppy-pe.sourceforge.net/heapy-thesis.pdf

For the shell I use to put `import guppy.heapy.RM' before any other
import statement in main.py.

riccardo

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Stability and Memory Pressure in 8.2

2008-09-10 Thread Tomeu Vizoso
On Wed, Sep 10, 2008 at 2:37 PM, riccardo [EMAIL PROTECTED] wrote:
 Paul,

 On Wed, 2008-09-10 at 08:18 -0400, [EMAIL PROTECTED] wrote:
 tomeu wrote:
   On Wed, Sep 10, 2008 at 2:05 PM, James Cameron [EMAIL PROTECTED] wrote:
  
Has anyone got an idea of how to measure the heap by usage?
  
   Not from outside python, but from inside we are using heapy:
  
   http://guppy-pe.sourceforge.net/

 i started down that path yesterday afternoon, and realized that it
 wasn't clear to me how i needed to invoke it.  it seems to want
 to be imported before you start the rest of your program, which
 sort of forces you into interactive mode.  is that your understanding?
 i had been hoping to be able to attach to the sugar shell process,
 in the way one might do with gdb.  perhaps that's not possible.


 There is kick-start tutorial on how to use heapy's remote monitor at the
 56th page of http://guppy-pe.sourceforge.net/heapy-thesis.pdf

 For the shell I use to put `import guppy.heapy.RM' before any other
 import statement in main.py.

Another pointer:

http://guppy-pe.sourceforge.net/heapy_Use.html#heapykinds.Use.monitor

Other ways of using guppy are logging out periodically the heap with
gobject.timeout_add or patching keyhandler.py to print the heap (or a
diff of it) when a key combination is pressed.

Regards,

Tomeu
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Stability and Memory Pressure in 8.2

2008-09-10 Thread david
On Wed, 10 Sep 2008, Tomeu Vizoso wrote:

 On Wed, Sep 10, 2008 at 11:53 AM, John Gilmore [EMAIL PROTECTED] wrote:

 It may be possible and useful to store some commonly used executables
 and shared libraries as uncompressed files in jffs2, making them much
 faster to page back in from Flash.  Nobody has tried doing this, as
 far as I know.

 Please, I would love to see this as well...

not for this release, but would the axfs be an option in the future with 
it's execute in place capability for key files? or is the performance 
difference compared to ram such that you wouldn't want it in any case?

either way, the profiling it does of which pages are used (and how much) 
could be useful in figuring out what binaries should be stored 
uncompressed.

David Lang
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Stability and Memory Pressure in 8.2

2008-09-10 Thread pgf
tomeu wrote:
  On Wed, Sep 10, 2008 at 2:37 PM, riccardo [EMAIL PROTECTED] wrote:
   Paul,
  
   On Wed, 2008-09-10 at 08:18 -0400, [EMAIL PROTECTED] wrote:
   i started down that path yesterday afternoon, and realized that it
   wasn't clear to me how i needed to invoke it.  it seems to want
   to be imported before you start the rest of your program, which
   sort of forces you into interactive mode.  is that your understanding?
   i had been hoping to be able to attach to the sugar shell process,
   in the way one might do with gdb.  perhaps that's not possible.
  
  
   There is kick-start tutorial on how to use heapy's remote monitor at the
   56th page of http://guppy-pe.sourceforge.net/heapy-thesis.pdf
  
   For the shell I use to put `import guppy.heapy.RM' before any other
   import statement in main.py.
  
  Another pointer:
  
  http://guppy-pe.sourceforge.net/heapy_Use.html#heapykinds.Use.monitor
  
  Other ways of using guppy are logging out periodically the heap with
  gobject.timeout_add or patching keyhandler.py to print the heap (or a
  diff of it) when a key combination is pressed.

thank you for both of those pointers.

paul
=-
 paul fox, [EMAIL PROTECTED]
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Stability and Memory Pressure in 8.2

2008-09-10 Thread Michael Stone
On Wed, Sep 10, 2008 at 02:13:24PM +0200, Tomeu Vizoso wrote:
Not from outside python, but from inside we are using heapy:

http://guppy-pe.sourceforge.net/

Tomeu already published some guppy RPMs but here is a git repo with
pacakging instructions (Makefiles) should you wish to make any changes.

   http://dev.laptop.org/git/users/mstone/heapy

(Look for the $(TARBALL) rule in Makefile.fedora if you want to change
anything; that Makefile is configured to pull directly from PyPI...)

Michael

  
Here are the resulting RPMs:  

http://dev.laptop.org/~mstone/releases/RPMS/guppy-0.1.8-1.fc9.i386.rpm
http://dev.laptop.org/~mstone/releases/RPMS/guppy-debuginfo-0.1.8-1.fc9.i386.rpm
http://dev.laptop.org/~mstone/releases/SRPMS/guppy-0.1.8-1.fc9.src.rpm

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Stability and Memory Pressure in 8.2

2008-09-10 Thread Nate Ridderman
My layman's understanding is that you can't execute in place from the NAND
flash on the XO. XIP requires NOR flash which is more expensive than NAND
but has faster read speeds. It mentions this briefly on the axfs FAQ.

Storing some executables and libraries in a separate uncompressed partition
seems more plausible, but I can't speculate on the system performance
impact.

Thanks,
Nate

On Wed, Sep 10, 2008 at 9:38 AM, [EMAIL PROTECTED] wrote:

 On Wed, 10 Sep 2008, Tomeu Vizoso wrote:

  On Wed, Sep 10, 2008 at 11:53 AM, John Gilmore [EMAIL PROTECTED] wrote:
 
  It may be possible and useful to store some commonly used executables
  and shared libraries as uncompressed files in jffs2, making them much
  faster to page back in from Flash.  Nobody has tried doing this, as
  far as I know.
 
  Please, I would love to see this as well...

 not for this release, but would the axfs be an option in the future with
 it's execute in place capability for key files? or is the performance
 difference compared to ram such that you wouldn't want it in any case?

 either way, the profiling it does of which pages are used (and how much)
 could be useful in figuring out what binaries should be stored
 uncompressed.

 David Lang
 ___
 Devel mailing list
 Devel@lists.laptop.org
 http://lists.laptop.org/listinfo/devel

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Stability and Memory Pressure in 8.2

2008-09-10 Thread Gary C Martin
On 10 Sep 2008, at 08:27, Marco Pesenti Gritti wrote:

 On Wed, Sep 10, 2008 at 4:29 AM, Gary C Martin  
 [EMAIL PROTECTED] wrote:
 Well, I was hoping to see the numbers go the other way with the
 rainbow fork trick sharing more module code between Activities. Could
 be worse I guess – I should also test opening N instances of the same
 Activity and see which way memory usage has moved in that scenario.

 Now that's worrying. Could you try to disable security (remove
 /etc/olpc-security)? That will kill the rainbow trick and comparing
 data should tell us if it's helping  memory at all.


Sure, retested build 759 with and without /etc/olpc-security, same  
test procedure as before:

With /etc/olpc-security removed, a reboot, and all five Activities  
launched, free buffers/cache reports 21Mb more is being used (up to  
192Mb from 171Mb). Though looking at each Activity's footprint shows a  
less clear signal where Write-57 and Record-57 actually have a  
considerably smaller footprint; and Calculate-23, Paint-22, and Moon-4  
have a slightly larger footprint (and shared memory is actually  
reported as having increased).

Write-57
with - 15.5% (RES=35m, SHR=13m, DATA=20m)
without - 13.8% (RES=31m, SHR=12m, DATA=18m)

Record-57
with - 14.2% (RES=32m, SHR=14m, DATA=64m)
without - 11.5% (RES=26m, SHR=11m, DATA=61m)

Calculate-23
with - 10.6% (RES=24m, SHR=8m, DATA=15m)
without - 11.3% (RES=25m, SHR=11m, DATA=13m)

Paint-22
with - 10.1% (RES=23m, SHR=8m, DATA=14m)
without - 10.6% (RES=24m, SHR=11m, DATA=12m)

Moon-4
with - 9.7% (RES=22m, SHR=8m, DATA=13m)
without - 10.3% (RES=23m, SHR=11m, DATA=11m)

Also I noticed, for some reason, X uses 6-8Mb less resident memory  
with /etc/olpc-security removed. That was unexpected enough for me to  
re-check the results several times:

/usr/bin/X
with - 7.5% (RES=17m, SHR=13m, DATA=9m)
with - 7.5% (RES=17m  SHR=13m DATA=9m)

without - 5.1% (RES=11m, SHR=8m, DATA=9m)
without - 4.3% (RES=9m, SHR=7m, DATA=9m)

Hmm, so what actually took up the extra 21Mb in total that the rainbow  
trick does appear to be saving us (considering most of the above items  
all add up as memory savings when disabling the rainbow trick)?

I seem to be generating more questions than answers here!

--Gary
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Stability and Memory Pressure in 8.2

2008-09-10 Thread Michael Stone
A more accurate test would be to disable the preloading itself rather
than disabling isolation but leaving rainbow loading the libraries. :)

To do that, see lines 31-32 of

   /usr/lib/python2.5/site_packages/rainbow/service.py

You want to set self.preloader_hint = False and comment out the call to
self.preload_common_modules() by putting '#' at the beginning.

Regards,

Michael
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Stability and Memory Pressure in 8.2

2008-09-09 Thread Tomeu Vizoso
On Tue, Sep 9, 2008 at 6:10 AM, Michael Stone [EMAIL PROTECTED] wrote:

  * We need to find out why the oom-killer is not killing things fast
enough. Based on our results, we might consider configuring
/proc/$pid/oom_adj to preferentially kill some processes (e.g., the
foreground [or background?] activities.)

Any reason why killing first activities' processes wouldn't solve the
stability issue? AFAIK, we haven't seen OOM conditions without any
activity open.

Just in case, I'm not saying that isn't worth to do any of the other
things on your list.

Regards,

Tomeu
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Stability and Memory Pressure in 8.2

2008-09-09 Thread Martin Dengler
On Tue, Sep 09, 2008 at 12:10:53AM -0400, Michael Stone wrote:
 Dear devel@,
 
 Kim, Greg, and I have concluded that the instability we experience under
 memory-pressure in 8.2-759 and similar is the single hard issue that
 we wish to _attempt_ to address before releasing 8.2 on current
 timeframes.
[...]
   * We ought to ponder whether there are any additional dirty hacks we
 can experiment with in order to reduce memory consumption; for
 example, running the Shell and Journal (and DS?) in one process or
 making use of the compressed-caching code published on this list some
 months ago.

Compcache has been working well enough for me for the last six months
to suggest that wider testing wouldn't be a disaster.

-bash-3.2# cat /boot/olpc_build
joyride 2399

-bash-3.2# free
 total   used   free sharedbufferscached
Mem:235716 230356   5360  0   162865448
-/+ buffers/cache: 163280  72436
Swap:58924   2736  56188

-bash-3.2# swapon -s
FilenameTypeSizeUsed Priority
/dev/ramzswap0  partition   58924   2736 100

The trac ticket is http://dev.laptop.org/ticket/28

 Regards,
 
 Michael

Martin


pgpTfTp0tmNft.pgp
Description: PGP signature
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Stability and Memory Pressure in 8.2

2008-09-09 Thread Jim Gettys
There are four classes of things we can/should/could do:
1) understand where our memory is being used.  Individual bugs can have
a large effect.  Something stupid could be hurting us badly, and we
won't know unless we look.  What is more, we need to invest in tools
that allow us to monitor this.
2) there are some band-aids that have been discussed, such as rlimits,
which we can experiment with, and that *might* improve the situation
without the real solutions the next two items go into.
3) the oom killer's default algorithms are pretty terrible, taking
little into account in the choice of what gets killed.  Between
Sugar/Rainbow, and knowledge that the window manager has, one could do
much better.
4) we provide no end user feedback on memory usage, either.  We should
investigate whether revisiting our previous attempt to give such
feedback, now that Linux can provide much better information than it
could when we abandoned our previous donut attempt.  The users could
really help, if only we let them know a bit about what was going on...

In terms of priority: immediately examining what is going on with memory
usage in case we have a bad leak is clearly worthwhile (1).  We need to
budget for tool-building to monitor the situation going forward
immediately.

2) and *possibly* (a beginning on) 3 may be 8.2.1 fodder, but without
feedback from more users, we won't know if this isn't just keys under
the lamppost (e.g. our multiple bug reports about browse ooming because
of our amazingly stupid hardware wiki page, which is one of the most
egregious pages I've seen in recent memory.

Doing 3) pretty well I suspect is 9.1 fodder, but only if we start very
soon.  My gut tells me its some man-months of work.  We might get lucky
and should investigate if any of the embedded folks have something we
can use.  Unfortunately, the Nokia folks I had thought might have
something didn't, when I last checked a year ago.  But we can/should
check a bit first before diving in; it's a year later.
http://dev.laptop.org/ticket/1995

I urge we investigate quickly whether 4) is, in fact, feasible, so that
it can go on the Sugar roadmap in time to be done for 9.1.
- Jim


On Tue, 2008-09-09 at 13:02 +0200, Tomeu Vizoso wrote:
 On Tue, Sep 9, 2008 at 6:10 AM, Michael Stone [EMAIL PROTECTED] wrote:
 
   * We need to find out why the oom-killer is not killing things fast
 enough. Based on our results, we might consider configuring
 /proc/$pid/oom_adj to preferentially kill some processes (e.g., the
 foreground [or background?] activities.)
 
 Any reason why killing first activities' processes wouldn't solve the
 stability issue? AFAIK, we haven't seen OOM conditions without any
 activity open.
 
 Just in case, I'm not saying that isn't worth to do any of the other
 things on your list.
 
 Regards,
 
 Tomeu
 ___
 Devel mailing list
 Devel@lists.laptop.org
 http://lists.laptop.org/listinfo/devel
-- 
Jim Gettys [EMAIL PROTECTED]
One Laptop Per Child

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Stability and Memory Pressure in 8.2

2008-09-09 Thread riccardo
On Tue, 2008-09-09 at 00:10 -0400, Michael Stone wrote:
 Dear devel@,
 
 Kim, Greg, and I have concluded that the instability we experience under
 memory-pressure in 8.2-759 and similar is the single hard issue that
 we wish to _attempt_ to address before releasing 8.2 on current
 timeframes. (We recognize that there are several other issues marked
 as blocking the release but we are confident that they will be resolved
 satisfactorily or are, in a few cases, beyond help.)
 
 Since most other aspects of the release seem to be running smoothly, Kim
 asked me to take a more direct role in organizing our efforts produce a
 release which avoids memory pressure when possible and which is
 better-behaved when it strikes.
 
 To that end, I would like to ask for your assistance with the following
 questions and tasks:
 
   * We need to determine why we encounter low-memory and out-of-memory
 situations more frequently than in previous releases. 
 
 - This means that we need to measure how our memory consumption
   profile has changed since our previous releases. 
 
   (cscott observes that we were unable to attack the F-9 image size
   issues until we were able to quantify the effect of changes we had
   made or were considering making. Consequently, he suggests that we
   will be unable to attack our current space consumption problems
   until we are able to generate good numbers (and displays).)
 
 - We need to think carefully about (or measure) whether our
   memory-consumption patterns have changed. I am particularly
   skeptical of our widespread use of tmpfsen since the pages consumed
   by files stored on tmpfsen are permanently dirty (and are perhaps
   accounted for differently than pages mapped into process' address
   spaces?) 
 
 - We need to check the configuration of applications like Browse
   which have configurable caching behavior. (Search for cache or
   capacity in about:config; check for important compile-time
   configuration flags.)
 
 - We need to test in a variety of different network configurations
   in order to determine to what extent the network/presence
   environment affects memory consumption.

   * We need to check carefully for memory-leaks. Three mechanisms which
 occur to me include: 
 
   1) running the system for a period of time, then scanning for
  anomalies either manually or in some automated fashion from
  userland, kernel-land, or OFW (via SysRq or SMM).
   
   2) setting rlimits various processes and noting what dies 
 
   3) using debugging tools like the python garbage collection
  module, guppy/heapy, gdb+macros, valgrind, efence, purify, etc.
  looking for trouble.
 
   * We need to find out why the oom-killer is not killing things fast
 enough. Based on our results, we might consider configuring
 /proc/$pid/oom_adj to preferentially kill some processes (e.g., the
 foreground [or background?] activities.)
 
   * We need to determine whether the oom-killer is killing the right
 processes. (sysctl's vm.oom_dump_tasks can be set to 1 in order to
 get more verbosity from the oom-killer when it fires).
 
   * We ought to ponder whether there are any additional dirty hacks we
 can experiment with in order to reduce memory consumption; for
 example, running the Shell and Journal (and DS?) in one process or
 making use of the compressed-caching code published on this list some
 months ago.
 
   * Random other stuff to think about:
  
 - rlimits, cgroups, and the memory resource controller
 
 - the warnings in the ramfs and tmpfs code about the deadlocks that
   tmpfsen can generate under low- or no-memory conditions.
 
 - whether our kernel overcommits when allocation requests are made?
 
 - whether we can get Browse to behave intelligently when it receives
   BadAlloc errors from X?
 
 - how to run bootchart on the XO
 
 - how to generate decent statistics and graphics (preferably in an
   automated fashion) concerning memory usage as part of our test
   suite
 
 - system-tap's kmalloc2.stp example
 
 In conclusion, more to come once I have some actual data; _please_ feel
 free to assist in collecting it! (though be aware that I may 'volunteer'
 you if I need your help. (That means you, Tomeu, Riccardo, Deepak,
 ...)).
 
 Regards,
 
 Michael


There are some (trivial) tools (you may be interested in) I've written
and used besides others to attack/study this issues:

 * picker [1]
For me it was handier to use then bootchart; will also show per process
mem usage. 

 * imports timings and alloc statistics [2]
Patch to python that prints timings and mem usage diffs for every
imported module. Original timings patch is from Tomeu.

 * python-allocstatsmodule [3]
Inspired by [2] but can be used inside python scripts to collect
stats on heap 

Re: Stability and Memory Pressure in 8.2

2008-09-09 Thread pgf
On Tue, 2008-09-09 at 00:10 -0400, Michael Stone wrote:
 
 - This means that we need to measure how our memory consumption
   profile has changed since our previous releases. 
 
   (cscott observes that we were unable to attack the F-9 image size
   issues until we were able to quantify the effect of changes we had
   made or were considering making. Consequently, he suggests that we
   will be unable to attack our current space consumption problems
   until we are able to generate good numbers (and displays).)

what's the baseline previous release for this comparison?

paul
=-
 paul fox, [EMAIL PROTECTED]
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Stability and Memory Pressure in 8.2 ([EMAIL PROTECTED])

2008-09-09 Thread Greg Smith
Hi All,

I recommend build 708 as the baseline.

Thanks,

Greg S

**
 Date: Tue, 09 Sep 2008 11:34:50 -0400
 From: [EMAIL PROTECTED]
 Subject: Re: Stability and Memory Pressure in 8.2 
 To: devel@lists.laptop.org
 Message-ID: [EMAIL PROTECTED]
 Content-Type: text/plain; charset=us-ascii
 
 On Tue, 2008-09-09 at 00:10 -0400, Michael Stone wrote:
 
 - This means that we need to measure how our memory consumption
   profile has changed since our previous releases. 

   (cscott observes that we were unable to attack the F-9 image size
   issues until we were able to quantify the effect of changes we had
   made or were considering making. Consequently, he suggests that we
   will be unable to attack our current space consumption problems
   until we are able to generate good numbers (and displays).)
 
 what's the baseline previous release for this comparison?
 
 paul
 =-
  paul fox, [EMAIL PROTECTED]
 
 
 --
 
 ___
 Devel mailing list
 Devel@lists.laptop.org
 http://lists.laptop.org/listinfo/devel
 
 
 End of Devel Digest, Vol 31, Issue 30
 *
 
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Stability and Memory Pressure in 8.2

2008-09-09 Thread Deepak Saxena
   * We need to find out why the oom-killer is not killing things fast
 enough. Based on our results, we might consider configuring
 /proc/$pid/oom_adj to preferentially kill some processes (e.g., the
 foreground [or background?] activities.)

In the cases I've been playing with, browse is the only activity that
is running. Will try bumping its oom_adj to see if this improves OOM
kill latency.

   * We need to determine whether the oom-killer is killing the right
 processes. (sysctl's vm.oom_dump_tasks can be set to 1 in order to
 get more verbosity from the oom-killer when it fires).

From watching top, it appears that we're killing the correct process. For 
example, when running the test case from #8316, OOM killer does not kill 
browse, but just kills the gnash instance which is chewing up RAM.

 - the warnings in the ramfs and tmpfs code about the deadlocks that
   tmpfsen can generate under low- or no-memory conditions.

I have yet to see an actual deadlock. What I saw when trying to
reproduce #3816 is that the OOM killer just takes a very very long
time to kick in.

 - whether our kernel overcommits when allocation requests are made?

By default vm.overcommit_memory is set to 0 which will refuse Obvious
overcommits of address space. I will try setting this to 3 along with
vm.overcommit_ratio to 0 to force no overcommit at all and see how the 
system reacts.

~Deepak

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Stability and Memory Pressure in 8.2

2008-09-09 Thread Daniel Drake
On Tue, 2008-09-09 at 00:10 -0400, Michael Stone wrote:
 - whether we can get Browse to behave intelligently when it receives
   BadAlloc errors from X?

I have no doubt that Browse/xulrunner has room for improvement with
memory usage but this is not where you should be looking. These BadAlloc
messages are true errors generated when the application requests pixmaps
outside of the coordinate range accepted by X (this is well
documented). 

This is a real bug in the code, not a memory pressure issue. Such
requests should never be generated, and the application crashing is
probably the behaviour we want.

Daniel


___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Stability and Memory Pressure in 8.2

2008-09-09 Thread C. Scott Ananian
On Tue, Sep 9, 2008 at 11:34 AM,  [EMAIL PROTECTED] wrote:
 On Tue, 2008-09-09 at 00:10 -0400, Michael Stone wrote:

 - This means that we need to measure how our memory consumption
   profile has changed since our previous releases.

   (cscott observes that we were unable to attack the F-9 image size
   issues until we were able to quantify the effect of changes we had
   made or were considering making. Consequently, he suggests that we
   will be unable to attack our current space consumption problems
   until we are able to generate good numbers (and displays).)

 what's the baseline previous release for this comparison?

update.1 (703, 708, 713 or your choice)

You could also try using the pre-F-9 merge joyrides for comparison,
but that presupposes that our memory problems are a side-effect of the
F-9 merge, and I don't think that we have any evidence of this yet.
 --scott

-- 
 ( http://cscott.net/ )
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Stability and Memory Pressure in 8.2

2008-09-09 Thread C. Scott Ananian
On Tue, Sep 9, 2008 at 7:02 AM, Tomeu Vizoso [EMAIL PROTECTED] wrote:
 On Tue, Sep 9, 2008 at 6:10 AM, Michael Stone [EMAIL PROTECTED] wrote:

  * We need to find out why the oom-killer is not killing things fast
enough. Based on our results, we might consider configuring
/proc/$pid/oom_adj to preferentially kill some processes (e.g., the
foreground [or background?] activities.)

 Any reason why killing first activities' processes wouldn't solve the
 stability issue? AFAIK, we haven't seen OOM conditions without any
 activity open.

Yes, we have.  In particular, if you update your system and then leave
it for a while, and later click the software update control panel, you
end up OOMing in the control panel.  Sugar restarts and reports are
that software update works fine the second time.  So this might well
be a sugar leak; killing 'sugar' is not good for stability.
 --scott

-- 
 ( http://cscott.net/ )
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Stability and Memory Pressure in 8.2

2008-09-09 Thread Tomeu Vizoso
On Tue, Sep 9, 2008 at 7:39 PM, C. Scott Ananian [EMAIL PROTECTED] wrote:
 On Tue, Sep 9, 2008 at 7:02 AM, Tomeu Vizoso [EMAIL PROTECTED] wrote:
 On Tue, Sep 9, 2008 at 6:10 AM, Michael Stone [EMAIL PROTECTED] wrote:

  * We need to find out why the oom-killer is not killing things fast
enough. Based on our results, we might consider configuring
/proc/$pid/oom_adj to preferentially kill some processes (e.g., the
foreground [or background?] activities.)

 Any reason why killing first activities' processes wouldn't solve the
 stability issue? AFAIK, we haven't seen OOM conditions without any
 activity open.

 Yes, we have.  In particular, if you update your system and then leave
 it for a while, and later click the software update control panel, you
 end up OOMing in the control panel.  Sugar restarts and reports are
 that software update works fine the second time.  So this might well
 be a sugar leak; killing 'sugar' is not good for stability.

That sounds pretty awful, do we have a ticket with precise
instructions about how to reproduce? How much time approx. need to
wait after updating sugar?

Regards,

Tomeu
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Stability and Memory Pressure in 8.2

2008-09-09 Thread Jim Gettys
On Tue, 2008-09-09 at 13:10 -0400, Daniel Drake wrote:
 On Tue, 2008-09-09 at 00:10 -0400, Michael Stone wrote:
  - whether we can get Browse to behave intelligently when it receives
BadAlloc errors from X?
 
 I have no doubt that Browse/xulrunner has room for improvement with
 memory usage but this is not where you should be looking. These BadAlloc
 messages are true errors generated when the application requests pixmaps
 outside of the coordinate range accepted by X (this is well
 documented). 
 
 This is a real bug in the code, not a memory pressure issue. Such
 requests should never be generated, and the application crashing is
 probably the behaviour we want.

For the specific BadAlloc of the page in our wiki, it is not coordinate
out of range, but that the images on that page are so huge as to cause X
to get a allocation failure from the OS, and that gets reflected back to
the client.  Otherwise we'd have gotten a BadValue error.  At one point,
X11 was pretty carefully checked to work in the face of failures to
allocate memory (dunno how true that is today).

Whether Firefox should be so silly as to even be asking (the images are
huge) and asking the X server to rescale them (also very questionable)
is something that can/should be taken up with the Firefox folks, but not
something we're going to (be able) to fix on our own.  The embedded
mozilla folks (there are such people at long last) are the logical
people to own this headache.
   - Jim

-- 
Jim Gettys [EMAIL PROTECTED]
One Laptop Per Child

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Stability and Memory Pressure in 8.2

2008-09-09 Thread Michael Stone
On Tue, Sep 09, 2008 at 01:10:57PM -0400, Daniel Drake wrote:
On Tue, 2008-09-09 at 00:10 -0400, Michael Stone wrote:
 - whether we can get Browse to behave intelligently when it receives
   BadAlloc errors from X?

I have no doubt that Browse/xulrunner has room for improvement with
memory usage but this is not where you should be looking. These BadAlloc
messages are true errors generated when the application requests pixmaps
outside of the coordinate range accepted by X (this is well
documented). 

This is a real bug in the code, not a memory pressure issue. 

Fine. How does the X server report failures to allocate memory on behalf
of clients? How does Browse respond?

Such requests should never be generated, and the application crashing
is probably the behaviour we want.

I'll grant that it may be helpful for finding the issue in the first
place, but I would much rather that we ship a Browse which displayed
what it can display without crashing.

Michael
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Stability and Memory Pressure in 8.2

2008-09-09 Thread pgf
c. scott ananian wrote:
  On Tue, Sep 9, 2008 at 7:02 AM, Tomeu Vizoso [EMAIL PROTECTED] wrote:
   stability issue? AFAIK, we haven't seen OOM conditions without any
   activity open.
  
  Yes, we have.  In particular, if you update your system and then leave
  it for a while, and later click the software update control panel, you
  end up OOMing in the control panel.  Sugar restarts and reports are
  that software update works fine the second time.  So this might well
  be a sugar leak; killing 'sugar' is not good for stability.


i think there's definitely a sugar shell leak.  here's some
partial data, gathered from a few machines on my desk right now.

(be careful with the column headings -- i rearranged partway through
to get separate CODE and DATA columns.)

(also, don't do an absolute compare between the 708 build and the
759 build -- the latter is chock full of activites, the former
has none at all.)


build 708:
top - 17:45:17 up 59 min,  3 users,  load average: 0.03, 0.05, 0.01
 PID USER  PR  NI  VIRT  RES  SHR CODE DATA %MEM COMMAND
1741 olpc  15   0 53128  27m  13m4  14m 12.2 python

same build 708, roughly twenty minutes later:
top - 18:03:16 up  1:17,  3 users,  load average: 0.01, 0.01, 0.00
 PID USER  PR  NI  VIRT  RES  SHR CODE DATA %MEM COMMAND
1741 olpc  15   0 53308  28m  13m4  14m 12.3 python

build 759:
top - 12:20:00 up 39 min,  4 users,  load average: 0.00, 0.06, 0.11
 PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
1461 olpc  20   0 60576  33m  14m S  0.3 14.5   0:48.38 python

same build 759, almost two hours later:
top - 14:04:11 up  2:23,  3 users,  load average: 0.04, 0.06, 0.08
 PID USER  PR  NI  VIRT  RES  SHR CODE DATA %MEM COMMAND
1461 olpc  20   0 65964  38m  14m4  23m 16.7 python

finally, i have a joyride-2263, which has been up for 6 days.  i
don't have copy/paste access to it, but the sugar shell is currently
taking 99.6m VIRT, 64m RES, 14m SHR, and is using 28% of system memory.

paul

p.s.  in addition, i think a lot of system processes have grown
somewhat.  for instance, login now has 100k more DATA space in
759 than it had in 708.  others (e.g., xinit) haven't grown at
all.  (also measured with top.)


=-
 paul fox, [EMAIL PROTECTED]
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Stability and Memory Pressure in 8.2

2008-09-09 Thread pgf
i wrote:
  
  i think there's definitely a sugar shell leak.  here's some
  partial data, gathered from a few machines on my desk right now.
  
  (be careful with the column headings -- i rearranged partway through
  to get separate CODE and DATA columns.)
  
  (also, don't do an absolute compare between the 708 build and the
  759 build -- the latter is chock full of activites, the former
  has none at all.)
  
  
  build 708:
  top - 17:45:17 up 59 min,  3 users,  load average: 0.03, 0.05, 0.01
   PID USER  PR  NI  VIRT  RES  SHR CODE DATA %MEM COMMAND
  1741 olpc  15   0 53128  27m  13m4  14m 12.2 python
  
  same build 708, roughly twenty minutes later:
  top - 18:03:16 up  1:17,  3 users,  load average: 0.01, 0.01, 0.00
   PID USER  PR  NI  VIRT  RES  SHR CODE DATA %MEM COMMAND
  1741 olpc  15   0 53308  28m  13m4  14m 12.3 python

another hour later on 708:
top - 19:06:19 up  2:21,  3 users,  load average: 0.00, 0.00, 0.00
 PID USER  PR  NI  VIRT  RES  SHR CODE DATA %MEM COMMAND
1741 olpc  15   0 53576  28m  13m4  15m 12.3 python

call it 200 KB/hour?

  
  build 759:
  top - 12:20:00 up 39 min,  4 users,  load average: 0.00, 0.06, 0.11
   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
  1461 olpc  20   0 60576  33m  14m S  0.3 14.5   0:48.38 python
  
  same build 759, almost two hours later:
  top - 14:04:11 up  2:23,  3 users,  load average: 0.04, 0.06, 0.08
   PID USER  PR  NI  VIRT  RES  SHR CODE DATA %MEM COMMAND
  1461 olpc  20   0 65964  38m  14m4  23m 16.7 python

and another hour on 759:

top - 15:07:25 up  3:27,  3 users,  load average: 0.00, 0.04, 0.02
 PID USER  PR  NI  VIRT  RES  SHR CODE DATA %MEM COMMAND
1461 olpc  20   0 70468  42m  14m4  28m 18.6 python

seems more like 4.5 MB/hour.

(there are a lot of variables in play here -- the main thing is
that something's certainly leaking.)

paul
=-
 paul fox, [EMAIL PROTECTED]
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Stability and Memory Pressure in 8.2

2008-09-09 Thread Tomeu Vizoso
On Tue, Sep 9, 2008 at 9:13 PM,  [EMAIL PROTECTED] wrote:
 i wrote:
  
   i think there's definitely a sugar shell leak.  here's some
   partial data, gathered from a few machines on my desk right now.
  
   (be careful with the column headings -- i rearranged partway through
   to get separate CODE and DATA columns.)
  
   (also, don't do an absolute compare between the 708 build and the
   759 build -- the latter is chock full of activites, the former
   has none at all.)
  
  
   build 708:
   top - 17:45:17 up 59 min,  3 users,  load average: 0.03, 0.05, 0.01
PID USER  PR  NI  VIRT  RES  SHR CODE DATA %MEM COMMAND
   1741 olpc  15   0 53128  27m  13m4  14m 12.2 python
  
   same build 708, roughly twenty minutes later:
   top - 18:03:16 up  1:17,  3 users,  load average: 0.01, 0.01, 0.00
PID USER  PR  NI  VIRT  RES  SHR CODE DATA %MEM COMMAND
   1741 olpc  15   0 53308  28m  13m4  14m 12.3 python

 another hour later on 708:
 top - 19:06:19 up  2:21,  3 users,  load average: 0.00, 0.00, 0.00
  PID USER  PR  NI  VIRT  RES  SHR CODE DATA %MEM COMMAND
 1741 olpc  15   0 53576  28m  13m4  15m 12.3 python

 call it 200 KB/hour?

  
   build 759:
   top - 12:20:00 up 39 min,  4 users,  load average: 0.00, 0.06, 0.11
PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
   1461 olpc  20   0 60576  33m  14m S  0.3 14.5   0:48.38 python
  
   same build 759, almost two hours later:
   top - 14:04:11 up  2:23,  3 users,  load average: 0.04, 0.06, 0.08
PID USER  PR  NI  VIRT  RES  SHR CODE DATA %MEM COMMAND
   1461 olpc  20   0 65964  38m  14m4  23m 16.7 python

 and another hour on 759:

 top - 15:07:25 up  3:27,  3 users,  load average: 0.00, 0.04, 0.02
  PID USER  PR  NI  VIRT  RES  SHR CODE DATA %MEM COMMAND
 1461 olpc  20   0 70468  42m  14m4  28m 18.6 python

 seems more like 4.5 MB/hour.

 (there are a lot of variables in play here -- the main thing is
 that something's certainly leaking.)

The shell shouldn't be doing anything while idle, so checking if the
trigger is activity network would help here.

Thanks,

Tomeu
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Stability and Memory Pressure in 8.2

2008-09-09 Thread pgf
tomeu wrote:
  On Tue, Sep 9, 2008 at 9:13 PM,  [EMAIL PROTECTED] wrote:
   (there are a lot of variables in play here -- the main thing is
   that something's certainly leaking.)
  
  The shell shouldn't be doing anything while idle, so checking if the
  trigger is activity network would help here.

point of reference:  on irc you mentioned the buddy list had
been an issue in the past.  does the sugar shell maintain that
even when that screen isn't visible?

paul
=-
 paul fox, [EMAIL PROTECTED]
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Stability and Memory Pressure in 8.2

2008-09-09 Thread Tomeu Vizoso
On Tue, Sep 9, 2008 at 9:24 PM,  [EMAIL PROTECTED] wrote:
 tomeu wrote:
   On Tue, Sep 9, 2008 at 9:13 PM,  [EMAIL PROTECTED] wrote:
(there are a lot of variables in play here -- the main thing is
that something's certainly leaking.)
  
   The shell shouldn't be doing anything while idle, so checking if the
   trigger is activity network would help here.

 point of reference:  on irc you mentioned the buddy list had
 been an issue in the past.  does the sugar shell maintain that
 even when that screen isn't visible?

Some history: http://dev.laptop.org/ticket/5532

Info about buddies is permanently stored in the presence service, in
the PS wrapper in the sugar shell and in the view. None of this data
gets released due to the user switching views.

Regards,

Tomeu
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Stability and Memory Pressure in 8.2

2008-09-09 Thread Martin Dengler
On Tue, Sep 09, 2008 at 03:13:28PM -0400, [EMAIL PROTECTED] wrote:
 [759 sugar shell leak] seems more like 4.5 MB/hour.

joyride-2399 sitting back at home with no activities, doing nothing
all day:

-bash-3.2# uptime
 18:14:19 up 20:46,  8 users,  load average: 0.15, 0.09, 0.12
-bash-3.2# /home/olpc/bin/ps_mem.py | grep python
 70.1 MiB +   6.6 MiB =  76.7 MiB   python (5)
[...time passes...]
-bash-3.2# uptime
 19:52:08 up 22:24,  8 users,  load average: 0.08, 0.07, 0.01
-bash-3.2# /home/olpc/bin/ps_mem.py | grep python
 70.3 MiB +   6.6 MiB =  76.8 MiB   python (5)

 paul

Martin



pgprMb6vdCsQp.pgp
Description: PGP signature
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Stability and Memory Pressure in 8.2

2008-09-09 Thread C. Scott Ananian
On Tue, Sep 9, 2008 at 4:01 PM, Marco Pesenti Gritti [EMAIL PROTECTED] wrote:
 A couple of low risk fixes which could save ~6 mb at startup:

 Remove numpy usage from the shell
 http://dev.laptop.org/ticket/8372
 (has patch)

 gst usage in the shell wastes 2.6mb
 http://dev.laptop.org/ticket/8375

These seem obvious and low risk.  +1 from me.

(We should be careful to test the numpy removal w/ differing locales,
to ensure #5559 doesn't regress.)
 --scott

-- 
 ( http://cscott.net/ )
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Stability and Memory Pressure in 8.2

2008-09-09 Thread Deepak Saxena
On Tue, Sep 09, 2008 at 05:10:41PM +, Deepak Saxena wrote:
* We need to find out why the oom-killer is not killing things fast
  enough. Based on our results, we might consider configuring
  /proc/$pid/oom_adj to preferentially kill some processes (e.g., the
  foreground [or background?] activities.)
 
 In the cases I've been playing with, browse is the only activity that
 is running. Will try bumping its oom_adj to see if this improves OOM
 kill latency.

Did 'echo 15  /proc/pid/oom_adj`' and this does not help much. The 
system starts getting laggy at the point we reach about 3M remaining 
memory (according to top) but the OOM killer does not actually kick 
in until we fail an allocation which happens sometime in later. Need
to capture what is happening at the kernel level during this window
though I don't think that fixing this at the OOM killer layer is 
doable for 8.2. 

 I have yet to see an actual deadlock. What I saw when trying to
 reproduce #3816 is that the OOM killer just takes a very very long
 time to kick in.
 
  - whether our kernel overcommits when allocation requests are made?
 
 By default vm.overcommit_memory is set to 0 which will refuse Obvious
 overcommits of address space. I will try setting this to 3 along with
 vm.overcommit_ratio to 0 to force no overcommit at all and see how the 
 system reacts.

This didn't quite do what I expected as I missread the docs. 

If we set overcommit_ratio=100 and overcommit_memory=3, the kernel will 
not overcommit memory and we end up with Browse crashing gracefully
w/o bogging down the whole system or with Browse just gracefully
ignoring any user input in the address bar due to probably a failed
allocation of some sort when creating a new webpage instance.

~Deepak

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Stability and Memory Pressure in 8.2

2008-09-09 Thread Mikus Grinbergs
 Remove numpy usage from the shell

I have not been following this thread - but:

There were several Activities (not just Measure) which used 
'numeric'.  Then 'numeric' was removed from the builds.  I don't 
know what those Activities are using now.  My concern is that if 
they happened to switch to using 'numpy' in place of using 'numeric'
then no numpy might also cause ripples.

mikus

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Stability and Memory Pressure in 8.2

2008-09-09 Thread Marco Pesenti Gritti
numpy will still work fine for activities.

Marco

On Tue, Sep 9, 2008 at 10:52 PM, Mikus Grinbergs [EMAIL PROTECTED] wrote:
 Remove numpy usage from the shell

 I have not been following this thread - but:

 There were several Activities (not just Measure) which used
 'numeric'.  Then 'numeric' was removed from the builds.  I don't
 know what those Activities are using now.  My concern is that if
 they happened to switch to using 'numpy' in place of using 'numeric'
 then no numpy might also cause ripples.

 mikus

 ___
 Devel mailing list
 Devel@lists.laptop.org
 http://lists.laptop.org/listinfo/devel

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Stability and Memory Pressure in 8.2

2008-09-09 Thread Gary C Martin
On 9 Sep 2008, at 05:10, Michael Stone wrote:

 * We need to determine why we encounter low-memory and out-of-memory
situations more frequently than in previous releases.

- This means that we need to measure how our memory consumption
  profile has changed since our previous releases.

  (cscott observes that we were unable to attack the F-9 image size
  issues until we were able to quantify the effect of changes we  
 had
  made or were considering making. Consequently, he suggests that  
 we
  will be unable to attack our current space consumption problems
  until we are able to generate good numbers (and displays).)

- We need to think carefully about (or measure) whether our
  memory-consumption patterns have changed.


SUMMARY: 759 vs 711 is only eating an extra ~16Mb of ram after a clean  
boot (no running Activities)

Just some very quick general observations between build 771 and build  
759 running on XO hardware. Tests were taken after clean reboots;  
allowing things to settle (~5min before collecting stats); with no  
Activities or UI use (data collected via a remote ssh session); jabber  
server was set to an unreachable name and no local salute buddies; net  
connection was to an AP, with about ~4 other APs visible in my  
neighbourhood.

Using free, the reported buffers/cache is generally the more  
interesting value. After a clean boot 759 is now using an extra 16Mb  
(up to 115Mb). The reported total has gone up 80k, so I guess the  
kernel is a little smaller :-) The reported mem free is down by 8Mb  
(down to 47Mb) indicating better use of available memory (caches went  
up by that same ~8Mb, plus extra some buffers by 100K).

As far as processes are concerned /usr/bin/sugar-shell is initially  
the most hungry, 711 it starts out at 12.2% of total used (RES=28m,  
SHR =12m, DATA=15m). For 759 it's gone up to 14.3% (RES=33m, SHR=14m,  
DATA=18m).

Working down the list journal is next, 711 starts out at 8.7% of total  
used (RES=20m, SHR=10m, DATA=1m). For 759 it's gone up to 10.1%  
(RES=23m, SHR=11m, DATA=11m). These figures are with an empty journal  
due to the break in compatibility when switching between these  
builds :-(

Next for 759 is more interesting as it reflects the changes to rainbow  
and (I assume) the pre-loading of commonly used modules for Activity  
efficiency (I need to test Activity usage changes separately from this  
email**). So /usr/sbin/rainbow-daemon for 711 is just 3.1% (RES=7m,  
SHR=1m, DATA=6m), while 759 is up at 9.6% (RES=22m, SHR=10m, DATA=11m).

Other processes such as /usr/bin/datastore-service, and /usr/bin/ 
sugar-presence-service, have grown slightly by small amounts, /usr/ 
bin/sugar-shell-service has shrunk slightly - nothing exciting.

FWIW: The pmap tool seems like it might show interesting data for  
comparisons (lists where a specific PIDs memory is going at a library  
level). Most of the interesting stuff is hidden in [ anon ] blocks,  
but knowing all the libs referenced and their size should be of use.  
Have been experimenting a little with a script to collect and compare  
data for all processes between builds - need to find a clear way to  
visualise the results in a useful (not an 'oh my god spiders with pens  
are attacking') way.

**I'll try and test several Activity versions that can run on both  
builds and see how their individual resources have changed, will post  
later.

--Gary
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Stability and Memory Pressure in 8.2

2008-09-09 Thread Gary C Martin
On 10 Sep 2008, at 00:11, Gary C Martin wrote:

 SUMMARY: 759 vs 711 is only eating an extra ~16Mb of ram after a clean
 boot (no running Activities)


 **I'll try and test several Activity versions that can run on both
 builds and see how their individual resources have changed, will post
 later.

OK, news is not great on the Activity front...

SUMMARY: 759 vs 711 each Activity instance in 759 consumes an average  
of 1Mb more memory than the same Activity running in 711, with  
Write-57 reportedly taking significantly more than that (perhaps ~7Mb).

Is top and/or ps memory usage calculated in the same way between these  
builds? Could make collecting real data pretty painful.

Tests were taken after clean reboots and allowing things to settle  
(~5min); five activities were launched in order Moon-4, Write-57,  
Record-57, Paint-20, and Calculate-23; Journal was made the current  
Activity and the view was switched to home; data collected via a  
remote ssh session. Wanted to test Browse as it's a known memory eater  
(well most browsers are), but will need to dig out the most recent  
version that works with 711 for a reasonable comparison.

With all five Activities launched, free buffers/cache reported 5m more  
memory was being used under 759. Looking at each Activity's foot print  
shows 759 all having less shared memory, and more resident and data  
memory.

Write-57
759 - 15.5% (RES=35m, SHR=13m, DATA=20m)
711 - 12.4% (RES=28m, SHR=15m, DATA=11m)

Record-57
759 - 14.2% (RES=32m, SHR=14m, DATA=64m)
711 - 13.1% (RES=30m, SHR=16m, DATA=61m)

Calculate-23
759 - 10.6% (RES=24m, SHR=8m, DATA=15m)
711 - 10.1% (RES=23m, SHR=10m, DATA=11m)

Paint-20
759 - 10.1% (RES=23m, SHR=8m, DATA=14m)
711 - 9.6% (RES=22m, SHR=10m, DATA=10m)

Moon-4
759 - 9.7% (RES=22m, SHR=8m, DATA=13m)
711 - 9.2% (RES=21m, SHR=11m, DATA=10m)

Well, I was hoping to see the numbers go the other way with the  
rainbow fork trick sharing more module code between Activities. Could  
be worse I guess – I should also test opening N instances of the same  
Activity and see which way memory usage has moved in that scenario.

--Gary

P.S. No body spotted my intentional 771 mistake in the last email, it  
was of obviously meant to be 711 :)
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel