Re: Yum random crashes in XO4 f20 images

2014-09-09 Thread James Cameron
G'day Peter,

Thanks for any ideas you may have.

The problem also reproduces on OLPC Fedora 20 image for XO-4:

http://build.laptop.org/14.1.0/os1/xo-4/41001o4.zd (552 MB)

*** Error in `/usr/bin/python': free(): invalid pointer: 0x047c79ae ***
=== Backtrace: =
/lib/libc.so.6(+0x6c8b4)[0xb6c828b4]
/lib/libc.so.6(+0x754e8)[0xb6c8b4e8]
=== Memory map: 
[...]

The error varies in detail, but always suggests corruption of heap or
pointers to heap.

The triggering conditions are interactive use of yum, yum update, or
yum used by olpc-os-builder.  The latter is a simple reproducer for me.

I'm reproducing it on an XO-4, with 2GB of RAM, no swap, 8 GB eMMC, 8
GB USB flash drive.

While memory demand by yum is large by comparison to other programs,
the available memory at the time of failure is ample.  There are no
kernel out of memory (OOM) events.  It seems more likely to occur when
the filesystem cache is under heavy demand.

The method to recreate the problem was:

1.  install the system image 41001o4.zd using fs-update and then boot,

2.  configure wireless network,

3.  "yum install -y git olpc-os-builder"

4.  clone the master branch of
git://dev.laptop.org/projects/olpc-os-builder
(last verified with b87e6ee)

5.  run "./osbuilder.py examples/olpc-os-14.1.0-xo4.ini" repeatedly
until the error occurs (usually within about five attempts),


I've also tried running under valgrind, but that causes illegal
instruction.  It is quite likely I'm not using valgrind correctly.
http://dev.laptop.org/~quozl/z/1XRYtO.txt

The workaround at the moment is to build our Fedora 20 images on
Fedora 18.  Fedora 18 shows no sign of the problem.  I'm worried that
a low probability heap corruptor may cause instability of applications
in the field.

The exact same kernel is being used for Fedora 18 and Fedora 20.

On Tue, Sep 09, 2014 at 03:55:24PM +0100, Peter Robinson wrote:
> What version of OOB are you using, and what config files? I can try
> and recreate the problem here on other devices.

-- 
James Cameron
http://quozl.linux.org.au/
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Yum random crashes in XO4 f20 images

2014-09-09 Thread Martin Abente Lahaye
Thanks for the quick reponse,

On Tue, Sep 9, 2014 at 10:55 AM, Peter Robinson 
wrote:

> Hi Martin,
>
> > I am have been building a Fedora 20 image for the XO4, and I am seeing
> > memory corruption problems while running yum in these images (please
> check
> > the logs [1,2]).
>
> The logs don't mean anything to me. What device are you running it on,
> how much memory, do you have swap enabled?
>

It is running on a XO4, with 999MB ram and no swap. Maybe James can explain
better what other specs are for the XO4.


>
> > To give you some context, we are using olpc-os-builder (master [3]) with
> > fc20 repositories plus a few hand-crafted packages such as the kernel,
> > systemd and xorg, taken from previous Daniel Narvaez efforts [4,5].
> >
> > These crashes happens randomly when yum is running, by calling yum
> update or
> > when olpc-os-build is installing system packages.
>
> Yum isn't the most memory friendly, and some of the post update
> scripts aren't either, you need to make sure there's enough
> memory/swap.
>

I am not so sure this is related to memory usage really, as it happens even
when I am installing just a few packages (located in FS)  via "yum update
/packages/*.rpm".

>
> > James Cameron, who spent some time researching about this issue,
> speculates
> > that this problem could be caused by: (a) using older kernel that was
> > compiled (possibly) with different options compared to f20's, or (b) a
> > faulty glibc library.
> >
> > I was wondering if this could be related to something else, something
> more
> > specific to  yum or python arm binaries (?).
>
> Unlikely, I've not seen issues elsewhere. But I need more details.
>

What kind of details would you need? I can try reproducing and send what
you need.


>
> > I would sincerely appreciate any guidance you can provide to start
> > discarding possibilities and try to debug this issue.
> >
> > Thanks in advance for any help you can provide!
>
> What version of OOB are you using, and what config files? I can try
> and recreate the problem here on other devices.
>

This problem occurs in all f20 images for the XO4 that we or others have
created.

The latest images were created using:

* OLPC's OOB masterbranch: git://dev.laptop.org/projects/olpc-os-builder
* OLPC's .ini file:
http://dev.laptop.org/git/projects/olpc-os-builder/tree/examples/olpc-os-14.1.0-xo4.ini

In case you have a XO4 with you, and have the time to try this out, you can
download an image from
http://system.one-education.org/au2a/images/testing/40002au4/


> Peter
>
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: Yum random crashes in XO4 f20 images

2014-09-09 Thread Peter Robinson
Hi Martin,

> I am have been building a Fedora 20 image for the XO4, and I am seeing
> memory corruption problems while running yum in these images (please check
> the logs [1,2]).

The logs don't mean anything to me. What device are you running it on,
how much memory, do you have swap enabled?

> To give you some context, we are using olpc-os-builder (master [3]) with
> fc20 repositories plus a few hand-crafted packages such as the kernel,
> systemd and xorg, taken from previous Daniel Narvaez efforts [4,5].
>
> These crashes happens randomly when yum is running, by calling yum update or
> when olpc-os-build is installing system packages.

Yum isn't the most memory friendly, and some of the post update
scripts aren't either, you need to make sure there's enough
memory/swap.

> James Cameron, who spent some time researching about this issue, speculates
> that this problem could be caused by: (a) using older kernel that was
> compiled (possibly) with different options compared to f20's, or (b) a
> faulty glibc library.
>
> I was wondering if this could be related to something else, something more
> specific to  yum or python arm binaries (?).

Unlikely, I've not seen issues elsewhere. But I need more details.

> I would sincerely appreciate any guidance you can provide to start
> discarding possibilities and try to debug this issue.
>
> Thanks in advance for any help you can provide!

What version of OOB are you using, and what config files? I can try
and recreate the problem here on other devices.

Peter
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel