Re: On XO-1.5 with 11.3.0/11.3.1 -- hang during shutdown?

2012-07-16 Thread Sridhar Dhanapalan
Deepak is in India. I have also been able to replicate the problem
using XOs manufactured in the same week (SN SHC037x). It's
probably easiest if I send my SD cards to James.

James, I'll contact you separately about this.

Regards,
Sridhar


On 14 July 2012 03:32, Martin Langhoff mar...@laptop.org wrote:
 Hi folks,

 where is Deepak Muddhaa based? Any reason his failing XO and SD card
 can't be traded for good ones, and the failing units shipped to James,
 Miami or Boston, where we can look at things at a lower level?

 We'll gladly provide a replacement unit.

 I appreciate all the analysis, but it' is apparent that it is being
 done on rather poor data. Hands-on debugging wins.

 cheers,




 m

 On Sun, Jun 24, 2012 at 11:32 PM, James Cameron qu...@laptop.org wrote:
 Thanks for your reply!

 On Mon, Jun 25, 2012 at 11:16:26AM +1000, Sridhar Dhanapalan wrote:
 On 21 June 2012 16:14, James Cameron qu...@laptop.org wrote:
  On Thu, Jun 21, 2012 at 02:37:35PM +1000, Sridhar Dhanapalan wrote:
  On 16 June 2012 17:08, James Cameron qu...@laptop.org wrote:
   That means the hang should not exceed 15 seconds. ?Is this what you
   find? ?If not, then this casts doubt on your solution.
 
  I'm going to propose something extremely hackish: [...]
 
  Just to remind you that I'm still interested to know if the hang you
  observe exceeds 15 seconds or not. ?I've not had the time to reproduce
  this hang yet. ?Building a mental model of the problem is important to
  me, because I can sometimes resolve a problem if I have a good model.

 Yes; we have left it for several minutes and no shutdown has
 occurred.

 Ooh, I'm surprised.

 This observation, and the statistical results from your temporary
 solution (a delay), implies a combination effect, of both the
 processes not yet terminated, and the umount, leading to a process
 hang of umount.

 I can't think of a hack that would meet the requirements:

 - survive the process deletion steps, and

 - detect the stalled umount process.

 I guess you might try remounting the filesystem -o sync, just to
 further shift the timing.

 The problem needs a kernel developer to reproduce it.

 Do you have a way to encourage the problem to occur?  If it can be
 made to occur on a higher percentage of shutdowns, it becomes easier
 to debug.  For instance, there is a two second delay in the code, so
 does the hang occur more frequently if this is reduced to zero?

  The XO-1.75 CPU has a hardware watchdog that could be used for this,
  but you aren't likely to ever have a heat problem with XO-1.75.

 That is interesting. Why is that?

 I take it you mean why won't you have a heat problem with XO-1.75.
 There are two new characteristics of the XO-1.75 over the XO-1.5:


 1.  the maximum power draw of the XO-1.75 at full utilisation is a
 long way below that of the XO-1.5.  In a scenario where the laptop is
 powered on and insulated from cooling air flow, this means:

 1.a. the temperature rise toward equilibrium will be slower,

 1.b. the equilibrium temperature will be lower for a given level of
 insulation, (stacking, or cloth covers, or both),

 1.c. the insulation will have to be far greater to achieve the same
 equilibrium temperature.


 2.  the XO-1.75 has a thermal protection feature that forces the power
 off if the temperature of the CPU exceeds 85 degrees C, rather than
 slowing or stopping the CPU as on XO-1.5.  In a scenario where the
 laptop is powered on and insulated from cooling air flow, this means:

 2.a. the temperature rise will be interrupted by a sudden loss of
 input heat, rather than be slowed by a gradual loss of input heat,

 2.b. the insulation will have to be far far greater to achieve the
 same equilibrium temperature.


 In this scenario, the heat spreader has very little bearing on the
 matter.  The heat spreader relies on cooling air flow to the top of
 the case.  If there is no air flow, the heat spreader is ineffective.

 The new thermal protection feature isn't a perfect protection; the
 battery charge circuit remains powered.  So a laptop held between very
 good insulation (e.g. thick polystyrene with sealed edges) with a flat
 battery will still heat up, but not nearly as much as one with an
 active CPU.

 (Please, test this yourselves with an IR thermometer.  If you don't
 have one, the closest in Sydney to you would be at the Jaycar store
 at 127 York St.)

 --
 James Cameron
 http://quozl.linux.org.au/



 --
  mar...@laptop.org -- Software Architect - OLPC
  - ask interesting questions
  - don't get distracted with shiny stuff  - working code first
  - http://wiki.laptop.org/go/User:Martinlanghoff
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: On XO-1.5 with 11.3.0/11.3.1 -- hang during shutdown?

2012-07-13 Thread Martin Langhoff
Hi folks,

where is Deepak Muddhaa based? Any reason his failing XO and SD card
can't be traded for good ones, and the failing units shipped to James,
Miami or Boston, where we can look at things at a lower level?

We'll gladly provide a replacement unit.

I appreciate all the analysis, but it' is apparent that it is being
done on rather poor data. Hands-on debugging wins.

cheers,




m

On Sun, Jun 24, 2012 at 11:32 PM, James Cameron qu...@laptop.org wrote:
 Thanks for your reply!

 On Mon, Jun 25, 2012 at 11:16:26AM +1000, Sridhar Dhanapalan wrote:
 On 21 June 2012 16:14, James Cameron qu...@laptop.org wrote:
  On Thu, Jun 21, 2012 at 02:37:35PM +1000, Sridhar Dhanapalan wrote:
  On 16 June 2012 17:08, James Cameron qu...@laptop.org wrote:
   That means the hang should not exceed 15 seconds. ?Is this what you
   find? ?If not, then this casts doubt on your solution.
 
  I'm going to propose something extremely hackish: [...]
 
  Just to remind you that I'm still interested to know if the hang you
  observe exceeds 15 seconds or not. ?I've not had the time to reproduce
  this hang yet. ?Building a mental model of the problem is important to
  me, because I can sometimes resolve a problem if I have a good model.

 Yes; we have left it for several minutes and no shutdown has
 occurred.

 Ooh, I'm surprised.

 This observation, and the statistical results from your temporary
 solution (a delay), implies a combination effect, of both the
 processes not yet terminated, and the umount, leading to a process
 hang of umount.

 I can't think of a hack that would meet the requirements:

 - survive the process deletion steps, and

 - detect the stalled umount process.

 I guess you might try remounting the filesystem -o sync, just to
 further shift the timing.

 The problem needs a kernel developer to reproduce it.

 Do you have a way to encourage the problem to occur?  If it can be
 made to occur on a higher percentage of shutdowns, it becomes easier
 to debug.  For instance, there is a two second delay in the code, so
 does the hang occur more frequently if this is reduced to zero?

  The XO-1.75 CPU has a hardware watchdog that could be used for this,
  but you aren't likely to ever have a heat problem with XO-1.75.

 That is interesting. Why is that?

 I take it you mean why won't you have a heat problem with XO-1.75.
 There are two new characteristics of the XO-1.75 over the XO-1.5:


 1.  the maximum power draw of the XO-1.75 at full utilisation is a
 long way below that of the XO-1.5.  In a scenario where the laptop is
 powered on and insulated from cooling air flow, this means:

 1.a. the temperature rise toward equilibrium will be slower,

 1.b. the equilibrium temperature will be lower for a given level of
 insulation, (stacking, or cloth covers, or both),

 1.c. the insulation will have to be far greater to achieve the same
 equilibrium temperature.


 2.  the XO-1.75 has a thermal protection feature that forces the power
 off if the temperature of the CPU exceeds 85 degrees C, rather than
 slowing or stopping the CPU as on XO-1.5.  In a scenario where the
 laptop is powered on and insulated from cooling air flow, this means:

 2.a. the temperature rise will be interrupted by a sudden loss of
 input heat, rather than be slowed by a gradual loss of input heat,

 2.b. the insulation will have to be far far greater to achieve the
 same equilibrium temperature.


 In this scenario, the heat spreader has very little bearing on the
 matter.  The heat spreader relies on cooling air flow to the top of
 the case.  If there is no air flow, the heat spreader is ineffective.

 The new thermal protection feature isn't a perfect protection; the
 battery charge circuit remains powered.  So a laptop held between very
 good insulation (e.g. thick polystyrene with sealed edges) with a flat
 battery will still heat up, but not nearly as much as one with an
 active CPU.

 (Please, test this yourselves with an IR thermometer.  If you don't
 have one, the closest in Sydney to you would be at the Jaycar store
 at 127 York St.)

 --
 James Cameron
 http://quozl.linux.org.au/



-- 
 mar...@laptop.org -- Software Architect - OLPC
 - ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 - http://wiki.laptop.org/go/User:Martinlanghoff
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: On XO-1.5 with 11.3.0/11.3.1 -- hang during shutdown?

2012-06-24 Thread Sridhar Dhanapalan
On 21 June 2012 16:14, James Cameron qu...@laptop.org wrote:
 On Thu, Jun 21, 2012 at 02:37:35PM +1000, Sridhar Dhanapalan wrote:
 On 16 June 2012 17:08, James Cameron qu...@laptop.org wrote:
  That means the hang should not exceed 15 seconds. ?Is this what you
  find? ?If not, then this casts doubt on your solution.

 I'm going to propose something extremely hackish: [...]

 Just to remind you that I'm still interested to know if the hang you
 observe exceeds 15 seconds or not.  I've not had the time to reproduce
 this hang yet.  Building a mental model of the problem is important to
 me, because I can sometimes resolve a problem if I have a good model.

Yes; we have left it for several minutes and no shutdown has occurred.

If you disable the boot/shutdown animation, the shutdown sequence
stops at this: 
http://dev.laptop.org.au/attachments/download/914/hang-on-shutdown.jpg

That image is an attachment on the main issue:
http://dev.laptop.org.au/issues/1033


 The XO-1.75 CPU has a hardware watchdog that could be used for this,
 but you aren't likely to ever have a heat problem with XO-1.75.

That is interesting. Why is that?

Thanks,
Sridhar
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: On XO-1.5 with 11.3.0/11.3.1 -- hang during shutdown?

2012-06-24 Thread James Cameron
Thanks for your reply!

On Mon, Jun 25, 2012 at 11:16:26AM +1000, Sridhar Dhanapalan wrote:
 On 21 June 2012 16:14, James Cameron qu...@laptop.org wrote:
  On Thu, Jun 21, 2012 at 02:37:35PM +1000, Sridhar Dhanapalan wrote:
  On 16 June 2012 17:08, James Cameron qu...@laptop.org wrote:
   That means the hang should not exceed 15 seconds. ?Is this what you
   find? ?If not, then this casts doubt on your solution.
 
  I'm going to propose something extremely hackish: [...]
 
  Just to remind you that I'm still interested to know if the hang you
  observe exceeds 15 seconds or not. ?I've not had the time to reproduce
  this hang yet. ?Building a mental model of the problem is important to
  me, because I can sometimes resolve a problem if I have a good model.
 
 Yes; we have left it for several minutes and no shutdown has
 occurred.

Ooh, I'm surprised.

This observation, and the statistical results from your temporary
solution (a delay), implies a combination effect, of both the
processes not yet terminated, and the umount, leading to a process
hang of umount.

I can't think of a hack that would meet the requirements:

- survive the process deletion steps, and

- detect the stalled umount process.

I guess you might try remounting the filesystem -o sync, just to
further shift the timing.

The problem needs a kernel developer to reproduce it.

Do you have a way to encourage the problem to occur?  If it can be
made to occur on a higher percentage of shutdowns, it becomes easier
to debug.  For instance, there is a two second delay in the code, so
does the hang occur more frequently if this is reduced to zero?

  The XO-1.75 CPU has a hardware watchdog that could be used for this,
  but you aren't likely to ever have a heat problem with XO-1.75.
 
 That is interesting. Why is that?

I take it you mean why won't you have a heat problem with XO-1.75.
There are two new characteristics of the XO-1.75 over the XO-1.5:


1.  the maximum power draw of the XO-1.75 at full utilisation is a
long way below that of the XO-1.5.  In a scenario where the laptop is
powered on and insulated from cooling air flow, this means:

1.a. the temperature rise toward equilibrium will be slower,

1.b. the equilibrium temperature will be lower for a given level of
insulation, (stacking, or cloth covers, or both),

1.c. the insulation will have to be far greater to achieve the same
equilibrium temperature.


2.  the XO-1.75 has a thermal protection feature that forces the power
off if the temperature of the CPU exceeds 85 degrees C, rather than
slowing or stopping the CPU as on XO-1.5.  In a scenario where the
laptop is powered on and insulated from cooling air flow, this means:

2.a. the temperature rise will be interrupted by a sudden loss of
input heat, rather than be slowed by a gradual loss of input heat,

2.b. the insulation will have to be far far greater to achieve the
same equilibrium temperature.


In this scenario, the heat spreader has very little bearing on the
matter.  The heat spreader relies on cooling air flow to the top of
the case.  If there is no air flow, the heat spreader is ineffective.

The new thermal protection feature isn't a perfect protection; the
battery charge circuit remains powered.  So a laptop held between very
good insulation (e.g. thick polystyrene with sealed edges) with a flat
battery will still heat up, but not nearly as much as one with an
active CPU.

(Please, test this yourselves with an IR thermometer.  If you don't
have one, the closest in Sydney to you would be at the Jaycar store
at 127 York St.)

-- 
James Cameron
http://quozl.linux.org.au/
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: On XO-1.5 with 11.3.0/11.3.1 -- hang during shutdown?

2012-06-21 Thread James Cameron
On Thu, Jun 21, 2012 at 02:37:35PM +1000, Sridhar Dhanapalan wrote:
 On 16 June 2012 17:08, James Cameron qu...@laptop.org wrote:
  That means the hang should not exceed 15 seconds. ?Is this what you
  find? ?If not, then this casts doubt on your solution.
 
 I'm going to propose something extremely hackish: [...]

Just to remind you that I'm still interested to know if the hang you
observe exceeds 15 seconds or not.  I've not had the time to reproduce
this hang yet.  Building a mental model of the problem is important to
me, because I can sometimes resolve a problem if I have a good model.

The reason that 15 seconds threshold is important, is that the
/etc/init.d/functions is designed to finish the unmounting by then.

If it is not finishing, then this hang is at root a kernel problem.

 [...] can we have the XO perform a hard power-off if the software
 shutdown sequence does not complete within 30 seconds?

Yes.

However, the time would likely be better spent by a developer in
understanding what is happening.  Without that, there's a strong risk
that the hack may be ineffective, because whatever is stopping the
shutdown might also stop the hack.  It isn't about elegance, it's
about effectiveness.

Hack type 1: in /etc/init.d/halt fork a process that sleeps for 30
seconds and then forces a power down:

(sleep 30 ; /sbin/halt -f -d -p) 

Hack type 2: in /etc/init.d/halt fork a process that sleeps for 30
seconds and then sends a power down command to the embedded
controller:

(sleep 30 ; echo 55:0  /sys/power/ec) 

But both these approaches don't work for me.  I presume it is because
the forked process is killed by /etc/init.d/halt

One might add code to /etc/init.d/halt to check for elapsed time and
force a power off, but this would be blocked if a command hangs.

The XO-1.5 embedded controller firmware might also be modified, to
provide a watchdog, but my guess is that will take a lot of
engineering effort.  Again, that effort, if it is to be spent, would
be better spent in diagnosis and debugging.

 Ideally this would be managed by some kind of hardware watchdog, but
 maybe there's a cheap-and-nasty version we can implement in
 software.

The XO-1.75 CPU has a hardware watchdog that could be used for this,
but you aren't likely to ever have a heat problem with XO-1.75.

I don't know if the XO-1.5 CPU has a hardware watchdog.

-- 
James Cameron
http://quozl.linux.org.au/
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: On XO-1.5 with 11.3.0/11.3.1 -- hang during shutdown?

2012-06-20 Thread Sridhar Dhanapalan
On 16 June 2012 17:08, James Cameron qu...@laptop.org wrote:
 That means the hang should not exceed 15 seconds.  Is this what you
 find?  If not, then this casts doubt on your solution.

I'm going to propose something extremely hackish: can we have the XO
perform a hard power-off if the software shutdown sequence does not
complete within 30 seconds? Ideally this would be managed by some kind
of hardware watchdog, but maybe there's a cheap-and-nasty version we
can implement in software.

The problem we want to eliminate is that XOs are being told to
shutdown and are then closed and placed in an XOP charging rack. If an
XO does not actually turn off and remains on while in the rack and
charging, it has the potential to overheat. We have seen cases where
XOs get so hot that the plastic on the touchpad and even on the outer
casing becomes warped. If a problem like that becomes widespread, it
can be *major* for us.

I understand that it's not the most elegant solution, but from a
deployment perspective we need a failsafe to protect the hardware.

Sridhar


Sridhar Dhanapalan
Engineering Manager
One Laptop per Child Australia
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: On XO-1.5 with 11.3.0/11.3.1 -- hang during shutdown?

2012-06-16 Thread James Cameron
G'day Anish,

I don't think you should conflate the shutdown issue with slower than
claimed microSD cards.  The shutdown issue may be a symptom, combining
Fedora's assumptions about how quickly the kernel will finish writing
data, with the microSD cards being much slower than the hard disks the
halt script was written for.

Your fix of adding delay was based on the assumption that more time
was needed for processes to be killed.  I disagree.  I think the delay
was reducing the probability of dirty blocks in the cache, and you
would have observed an improvement because of that alone.

Your fix of adding a sync before umount might work.  I'm interested to
know how successful that is.

Another thing you could do is reduce the retry timers and counters in
__umount_loop so that it abandons the wait sooner, resulting in the
laptop powering down with the filesystem still mounted.  A better
scenario than staying powered.

__umount_loop tries an umount.  In your photograph [12] that first
umount failed with umount: /home: device is busy.  __umount_loop
then counts the number of filesystems yet to be unmounted, allowing
two seconds to elapse before it sends another signal to each process
that has references to the filesystem.  It then sleeps for three
seconds before retrying up to 3 times.  __umount_loop is then
abandoned.

That means the hang should not exceed 15 seconds.  Is this what you
find?  If not, then this casts doubt on your solution.

The umount2: Device or resource busy is interesting.  I don't see
this if I try to umount a device that is busy on Fedora 14.  It may
suggest that the umount is failing for a reason other than filesystems
with remaining references or dirty blocks.

I really doubt that fixing the microSD card write performance will
properly fix this hang problem.  Fixing the microSD card write
performance, if it is below a specification, should be done anyway.
It may well reduce the frequency of the hang.  But as far as I can
see, it isn't the only contributor to the hang.

You seem to have settled on a myth.  You seem to believe, based on
selected evidence, that the problem is entirely to do with microSD
cards.  You continue to seek such evidence, but I think you should
seek other evidence.

-- 
James Cameron
http://quozl.linux.org.au/
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: On XO-1.5 with 11.3.0/11.3.1 -- hang during shutdown?

2012-06-16 Thread John Gilmore
I doubt that this issue is your problem.  But in response to one remark:

  On the theory that these writes may
 be stalling due to the block number, (and we haven't seen any evidence
 yet of this), you can test for that by repeating the writes...

There *is* evidence that accesses to some block numbers in MLC flash
chips are much faster or slower than others (like 5x slower).  They
seem to be designed with fast blocks and slow blocks, though this
is undocumented.  There is no interface for telling the software
which is which (except by actual measurement of the responsiveness of
the chip -- and in microSD cards, accesses are mediated by a Flash
Translation Layer of unknown characteristics).  See:

  Characterizing Flash Memory: Anomalies, Observations, and Applications
  Laura Grupp, Adrian Caulfield, Joel Coburn, Steven Swanson, Eitan Yaakobi and 
Paul Siegel
  UCSD Tech Report CS2009-0946
  August 19, 2009

Unfortunately the amazing people at UCSD fail to put up their archival
tech reports in readily accessible PDFs.  (It seems to be some sort of
half-assed DRM system, since they yammer about copyrights on the same
page.)  They do have a mangled (OCR'd!) abstract here:

  http://csetechrep.ucsd.edu/Dienst/UI/2.0/Describe/ncstrl.ucsd_cse/CS2009-0946

and a mangled 18MB PostScript version available here:

  
http://csetechrep.ucsd.edu/Dienst/Repository/2.0/Body/ncstrl.ucsd_cse/CS2009-0946/postscript

The Wayback Machine failed to capture it while it was there.  But I
got the PDF from them when they had published it in 2009.  I have put
up the 1.5MB PDF temporarily here for research purposes:

  http://www.toad.com/TEMP-Grupp-2009-TR-FTest.pdf

with the slides here:

  http://www.toad.com/TEMP-Grupp-2009-FMS-FTest.pdf
  
John

  

  
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: On XO-1.5 with 11.3.0/11.3.1 -- hang during shutdown?

2012-06-15 Thread Anish Mangal
Hi Martin, James et. al.

It seems that the microSD card was definitely one of the main reasons
why the hang on shutdown was happening [1]

[1] http://dev.laptop.org.au/issues/1323

Cheers,
Anish


On Fri, Jun 15, 2012 at 7:45 AM, Anish Mangal an...@activitycentral.com wrote:
 On Fri, Jun 15, 2012 at 4:09 AM, James Cameron qu...@laptop.org wrote:
 On Thu, Jun 14, 2012 at 06:51:58PM +0530, Anish Mangal wrote:
 * Insert bad microSD. Flash the new build (using fs-update)
 * Test

 * Insert good microSD. Flash the new build (using fs-update)
 * Test

 In your testing, please also control for the version of Open Firmware
 used at the fs-update step.


 Good point. Deepak, please take note of it in your testing. The
 firmware version while testing with the old and new microSD cards
 should be same.

 --
 James Cameron
 http://quozl.linux.org.au/
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: On XO-1.5 with 11.3.0/11.3.1 -- hang during shutdown?

2012-06-15 Thread James Cameron
I do not wish to constrain your investigation at all, please continue
investigating, but I do have some comments and speculations.

Yes, the microSD card cannot be excluded as a cause.  But unless there
are other symptoms associated with the microSD card, effort should be
concentrated on finding the root cause of the hang, using Linux
debugging techniques.

The transactions that are given to the microSD card during shutdown
should be normal block read and writes, as the filesystem is prepared
for unmounting.  There's nothing unusual about these transactions,
except that some of them may be located in a particular block range.

So it is unlikely that this will be a cause of the hang.

But you may want to exclude it.  On the theory that these writes may
be stalling due to the block number, (and we haven't seen any evidence
yet of this), you can test for that by repeating the writes in a
controlled fashion, such as by booting from an external SD card or USB
drive, and using Linux to mount and umount the internal microSD card
partitions.  If you find this unreliable, then it is a critical
finding.  If you find this reliable, then you can exclude the theory
of writes stalling due to block number.

There is a possibility that the contributed behaviour is tied to a
model of microSD card, rather than a specific microSD card.  We use
multiple qualified sources in manufacturing.  You might identify the
manufacturer's identity and configuration of the microSD card.  You
can do this in Open Firmware using:

ok select int
ok show-cid

There are, no doubt, ways to do this in Linux as well, but I do not
recall the details.

I look forward to hearing what your Linux debugging techniques
uncover.  Ask yourself this question; what is preventing the power off
command from being delivered to the embedded controller by the kernel?
Is it because it was not sent?  If so, why?  Is it because it was sent
(per serial port evidence) but not obeyed?  If so, why is it that the
power button responds?  Is there any serial port evidence of the power
button being detected by the kernel at the point of the hang?  And so
on.

-- 
James Cameron
http://quozl.linux.org.au/
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: On XO-1.5 with 11.3.0/11.3.1 -- hang during shutdown?

2012-06-15 Thread Anish Mangal
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi James,

Please have a look at https://dev.laptop.org.au/issues/1033#note-64
which was the email sent some time ago internally when the workaround
for this was just found.

* It is currently difficult for us to follow linux debugging
techniques as the broken laptop and good/bad microSD cards are with
someone with only basic linux knowledge. (and this was a restriction
during initial debug too)

* We think the problem was with the SD cards (perhaps a specific batch
of them), and I think our findings establish that with some confidence.

* The reason why the laptop was not being shutdown was due to a race
condition. The halt script was expecting the processed to get killed
within a certain amount of time, which they weren't. Just delaying
that expected time point (by which the processes should be killed)
worked for us.

* As for further debugging, I could have the SD card shipped to me, or
anybody looking to spend time on it, but we must be confident enough
that the problem lies there (which I think it does).

On Saturday 16 June 2012 05:46 AM, James Cameron wrote:
 I do not wish to constrain your investigation at all, please
 continue investigating, but I do have some comments and
 speculations.
 
 Yes, the microSD card cannot be excluded as a cause.  But unless
 there are other symptoms associated with the microSD card, effort
 should be concentrated on finding the root cause of the hang, using
 Linux debugging techniques.
 
 The transactions that are given to the microSD card during
 shutdown should be normal block read and writes, as the filesystem
 is prepared for unmounting.  There's nothing unusual about these
 transactions, except that some of them may be located in a
 particular block range.
 
 So it is unlikely that this will be a cause of the hang.
 
 But you may want to exclude it.  On the theory that these writes
 may be stalling due to the block number, (and we haven't seen any
 evidence yet of this), you can test for that by repeating the
 writes in a controlled fashion, such as by booting from an external
 SD card or USB drive, and using Linux to mount and umount the
 internal microSD card partitions.  If you find this unreliable,
 then it is a critical finding.  If you find this reliable, then you
 can exclude the theory of writes stalling due to block number.
 
 There is a possibility that the contributed behaviour is tied to a 
 model of microSD card, rather than a specific microSD card.  We
 use multiple qualified sources in manufacturing.  You might
 identify the manufacturer's identity and configuration of the
 microSD card.  You can do this in Open Firmware using:
 
 ok select int ok show-cid
 
 There are, no doubt, ways to do this in Linux as well, but I do
 not recall the details.
 
 I look forward to hearing what your Linux debugging techniques 
 uncover.  Ask yourself this question; what is preventing the power
 off command from being delivered to the embedded controller by the
 kernel? Is it because it was not sent?  If so, why?  Is it because
 it was sent (per serial port evidence) but not obeyed?  If so, why
 is it that the power button responds?  Is there any serial port
 evidence of the power button being detected by the kernel at the
 point of the hang?  And so on.
 


- -- 
Anish Mangal
Dextrose Project Manager
Activity Central
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJP2/xTAAoJEBoxUdDHDZVp33QH/jfYUQYyQLLt6+cAH/aYAEhU
yymgGEZzmCqhn+i92CuD1LoChblV+mYNVCQH0DqLe8aoDyzyqoOsdZ7lLgv+FQdv
niIBQxS5q7J+sKOB4pzVgXes/2HAn3fj/VyRHUkqgLsvYyzfA2ZMm7+qYGyGZ410
QIU6oRkJqwIrGq+hAd8dyGogFtByB3xOquCWeBnIF63MZ0mr7/Agjdek8a+h+Y+q
5nGfd+HuRTnzfgQezx+kX3K7a7ozj3lOpMDD5pAbXIbLKNWeyoAoUh31suGmTJO0
bWwbzDDze+g1VqaJOF/AuAiMuQi1k9ZbUlbhhlqRwBNAPWggEtH+3Iqj3flm+VE=
=ktQp
-END PGP SIGNATURE-
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: On XO-1.5 with 11.3.0/11.3.1 -- hang during shutdown?

2012-06-14 Thread Anish Mangal
FYI, The testing efforts (with old v new SD cards are being recorded
at http://dev.laptop.org.au/issues/1323)

On Wed, Jun 13, 2012 at 12:57 AM, Jerry Vonau jvo...@shaw.ca wrote:
 On Mon, 2012-06-11 at 20:06 +0530, Anish Mangal wrote:
 On Tue, Jun 5, 2012 at 8:11 PM, Martin Langhoff mar...@laptop.org wrote:
  On Tue, Jun 5, 2012 at 10:37 AM, Martin Langhoff mar...@laptop.org wrote:
   - Seems to be related to umount of /home failing. Adding sync ; sleep
  2; before umount seems to cure it; that's their current workaround.
 
  Cutting the CC list down to only devel@ for debugging --
 
  Anish,
 
  thanks for reporting this. Couple of questions/requests:
 
   - can you give us the exact patch showing the workaround you are applying?
 

 Jerry, can you pls provide the same?


 Sorry not patching on the fly with this one, just copying a revised file
 in place via OOB. You could do diff against a stock file and what we're
 using from:
 https://dev.laptop.org.au/projects/xo-au/repository/revisions/dex3/raw/olpc-os-builder/sub-files/halt

 Jerry


   - very interested in the microSD swap between good and bad units. Let
  us know how it goes.
 

 We just shipped a good SD card to the person with the 'failing'
 laptop. Expect to hear back very soon.

  On 12.1.0 the switch to systemd completely reworks the shutdown /
  umount process; so if it affects Fedora or OLPC releases, the scope is
  11.3.x / F14. Very unlikely that we see it, at least in this
  particular incarnation, on 12.1.0.
 
  cheers,
 
 
  m
  --
   mar...@laptop.org -- Software Architect - OLPC
   - ask interesting questions
   - don't get distracted with shiny stuff  - working code first
   - http://wiki.laptop.org/go/User:Martinlanghoff
 ___
 Devel mailing list
 Devel@lists.laptop.org
 http://lists.laptop.org/listinfo/devel


___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: On XO-1.5 with 11.3.0/11.3.1 -- hang during shutdown?

2012-06-14 Thread James Cameron
On Thu, Jun 14, 2012 at 12:03:39PM +0530, Anish Mangal wrote:
 FYI, The testing efforts (with old v new SD cards are being recorded
 at http://dev.laptop.org.au/issues/1323)

I've found microSD card performance can change slightly as a result of
a reflash.  On #1323 it seems an fs-update was done prior to the test.

If you wish to keep analysing it to look at the differences between
microSD cards, then:

- make the same number of shutdown tests for both the original microSD
  card and the different microSD card, so that the difference can be
  established statistically.

- restrict the testing to microSD cards from OLPC that we have
  qualified.

- widen the testing to microSD cards from OLPC that have had little use.

- look for difference in behaviour with the microSD card written to in
  one laptop and used in another ... 'cause I'd hate to find that this
  was due to fs-update.

I predict that this is a race condition during shutdown, which may
yield better to analysis with serial port attached.  The 11.3.x builds
maintain a getty and shell on the serial port, if I recall correctly,
and this may still be responsive at the time of the hang.  Using that
shell it may be possible to find what processes are happening.  If
that shell isn't available, try adding it.

As to what is causing the different timing between different laptops,
I predict that this is dependent on microSD card performance
variation.  These cards contain a FLASH translation layer that
processes SD commands and manages the remapping from virtual blocks to
physical cells.  Their performance can vary.

I did consider the possibility of power cycling timing during
fs-update, but in the XO-1.5 units you have the microSD power is
managed by the embedded controller.  The SD card is power cycled by
Open Firmware, but not the microSD card.

Another thing you might try is run a microSD card in an SD card
adapter, see if there is a difference.

#1323 needs a pointer to your earlier work on #1033.  I found it in
mail, but I shouldn't have had to.  ;-}

-- 
James Cameron
http://quozl.linux.org.au/
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: On XO-1.5 with 11.3.0/11.3.1 -- hang during shutdown?

2012-06-14 Thread Anish Mangal
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Thursday 14 June 2012 01:16 PM, James Cameron wrote:
 On Thu, Jun 14, 2012 at 12:03:39PM +0530, Anish Mangal wrote:
 FYI, The testing efforts (with old v new SD cards are being
 recorded at http://dev.laptop.org.au/issues/1323)
 
 I've found microSD card performance can change slightly as a result
 of a reflash.  On #1323 it seems an fs-update was done prior to the
 test.
 
 If you wish to keep analysing it to look at the differences
 between microSD cards, then:
 
 - make the same number of shutdown tests for both the original
 microSD card and the different microSD card, so that the difference
 can be established statistically.
 
 - restrict the testing to microSD cards from OLPC that we have 
 qualified.
 

See http://dev.laptop.org.au/issues/1323#note-3 for the above two
points. We're testing with microSD cards that came with the OLPC
laptops (i.e.  OLPC approved/validated)

 - widen the testing to microSD cards from OLPC that have had little
 use.
 
 - look for difference in behaviour with the microSD card written to
 in one laptop and used in another ... 'cause I'd hate to find that
 this was due to fs-update.
 

Perhaps this could be one next step. Right now, we're doing (we have
one xo-1.5 and two microSD cards, one probably 'good', and the other
one 'bad', both OLPC approved)

* Insert bad microSD. Flash the new build (using fs-update)
* Test

* Insert good microSD. Flash the new build (using fs-update)
* Test

 I predict that this is a race condition during shutdown, which may 
 yield better to analysis with serial port attached.  The 11.3.x
 builds maintain a getty and shell on the serial port, if I recall
 correctly, and this may still be responsive at the time of the
 hang.  Using that shell it may be possible to find what processes
 are happening.  If that shell isn't available, try adding it.
 

That's what we seem to have established in our initial debug (I think
it should be present somewhere in the thread history or the ticket).
Let me know if you can't find it.

 As to what is causing the different timing between different
 laptops, I predict that this is dependent on microSD card
 performance variation.  These cards contain a FLASH translation
 layer that processes SD commands and manages the remapping from
 virtual blocks to physical cells.  Their performance can vary.
 
 I did consider the possibility of power cycling timing during 
 fs-update, but in the XO-1.5 units you have the microSD power is 
 managed by the embedded controller.  The SD card is power cycled
 by Open Firmware, but not the microSD card.
 

The first thing we're trying to establish is that the problem happens
primarily due to a microSD card. Once we verify that it is the correct
direction, we'll go deeper into debug.

 Another thing you might try is run a microSD card in an SD card 
 adapter, see if there is a difference.
 
 #1323 needs a pointer to your earlier work on #1033.  I found it
 in mail, but I shouldn't have had to.  ;-}
 

Its already present in the 'Related Tickets' section on the same page.

- -- 
Anish

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJP2eV2AAoJEBoxUdDHDZVpx08H/jY46mT9vRI5Zn530BoeU4GJ
0LD2Ycf/0XWLjVPx7oS6cuySiOV2qWnNFUbh0iRVXOSHjafnm2Xsx2tMZX9t3CRe
JR3Yuzz1ymBPaYTK405+Kf5BadIHp6i0cJAG1jMtEvO7VvgQ8AxQCfiHOyjgxYe8
skI7w9xjSpIjB0nT76ePNaGG5FtLKQXMIixhcEbJt8pRoiBOLKYo2N6mEXgfniEU
k/UyvIbuMShdQzFJAIcQm8uEw8kZCHp4bQzjh+XMOL/H3eaL6JQT8K1bep9Ps+V6
YnC1D4ggFIm2uFUXy/aag0yaFepDoBvT3g0e66awPT/ZcT3p3+TwF8AlilwVOkw=
=GF4u
-END PGP SIGNATURE-
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: On XO-1.5 with 11.3.0/11.3.1 -- hang during shutdown?

2012-06-14 Thread Anish Mangal
On Fri, Jun 15, 2012 at 4:09 AM, James Cameron qu...@laptop.org wrote:
 On Thu, Jun 14, 2012 at 06:51:58PM +0530, Anish Mangal wrote:
 * Insert bad microSD. Flash the new build (using fs-update)
 * Test

 * Insert good microSD. Flash the new build (using fs-update)
 * Test

 In your testing, please also control for the version of Open Firmware
 used at the fs-update step.


Good point. Deepak, please take note of it in your testing. The
firmware version while testing with the old and new microSD cards
should be same.

 --
 James Cameron
 http://quozl.linux.org.au/
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: On XO-1.5 with 11.3.0/11.3.1 -- hang during shutdown?

2012-06-12 Thread Jerry Vonau
On Mon, 2012-06-11 at 20:06 +0530, Anish Mangal wrote:
 On Tue, Jun 5, 2012 at 8:11 PM, Martin Langhoff mar...@laptop.org wrote:
  On Tue, Jun 5, 2012 at 10:37 AM, Martin Langhoff mar...@laptop.org wrote:
   - Seems to be related to umount of /home failing. Adding sync ; sleep
  2; before umount seems to cure it; that's their current workaround.
 
  Cutting the CC list down to only devel@ for debugging --
 
  Anish,
 
  thanks for reporting this. Couple of questions/requests:
 
   - can you give us the exact patch showing the workaround you are applying?
 
 
 Jerry, can you pls provide the same?
 

Sorry not patching on the fly with this one, just copying a revised file
in place via OOB. You could do diff against a stock file and what we're
using from:
https://dev.laptop.org.au/projects/xo-au/repository/revisions/dex3/raw/olpc-os-builder/sub-files/halt

Jerry


   - very interested in the microSD swap between good and bad units. Let
  us know how it goes.
 
 
 We just shipped a good SD card to the person with the 'failing'
 laptop. Expect to hear back very soon.
 
  On 12.1.0 the switch to systemd completely reworks the shutdown /
  umount process; so if it affects Fedora or OLPC releases, the scope is
  11.3.x / F14. Very unlikely that we see it, at least in this
  particular incarnation, on 12.1.0.
 
  cheers,
 
 
  m
  --
   mar...@laptop.org -- Software Architect - OLPC
   - ask interesting questions
   - don't get distracted with shiny stuff  - working code first
   - http://wiki.laptop.org/go/User:Martinlanghoff
 ___
 Devel mailing list
 Devel@lists.laptop.org
 http://lists.laptop.org/listinfo/devel


___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: On XO-1.5 with 11.3.0/11.3.1 -- hang during shutdown?

2012-06-11 Thread Anish Mangal
On Tue, Jun 5, 2012 at 8:11 PM, Martin Langhoff mar...@laptop.org wrote:
 On Tue, Jun 5, 2012 at 10:37 AM, Martin Langhoff mar...@laptop.org wrote:
  - Seems to be related to umount of /home failing. Adding sync ; sleep
 2; before umount seems to cure it; that's their current workaround.

 Cutting the CC list down to only devel@ for debugging --

 Anish,

 thanks for reporting this. Couple of questions/requests:

  - can you give us the exact patch showing the workaround you are applying?


Jerry, can you pls provide the same?

  - very interested in the microSD swap between good and bad units. Let
 us know how it goes.


We just shipped a good SD card to the person with the 'failing'
laptop. Expect to hear back very soon.

 On 12.1.0 the switch to systemd completely reworks the shutdown /
 umount process; so if it affects Fedora or OLPC releases, the scope is
 11.3.x / F14. Very unlikely that we see it, at least in this
 particular incarnation, on 12.1.0.

 cheers,


 m
 --
  mar...@laptop.org -- Software Architect - OLPC
  - ask interesting questions
  - don't get distracted with shiny stuff  - working code first
  - http://wiki.laptop.org/go/User:Martinlanghoff
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: On XO-1.5 with 11.3.0/11.3.1 -- hang during shutdown?

2012-06-10 Thread Kevin Gordon
On Fri, Jun 8, 2012 at 6:26 PM, Anish Mangal an...@activitycentral.orgwrote:

 On Fri, Jun 8, 2012 at 5:45 PM, Kevin Gordon kgordon...@gmail.com wrote:
  Sitting in a lab here in Kenya with about 60 XO 1.5's all at standard 885
 
  Did shutdowns on all boxes 3 times (180 shutdowns) - all with right-click
  on centre icon and choose shutdown - after they were up for 5 mins or
 more.
 
  Results:
 
  160 normal shutdowns
 
  20 hang on warnings page
 
  No machines hung 3 times
  4 machines hung twice
  12 machines hung once.
 
  tap on power button shut hung machines off.
 

 Was it a tap, or an extended press (longer duration, until the machine
 powered off)


It is the extended press and hold, not like the tap twice to power off
timing, sorry for the poor choice of words :-)


 In any case, can you note down the Serial Nos. of the machines that hung?


They are all out in the schools now, but I can state that the serial
numbers are all in the range:  SHC13100D00 to SHC13100DFF, as all of the
machines in this batch are from a single order of 100 machines.


 Thanks!

  Cheers
 
  KG
 
 
 
 
 
  On Tue, Jun 5, 2012 at 11:55 PM, Martin Langhoff mar...@laptop.org
 wrote:
 
  On Tue, Jun 5, 2012 at 4:43 PM, Tom Parker t...@carrott.org wrote:
   We'll do some explicit testing of shutdown on Saturday.
 
  Fantastic, thanks!
 
 
 
  m
  --
   mar...@laptop.org -- Software Architect - OLPC
   - ask interesting questions
   - don't get distracted with shiny stuff  - working code first
   - http://wiki.laptop.org/go/User:Martinlanghoff
  ___
  Devel mailing list
  Devel@lists.laptop.org
  http://lists.laptop.org/listinfo/devel
 
 
 
  ___
  Devel mailing list
  Devel@lists.laptop.org
  http://lists.laptop.org/listinfo/devel
 

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: On XO-1.5 with 11.3.0/11.3.1 -- hang during shutdown?

2012-06-09 Thread Peter Robinson
On Sat, Jun 9, 2012 at 3:43 AM, Anish Mangal an...@sugarlabs.org wrote:
 On Sat, Jun 9, 2012 at 2:22 AM, Walter Bender walter.ben...@gmail.com wrote:
 On Tue, Jun 5, 2012 at 10:37 AM, Martin Langhoff mar...@laptop.org wrote:
 Hi folks,

 the Dextrose team has been hunting a bug on XO-1.5, on their variant
 of 11.3.x (Dextrose 3 or DX3), and they are pointing out that it
 could be a latent or unreported problem in 11.3.x series. It could
 also be a problem in their modifications.

 While they continue to investigate, it is important to hear whether
 anyone has seen XO-1.5s handing during shutdown, on the Warnings
 screen.

  - It seems to affect some units, not all. So far it has been seen on
 3 units with SN starting with SHC037 .

  - Units affected show the symptoms 2 out of 5 boots.

  - Seems to be related to umount of /home failing. Adding sync ; sleep
 2; before umount seems to cure it; that's their current workaround.

  - Their debugging adventures are documented at
 http://dev.laptop.org.au/issues/1033

  - They will be checking whether the symptoms follow the microSD card
 or the motherboard, swapping microSD with a good unit.

 Why is this important? When the unit hangs during shutdown, it is left
 in a condition where it can overheat, potentially damaging the unit.

 The bottom line: have you seen this issue on XO-1.5 + 11.3.x? Even if
 ocassionally? Let us know, and join the bug-hunting party.

 cheers,



 m
 --
  mar...@laptop.org -- Software Architect - OLPC
  - ask interesting questions
  - don't get distracted with shiny stuff  - working code first
  - http://wiki.laptop.org/go/User:Martinlanghoff
 ___
 Devel mailing list
 Devel@lists.laptop.org
 http://lists.laptop.org/listinfo/devel

 While I haven't seen the exact symptoms described here, I have seen
 some problems here in Chachapoyas. XO 1.5s running 11.3.1. It seems
 that machines that are running for quite some time with very active
 use eventually hang. I grabbed some longs from a Browse session, which
 seems to be the activity running when the crashes happen, but looking
 at the logs, it seems to be something failing at a lower level. See
 http://bugs.sugarlabs.org/ticket/3678

 Also, at times, Restart fails to restart.


 In our specific case (of this problem on an XO-1.5 running dx3), the
 hang was occurring even when almost no activity was done, i.e. start
 the XO, wait for sugar boot, shutdown the XO.

Does dx3 have a different partition for /home as part of the standard
dx3 build? If so I vaguely remember a hang issue on shutdown on
generic F-14 with separate /home but F-14 was a while ago so I'm not
sure but it might be worthwhile doing a google / bugzilla search of
mainline Fedora bugs/lists/wiki to see if it was a general problem
that was fixed upstream.

Peter
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: On XO-1.5 with 11.3.0/11.3.1 -- hang during shutdown?

2012-06-08 Thread Kevin Gordon
Sitting in a lab here in Kenya with about 60 XO 1.5's all at standard 885

Did shutdowns on all boxes 3 times (180 shutdowns) - all with right-click
on centre icon and choose shutdown - after they were up for 5 mins or more.

Results:

160 normal shutdowns

20 hang on warnings page

No machines hung 3 times
4 machines hung twice
12 machines hung once.

tap on power button shut hung machines off.

Cheers

KG




On Tue, Jun 5, 2012 at 11:55 PM, Martin Langhoff mar...@laptop.org wrote:

 On Tue, Jun 5, 2012 at 4:43 PM, Tom Parker t...@carrott.org wrote:
  We'll do some explicit testing of shutdown on Saturday.

 Fantastic, thanks!



 m
 --
  mar...@laptop.org -- Software Architect - OLPC
  - ask interesting questions
  - don't get distracted with shiny stuff  - working code first
  - http://wiki.laptop.org/go/User:Martinlanghoff
 ___
 Devel mailing list
 Devel@lists.laptop.org
 http://lists.laptop.org/listinfo/devel

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: On XO-1.5 with 11.3.0/11.3.1 -- hang during shutdown?

2012-06-08 Thread Anish Mangal
On Fri, Jun 8, 2012 at 5:45 PM, Kevin Gordon kgordon...@gmail.com wrote:
 Sitting in a lab here in Kenya with about 60 XO 1.5's all at standard 885

 Did shutdowns on all boxes 3 times (180 shutdowns) - all with right-click
 on centre icon and choose shutdown - after they were up for 5 mins or more.

 Results:

 160 normal shutdowns

 20 hang on warnings page

 No machines hung 3 times
 4 machines hung twice
 12 machines hung once.

 tap on power button shut hung machines off.


Was it a tap, or an extended press (longer duration, until the machine
powered off)

In any case, can you note down the Serial Nos. of the machines that hung?

Thanks!

 Cheers

 KG





 On Tue, Jun 5, 2012 at 11:55 PM, Martin Langhoff mar...@laptop.org wrote:

 On Tue, Jun 5, 2012 at 4:43 PM, Tom Parker t...@carrott.org wrote:
  We'll do some explicit testing of shutdown on Saturday.

 Fantastic, thanks!



 m
 --
  mar...@laptop.org -- Software Architect - OLPC
  - ask interesting questions
  - don't get distracted with shiny stuff  - working code first
  - http://wiki.laptop.org/go/User:Martinlanghoff
 ___
 Devel mailing list
 Devel@lists.laptop.org
 http://lists.laptop.org/listinfo/devel



 ___
 Devel mailing list
 Devel@lists.laptop.org
 http://lists.laptop.org/listinfo/devel

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: On XO-1.5 with 11.3.0/11.3.1 -- hang during shutdown?

2012-06-08 Thread Walter Bender
On Tue, Jun 5, 2012 at 10:37 AM, Martin Langhoff mar...@laptop.org wrote:
 Hi folks,

 the Dextrose team has been hunting a bug on XO-1.5, on their variant
 of 11.3.x (Dextrose 3 or DX3), and they are pointing out that it
 could be a latent or unreported problem in 11.3.x series. It could
 also be a problem in their modifications.

 While they continue to investigate, it is important to hear whether
 anyone has seen XO-1.5s handing during shutdown, on the Warnings
 screen.

  - It seems to affect some units, not all. So far it has been seen on
 3 units with SN starting with SHC037 .

  - Units affected show the symptoms 2 out of 5 boots.

  - Seems to be related to umount of /home failing. Adding sync ; sleep
 2; before umount seems to cure it; that's their current workaround.

  - Their debugging adventures are documented at
 http://dev.laptop.org.au/issues/1033

  - They will be checking whether the symptoms follow the microSD card
 or the motherboard, swapping microSD with a good unit.

 Why is this important? When the unit hangs during shutdown, it is left
 in a condition where it can overheat, potentially damaging the unit.

 The bottom line: have you seen this issue on XO-1.5 + 11.3.x? Even if
 ocassionally? Let us know, and join the bug-hunting party.

 cheers,



 m
 --
  mar...@laptop.org -- Software Architect - OLPC
  - ask interesting questions
  - don't get distracted with shiny stuff  - working code first
  - http://wiki.laptop.org/go/User:Martinlanghoff
 ___
 Devel mailing list
 Devel@lists.laptop.org
 http://lists.laptop.org/listinfo/devel

While I haven't seen the exact symptoms described here, I have seen
some problems here in Chachapoyas. XO 1.5s running 11.3.1. It seems
that machines that are running for quite some time with very active
use eventually hang. I grabbed some longs from a Browse session, which
seems to be the activity running when the crashes happen, but looking
at the logs, it seems to be something failing at a lower level. See
http://bugs.sugarlabs.org/ticket/3678

Also, at times, Restart fails to restart.

-walter

-- 
Walter Bender
Sugar Labs
http://www.sugarlabs.org
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: On XO-1.5 with 11.3.0/11.3.1 -- hang during shutdown?

2012-06-08 Thread Anish Mangal
On Sat, Jun 9, 2012 at 2:22 AM, Walter Bender walter.ben...@gmail.com wrote:
 On Tue, Jun 5, 2012 at 10:37 AM, Martin Langhoff mar...@laptop.org wrote:
 Hi folks,

 the Dextrose team has been hunting a bug on XO-1.5, on their variant
 of 11.3.x (Dextrose 3 or DX3), and they are pointing out that it
 could be a latent or unreported problem in 11.3.x series. It could
 also be a problem in their modifications.

 While they continue to investigate, it is important to hear whether
 anyone has seen XO-1.5s handing during shutdown, on the Warnings
 screen.

  - It seems to affect some units, not all. So far it has been seen on
 3 units with SN starting with SHC037 .

  - Units affected show the symptoms 2 out of 5 boots.

  - Seems to be related to umount of /home failing. Adding sync ; sleep
 2; before umount seems to cure it; that's their current workaround.

  - Their debugging adventures are documented at
 http://dev.laptop.org.au/issues/1033

  - They will be checking whether the symptoms follow the microSD card
 or the motherboard, swapping microSD with a good unit.

 Why is this important? When the unit hangs during shutdown, it is left
 in a condition where it can overheat, potentially damaging the unit.

 The bottom line: have you seen this issue on XO-1.5 + 11.3.x? Even if
 ocassionally? Let us know, and join the bug-hunting party.

 cheers,



 m
 --
  mar...@laptop.org -- Software Architect - OLPC
  - ask interesting questions
  - don't get distracted with shiny stuff  - working code first
  - http://wiki.laptop.org/go/User:Martinlanghoff
 ___
 Devel mailing list
 Devel@lists.laptop.org
 http://lists.laptop.org/listinfo/devel

 While I haven't seen the exact symptoms described here, I have seen
 some problems here in Chachapoyas. XO 1.5s running 11.3.1. It seems
 that machines that are running for quite some time with very active
 use eventually hang. I grabbed some longs from a Browse session, which
 seems to be the activity running when the crashes happen, but looking
 at the logs, it seems to be something failing at a lower level. See
 http://bugs.sugarlabs.org/ticket/3678

 Also, at times, Restart fails to restart.


In our specific case (of this problem on an XO-1.5 running dx3), the
hang was occurring even when almost no activity was done, i.e. start
the XO, wait for sugar boot, shutdown the XO.

 -walter

 --
 Walter Bender
 Sugar Labs
 http://www.sugarlabs.org
 ___
 Devel mailing list
 Devel@lists.laptop.org
 http://lists.laptop.org/listinfo/devel



-- 
Anish | an...@sugarlabs.org
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


On XO-1.5 with 11.3.0/11.3.1 -- hang during shutdown?

2012-06-05 Thread Martin Langhoff
Hi folks,

the Dextrose team has been hunting a bug on XO-1.5, on their variant
of 11.3.x (Dextrose 3 or DX3), and they are pointing out that it
could be a latent or unreported problem in 11.3.x series. It could
also be a problem in their modifications.

While they continue to investigate, it is important to hear whether
anyone has seen XO-1.5s handing during shutdown, on the Warnings
screen.

 - It seems to affect some units, not all. So far it has been seen on
3 units with SN starting with SHC037 .

 - Units affected show the symptoms 2 out of 5 boots.

 - Seems to be related to umount of /home failing. Adding sync ; sleep
2; before umount seems to cure it; that's their current workaround.

 - Their debugging adventures are documented at
http://dev.laptop.org.au/issues/1033

 - They will be checking whether the symptoms follow the microSD card
or the motherboard, swapping microSD with a good unit.

Why is this important? When the unit hangs during shutdown, it is left
in a condition where it can overheat, potentially damaging the unit.

The bottom line: have you seen this issue on XO-1.5 + 11.3.x? Even if
ocassionally? Let us know, and join the bug-hunting party.

cheers,



m
-- 
 mar...@laptop.org -- Software Architect - OLPC
 - ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 - http://wiki.laptop.org/go/User:Martinlanghoff
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: On XO-1.5 with 11.3.0/11.3.1 -- hang during shutdown?

2012-06-05 Thread Martin Langhoff
On Tue, Jun 5, 2012 at 10:37 AM, Martin Langhoff mar...@laptop.org wrote:
  - Seems to be related to umount of /home failing. Adding sync ; sleep
 2; before umount seems to cure it; that's their current workaround.

Cutting the CC list down to only devel@ for debugging --

Anish,

thanks for reporting this. Couple of questions/requests:

 - can you give us the exact patch showing the workaround you are applying?

 - very interested in the microSD swap between good and bad units. Let
us know how it goes.

On 12.1.0 the switch to systemd completely reworks the shutdown /
umount process; so if it affects Fedora or OLPC releases, the scope is
11.3.x / F14. Very unlikely that we see it, at least in this
particular incarnation, on 12.1.0.

cheers,


m
-- 
 mar...@laptop.org -- Software Architect - OLPC
 - ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 - http://wiki.laptop.org/go/User:Martinlanghoff
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: On XO-1.5 with 11.3.0/11.3.1 -- hang during shutdown?

2012-06-05 Thread Tom Parker

On 06/06/12 02:37, Martin Langhoff wrote:


The bottom line: have you seen this issue on XO-1.5 + 11.3.x? Even if
ocassionally? Let us know, and join the bug-hunting party.


We have seen this sort of problem in Auckland recently. However I can't 
really be sure if what we see is related. We test XO-1.5s, XO-1.75s, 
11.3.1, olpc-au's dextrose release and 12.1.0 and I don't recall which 
laptops and which releases have hung. We have seen a wide variety of 
shutdown misbehaviour, ranging from shutdown doesn't do anything at all 
through to everything appears to be off except the power light is on solid.


We'll do some explicit testing of shutdown on Saturday.
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: On XO-1.5 with 11.3.0/11.3.1 -- hang during shutdown?

2012-06-05 Thread Martin Langhoff
On Tue, Jun 5, 2012 at 4:43 PM, Tom Parker t...@carrott.org wrote:
 We'll do some explicit testing of shutdown on Saturday.

Fantastic, thanks!



m
-- 
 mar...@laptop.org -- Software Architect - OLPC
 - ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 - http://wiki.laptop.org/go/User:Martinlanghoff
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel