Re: Run OFW heat spreader test
On 24 January 2012 02:36, Martin Langhoff martin.langh...@gmail.com wrote: On Mon, Jan 23, 2012 at 7:54 AM, Richard A. Smith rich...@laptop.org wrote: Hmmm... Something else is the problem here. You can't damage the processor via thermal overload because it has an automatic clock back off. If you have motherboards that are failing its not due to a bad heat spreader. At worst all you would get would be hangs. Agreed with Richard -- Sridhar, if you are seeing permanent mb failures, let's get SNs of those motherboards into Reuben's hands for more in-depth diagnostics. I've addressed this matter off-list. We are thinking of forcing a heat spreader test based on the XO's manufacturing date: https://dev.laptop.org.au/issues/1026 Sridhar ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: Run OFW heat spreader test
On 01/23/2012 01:32 AM, Sridhar Dhanapalan wrote: On 23 January 2012 17:20, James Cameronqu...@laptop.org wrote: I thought you were doing this test to detect early units that may have a failed heat spreader, and you were doing it at the time of reflashing because that's when you had some control. Yes, that's the primary reason. Our initial batch of XO-1.5s have an inefficient heat spreader. They've been burning out, and replacing the motherboards is getting expensive and time consuming. We'd like to detect potentially faulty units early, and recommend a heat spreader change for them. Hmmm... Something else is the problem here. You can't damage the processor via thermal overload because it has an automatic clock back off. If you have motherboards that are failing its not due to a bad heat spreader. At worst all you would get would be hangs. Can you acquire the serial number of the failed motherboards or is that not lost? -- Richard A. Smith rich...@laptop.org One Laptop per Child ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Run OFW heat spreader test
We are using a custom olpc.fth to present a boot menu so that users can easily flash their XOs. As a precaution, we run a lid switches test before the OS installation begins. Some of our XOs have older, less effective heat spreaders, and we would like to catch these before they get burnt-out by the flashing process. The automatic lid switches test is confusing some teachers. Ideally we only want to run the heat spreader test part of it, so that the test is transparent and the user doesn't need to close the lid. Is this possible? Further to this, is it possible to reliably parse the result and halt the OS flashing if the test fails? Sridhar Dhanapalan Engineering Manager One Laptop per Child Australia M: +61 425 239 701 E: srid...@laptop.org.au A: G.P.O. Box 731 Sydney, NSW 2001 W: www.laptop.org.au ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: Run OFW heat spreader test
On 01/22/2012 09:00 PM, Sridhar Dhanapalan wrote: We are using a custom olpc.fth to present a boot menu so that users can easily flash their XOs. As a precaution, we run a lid switches test before the OS installation begins. Some of our XOs have older, less effective heat spreaders, and we would like to catch these before they get burnt-out by the flashing process. The CPU has an internal thermal shutdown. It won't burn out. I run 1.5's without heat spreaders all the time. The reason you want to catch them is that fs-update will hang. The automatic lid switches test is confusing some teachers. Ideally we only want to run the heat spreader test part of it, so that the test is transparent and the user doesn't need to close the lid. Is this possible? Yes. I don' have a 1.5 with me at the moment but from looking at the OFW source I think you want ' .temp-rise '. ok .temp-rise That should run the thermal test and print a pass fail message. It also returns back a true or false on the stack for if the test passes or fails. -- Richard A. Smith rich...@laptop.org One Laptop per Child ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: Run OFW heat spreader test
On Mon, 2012-01-23 at 13:00 +1100, Sridhar Dhanapalan wrote: We are using a custom olpc.fth to present a boot menu so that users can easily flash their XOs. As a precaution, we run a lid switches test before the OS installation begins. Some of our XOs have older, less effective heat spreaders, and we would like to catch these before they get burnt-out by the flashing process. The automatic lid switches test is confusing some teachers. Ideally we only want to run the heat spreader test part of it, so that the test is transparent and the user doesn't need to close the lid. Is this possible? I thought with the physical layout of the motherboard, CPU towards the outside of the screen lid, that e-book mode would be the hardest on the XO in terms of heat dissipation. Given there in no active cooling, is the heat spreader test not measuring the difference in temperature between the 2 modes of operation? Jerry ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: Run OFW heat spreader test
On Sun, 2012-01-22 at 23:16 -0600, Jerry Vonau wrote: On Mon, 2012-01-23 at 00:02 -0500, Richard A. Smith wrote: On 01/22/2012 09:00 PM, Sridhar Dhanapalan wrote: We are using a custom olpc.fth to present a boot menu so that users can easily flash their XOs. As a precaution, we run a lid switches test before the OS installation begins. Some of our XOs have older, less effective heat spreaders, and we would like to catch these before they get burnt-out by the flashing process. The CPU has an internal thermal shutdown. It won't burn out. I run 1.5's without heat spreaders all the time. The reason you want to catch them is that fs-update will hang. The automatic lid switches test is confusing some teachers. Ideally we only want to run the heat spreader test part of it, so that the test is transparent and the user doesn't need to close the lid. Is this possible? Yes. I don' have a 1.5 with me at the moment but from looking at the OFW source I think you want ' .temp-rise '. ok .temp-rise That should run the thermal test and print a pass fail message. It also returns back a true or false on the stack for if the test passes or fails. Thanks, looking into that part of OFW code. .temp-rise returns a ? from the OK prompt. Jerry ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: Run OFW heat spreader test
On 01/23/2012 12:23 AM, Jerry Vonau wrote: Thanks, looking into that part of OFW code. .temp-rise returns a ? from the OK prompt. Ah... I see its part of the /switches node . Try this: ok select /switches ok .temp-rise -- Richard A. Smith rich...@laptop.org One Laptop per Child ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: Run OFW heat spreader test
On 01/23/2012 12:17 AM, Jerry Vonau wrote: Is this possible? I thought with the physical layout of the motherboard, CPU towards the outside of the screen lid, that e-book mode would be the hardest on the XO in terms of heat dissipation. Given there in no active cooling, is the heat spreader test not measuring the difference in temperature between the 2 modes of operation? No. The heat spreader test runs the cpu in a tight loop and watches the rate of change in cpu temp. If it rises to quickly then the heat spreader isn't making good contact. -- Richard A. Smith rich...@laptop.org One Laptop per Child ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: Run OFW heat spreader test
On 01/23/2012 12:41 AM, Richard A. Smith wrote: On 01/23/2012 12:23 AM, Jerry Vonau wrote: Thanks, looking into that part of OFW code. .temp-rise returns a ? from the OK prompt. Ah... I see its part of the /switches node . Try this: oops.. forgot to get back out of that device node instance. That should be: ok select /switches ok .temp-rise ok unselect -- Richard A. Smith rich...@laptop.org One Laptop per Child ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: Run OFW heat spreader test
The heat spreader test is always run in e-book mode, because that's what the immediately preceeding test does. I imagine you would get different results if you didn't run it in e-book mode. I imagine that over a large sample, the results would be considerably different. For the user training issue, stop using the manufacturing test prompts, and replace them with something localised. dev /switches : new-wait-lid ( -- ) . Thermal test step 1, close and then re-open the laptop lid. cr begin ?key-abort lid? until ; : new-wait-ebook ( -- ) . Thermal test step 2, rotate the top part and lay it down face up. cr begin ?key-abort ebook? until . Thermal test step 3, please wait a few seconds. cr ; patch new-wait-lid wait-lid all-switch-states patch new-wait-ebook wait-ebook all-switch-states -- James Cameron http://quozl.linux.org.au/ ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: Run OFW heat spreader test
On Mon, Jan 23, 2012 at 04:59:55PM +1100, Sridhar Dhanapalan wrote: How can we parse the output? Our attempts so far have been unreliable - Jerry has details. You should find it very reliable when used correctly. Don't parse the output. Instead, call temp-rise and use the value on the stack, comparing it to temperature-threshold, in the same way that .temp-rise does. You may want a localised temperature-threshold as well, based on a survey of rise values in the field. -- James Cameron http://quozl.linux.org.au/ ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: Run OFW heat spreader test
On Mon, Jan 23, 2012 at 05:03:29PM +1100, Sridhar Dhanapalan wrote: On 23 January 2012 16:56, James Cameron qu...@laptop.org wrote: The heat spreader test is always run in e-book mode, because that's what the immediately preceeding test does. I imagine you would get different results if you didn't run it in e-book mode. ?I imagine that over a large sample, the results would be considerably different. For the purposes of flashing the XOs, does it matter? We haven't measured how many units pass the test upright versus pass the test in e-book mode, we only test in e-book mode. If you think it matters, you will have to characterise the result. The manual handling of the upper section can also change the test result when a heat spreader is in a marginal condition. I have results that show this. I thought you were doing this test to detect early units that may have a failed heat spreader, and you were doing it at the time of reflashing because that's when you had some control. I don't think it is worth doing this test for the purposes of flashing the XOs. The built in throttling will work fine. If the heat spreader is dodgy, you'll either get a hang during fs-update or it will take much longer. Of course, make sure the units are not racked, stacked, or placed under a towel. For the user training issue, stop using the manufacturing test prompts, and replace them with something localised. dev /switches : new-wait-lid ?( -- ) ?. Thermal test step 1, close and then re-open the laptop lid. cr ? ? begin ??key-abort ?lid? until ; : new-wait-ebook ?( -- ) ?. Thermal test step 2, rotate the top part and lay it down face up. cr ?begin ??key-abort ?ebook? until ?. Thermal test step 3, please wait a few seconds. cr ; patch new-wait-lid wait-lid all-switch-states patch new-wait-ebook wait-ebook all-switch-states The problem is that this is very tedious for the user. Imagine you're flashing 100 XOs together. Having to close and open the lid twice per XO will make the entire process much longer. Yes, as you can see above it really wasn't clear to me why you were running the test. Looking at your original post on the thread, I don't think heat spreaders will be burnt-out by the flashing process. -- James Cameron http://quozl.linux.org.au/ ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: Run OFW heat spreader test
On 23 January 2012 17:20, James Cameron qu...@laptop.org wrote: I thought you were doing this test to detect early units that may have a failed heat spreader, and you were doing it at the time of reflashing because that's when you had some control. Yes, that's the primary reason. Our initial batch of XO-1.5s have an inefficient heat spreader. They've been burning out, and replacing the motherboards is getting expensive and time consuming. We'd like to detect potentially faulty units early, and recommend a heat spreader change for them. As a thought - maybe we should be identifying the serial number as well? I don't think it is worth doing this test for the purposes of flashing the XOs. The built in throttling will work fine. If the heat spreader is dodgy, you'll either get a hang during fs-update or it will take much longer. Hangs are annoying and don't provide any useful feedback to the user. It might be true that the XO can't get damaged from flashing, but the symptoms at runtime are random and are difficult to diagnose. I think that forcing a heat spreader test can provide a warning to the user and allow them to do something before any damage or annoying behaviour can begin. ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel
Re: Run OFW heat spreader test
On Mon, Jan 23, 2012 at 05:32:05PM +1100, Sridhar Dhanapalan wrote: On 23 January 2012 17:20, James Cameron qu...@laptop.org wrote: I thought you were doing this test to detect early units that may have a failed heat spreader, and you were doing it at the time of reflashing because that's when you had some control. Yes, that's the primary reason. Our initial batch of XO-1.5s have an inefficient heat spreader. They've been burning out, and replacing the motherboards is getting expensive and time consuming. We'd like to detect potentially faulty units early, and recommend a heat spreader change for them. I don't think burning out is the right wording. Perhaps you mean they have been losing contact with the CPU. This would be a gradual process, and would be encouraged mostly by handling, including compression of the case and opening and closing the lid. I'm certain fs-update would not affect this process. As a thought - maybe we should be identifying the serial number as well? Yes, you should be able to skip the test if the serial number is not in the range of your initial batch of XO-1.5s. I don't think it is worth doing this test for the purposes of flashing the XOs. ?The built in throttling will work fine. ?If the heat spreader is dodgy, you'll either get a hang during fs-update or it will take much longer. Hangs are annoying and don't provide any useful feedback to the user. It might be true that the XO can't get damaged from flashing, but the symptoms at runtime are random and are difficult to diagnose. I think that forcing a heat spreader test can provide a warning to the user and allow them to do something before any damage or annoying behaviour can begin. We have long since fixed the problem that led to random symptoms after boot following an incomplete reflash ... I trust you are using that code which writes the zero block last in the .zd file? So unlike your previous experience I don't expect any random and difficult to diagnose symptoms. Perhaps you should test what the result is. The unit should fail to boot. You can simulate a hang during fs-update by forcing the power off. The result should be identical. -- James Cameron http://quozl.linux.org.au/ ___ Devel mailing list Devel@lists.laptop.org http://lists.laptop.org/listinfo/devel