Re: [maemo-developers] RE: defective memory?
Kimmo Hämäläinen a écrit : On Wed, 2006-09-20 at 23:17, ext Frantisek Dufka wrote: ... So it really seems related to wi-fi. Thank you for the information, I'm trying to keep the internal investigation on-going (and not just assuming that HW is broken). These kind of hints should help to nail it down. BR, Kimmo Frantisek Hum ... this time I spent much more time on this and i have done a lot of tests. I try to use a scientifical approach so that I could say "yes the problem happen 5/100 if condition x ..." The results "for my device" are that the problem seem not to be related to - empty battery - wifi - temperature - high power requirements - bad memory region or any combination of these parameters. You can have the illusion that there is a correlation with that or that but no, the stats say that it's just a coincidence. I have notice that when memtester find a bad address line then the probability to find other bad address in the same run is very high. So after all that I REALLY don't know what the problem is :-( Need somme new ideas ... or an oscilloscope ... to go deeper. Regards, Olivier ROLAND ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] RE: defective memory?
On Wed, 2006-09-20 at 23:17, ext Frantisek Dufka wrote: ... > So it really seems related to wi-fi. Thank you for the information, I'm trying to keep the internal investigation on-going (and not just assuming that HW is broken). These kind of hints should help to nail it down. BR, Kimmo > > Frantisek ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] RE: defective memory?
After previous test I put device with empty battery to charger and it happens also when connected to charger (over wi-fi). Once with wlan power settings 100mw and later also when reduced to 10mw. Nokia770-26:~# ./memtester 40 1 memtester version 4.0.5 (32-bit) Copyright (C) 2005 Charles Cazabon. Licensed under the GNU General Public License version 2 (only). pagesize is 4096 pagesizemask is 0xf000 want 40MB (41943040 bytes) got 40MB (41943040 bytes), virtual address=0x41131000, trying mlock ...locked. Loop 1/1: Stuck Address : testing 0FAILURE: possible bad address line at offset 0x005031a5 (page offset 1a5). Skipping to next test... Random Value: FAILURE: 0xbf3777dc != 0xbf37 at offset 0x31a5 (page offset 1a5). FAILURE: 0x454d954f != 0x454d at offset 0x31a5 (page offset 1a5). Compare XOR : FAILURE: 0x8e5146b4 != 0x8e50b165 at offset 0x31a5 (page offset 1a5). Compare SUB : FAILURE: 0x06bfe708 != 0x9f20 at offset 0x31a5 (page offset 1a5). Compare MUL : Compare DIV : ok Compare OR : ok FAILURE: 0x7b69b068 != 0x7b69 at offset 0x31a5 (page offset 1a5). Compare AND : Sequential Increment: ok Solid Bits : testing 1FAILURE: 0x != 0x at offset 0x31a5 (page offset 1a5). Block Sequential: testing 1FAILURE: 0x01010101 != 0x0101 at offset 0x31a5 (page offset 1a5). Checkerboard: testing 0FAILURE: 0x != 0x at offset 0x31a5 (page offset 1a5). Bit Spread : testing 0FAILURE: 0xfffa != 0x at offset 0x31a5 (page offset 1a5). Bit Flip: testing 0FAILURE: 0x0001 != 0x at offset 0x31a5 (page offset 1a5). Walking Ones: testing 0FAILURE: 0xfffe != 0x at offset 0x31a5 (page offset 1a5). Walking Zeroes : testing 0FAILURE: 0x0001 != 0x at offset 0x31a5 (page offset 1a5). Done. Nokia770-26:~# Then I closed wlan and connected via bluetooth and no errors on charger. Then disconnected charger and no error again on battery (grey icon with 1 stripe again). Then did again still ins same shell via ssh over bluetooth but also connected to WLAN (but left it idle). And it happened again ! Then just disconnected WLAN and run again is same shell and it was OK again. So it really seems related to wi-fi. Frantisek ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] RE: defective memory?
Olivier ROLAND wrote: Siarhei Siamashka a écrit : On 9/19/06, Kimmo Hämäläinen <[EMAIL PROTECTED]> wrote: Yes, it would need to be reproducible in several different devices. The guy here that tried to reproduce it currently thinks that Siarhei's unit is broken. If your device is broken then mine is also. And mine too. Nokia770-26:~# ./memtester 40 1 memtester version 4.0.5 (32-bit) Copyright (C) 2005 Charles Cazabon. Licensed under the GNU General Public License version 2 (only). pagesize is 4096 pagesizemask is 0xf000 want 40MB (41943040 bytes) got 40MB (41943040 bytes), virtual address=0x41131000, trying mlock ...locked. Loop 1/1: Stuck Address : testing 0FAILURE: possible bad address line at offset 0x0029b1a5 (page offset 1a5). Skipping to next test... Random Value: FAILURE: 0x5df9 != 0x5df900cb at offset 0x0029b1a5 (page offset 1a5). FAILURE: 0xa783 != 0xa783e258 at offset 0x0029b1a5 (page offset 1a5). Compare XOR : FAILURE: 0xf086b165 != 0xf08793bd at offset 0x0029b1a5 (page offset 1a5). Compare SUB : FAILURE: 0x333c != 0xfd3b5e62 at offset 0x0029b1a5 (page offset 1a5). Compare MUL : Compare DIV : ok FAILURE: 0x7feb != 0x7febf0e8 at offset 0x0029b1a5 (page offset 1a5). Compare OR : FAILURE: 0x7b69 != 0x7b69b068 at offset 0x0029b1a5 (page offset 1a5). Compare AND : FAILURE: 0xfdcc != 0xfdccec72 at offset 0x0029b1a5 (page offset 1a5). Sequential Increment: Solid Bits : testing 1FAILURE: 0x != 0x at offset 0x0029b1a5 (page offset 1a5). Block Sequential: testing 1FAILURE: 0x0101 != 0x01010101 at offset 0x0029b1a5 (page offset 1a5). Checkerboard: testing 0FAILURE: 0x != 0x at offset 0x0029b1a5 (page offset 1a5). Bit Spread : testing 0FAILURE: 0x != 0xfffa at offset 0x0029b1a5 (page offset 1a5). Bit Flip: testing 0FAILURE: 0x != 0x0001 at offset 0x0029b1a5 (page offset 1a5). Walking Ones: testing 0FAILURE: 0x != 0xfffe at offset 0x0029b1a5 (page offset 1a5). Walking Zeroes : testing 0Killed Nokia770-26:~# Connection to n770 closed by remote host. Connection to n770 closed. This was done via ssh over wi-fi when the battery icon was already red. Few tenths of seconds later device powered down due to empty battery. Also did it ~30 minutes before when the battery meter was still grey over bluetooth PAN and the test went fine. Looks like combination with wi-fi. Frantisek ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] RE: defective memory?
On Wednesday 20 September 2006 01:12, Olivier ROLAND wrote: > If your device is broken then mine is also. > I don't think at all that we speak about (small) fraction because > majority of users won't even notice the problem. > My device seem stable until I stressed it. And stressed it is not a > "condition suffisante" to make the problem happen. That's exactly the point. The device is quite usable and most users will not detect any difference on most common operations. It is a very good sign as looks like in order to get rock solid stability, we only need to allocate and lock the problematic memory page early at boot time and do not let any applications use it. > When I have time, I will make extensive test on my device to check > exactly when the problem occur. Please do it, now with the lastest version of the tester and 40MB tested block, the coverage is almost 2/3 of physical memory. If that's a certain location in memory, the chances that it can be easily detected are quite high. Please verify that the offset of faulty address within 1KB page is reported to be always the same between different runs (it is equal to 1a5 for me). I'm trying to find a way to get a full physical address of that page. In my last tests I managed to mmap '/dev/mem' (just using 'read' function segfaults), but did not have enough time to experiment with it much yet. > My doubt about "small fraction" are probably driven by the fact that I > was "hit" by 'white screen of death' 4 weeks after buying the device. > So I guess that during the reparation my 770 was checked (again) by the > conventional Nokia diagnostic. > I conclude that the conventional Nokia diagnostic doesn't detect the > problem. > > To make things clear, I don't want to make negative publicity at all. I > enjoy this device a lot and I've ported Streamtuner on it with lot of > great feedback from users. > > My 2 cents. I don't want to make negative publicity either. My only goal now is to find some reliable technical solution for both diagnostics and workaround of such problems. After all, I have a good motivation for that :) I'm grateful to Nokia as they are also trying to investigate the problem. I'm quite confident that we can come up with some solution, and it will have some positive effect for Nokia 770 community as a result. This is a new device, software and tools for it are still being developed. We are all learning and getting more experience. > PS: I don't know what is "the conventional Nokia diagnostic" but as far > as I know there is always a "conventional XXX diagnostic" in reparation > centers. By the way, when looking for additional information I found some Sharp Zaurus community forum and asked what they use for hardware diagnostics in the hope that I could use the same tools. Somebody replied me that hardware diagnostics tools are built in Zaurus firmware and are accessible from boot menu. ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
Re: [maemo-developers] RE: defective memory?
Siarhei Siamashka a écrit : On 9/19/06, Kimmo Hämäläinen <[EMAIL PROTECTED]> wrote: Yes, it would need to be reproducible in several different devices. The guy here that tried to reproduce it currently thinks that Siarhei's unit is broken. Yes, I also think that the probability of my device being broken is quite high. A certain (small) fraction of other Nokia 770 owners are probably having the same problem. Does it make the device completely useless? Of course no, my device works almost fine, it only crashes and reboots sometimes, I also has filesystem corruption several times (now even switched mmc filesystem to ext3, don't know if it would help much though). So the device can be surely used as a book reader, internet browser and serve other tasks. Other (small) fraction of users who got 'white screen of death' were surely less lucky. What can be done about this if the defective memory problem gets confirmed. I see three possible ways: 1. 'Ignorance is a bliss' - just do nothing, those who don't know about the problem will not worry about it :) The device will just crash or reboot occasionally, some more unlucky users having more annoying crashes will complain in the forums providing some bad PR. 2. Distribute some diagnostics software that will help to identify memory problems and repair/replace defective units, that will have some expences, but will improve overall reliability and reduce the number of negative publicity. 3. Add some (un)official support for working around bad memory regions using technology something similar to BadRAM, in this case most of such units will be completely usable. In general, bad memory problem is quite common for x86 pc's, but there is an excellent tool for memory diagnostics - memtest86. It helped me quite a number of times, also I always advice everyone having stability issues to run it first. I don't know how the reliability of memory chips used in embedded devices compares to the reliability of memory from normal desktop computers, but bad memory seems to be one of the most frequently encountered hardware problems. If your device is broken then mine is also. I don't think at all that we speak about (small) fraction because majority of users won't even notice the problem. My device seem stable until I stressed it. And stressed it is not a "condition suffisante" to make the problem happen. When I have time, I will make extensive test on my device to check exactly when the problem occur. My doubt about "small fraction" are probably driven by the fact that I was "hit" by 'white screen of death' 4 weeks after buying the device. So I guess that during the reparation my 770 was checked (again) by the conventional Nokia diagnostic. I conclude that the conventional Nokia diagnostic doesn't detect the problem. To make things clear, I don't want to make negative publicity at all. I enjoy this device a lot and I've ported Streamtuner on it with lot of great feedback from users. My 2 cents. PS: I don't know what is "the conventional Nokia diagnostic" but as far as I know there is always a "conventional XXX diagnostic" in reparation centers. Olivier ROLAND ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers
[maemo-developers] RE: defective memory?
On 9/19/06, Kimmo Hämäläinen <[EMAIL PROTECTED]> wrote: Yes, it would need to be reproducible in several different devices. The guy here that tried to reproduce it currently thinks that Siarhei's unit is broken. Yes, I also think that the probability of my device being broken is quite high. A certain (small) fraction of other Nokia 770 owners are probably having the same problem. Does it make the device completely useless? Of course no, my device works almost fine, it only crashes and reboots sometimes, I also has filesystem corruption several times (now even switched mmc filesystem to ext3, don't know if it would help much though). So the device can be surely used as a book reader, internet browser and serve other tasks. Other (small) fraction of users who got 'white screen of death' were surely less lucky. What can be done about this if the defective memory problem gets confirmed. I see three possible ways: 1. 'Ignorance is a bliss' - just do nothing, those who don't know about the problem will not worry about it :) The device will just crash or reboot occasionally, some more unlucky users having more annoying crashes will complain in the forums providing some bad PR. 2. Distribute some diagnostics software that will help to identify memory problems and repair/replace defective units, that will have some expences, but will improve overall reliability and reduce the number of negative publicity. 3. Add some (un)official support for working around bad memory regions using technology something similar to BadRAM, in this case most of such units will be completely usable. In general, bad memory problem is quite common for x86 pc's, but there is an excellent tool for memory diagnostics - memtest86. It helped me quite a number of times, also I always advice everyone having stability issues to run it first. I don't know how the reliability of memory chips used in embedded devices compares to the reliability of memory from normal desktop computers, but bad memory seems to be one of the most frequently encountered hardware problems. ___ maemo-developers mailing list maemo-developers@maemo.org https://maemo.org/mailman/listinfo/maemo-developers