Hi Greg and all, good and bad news for me: the issue below has been solved, many thanks for the support. It was related to a "bad/open soldering" on a small pull-up resistor network, and spurious interrupts was generated.
Now the board is much more stable (was crashing in 1 hour), so i left the shell (serial port) open, but after 1 day of uptime i get this strange lock situation: ~ # ls bin etc lib mnt root usr dev home media proc tmp var ~ # ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒ - no keys are accepted from the console - not possible to connect with telnet - web server was still someway responding, but slower than normal, i could get 2 times the homepage, but then seem be locked completely. - ftp server (inetd) don't respond also, as telnet Debug symbols are already enabled inside the kernel, but there isn't any useful information. Any help is appreciated, regards, angelo On 12/08/2011 10:55, angelo wrote: > Hi Greg, many thanks for your reply, > > did you use and found reliable mcf5307 boards ? I am asking since the > errata for this chip only. > > > the issue described in the errata was happening in u-boot, i was getting > the trap below just after "rte" execution inside the interrupt handler. > Also there the shown PC was different from 0xffffffff., but setting the > "C/I" bit as they say solved the problem. > > > *** Unexpected exception *** > Vector Number: 3 Format: 04 Fault Status: 4 > > PC: 00fe910a SR: 00002000 SP: 00ed8af0 > D0: 00002c1b D1: 0000001b D2: 00400000 D3: 00ee8b76 > D4: ffc12d08 D5: ffffffff D6: 00ffad57 D7: 00ee8b76 > A0: 00ee8b76 A1: 00fe9604 A2: 00ee8bc6 A3: 00ffd400 > A4: 00ff8167 A5: 00ffbb00 A6: 00ed8b48 > > *** Please Reset Board! *** > > Looking better now, the uclinux trap is different, it is a Vector > 4(illegal instruction) and not 3, but it always happen inside the > interupt and this is probably not a case. > > > > About power, i recently changed the power supply circuit inside my > custom board, using a more reliable, switching, 3.3V regulator, 2A max > drain (total board consume is near 400ma). I will check for noise and > see if there are some stability issues. > > About SDRAM, do you know a method to be sure it's working correctly ? Is > there some specific test inside linux ? > I am thinking to run a continued loop test from u-boot ram test section, > to exclude out uclinux. > > This are some informations about the board: > > /proc # cat cpuinfo > CPU: COLDFIRE(m5307) > MMU: none > FPU: none > Clocking: 88.4MHz > BogoMips: 58.98 > Calibration: 29491200 loops > /proc # > > ~ # cat /proc/version > uClinux version 2.6.36.2 (angelo@angel7) (gcc version 4.2.4) #134 Wed > Aug 10 16:01:21 CEST 2011 > > ~ # cat /proc/meminfo > MemTotal: 13864 kB > MemFree: 7164 kB > Buffers: 16 kB > Cached: 124 kB > SwapCached: 0 kB > Active: 68 kB > Inactive: 72 kB > Active(anon): 0 kB > Inactive(anon): 0 kB > Active(file): 68 kB > Inactive(file): 72 kB > Unevictable: 0 kB > Mlocked: 0 kB > MmapCopy: 552 kB > SwapTotal: 0 kB > SwapFree: 0 kB > Dirty: 0 kB > Writeback: 0 kB > AnonPages: 0 kB > Mapped: 0 kB > Shmem: 0 kB > Slab: 1208 kB > SReclaimable: 60 kB > SUnreclaim: 1148 kB > KernelStack: 100 kB > PageTables: 0 kB > NFS_Unstable: 0 kB > Bounce: 0 kB > WritebackTmp: 0 kB > CommitLimit: 6932 kB > Committed_AS: 0 kB > VmallocTotal: 0 kB > VmallocUsed: 0 kB > VmallocChunk: 0 kB > ~ # > > > regards, > angelo > > > On 12/08/2011 05:28, Greg Ungerer wrote: >> Hi Angelo, >> >> On 11/08/11 19:52, angelo wrote: >>> working on a port of u-boot for mcf5307 i have found a major issue >>> due to an "errata" of this chip: >>> >>> from MCF5307ER pdf: >>> >>> 35 Corrupted Return PC in Exception Stack Frame >>> >>> 35.1 Description >>> When processing an autovectored interrupt an error can occur that >>> causes 0xFFFFFFFF to be written as >>> the return PC value in the exception stack frame. The problem is >>> caused by a conflict between an internal >>> autovector access and a chip select mapped to the IACK address space >>> (0xFFFFXXXX). >>> >>> 35.2 Workaround >>> • Set the C/I bit in the chip select mask register (CSMR) for the >>> chip select that is mapped to >>> 0xFFFFXXXX. This will prevent the chip select from asserting for IACK >>> accesses. >>> • Remap the chip select to a different address range. >>> • Use external logic to provide external vectors for all interrupts >>> instead of autovectoring. >>> MASKS: 0H55J, 1H55J, 1J20C, 2J20C01/22/04 >>> >>> >>> >>> from time to time, in my mcf5307 board (uClinux + main line kernel), >>> i get the following trap exception, and since the calltrace pass >>> always from an interrupt, and i am getting the same trap i was >>> getting in u-boot, i am suspecting that the issue is the same. >>> >>> >>> ~ # *** ILLEGAL INSTRUCTION *** FORMAT=4 >>> Current process id is 0 >>> BAD KERNEL TRAP: 00000000 >>> PC: [<0002e500>] >> >> But your trap PC is not 0xFFFFFFFF as per the errata above? >> I would not think you are seeing this problem. >> >> Over the years there is probably 2 main culprits I have seen for >> sporadic, hard to explain, traps on ColdFire boards. Since you say >> you have problems with both uboot and uClinux it may be time to >> check these: >> >> 1. bad power. Boards being run from wall wart type power supplies >> that just don't deliver good clean power >> 2. bad DRAM timing. Can be very subtle, and hard to diagnose. >> >> >>> SR: 2714 SP: 001bdf40 a2: 0016148c >>> d0: 00000000 d1: 00e80000 d2: 0000001e d3: 00000000 >>> d4: 00000000 d5: 00ffd440 a0: 001bd000 a1: 001ab0e8 >>> Process swapper (pid: 0, stackpage=001ac0e8) >>> Stack from 001bdf74: >>> 000207a0 00ffdb94 00000000 00022ec6 0000001e 001bdf8c 00e80000 00ffdb94 >>> 00000000 00000000 00ffd440 001bd008 001ab0e8 0016148c 00000000 ffffffff >>> 00000000 40782000 000208d8 00020a08 00020a0e 001608e4 001ab0e8 001ca5f8 >>> 001ca708 001be944 00000d6d 00000d6d 001cc000 00ffdb08 00ed8abc 00ffbf00 >>> 001ca86c 00ed8708 000200d8 >>> Call Trace with CONFIG_FRAME_POINTER disabled: >>> [000207a0] do_IRQ+0x1e/0x5a >>> [00022ec6] inthandler+0x6a/0x74 >>> [0016148c] schedule+0x0/0x30e >>> [000208d8] default_idle+0x22/0x40 >>> [00020a08] cpu_idle+0x1a/0x20 >>> [00020a0e] kernel_thread+0x0/0x3c >>> [001608e4] rest_init+0x6c/0x72 >>> [001ca5f8] _einittext+0x0/0x0 >>> [001be944] start_kernel+0x274/0x280 >>> [000200d8] _exit+0x0/0x8 >>> >>> Disabling lock debugging due to kernel taint >>> Kernel panic - not syncing: Attempted to kill the idle task! >>> Stack from 001bdddc: >>> 001bdf34 00029ae0 001961e6 001cc297 001cc297 00000400 001963cd 001bde24 >>> 00000001 001ab0e8 0000000b 00000000 001bd000 001bdf40 000d1d10 001bde24 >>> 0002ca90 001963cd 00000004 00000000 00000000 00ffd440 00000000 00ee8b76 >>> 0002a0a2 001bdf40 000d1d10 0002a0a2 001bdf34 00196170 00000004 00022656 >>> 0000000b 00000007 00000000 001bdf74 0019546c 001ab2ac 00000000 001ac0e8 >>> 001bdf40 0002a0a2 000226fe 001954f3 001bdf40 00000000 001954d6 00000000 >>> Call Trace with CONFIG_FRAME_POINTER disabled: >>> [00029ae0] panic+0x60/0x1be >>> [001961e6] __func__.34039+0x201d2/0x34c70 >>> [001963cd] __func__.34039+0x203b9/0x34c70 >>> [000d1d10] strlen+0x0/0x1a >>> [0002ca90] do_exit+0x648/0x6cc >>> [001963cd] __func__.34039+0x203b9/0x3 >>> [0002a0a2] printk+0x0/0x1c >>> [000d1d10] strlen+0x0/0x1a >>> [0002a0a2] printk+0x0/0x1c >>> [00196170] __func__.34039+0x2015c/0x34c70 >>> [00022656] die_if_kernel+0xd4/0xda >>> [0019546c] __func__.34039+0x1f458/0x34c70 >>> [0002a0a2] printk+0x0/0x1c >>> [000226fe] bad_super_trap+0xa2/0xb0 >>> [001954f3] __func__.34039+0x1f4df/0x34c70 >>> [001954d6] __func__.34039+0x1f4c2/0x34c70 >>> [0016148c] schedule+0x0/0x30e >>> [0002278a] trap_c+0x30/0x3da >>> [00026c88] wake_up_process+0x0/0x16 >>> [0002a0a2] printk+0x0/0x1c >>> [0002a0a2] printk+0x0/0x1c >>> [00026c88] wake_up_process+0x0/0x16 >>> [00032d4a] update_process_times+0x40/0x4a >>> [0003277c] run_timer_softirq+0x14/0x234 >>> [0004b00a] rcu_bh_qs+0x0/0x18 >>> [00020598] trap+0x5c/0x64 >>> [0016148c] schedule+0x0/0x30e >>> [0002e500] irq_enter+0x2e/0x3a >>> [000207a0] do_IRQ+0x1e/0x5a >>> [00022ec6] inthandler+0x6a/0x74 >>> [0016148c] schedule+0x0/0x30e >>> [000208d8] default_idle+0x22/0x40 >>> [00020a08] cpu_idle+0x1a/0x20 >>> [00020a0e] kernel_thread+0x0/0x3c >>> [001608e4] rest_init+0x6c/0x72 >>> [001ca5f8] _einittext+0x0/0x0 >>> [001be944] start_kernel+0x274/0x280 >>> [000200d8] _exit+0x0/0x8 >>> >>> Anyone have experienced something like this ? >>> Any help is appreciated, anyway i am going ahead in investigations >>> inside the kernel and let you know. >> >> I don't use 5307 parts too much these days, but a few years back >> used them alot. And in general uClinux is very reliable on them. >> What version kernel are you running? >> >> Regards >> Greg >> >> >> ------------------------------------------------------------------------ >> Greg Ungerer -- Principal Engineer EMAIL: g...@snapgear.com >> SnapGear Group, McAfee PHONE: +61 7 3435 2888 >> 8 Gardner Close FAX: +61 7 3217 5323 >> Milton, QLD, 4064, Australia WEB: http://www.SnapGear.com > > > -- > > .:.:.SYSAM.:.:. > > di Angelo Dureghello > via San Nazario 149 > 34151, Trieste, Italy > ++39 340 7631990 > www.sysam.it <http://www.sysam.it> > > _______________________________________________ uClinux-dev mailing list uClinux-dev@uclinux.org http://mailman.uclinux.org/mailman/listinfo/uclinux-dev This message was resent by uclinux-dev@uclinux.org To unsubscribe see: http://mailman.uclinux.org/mailman/options/uclinux-dev