Bug#440720: [SPARC]: non-SMP kernel fail on SunFIres with = 2 CPUs
I try the latest build (18-sep-2007) of the d-i and start it with -p, it crashes with the same error that Bernd Zeimetz said. ERROR(0): Cheetah error trap taken afsr[1000] afar[040001c0] TL1(0) ERROR(0): TPC[4377f4] TNPC[4377f8] O7[4379d0] TSTATE[80001606] ERROR(0): TPCinterpret_one_decode_reg+0x0/0xfc ERROR(0): M_SYND(0), E_SYND(0) ERROR(0): Highest priority error (1000) Unmapped error from system bus ERROR(0): D-cache idx[0] tag[] utag[] stag[] ERROR(0): D-cache data0[] data1[] data2[] data3[] ERROR(0): I-cache idx[0] tag[] utag[] stag[] u[] l[] ERROR(0): I-cache INSN0[] INSN1[] INSN2[] INSN3[] ERROR(0): I-cache INSN4[] INSN5[] INSN6[] INSN7[] ERROR(0): E-cache idx[0] tag[] ERROR(0): E-cache data0[] data1[] data2[] data3[] Kernel panic - not syncing: Irrecoverable deferred error trap. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#440720: [SPARC]: non-SMP kernel fail on SunFIres with = 2 CPUs
{I'm adding the former CCs to the mail again, this info should be included in the bug report and on -boot} Hi, Also I'm pretty sure that non-smp kernels just don't work on the machine. As far as I understand the way those machines work is that at least two CPUs have to be in an operating state as they share one CPU Data switch (if you have a look into such a machine you see that always 2 CPUs + their memory are sitting in a CPU bay, and as far as I know you can't run the system with an odd number of CPUs. I have 3 identical Sun Blade with 16 used CPU-Slots (each) where each can has two CPU's and shared memory between them and this machine machine does not boot with a NON-SMP Kernel, even if I remove 15 CPU-Cards. Out of curiosity: Are those machines using UltraSparc III CPUs? This sounds like another reason to have a SMP sparc installer. What I'm still wondering about is if Sparc SMP Kernels are supposed to run on all single CPU machines. I know one single-CPU machine where running a SMP kernel results in an OOPS. All processors share the same pysical memory address space and use - depending on the number of CPUs - different cache coherence protocols, so as far as I understand it such a system can't work at all without having all CPUs properly initialized. But probably somebody with a better knowledge about this architecture can give us some insight on that, therefore I'm forwarding the message to debian-sparc, too. Since I do not know, how many CPU-Slots you have, but do you have tried to run the machine with ONLY ONE CPU-Card? One CPU card contains 2 CPUs, and as far as I know you're not supposed to run the machine with an odd number of CPUs. But I didn't find any real proper information about that - I didn't spend much time on reading books, though. The machine has 4 slots, 2 CPUs per slot :) But removing CPUs to install a machine is nothing one can suggest people to install a machine, especially when you can't be sure that it'll work at all. Best regards, Bernd -- Bernd Zeimetz [EMAIL PROTECTED] http://bzed.de/ -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#440720: [SPARC]: non-SMP kernel fail on SunFIres with = 2 CPUs
reassign 440720 linux-2.6 found 440720 2.6.21-6 thanks On the SunFirev880 all SMP kernels = 2.6.21 booted, with the only problem that the qla2xxx module of 2.6.21 had hickups with the FC controller in the machine, but I guess the kernel would have worked otherwise. The installer from lenny which ships with a non-SMP kernel crashed baly as you can see in the bug report above. Hi kernel team, Looks like we need to discuss the available options for this problem. Other issue is to know if current snapshots of 2.6.23 does work or not on this hardware. Bernd, can you test lastest 2.6.23 snapshot and see if it works? Check at http://wiki.debian.org/DebianKernel for more information where to find them. I've reassigned it back to linux-2.6 since adding another kernel at installer would be a workaround and not a proper fix for it. Does anyone has comment about this all? -- O T A V I OS A L V A D O R - E-mail: [EMAIL PROTECTED] UIN: 5906116 GNU/Linux User: 239058 GPG ID: 49A5F855 Home Page: http://otavio.ossystems.com.br - Microsoft sells you Windows ... Linux gives you the whole house. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#440720: [SPARC]: non-SMP kernel fail on SunFIres with = 2 CPUs
Hi, Looks like we need to discuss the available options for this problem. Other issue is to know if current snapshots of 2.6.23 does work or not on this hardware. Bernd, can you test lastest 2.6.23 snapshot and see if it works? Check at http://wiki.debian.org/DebianKernel for more information where to find them. I've installed the machine using a 2.6.23-rc5 _smp_ kernel, all older non-smp kernels failed to boot, and as Ihad to build my own installer and kernel anyway I've decided to use an smp version. As it takes a _long_ time to reboot a machine with a crashed CPU (which was the result of all non-SMP kernel tests so far), and the machine should be in production yet, I would like to avoid to try a non-smp kernel on this machine. I've looked trough the sparc64 commits (about 80) between 2.6.22 and 2.6.23-rc6 and there was non which looked like it would address such an issue. If you can point me to such a change I'll give a non-smp kernel a try. Also I'm pretty sure that non-smp kernels just don't work on the machine. As far as I understand the way those machines work is that at least two CPUs have to be in an operating state as they share one CPU Data switch (if you have a look into such a machine you see that always 2 CPUs + their memory are sitting in a CPU bay, and as far as I know you can't run the system with an odd number of CPUs. All processors share the same pysical memory address space and use - depending on the number of CPUs - different cache coherence protocols, so as far as I understand it such a system can't work at all without having all CPUs properly initialized. But probably somebody with a better knowledge about this architecture can give us some insight on that, therefore I'm forwarding the message to debian-sparc, too. Probably interesting to read for you: http://www.sun.com/processors/manuals/USIIIv2.pdf http://docs-pdf.sun.com/806-6592-11/806-6592-11.pdf Cheers, Bernd -- Bernd Zeimetz [EMAIL PROTECTED] http://bzed.de/ -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#440720: [SPARC]: non-SMP kernel fail on SunFIres with = 2 CPUs
Bernd Zeimetz [EMAIL PROTECTED] writes: Hi, Looks like we need to discuss the available options for this problem. Other issue is to know if current snapshots of 2.6.23 does work or not on this hardware. Bernd, can you test lastest 2.6.23 snapshot and see if it works? Check at http://wiki.debian.org/DebianKernel for more information where to find them. I've installed the machine using a 2.6.23-rc5 _smp_ kernel, all older non-smp kernels failed to boot, and as Ihad to build my own installer and kernel anyway I've decided to use an smp version. Have you tested it witn a non-smp kernel? As it takes a _long_ time to reboot a machine with a crashed CPU (which was the result of all non-SMP kernel tests so far), and the machine should be in production yet, I would like to avoid to try a non-smp kernel on this machine. Right. It makes difficult to us to know if has or not been fixed since your last try. I've looked trough the sparc64 commits (about 80) between 2.6.22 and 2.6.23-rc6 and there was non which looked like it would address such an issue. If you can point me to such a change I'll give a non-smp kernel a try. Right :( Also I'm pretty sure that non-smp kernels just don't work on the machine. As far as I understand the way those machines work is that at least two CPUs have to be in an operating state as they share one CPU Data switch (if you have a look into such a machine you see that always 2 CPUs + their memory are sitting in a CPU bay, and as far as I know you can't run the system with an odd number of CPUs. All processors share the same pysical memory address space and use - depending on the number of CPUs - different cache coherence protocols, so as far as I understand it such a system can't work at all without having all CPUs properly initialized. But probably somebody with a better knowledge about this architecture can give us some insight on that, therefore I'm forwarding the message to debian-sparc, too. Probably interesting to read for you: http://www.sun.com/processors/manuals/USIIIv2.pdf http://docs-pdf.sun.com/806-6592-11/806-6592-11.pdf If that's true then we might have two options: - add a subarch for sparc with the needed kernel changes for it; - change default kernel on installer to be smp; What other think about that? (added debian-boot on cc due this) -- O T A V I OS A L V A D O R - E-mail: [EMAIL PROTECTED] UIN: 5906116 GNU/Linux User: 239058 GPG ID: 49A5F855 Home Page: http://otavio.ossystems.com.br - Microsoft sells you Windows ... Linux gives you the whole house. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#440720: [SPARC]: non-SMP kernel fail on SunFIres with = 2 CPUs
Hi, Bernd, can you test lastest 2.6.23 snapshot and see if it works? Check at http://wiki.debian.org/DebianKernel for more information where to find them. I've installed the machine using a 2.6.23-rc5 _smp_ kernel, all older non-smp kernels failed to boot, and as Ihad to build my own installer and kernel anyway I've decided to use an smp version. Have you tested it witn a non-smp kernel? Well, I've tried to boot a non-SMP kernel (2.6.23-rc5) a few minutes ago (... and I had to power-cycle it, it is running selftests now), and the machine froze after Remapping the kernel... done. OF stdout device is: /[EMAIL PROTECTED],70/[EMAIL PROTECTED]/[EMAIL PROTECTED],40:a Booting Linux... Which is just the same state as described in http://lists.debian.org/debian-sparc/2007/09/msg00045.html which talks about a SunFire v240 which uses 2 US III CPus. Please note that the last non-SMP kernels seem (as in: according to goole) to boot well on the SunFire v210, which is a 1 HE machine with ONE cpu only. As it takes a _long_ time to reboot a machine with a crashed CPU (which was the result of all non-SMP kernel tests so far), and the machine should be in production yet, I would like to avoid to try a non-smp kernel on this machine. Right. It makes difficult to us to know if has or not been fixed since your last try. As I said before, I doubt that this fixable - you have to use a SMP kernel. - add a subarch for sparc with the needed kernel changes for it; As this seems to affect all machines running more than one US III CPU this would probably a good thing. Not only that those machines start to be avaiable at ebay for not-s-much-money and are still sold, and the Ultrasparc IV should have the same features (issues in this case), this would make more then sense to allow to run Debian on serious sparc Hardware. - change default kernel on installer to be smp; I've given that idea a try and tried to boot a SMP kernel on a single cpu (US IIi) machine and the machine froze in the same state as the v880 above. So I guess it's not possible to run SMP kernels on all sparc machines. Cheers, Bernd -- Bernd Zeimetz [EMAIL PROTECTED] http://bzed.de/ -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#440720: [SPARC]: non-SMP kernel fail on SunFIres with = 2 CPUs
On Thu, 2007-09-13 at 22:30 +0200, Bernd Zeimetz wrote: Well, I've tried to boot a non-SMP kernel (2.6.23-rc5) a few minutes ago (... and I had to power-cycle it, it is running selftests now), and the machine froze after Remapping the kernel... done. OF stdout device is: /[EMAIL PROTECTED],70/[EMAIL PROTECTED]/[EMAIL PROTECTED],40:a Booting Linux... Try adding -p to the silo commandline. That will direct all early log output direct to the console and you should see the kernel panic that is undoubtedly happening before the kernel is fully initialised. Richard -- Richard Mortimer [EMAIL PROTECTED] -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#440720: [SPARC]: non-SMP kernel fail on SunFIres with = 2 CPUs
Richard Mortimer wrote: On Thu, 2007-09-13 at 22:30 +0200, Bernd Zeimetz wrote: Well, I've tried to boot a non-SMP kernel (2.6.23-rc5) a few minutes ago (... and I had to power-cycle it, it is running selftests now), and the machine froze after Remapping the kernel... done. OF stdout device is: /[EMAIL PROTECTED],70/[EMAIL PROTECTED]/[EMAIL PROTECTED],40:a Booting Linux... Try adding -p to the silo commandline. That will direct all early log output direct to the console and you should see the kernel panic that is undoubtedly happening before the kernel is fully initialised. Here is the interesting part: checking if image is initramfs... it is Freeing initrd memory: 6214k freed Mini RTC Driver /[EMAIL PROTECTED],40: US3 memory controller at 0440 [ACTIVE] /[EMAIL PROTECTED],40: US3 memory controller at 04c0 [ACTIVE] /[EMAIL PROTECTED],40: US3 memory controller at 04000140 [ACTIVE] ERROR(0): Cheetah error trap taken afsr[1000] afar[040001c0] TL1(0) ERROR(0): TPC[4377f4] TNPC[4377f8] O7[4379d0] TSTATE[80001606] ERROR(0): TPCinterpret_one_decode_reg+0x0/0xfc ERROR(0): M_SYND(0), E_SYND(0) ERROR(0): Highest priority error (1000) Unmapped error from system bus ERROR(0): D-cache idx[0] tag[] utag[] stag[] ERROR(0): D-cache data0[] data1[] data2[] data3[] ERROR(0): I-cache idx[0] tag[] utag[] stag[] u[] l[] ERROR(0): I-cache INSN0[] INSN1[] INSN2[] INSN3[] ERROR(0): I-cache INSN4[] INSN5[] INSN6[] INSN7[] ERROR(0): E-cache idx[0] tag[] ERROR(0): E-cache data0[] data1[] data2[] data3[] Kernel panic - not syncing: Irrecoverable deferred error trap. If you want the full output, please let me know. THere was nothing unusual so far, though. Bernd -- Bernd Zeimetz [EMAIL PROTECTED] http://bzed.de/ -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#440720: [SPARC]: non-SMP kernel fail on SunFIres with = 2 CPUs
reassign 440720 debian-installer retitle 440720 [SPARC]: non-SMP kernel fail on SunFIres with = 2 CPUs thanks After a lot of testing the I didn't manage to find a single non-SMP kernel which would boot on a SunFire v880. As google finds several reports of failed installs on SunFire machines with =2 Ultrasparc III and it seems to work well on machines with one CPU only (http://www.nabble.com/SunFire-v240---Debian---Getting-Closer-t1103286.html for example, read the lower part - google finds more reports like that, including reports that gentoo installed fine while Debian failed to install) I'd like to suggest to use the SMP kernel on sparc as default for the installer, or if that's a problem for the single CPU machines, provide both versions of the installer. On the SunFirev880 all SMP kernels = 2.6.21 booted, with the only problem that the qla2xxx module of 2.6.21 had hickups with the FC controller in the machine, but I guess the kernel would have worked otherwise. The installer from lenny which ships with a non-SMP kernel crashed baly as you can see in the bug report above. Cheers, Bernd -- Bernd Zeimetz [EMAIL PROTECTED] http://bzed.de/ -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]