Bug#440720: [SPARC]: non-SMP kernel fail on SunFIres with = 2 CPUs

2007-09-18 Thread Jonathan Rivera
I try the latest build (18-sep-2007) of the d-i and start it with -p,
it crashes with the same error that Bernd Zeimetz said.


ERROR(0): Cheetah error trap taken afsr[1000]
afar[040001c0] TL1(0)
ERROR(0): TPC[4377f4] TNPC[4377f8] O7[4379d0] TSTATE[80001606]
ERROR(0): TPCinterpret_one_decode_reg+0x0/0xfc
ERROR(0): M_SYND(0),  E_SYND(0)
ERROR(0): Highest priority error (1000) Unmapped error
from system bus
ERROR(0): D-cache idx[0] tag[] utag[]
stag[]
ERROR(0): D-cache data0[] data1[]
data2[] data3[]
ERROR(0): I-cache idx[0] tag[] utag[]
stag[] u[] l[]
ERROR(0): I-cache INSN0[] INSN1[]
INSN2[] INSN3[]
ERROR(0): I-cache INSN4[] INSN5[]
INSN6[] INSN7[]
ERROR(0): E-cache idx[0] tag[]
ERROR(0): E-cache data0[] data1[]
data2[] data3[]
Kernel panic - not syncing: Irrecoverable deferred error trap.



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#440720: [SPARC]: non-SMP kernel fail on SunFIres with = 2 CPUs

2007-09-17 Thread Bernd Zeimetz
{I'm adding the former CCs to the mail again, this info should be
included in the bug report and on -boot}

Hi,

 Also I'm pretty sure that non-smp kernels just don't work on the
 machine. As far as I understand the way those machines work is that at
 least two CPUs have to be in an operating state as they share one CPU
 Data switch (if you have a look into such a machine you see that always
 2 CPUs + their memory are sitting in a CPU bay, and as far as I know you
 can't run the system with an odd number of CPUs.
 
 I have 3 identical Sun Blade with 16 used CPU-Slots (each) where each can
 has two CPU's and shared memory between them and this machine machine
 does not boot with a NON-SMP Kernel, even if I remove 15 CPU-Cards.


Out of curiosity: Are those machines using UltraSparc III CPUs?

This sounds like another reason to have a SMP sparc installer.
What I'm still wondering about is if Sparc SMP Kernels are supposed to
run on all single CPU machines. I know one single-CPU machine where
running a SMP kernel results in an OOPS.


 All processors share the same pysical memory address space and use -
 depending on the number of CPUs - different cache coherence protocols,
 so as far as I understand it such a system can't work at all without
 having all CPUs properly initialized. But probably somebody with a
 better knowledge about this architecture can give us some insight on
 that, therefore I'm forwarding the message to debian-sparc, too.
 
 Since I do not know, how many CPU-Slots you have, but do you have tried
 to run the machine with ONLY ONE CPU-Card?

One CPU card contains 2 CPUs, and as far as I know you're not supposed
to run the machine with an odd number of CPUs. But I didn't find any
real proper information about that - I didn't spend much time on reading
books, though. The machine has 4 slots, 2 CPUs per slot :)

But removing CPUs to install a machine is nothing one can suggest people
to install a machine, especially when you can't be sure that it'll work
at all.


Best regards,

Bernd

-- 
Bernd Zeimetz
[EMAIL PROTECTED] http://bzed.de/



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#440720: [SPARC]: non-SMP kernel fail on SunFIres with = 2 CPUs

2007-09-13 Thread Otavio Salvador
reassign 440720 linux-2.6
found 440720 2.6.21-6
thanks

 On the SunFirev880 all SMP kernels = 2.6.21 booted, with the only
 problem that the qla2xxx module of 2.6.21 had hickups with the FC
 controller in the machine, but I guess the kernel would have worked
 otherwise. The installer from lenny which ships with a non-SMP kernel
 crashed baly as you can see in the bug report above.

Hi kernel team,

Looks like we need to discuss the available options for this
problem. Other issue is to know if current snapshots of 2.6.23 does
work or not on this hardware.

Bernd, can you test lastest 2.6.23 snapshot and see if it works? Check
at http://wiki.debian.org/DebianKernel for more information where to
find them.

I've reassigned it back to linux-2.6 since adding another kernel at
installer would be a workaround and not a proper fix for it. Does
anyone has comment about this all?

-- 
O T A V I OS A L V A D O R
-
 E-mail: [EMAIL PROTECTED]  UIN: 5906116
 GNU/Linux User: 239058 GPG ID: 49A5F855
 Home Page: http://otavio.ossystems.com.br
-
Microsoft sells you Windows ... Linux gives
 you the whole house.



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#440720: [SPARC]: non-SMP kernel fail on SunFIres with = 2 CPUs

2007-09-13 Thread Bernd Zeimetz
Hi,

 Looks like we need to discuss the available options for this
 problem. Other issue is to know if current snapshots of 2.6.23 does
 work or not on this hardware.
 
 Bernd, can you test lastest 2.6.23 snapshot and see if it works? Check
 at http://wiki.debian.org/DebianKernel for more information where to
 find them.

I've installed the machine using a 2.6.23-rc5 _smp_ kernel, all older
non-smp kernels failed to boot, and as Ihad to build my own installer
and kernel anyway I've decided to use an smp version.

As it takes a _long_ time to reboot a machine with a crashed CPU
(which was the result of all non-SMP kernel tests so far), and the
machine should be in production yet, I would like to avoid to try a
non-smp kernel on this machine.
I've looked trough the sparc64 commits (about 80) between 2.6.22 and
2.6.23-rc6 and there was non which looked like it would address such an
issue. If you can point me to such a change I'll give a non-smp kernel a
try.
Also I'm pretty sure that non-smp kernels just don't work on the
machine. As far as I understand the way those machines work is that at
least two CPUs have to be in an operating state as they share one CPU
Data switch (if you have a look into such a machine you see that always
2 CPUs + their memory are sitting in a CPU bay, and as far as I know you
can't run the system with an odd number of CPUs.
All processors share the same pysical memory address space and use -
depending on the number of CPUs - different cache coherence protocols,
so as far as I understand it such a system can't work at all without
having all CPUs properly initialized. But probably somebody with a
better knowledge about this architecture can give us some insight on
that, therefore I'm forwarding the message to debian-sparc, too.

Probably interesting to read for you:
http://www.sun.com/processors/manuals/USIIIv2.pdf
http://docs-pdf.sun.com/806-6592-11/806-6592-11.pdf



Cheers,

Bernd

-- 
Bernd Zeimetz
[EMAIL PROTECTED] http://bzed.de/



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#440720: [SPARC]: non-SMP kernel fail on SunFIres with = 2 CPUs

2007-09-13 Thread Otavio Salvador
Bernd Zeimetz [EMAIL PROTECTED] writes:

 Hi,

 Looks like we need to discuss the available options for this
 problem. Other issue is to know if current snapshots of 2.6.23 does
 work or not on this hardware.
 
 Bernd, can you test lastest 2.6.23 snapshot and see if it works? Check
 at http://wiki.debian.org/DebianKernel for more information where to
 find them.

 I've installed the machine using a 2.6.23-rc5 _smp_ kernel, all older
 non-smp kernels failed to boot, and as Ihad to build my own installer
 and kernel anyway I've decided to use an smp version.

Have you tested it witn a non-smp kernel?

 As it takes a _long_ time to reboot a machine with a crashed CPU
 (which was the result of all non-SMP kernel tests so far), and the
 machine should be in production yet, I would like to avoid to try a
 non-smp kernel on this machine.

Right. It makes difficult to us to know if has or not been fixed since
your last try.

 I've looked trough the sparc64 commits (about 80) between 2.6.22 and
 2.6.23-rc6 and there was non which looked like it would address such an
 issue. If you can point me to such a change I'll give a non-smp kernel a
 try.

Right :(

 Also I'm pretty sure that non-smp kernels just don't work on the
 machine. As far as I understand the way those machines work is that at
 least two CPUs have to be in an operating state as they share one CPU
 Data switch (if you have a look into such a machine you see that always
 2 CPUs + their memory are sitting in a CPU bay, and as far as I know you
 can't run the system with an odd number of CPUs.
 All processors share the same pysical memory address space and use -
 depending on the number of CPUs - different cache coherence protocols,
 so as far as I understand it such a system can't work at all without
 having all CPUs properly initialized. But probably somebody with a
 better knowledge about this architecture can give us some insight on
 that, therefore I'm forwarding the message to debian-sparc, too.

 Probably interesting to read for you:
 http://www.sun.com/processors/manuals/USIIIv2.pdf
 http://docs-pdf.sun.com/806-6592-11/806-6592-11.pdf

If that's true then we might have two options:

 - add a subarch for sparc with the needed kernel changes for it;
 - change default kernel on installer to be smp;

What other think about that? (added debian-boot on cc due this)

-- 
O T A V I OS A L V A D O R
-
 E-mail: [EMAIL PROTECTED]  UIN: 5906116
 GNU/Linux User: 239058 GPG ID: 49A5F855
 Home Page: http://otavio.ossystems.com.br
-
Microsoft sells you Windows ... Linux gives
 you the whole house.



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#440720: [SPARC]: non-SMP kernel fail on SunFIres with = 2 CPUs

2007-09-13 Thread Bernd Zeimetz
Hi,


 Bernd, can you test lastest 2.6.23 snapshot and see if it works? Check
 at http://wiki.debian.org/DebianKernel for more information where to
 find them.
 I've installed the machine using a 2.6.23-rc5 _smp_ kernel, all older
 non-smp kernels failed to boot, and as Ihad to build my own installer
 and kernel anyway I've decided to use an smp version.
 
 Have you tested it witn a non-smp kernel?

Well, I've tried to boot a non-SMP kernel (2.6.23-rc5) a few minutes ago
(... and I had to power-cycle it, it is running selftests now), and
the machine froze after

Remapping the kernel... done.
OF stdout device is: /[EMAIL PROTECTED],70/[EMAIL PROTECTED]/[EMAIL 
PROTECTED],40:a
Booting Linux...


Which is just the same state as described in

http://lists.debian.org/debian-sparc/2007/09/msg00045.html
which talks about a SunFire v240 which uses 2 US III CPus. Please note
that the last non-SMP kernels seem (as in: according to goole) to boot
well on the SunFire v210, which is a 1 HE machine with ONE cpu only.

 
 As it takes a _long_ time to reboot a machine with a crashed CPU
 (which was the result of all non-SMP kernel tests so far), and the
 machine should be in production yet, I would like to avoid to try a
 non-smp kernel on this machine.
 
 Right. It makes difficult to us to know if has or not been fixed since
 your last try.

As I said before, I doubt that this fixable - you have to use a SMP kernel.

  - add a subarch for sparc with the needed kernel changes for it;

As this seems to affect all machines running more than one US III CPU
this would probably a good thing. Not only that those machines start to
be avaiable at ebay for not-s-much-money and are still sold, and the
Ultrasparc IV should have the same features (issues in this case), this
would make more then sense to allow to run Debian on serious sparc Hardware.

  - change default kernel on installer to be smp;

I've given that idea a try and tried to boot a SMP kernel on a single
cpu (US IIi) machine and the machine froze in the same state as the v880
above. So I guess it's not possible to run SMP kernels on all sparc
machines.


Cheers,

Bernd

-- 
Bernd Zeimetz
[EMAIL PROTECTED] http://bzed.de/



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#440720: [SPARC]: non-SMP kernel fail on SunFIres with = 2 CPUs

2007-09-13 Thread Richard Mortimer
On Thu, 2007-09-13 at 22:30 +0200, Bernd Zeimetz wrote:
 Well, I've tried to boot a non-SMP kernel (2.6.23-rc5) a few minutes ago
 (... and I had to power-cycle it, it is running selftests now), and
 the machine froze after
 
 Remapping the kernel... done.
 OF stdout device is: /[EMAIL PROTECTED],70/[EMAIL PROTECTED]/[EMAIL 
 PROTECTED],40:a
 Booting Linux...

Try adding   -p   to the silo commandline. That will direct all early
log output direct to the console and you should see the kernel panic
that is undoubtedly happening before the kernel is fully initialised.

Richard

-- 
Richard Mortimer [EMAIL PROTECTED]




-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#440720: [SPARC]: non-SMP kernel fail on SunFIres with = 2 CPUs

2007-09-13 Thread Bernd Zeimetz
Richard Mortimer wrote:
 On Thu, 2007-09-13 at 22:30 +0200, Bernd Zeimetz wrote:
 Well, I've tried to boot a non-SMP kernel (2.6.23-rc5) a few minutes ago
 (... and I had to power-cycle it, it is running selftests now), and
 the machine froze after

 Remapping the kernel... done.
 OF stdout device is: /[EMAIL PROTECTED],70/[EMAIL PROTECTED]/[EMAIL 
 PROTECTED],40:a
 Booting Linux...
 
 Try adding   -p   to the silo commandline. That will direct all early
 log output direct to the console and you should see the kernel panic
 that is undoubtedly happening before the kernel is fully initialised.

Here is the interesting part:

checking if image is initramfs... it is
Freeing initrd memory: 6214k freed
Mini RTC Driver
/[EMAIL PROTECTED],40: US3 memory controller at 0440 [ACTIVE]
/[EMAIL PROTECTED],40: US3 memory controller at 04c0 [ACTIVE]
/[EMAIL PROTECTED],40: US3 memory controller at 04000140 [ACTIVE]
ERROR(0): Cheetah error trap taken afsr[1000] 
afar[040001c0] TL1(0)
ERROR(0): TPC[4377f4] TNPC[4377f8] O7[4379d0] TSTATE[80001606]
ERROR(0): TPCinterpret_one_decode_reg+0x0/0xfc
ERROR(0): M_SYND(0),  E_SYND(0)
ERROR(0): Highest priority error (1000) Unmapped error from system 
bus
ERROR(0): D-cache idx[0] tag[] utag[] 
stag[]
ERROR(0): D-cache data0[] data1[] 
data2[] data3[]
ERROR(0): I-cache idx[0] tag[] utag[] 
stag[] u[] l[]
ERROR(0): I-cache INSN0[] INSN1[] 
INSN2[] INSN3[]
ERROR(0): I-cache INSN4[] INSN5[] 
INSN6[] INSN7[]
ERROR(0): E-cache idx[0] tag[]
ERROR(0): E-cache data0[] data1[] 
data2[] data3[]
Kernel panic - not syncing: Irrecoverable deferred error trap.


If you want the full output, please let me know. THere was nothing unusual so 
far, though.


Bernd

-- 
Bernd Zeimetz
[EMAIL PROTECTED] http://bzed.de/



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#440720: [SPARC]: non-SMP kernel fail on SunFIres with = 2 CPUs

2007-09-12 Thread Bernd Zeimetz
reassign 440720 debian-installer
retitle 440720 [SPARC]: non-SMP kernel fail on SunFIres with = 2 CPUs
thanks

After a lot of testing the I didn't manage to find a single non-SMP
kernel which would boot on a SunFire v880.
As google finds several reports of failed installs on SunFire machines
with =2 Ultrasparc III and it seems to work well on machines with one
CPU only
(http://www.nabble.com/SunFire-v240---Debian---Getting-Closer-t1103286.html
for example, read the lower part - google finds more reports like that,
including reports that gentoo installed fine while Debian failed to install)
I'd like to suggest to use the SMP kernel on sparc as default for the
installer, or if that's a problem for the single CPU machines, provide
both versions of the installer.

On the SunFirev880 all SMP kernels = 2.6.21 booted, with the only
problem that the qla2xxx module of 2.6.21 had hickups with the FC
controller in the machine, but I guess the kernel would have worked
otherwise. The installer from lenny which ships with a non-SMP kernel
crashed baly as you can see in the bug report above.

Cheers,

Bernd

-- 
Bernd Zeimetz
[EMAIL PROTECTED] http://bzed.de/



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]