Re: Linux 2.6.22-rc2

2007-05-24 Thread Mike Houston
On Wed, 23 May 2007 10:46:05 -0700 (PDT)
Linus Torvalds <[EMAIL PROTECTED]> wrote:

> 
> 
> On Wed, 23 May 2007, Mike Houston wrote:
> > 
> > I still happen to have a Windows Vista install kicking around, so
> > to make sure we're not flogging a dead horse I booted that and
> > let it set up the yukon2 chip and I tested it. (more to make sure
> > that eeprom update didn't break it). I used it for a bit and
> > successfully transferred some large files from box running Samba.
> > MS must be using some specific workaround or something.
> 
> I think there is some lspci-like thing for windows too. 
> 
> Can you do the equivalent of "lspci -vvxxx" on that box under both
> Linux and Windows? _If_ it's some PCI config space thing (which is
> not at all guaranteed - it could be about setup in random MMIO
> ranges) it might give us some clues.
>

This is the sky2 issue with Gigabyte 88E8056 onboard LAN.

I've had no luck getting pciutils compiled for win32, but I found a
utility that gives similar output called Craig Hart's PCI bus sniffer
(pci32.exe).

Here is the output of pci32 with hex dump from within Windows Vista:
http://www.mikeserv.org/files/pci32_info.txt

Here is the output of lspci -vvxxx from within Linux:
http://www.mikeserv.org/files/lspci.txt

I hope this is helpful,

Mike Houston
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.22-rc2

2007-05-24 Thread Mike Houston
On Wed, 23 May 2007 10:46:05 -0700 (PDT)
Linus Torvalds [EMAIL PROTECTED] wrote:

 
 
 On Wed, 23 May 2007, Mike Houston wrote:
  
  I still happen to have a Windows Vista install kicking around, so
  to make sure we're not flogging a dead horse I booted that and
  let it set up the yukon2 chip and I tested it. (more to make sure
  that eeprom update didn't break it). I used it for a bit and
  successfully transferred some large files from box running Samba.
  MS must be using some specific workaround or something.
 
 I think there is some lspci-like thing for windows too. 
 
 Can you do the equivalent of lspci -vvxxx on that box under both
 Linux and Windows? _If_ it's some PCI config space thing (which is
 not at all guaranteed - it could be about setup in random MMIO
 ranges) it might give us some clues.


This is the sky2 issue with Gigabyte 88E8056 onboard LAN.

I've had no luck getting pciutils compiled for win32, but I found a
utility that gives similar output called Craig Hart's PCI bus sniffer
(pci32.exe).

Here is the output of pci32 with hex dump from within Windows Vista:
http://www.mikeserv.org/files/pci32_info.txt

Here is the output of lspci -vvxxx from within Linux:
http://www.mikeserv.org/files/lspci.txt

I hope this is helpful,

Mike Houston
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.22-rc2

2007-05-23 Thread Stephen Hemminger
On Wed, 23 May 2007 10:46:05 -0700 (PDT)
Linus Torvalds <[EMAIL PROTECTED]> wrote:

> 
> 
> On Wed, 23 May 2007, Mike Houston wrote:
> > 
> > I still happen to have a Windows Vista install kicking around, so to
> > make sure we're not flogging a dead horse I booted that and let it
> > set up the yukon2 chip and I tested it. (more to make sure that
> > eeprom update didn't break it). I used it for a bit and successfully
> > transferred some large files from box running Samba. MS must be using
> > some specific workaround or something.
> 
> I think there is some lspci-like thing for windows too. 
> 
> Can you do the equivalent of "lspci -vvxxx" on that box under both Linux 
> and Windows? _If_ it's some PCI config space thing (which is not at all 
> guaranteed - it could be about setup in random MMIO ranges) it might give 
> us some clues.
> 
>   Linus

lspci will work in windows, it is probably part of cygwin.

-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.22-rc2

2007-05-23 Thread Linus Torvalds


On Wed, 23 May 2007, Mike Houston wrote:
> 
> I still happen to have a Windows Vista install kicking around, so to
> make sure we're not flogging a dead horse I booted that and let it
> set up the yukon2 chip and I tested it. (more to make sure that
> eeprom update didn't break it). I used it for a bit and successfully
> transferred some large files from box running Samba. MS must be using
> some specific workaround or something.

I think there is some lspci-like thing for windows too. 

Can you do the equivalent of "lspci -vvxxx" on that box under both Linux 
and Windows? _If_ it's some PCI config space thing (which is not at all 
guaranteed - it could be about setup in random MMIO ranges) it might give 
us some clues.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.22-rc2

2007-05-23 Thread Mike Houston
On Tue, 22 May 2007 17:00:18 -0700 (PDT)
Linus Torvalds <[EMAIL PROTECTED]> wrote:

> and the load off "sk->sk_prot->ioctl" oopses, because "sk->sk_prot"
> is corrupt and contains 0x8e3cad42, which is not a valid kernel
> pointer.
> 
> The other oops is even worse. 
> 
> I also think it meshes with
> 
>   sky2 eth0: descriptor error q=0x280 get=285
> [800042375e2e5e] put=285
> 
> and I suspect your memory got corrupted by sky2 reading the wrong 
> descriptors, and overwriting kernel memory.
> 
> So it's almost certainly some DMA problem. Now, _why_ you have DMA 
> problems, I have no idea. But can you try:
>  - disable CONFIG_PREEMPT
>  - disable CONFIG_HIGHMEM if you have it on
>  - just in general see if you can disable any kernel config options
> that might be unnecessary.
> to see if it changes the situation at all..

Thanks for looking at this. After further posts in the discussion I
wasn't sure if you still wanted me to try this, but I thought it
might be useful to see if (particularly) highmem support might change
the behaviour, or the messages in any way that might lead to a clue.
There was no change to the behaviour.

I have a Core 2 duo, and 2 Gb of RAM, but I built a uniprocessor
kernel (with apic), without highmem support, with no PREEMPT and
without other unnecessary stuff. If by chance I got it working, my
plan was to enable things one at a time.

I won't get that oops on this setup though (never have, anyways...
it was just the PCLinux install on that other hard disk which has
now been returned to use elsewhere), but the messages on trying to
transfer data are the same:

First try (instant failure on trying to ssh):

May 23 12:51:14 cramit kernel: sky2 eth0: enabling interface
May 23 12:51:14 cramit kernel: sky2 eth0: ram buffer 0K
May 23 12:51:16 cramit kernel: sky2 eth0: Link is up at 100 Mbps,
full duplex, flow control both May 23 12:51:34 cramit kernel: sky2
:04:00.0: error interrupt status=0x1 May 23 12:51:34 cramit
kernel: sky2 eth0: descriptor error q=0x280 get=7 [0] put=7

Second try after cold boot (failure on trying to transfer file):

May 23 12:52:59 cramit kernel: sky2 eth0: enabling interface
May 23 12:52:59 cramit kernel: sky2 eth0: ram buffer 0K
May 23 12:53:01 cramit kernel: sky2 eth0: Link is up at 100 Mbps,
full duplex, flow control both
May 23 12:55:40 cramit kernel: sky2
:04:00.0: error interrupt status=0x8000
May 23 12:55:40 cramit kernel: sky2 eth0: hw error interrupt status
0x8
May 23 12:55:40 cramit kernel: sky2 eth0: MAC parity error

This is exactly the behaviour I've been seeing.

I still happen to have a Windows Vista install kicking around, so to
make sure we're not flogging a dead horse I booted that and let it
set up the yukon2 chip and I tested it. (more to make sure that
eeprom update didn't break it). I used it for a bit and successfully
transferred some large files from box running Samba. MS must be using
some specific workaround or something.

Mike Houston
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.22-rc2

2007-05-23 Thread Stephen Hemminger
On Tue, 22 May 2007 18:53:33 -0700 (PDT)
Linus Torvalds <[EMAIL PROTECTED]> wrote:

> 
> 
> On Tue, 22 May 2007, Stephen Hemminger wrote:
> > 
> > It looks like the chip reads the wrong memory sometimes. The problem happens
> > only on the on-board NIC's and only on this kind of motherboard.
> 
> Do you know if it happens for particular addresses? (Ie, can you tell what 
> the physical address of the descriptor is for the errors?)

I'll look but there didn't seem to be an obvious pattern when I last looked.


> 
> > For testing, I have put code in to check that the receive data actually
> > arrived before the IRQ, it triggered on my Gigabyte 925 motherboard. It
> > appears that DMA access is messed up.
> 
> Yes, that certainly would also explain memory corruption. Either because 
> writes went to the wrong address, or because writes went to the right 
> address, but because an earlier IO descriptor read had gotten corrupted, 
> the "right address" was in fact the wrong one ;)
> 
> The reason I ask whether you have some way of telling the pattern for the 
> physical address is that one traditional cause of DMA errors is due to 
> broken RAM remapping setup.
> 
> As an example of that - imagine that you have 1GB of RAM in the machine, 
> and realize that the memory behind the 640kB -> 1MB area isn't accessible, 
> because it's taken up by the legacy ISA region.
> 
> You have two possible outcomes: either (a) the memory is just "gone", and 
> you lost it, or (b) there is some RAM remapping in the core chipset that 
> makes the lost 384kB show up _above_ the 1GB mark instead.
> 
> The same "legacy ISA" hole situation happens for the "legacy PCI" hole, 
> which is why if you have 4GB of RAM in the machine, usually you'll see 
> 3GB at addresses 0-3GB (roughly), and then you'll see the rest at above 
> the 4GB mark, in order to have a nice PCI hole in the 32-bit access range.
> 
> There's also the "legacy 286" hole at the 15-16MB mark (which nobody uses 
> any more, but chipsets still inexplicably support), and the SMM remapping. 
> 
> Anyway, core chipsets generally do CPU memory accesses _differently_ from 
> DMA accesses from the PCI bus (at a minimum, SMM is something that only 
> the CPU can do), so I could see a situation where the remapping was set up 
> correctly for the CPU (and perhaps for "core chipset" devices like the 
> integrated southbridge), but devices that do DMA from the outside get 
> screwed over.
>

This board doesn't have any onboard video so that helps. I am running
with 2GB of memory.

I can put a card with similar chip in an X1 slot, and there are no
problems.  Same driver, but different bridges, and slightly different
Marvell chip.
 
> But it might not happen for all addresses. Non-remapped stuff might work 
> well, so if there is some way of figuring out what the bad DMA address was 
> for an erreneous access, that might offer some clues.
> 
> > This board has lots of "overclocker" friendly stuff; maybe the BIOS 
> > never really sets up the PCI bridges and clocks properly.
> 
> It's hard to set up a normal PCI-PCI bridge subtly incorrectly. But 
> special RAM timing or remapping stuff for the host bridge - sure.
> 
> > It doesn't seem like a software or driver problem. I have tried tweaking PCI
> > registers but nothing worked in this case.
> 
> Yeah, the PCI registers that would affect things like this tend to be in 
> the host bridge, not on the normal device.
> 
> That said, Intel doesn't generally do the really insane things. And a lot 
> of the old remapping stuff is simply not done any more. For example, I 
> doubt that the 925 chipset even supports remapping the 640k-1M range any 
> more: 384kB just isn't worth it when people talk about gigs of RAM, the 
> way it was when 16MB was considered a lot.
> 
> And looking quickly at the Intel 925X MCH (memory controller hub) 
> registers, nothing jumps out as a good candidate for some obvious bug. 
> 
>   Linus

Here is the PCI controller chain to the device:

00:1c.0 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 1 
(rev 02) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- Reset- FastB2B-
Capabilities: [40] Express Root Port (Slot+) IRQ 0
Device: Supported: MaxPayload 128 bytes, PhantFunc 0, ExtTag-
Device: Latency L0s unlimited, L1 unlimited
Device: Errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
Device: RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
Device: MaxPayload 128 bytes, MaxReadReq 128 bytes
Link: Supported Speed 2.5Gb/s, Width x1, ASPM L0s, Port 1
Link: Latency L0s <1us, L1 <4us
Link: ASPM Disabled RCB 64 bytes CommClk- ExtSynch-
Link: Speed 2.5Gb/s, Width x0
  

Re: Linux 2.6.22-rc2

2007-05-23 Thread Stephen Hemminger
On Tue, 22 May 2007 18:53:33 -0700 (PDT)
Linus Torvalds [EMAIL PROTECTED] wrote:

 
 
 On Tue, 22 May 2007, Stephen Hemminger wrote:
  
  It looks like the chip reads the wrong memory sometimes. The problem happens
  only on the on-board NIC's and only on this kind of motherboard.
 
 Do you know if it happens for particular addresses? (Ie, can you tell what 
 the physical address of the descriptor is for the errors?)

I'll look but there didn't seem to be an obvious pattern when I last looked.


 
  For testing, I have put code in to check that the receive data actually
  arrived before the IRQ, it triggered on my Gigabyte 925 motherboard. It
  appears that DMA access is messed up.
 
 Yes, that certainly would also explain memory corruption. Either because 
 writes went to the wrong address, or because writes went to the right 
 address, but because an earlier IO descriptor read had gotten corrupted, 
 the right address was in fact the wrong one ;)
 
 The reason I ask whether you have some way of telling the pattern for the 
 physical address is that one traditional cause of DMA errors is due to 
 broken RAM remapping setup.
 
 As an example of that - imagine that you have 1GB of RAM in the machine, 
 and realize that the memory behind the 640kB - 1MB area isn't accessible, 
 because it's taken up by the legacy ISA region.
 
 You have two possible outcomes: either (a) the memory is just gone, and 
 you lost it, or (b) there is some RAM remapping in the core chipset that 
 makes the lost 384kB show up _above_ the 1GB mark instead.
 
 The same legacy ISA hole situation happens for the legacy PCI hole, 
 which is why if you have 4GB of RAM in the machine, usually you'll see 
 3GB at addresses 0-3GB (roughly), and then you'll see the rest at above 
 the 4GB mark, in order to have a nice PCI hole in the 32-bit access range.
 
 There's also the legacy 286 hole at the 15-16MB mark (which nobody uses 
 any more, but chipsets still inexplicably support), and the SMM remapping. 
 
 Anyway, core chipsets generally do CPU memory accesses _differently_ from 
 DMA accesses from the PCI bus (at a minimum, SMM is something that only 
 the CPU can do), so I could see a situation where the remapping was set up 
 correctly for the CPU (and perhaps for core chipset devices like the 
 integrated southbridge), but devices that do DMA from the outside get 
 screwed over.


This board doesn't have any onboard video so that helps. I am running
with 2GB of memory.

I can put a card with similar chip in an X1 slot, and there are no
problems.  Same driver, but different bridges, and slightly different
Marvell chip.
 
 But it might not happen for all addresses. Non-remapped stuff might work 
 well, so if there is some way of figuring out what the bad DMA address was 
 for an erreneous access, that might offer some clues.
 
  This board has lots of overclocker friendly stuff; maybe the BIOS 
  never really sets up the PCI bridges and clocks properly.
 
 It's hard to set up a normal PCI-PCI bridge subtly incorrectly. But 
 special RAM timing or remapping stuff for the host bridge - sure.
 
  It doesn't seem like a software or driver problem. I have tried tweaking PCI
  registers but nothing worked in this case.
 
 Yeah, the PCI registers that would affect things like this tend to be in 
 the host bridge, not on the normal device.
 
 That said, Intel doesn't generally do the really insane things. And a lot 
 of the old remapping stuff is simply not done any more. For example, I 
 doubt that the 925 chipset even supports remapping the 640k-1M range any 
 more: 384kB just isn't worth it when people talk about gigs of RAM, the 
 way it was when 16MB was considered a lot.
 
 And looking quickly at the Intel 925X MCH (memory controller hub) 
 registers, nothing jumps out as a good candidate for some obvious bug. 
 
   Linus

Here is the PCI controller chain to the device:

00:1c.0 PCI bridge: Intel Corporation 82801H (ICH8 Family) PCI Express Port 1 
(rev 02) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- 
Stepping- SERR- FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast TAbort- TAbort- 
MAbort- SERR- PERR-
Latency: 0, Cache Line Size: 32 bytes
Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
I/O behind bridge: 5000-5fff
Memory behind bridge: fff0-000f
Prefetchable memory behind bridge: fff0-000f
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast TAbort- TAbort- 
MAbort- SERR- PERR-
BridgeCtl: Parity- SERR- NoISA+ VGA- MAbort- Reset- FastB2B-
Capabilities: [40] Express Root Port (Slot+) IRQ 0
Device: Supported: MaxPayload 128 bytes, PhantFunc 0, ExtTag-
Device: Latency L0s unlimited, L1 unlimited
Device: Errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
Device: 

Re: Linux 2.6.22-rc2

2007-05-23 Thread Mike Houston
On Tue, 22 May 2007 17:00:18 -0700 (PDT)
Linus Torvalds [EMAIL PROTECTED] wrote:

 and the load off sk-sk_prot-ioctl oopses, because sk-sk_prot
 is corrupt and contains 0x8e3cad42, which is not a valid kernel
 pointer.
 
 The other oops is even worse. 
 
 I also think it meshes with
 
   sky2 eth0: descriptor error q=0x280 get=285
 [800042375e2e5e] put=285
 
 and I suspect your memory got corrupted by sky2 reading the wrong 
 descriptors, and overwriting kernel memory.
 
 So it's almost certainly some DMA problem. Now, _why_ you have DMA 
 problems, I have no idea. But can you try:
  - disable CONFIG_PREEMPT
  - disable CONFIG_HIGHMEM if you have it on
  - just in general see if you can disable any kernel config options
 that might be unnecessary.
 to see if it changes the situation at all..

Thanks for looking at this. After further posts in the discussion I
wasn't sure if you still wanted me to try this, but I thought it
might be useful to see if (particularly) highmem support might change
the behaviour, or the messages in any way that might lead to a clue.
There was no change to the behaviour.

I have a Core 2 duo, and 2 Gb of RAM, but I built a uniprocessor
kernel (with apic), without highmem support, with no PREEMPT and
without other unnecessary stuff. If by chance I got it working, my
plan was to enable things one at a time.

I won't get that oops on this setup though (never have, anyways...
it was just the PCLinux install on that other hard disk which has
now been returned to use elsewhere), but the messages on trying to
transfer data are the same:

First try (instant failure on trying to ssh):

May 23 12:51:14 cramit kernel: sky2 eth0: enabling interface
May 23 12:51:14 cramit kernel: sky2 eth0: ram buffer 0K
May 23 12:51:16 cramit kernel: sky2 eth0: Link is up at 100 Mbps,
full duplex, flow control both May 23 12:51:34 cramit kernel: sky2
:04:00.0: error interrupt status=0x1 May 23 12:51:34 cramit
kernel: sky2 eth0: descriptor error q=0x280 get=7 [0] put=7

Second try after cold boot (failure on trying to transfer file):

May 23 12:52:59 cramit kernel: sky2 eth0: enabling interface
May 23 12:52:59 cramit kernel: sky2 eth0: ram buffer 0K
May 23 12:53:01 cramit kernel: sky2 eth0: Link is up at 100 Mbps,
full duplex, flow control both
May 23 12:55:40 cramit kernel: sky2
:04:00.0: error interrupt status=0x8000
May 23 12:55:40 cramit kernel: sky2 eth0: hw error interrupt status
0x8
May 23 12:55:40 cramit kernel: sky2 eth0: MAC parity error

This is exactly the behaviour I've been seeing.

I still happen to have a Windows Vista install kicking around, so to
make sure we're not flogging a dead horse I booted that and let it
set up the yukon2 chip and I tested it. (more to make sure that
eeprom update didn't break it). I used it for a bit and successfully
transferred some large files from box running Samba. MS must be using
some specific workaround or something.

Mike Houston
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.22-rc2

2007-05-23 Thread Linus Torvalds


On Wed, 23 May 2007, Mike Houston wrote:
 
 I still happen to have a Windows Vista install kicking around, so to
 make sure we're not flogging a dead horse I booted that and let it
 set up the yukon2 chip and I tested it. (more to make sure that
 eeprom update didn't break it). I used it for a bit and successfully
 transferred some large files from box running Samba. MS must be using
 some specific workaround or something.

I think there is some lspci-like thing for windows too. 

Can you do the equivalent of lspci -vvxxx on that box under both Linux 
and Windows? _If_ it's some PCI config space thing (which is not at all 
guaranteed - it could be about setup in random MMIO ranges) it might give 
us some clues.

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.22-rc2

2007-05-23 Thread Stephen Hemminger
On Wed, 23 May 2007 10:46:05 -0700 (PDT)
Linus Torvalds [EMAIL PROTECTED] wrote:

 
 
 On Wed, 23 May 2007, Mike Houston wrote:
  
  I still happen to have a Windows Vista install kicking around, so to
  make sure we're not flogging a dead horse I booted that and let it
  set up the yukon2 chip and I tested it. (more to make sure that
  eeprom update didn't break it). I used it for a bit and successfully
  transferred some large files from box running Samba. MS must be using
  some specific workaround or something.
 
 I think there is some lspci-like thing for windows too. 
 
 Can you do the equivalent of lspci -vvxxx on that box under both Linux 
 and Windows? _If_ it's some PCI config space thing (which is not at all 
 guaranteed - it could be about setup in random MMIO ranges) it might give 
 us some clues.
 
   Linus

lspci will work in windows, it is probably part of cygwin.

-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.22-rc2

2007-05-22 Thread Linus Torvalds


On Tue, 22 May 2007, Stephen Hemminger wrote:
> 
> It looks like the chip reads the wrong memory sometimes. The problem happens
> only on the on-board NIC's and only on this kind of motherboard.

Do you know if it happens for particular addresses? (Ie, can you tell what 
the physical address of the descriptor is for the errors?)

> For testing, I have put code in to check that the receive data actually
> arrived before the IRQ, it triggered on my Gigabyte 925 motherboard. It
> appears that DMA access is messed up.

Yes, that certainly would also explain memory corruption. Either because 
writes went to the wrong address, or because writes went to the right 
address, but because an earlier IO descriptor read had gotten corrupted, 
the "right address" was in fact the wrong one ;)

The reason I ask whether you have some way of telling the pattern for the 
physical address is that one traditional cause of DMA errors is due to 
broken RAM remapping setup.

As an example of that - imagine that you have 1GB of RAM in the machine, 
and realize that the memory behind the 640kB -> 1MB area isn't accessible, 
because it's taken up by the legacy ISA region.

You have two possible outcomes: either (a) the memory is just "gone", and 
you lost it, or (b) there is some RAM remapping in the core chipset that 
makes the lost 384kB show up _above_ the 1GB mark instead.

The same "legacy ISA" hole situation happens for the "legacy PCI" hole, 
which is why if you have 4GB of RAM in the machine, usually you'll see 
3GB at addresses 0-3GB (roughly), and then you'll see the rest at above 
the 4GB mark, in order to have a nice PCI hole in the 32-bit access range.

There's also the "legacy 286" hole at the 15-16MB mark (which nobody uses 
any more, but chipsets still inexplicably support), and the SMM remapping. 

Anyway, core chipsets generally do CPU memory accesses _differently_ from 
DMA accesses from the PCI bus (at a minimum, SMM is something that only 
the CPU can do), so I could see a situation where the remapping was set up 
correctly for the CPU (and perhaps for "core chipset" devices like the 
integrated southbridge), but devices that do DMA from the outside get 
screwed over.

But it might not happen for all addresses. Non-remapped stuff might work 
well, so if there is some way of figuring out what the bad DMA address was 
for an erreneous access, that might offer some clues.

> This board has lots of "overclocker" friendly stuff; maybe the BIOS 
> never really sets up the PCI bridges and clocks properly.

It's hard to set up a normal PCI-PCI bridge subtly incorrectly. But 
special RAM timing or remapping stuff for the host bridge - sure.

> It doesn't seem like a software or driver problem. I have tried tweaking PCI
> registers but nothing worked in this case.

Yeah, the PCI registers that would affect things like this tend to be in 
the host bridge, not on the normal device.

That said, Intel doesn't generally do the really insane things. And a lot 
of the old remapping stuff is simply not done any more. For example, I 
doubt that the 925 chipset even supports remapping the 640k-1M range any 
more: 384kB just isn't worth it when people talk about gigs of RAM, the 
way it was when 16MB was considered a lot.

And looking quickly at the Intel 925X MCH (memory controller hub) 
registers, nothing jumps out as a good candidate for some obvious bug. 

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.22-rc2

2007-05-22 Thread Stephen Hemminger

Linus Torvalds wrote:

On Tue, 22 May 2007, Mike Houston wrote:
  

In this case I actually had the kernel crash. First time for me ever
having a kernel oops! System locked up with keyboard LED's blinking.

Not sure if anyone wants to see all of it (maybe some screwy
userland stuff involved), so I won't include that mess in the
message. It's here:
http://www.mikeserv.org/files/kernelcrash.txt



I think you have major memory corruption. That first oops disassembles to

mov0x10(%eax),%esi
mov$0xfdfd,%eax
test   %esi,%esi
je after_call
mov%edx,%ecx
mov%edi,%eax
mov%ebx,%edx
call   *%esi
after_call:

which is (from net/ipv4/af_inet.c, inet_ioctl()):

default:
if (sk->sk_prot->ioctl)
err = sk->sk_prot->ioctl(sk, cmd, arg);
else
err = -ENOIOCTLCMD;
break;

and the load off "sk->sk_prot->ioctl" oopses, because "sk->sk_prot" is 
corrupt and contains 0x8e3cad42, which is not a valid kernel pointer.


The other oops is even worse. 


I also think it meshes with

sky2 eth0: descriptor error q=0x280 get=285 [800042375e2e5e] put=285

  
Descriptor error means,  the driver told it to do something but the 
OWNER bit wasn't set.

Only ever saw this on the Gigabyte motherboard.

It looks like the chip reads the wrong memory sometimes. The problem 
happens only on the on-board NIC's
and only on this kind of motherboard.  For testing, I have put code in 
to check that the receive data actually
arrived before the IRQ, it triggered on my Gigabyte 925 motherboard. It 
appears that DMA access
is messed up. This board has lots of "overclocker" friendly stuff; maybe 
the BIOS never really sets up the PCI

bridges and clocks properly.

It doesn't seem like a software or driver problem. I have tried tweaking 
PCI registers but nothing worked

in this case.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.22-rc2

2007-05-22 Thread Linus Torvalds


On Tue, 22 May 2007, Mike Houston wrote:
> 
> In this case I actually had the kernel crash. First time for me ever
> having a kernel oops! System locked up with keyboard LED's blinking.
> 
> Not sure if anyone wants to see all of it (maybe some screwy
> userland stuff involved), so I won't include that mess in the
> message. It's here:
> http://www.mikeserv.org/files/kernelcrash.txt

I think you have major memory corruption. That first oops disassembles to

mov0x10(%eax),%esi
mov$0xfdfd,%eax
test   %esi,%esi
je after_call
mov%edx,%ecx
mov%edi,%eax
mov%ebx,%edx
call   *%esi
after_call:

which is (from net/ipv4/af_inet.c, inet_ioctl()):

default:
if (sk->sk_prot->ioctl)
err = sk->sk_prot->ioctl(sk, cmd, arg);
else
err = -ENOIOCTLCMD;
break;

and the load off "sk->sk_prot->ioctl" oopses, because "sk->sk_prot" is 
corrupt and contains 0x8e3cad42, which is not a valid kernel pointer.

The other oops is even worse. 

I also think it meshes with

sky2 eth0: descriptor error q=0x280 get=285 [800042375e2e5e] put=285

and I suspect your memory got corrupted by sky2 reading the wrong 
descriptors, and overwriting kernel memory.

So it's almost certainly some DMA problem. Now, _why_ you have DMA 
problems, I have no idea. But can you try:
 - disable CONFIG_PREEMPT
 - disable CONFIG_HIGHMEM if you have it on
 - just in general see if you can disable any kernel config options that 
   might be unnecessary.
to see if it changes the situation at all..

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.22-rc2

2007-05-22 Thread Mike Houston
On Mon, 21 May 2007 21:31:46 -0700
Stephen Hemminger <[EMAIL PROTECTED]> wrote:

> There maybe some hardware level interaction with SATA controller.
> I saw no failures running off i386 kernel of PATA drive and quickly
> see errors with SATA/AHCI and x86_64.

AHCI SATA on i386, but I'm not sure that has anything to do with the
problem after what follows below.

I did another test here today. I disconnected my SATA hard disks and
installed a regular PATA drive. The only PATA port I have though, is
on the jmicron 363 controller. So I enabled that controller in the
bios (I keep it disabled because I have no use for it) and installed a
distro on the drive. PCLinuxOS TR4, which probably isn't the best test
system to use (and is not for me), but it's the only one I had on hand
that recognized IDE disks on the jmicron 363 controller with the
distro kernel.

After the install was done, I disconnected the SATA CD drive so there
would be no SATA devices. Nothing was on the ICH8 controller, which
I had put in IDE mode. (no setting to disable it entirely in bios)

I compiled 2.6.22-rc2 without libata/SATA support and only enabled the
old IDE subsystem with the jmicron 36x driver.

2.6.22-rc2 kernel was working well, and I brought up the sky2 eth0
interface alright, and as is the case most of the time (but not
always), I was able to do light stuff with it for a short time (e.g.
ssh in to another box, transfer a small text file etc.) but as soon
as I start trying to move any serious data the same or similar
problem occurs.

The only device using MSI at the time was the sky2, if that's
relevant. There were no other ethernet cards installed at the
time either.

In this case I actually had the kernel crash. First time for me ever
having a kernel oops! System locked up with keyboard LED's blinking.

Not sure if anyone wants to see all of it (maybe some screwy
userland stuff involved), so I won't include that mess in the
message. It's here:
http://www.mikeserv.org/files/kernelcrash.txt

But in there we get this, a somewhat similar message:

May 22 16:16:45 localhost kernel: sky2 :04:00.0: error interrupt
status=0x1
May 22 16:16:45 localhost kernel: sky2 eth0: descriptor error q=0x280
get=285 [800042375e2e5e] put=285

I hard booted and tried again a second time, and this time the kernel
didn't oops but I got this:

May 22 16:34:09 testinstall kernel: sky2 :04:00.0: error
interrupt status=0x1
May 22 16:34:09 testinstall kernel: sky2 eth0:
descriptor error q=0x280 get=497 [800042367dde5e] put=497
May 22 16:34:09 testinstall kernel: sky2 :04:00.0: error interrupt
status=0x8000
May 22 16:34:09 testinstall kernel: sky2 eth0: hw error interrupt
status 0x8
May 22 16:34:09 testinstall kernel: sky2
eth0: MAC parity error

So it's the same problem. On halting, I quickly saw what looked like
a kernel oops but nothing was logged at that stage.

Third try was the kernel oops again on attempting to transfer a file.

By the way, last night I did grab the dmesg output from the last
attempt to use sky2 on my normal (from scratch) system in case it
would be useful. This is not to be confused with the PATA experiment
above:
http://www.mikeserv.org/files/dmesg-2.6.22-rc2.txt

Mike Houston
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.22-rc2

2007-05-22 Thread H. Peter Anvin
Linus Torvalds wrote:
> 
> I can't really see that being a real problem in this day and age of PCI-X 
> etc, but it _used_ to be a possible issue a decade ago. Maybe you've found 
> a case where it matters even on modern hardware? We occasionally used to 
> set the PCI latency timer to make people happy.
> 
> (Not that I'm convinced it even has any semantic meaning on a modern PCI 
> system..)
> 

The PCI latency counters matter as long as you're talking a PCI or PCI-X
bus.  It matters not one iota on anything that pretends to be a PCI bus
but isn't, i.e. PCI Express, HyperTransport, etc.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.22-rc2

2007-05-22 Thread Stephen Hemminger
On Mon, 21 May 2007 22:04:26 -0700 (PDT)
Linus Torvalds <[EMAIL PROTECTED]> wrote:

> 
> 
> On Mon, 21 May 2007, Stephen Hemminger wrote:
> >
> > AHCI on this motherboard doesn't seem to use MSI. The problems occur
> > even if I boot with nomsi.
> 
> Have you tried playing with PCI latency counters etc? 
> 
> Maybe the SATA/AHCI thing is better at saturating the bus, and the sky2 
> hardware gets upset if it has overlong DMA access latencies due to some 
> other controller keeping the bus busy with a long burst access?
> 
> I can't really see that being a real problem in this day and age of PCI-X 
> etc, but it _used_ to be a possible issue a decade ago. Maybe you've found 
> a case where it matters even on modern hardware? We occasionally used to 
> set the PCI latency timer to make people happy.
> 
> (Not that I'm convinced it even has any semantic meaning on a modern PCI 
> system..)
> 
>   Linus

The device in question is PCI Express, and the latency has no meaning (at
least in vendor spec).

-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.22-rc2

2007-05-22 Thread Stephen Hemminger
On Mon, 21 May 2007 22:04:26 -0700 (PDT)
Linus Torvalds [EMAIL PROTECTED] wrote:

 
 
 On Mon, 21 May 2007, Stephen Hemminger wrote:
 
  AHCI on this motherboard doesn't seem to use MSI. The problems occur
  even if I boot with nomsi.
 
 Have you tried playing with PCI latency counters etc? 
 
 Maybe the SATA/AHCI thing is better at saturating the bus, and the sky2 
 hardware gets upset if it has overlong DMA access latencies due to some 
 other controller keeping the bus busy with a long burst access?
 
 I can't really see that being a real problem in this day and age of PCI-X 
 etc, but it _used_ to be a possible issue a decade ago. Maybe you've found 
 a case where it matters even on modern hardware? We occasionally used to 
 set the PCI latency timer to make people happy.
 
 (Not that I'm convinced it even has any semantic meaning on a modern PCI 
 system..)
 
   Linus

The device in question is PCI Express, and the latency has no meaning (at
least in vendor spec).

-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.22-rc2

2007-05-22 Thread H. Peter Anvin
Linus Torvalds wrote:
 
 I can't really see that being a real problem in this day and age of PCI-X 
 etc, but it _used_ to be a possible issue a decade ago. Maybe you've found 
 a case where it matters even on modern hardware? We occasionally used to 
 set the PCI latency timer to make people happy.
 
 (Not that I'm convinced it even has any semantic meaning on a modern PCI 
 system..)
 

The PCI latency counters matter as long as you're talking a PCI or PCI-X
bus.  It matters not one iota on anything that pretends to be a PCI bus
but isn't, i.e. PCI Express, HyperTransport, etc.

-hpa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.22-rc2

2007-05-22 Thread Mike Houston
On Mon, 21 May 2007 21:31:46 -0700
Stephen Hemminger [EMAIL PROTECTED] wrote:

 There maybe some hardware level interaction with SATA controller.
 I saw no failures running off i386 kernel of PATA drive and quickly
 see errors with SATA/AHCI and x86_64.

AHCI SATA on i386, but I'm not sure that has anything to do with the
problem after what follows below.

I did another test here today. I disconnected my SATA hard disks and
installed a regular PATA drive. The only PATA port I have though, is
on the jmicron 363 controller. So I enabled that controller in the
bios (I keep it disabled because I have no use for it) and installed a
distro on the drive. PCLinuxOS TR4, which probably isn't the best test
system to use (and is not for me), but it's the only one I had on hand
that recognized IDE disks on the jmicron 363 controller with the
distro kernel.

After the install was done, I disconnected the SATA CD drive so there
would be no SATA devices. Nothing was on the ICH8 controller, which
I had put in IDE mode. (no setting to disable it entirely in bios)

I compiled 2.6.22-rc2 without libata/SATA support and only enabled the
old IDE subsystem with the jmicron 36x driver.

2.6.22-rc2 kernel was working well, and I brought up the sky2 eth0
interface alright, and as is the case most of the time (but not
always), I was able to do light stuff with it for a short time (e.g.
ssh in to another box, transfer a small text file etc.) but as soon
as I start trying to move any serious data the same or similar
problem occurs.

The only device using MSI at the time was the sky2, if that's
relevant. There were no other ethernet cards installed at the
time either.

In this case I actually had the kernel crash. First time for me ever
having a kernel oops! System locked up with keyboard LED's blinking.

Not sure if anyone wants to see all of it (maybe some screwy
userland stuff involved), so I won't include that mess in the
message. It's here:
http://www.mikeserv.org/files/kernelcrash.txt

But in there we get this, a somewhat similar message:

May 22 16:16:45 localhost kernel: sky2 :04:00.0: error interrupt
status=0x1
May 22 16:16:45 localhost kernel: sky2 eth0: descriptor error q=0x280
get=285 [800042375e2e5e] put=285

I hard booted and tried again a second time, and this time the kernel
didn't oops but I got this:

May 22 16:34:09 testinstall kernel: sky2 :04:00.0: error
interrupt status=0x1
May 22 16:34:09 testinstall kernel: sky2 eth0:
descriptor error q=0x280 get=497 [800042367dde5e] put=497
May 22 16:34:09 testinstall kernel: sky2 :04:00.0: error interrupt
status=0x8000
May 22 16:34:09 testinstall kernel: sky2 eth0: hw error interrupt
status 0x8
May 22 16:34:09 testinstall kernel: sky2
eth0: MAC parity error

So it's the same problem. On halting, I quickly saw what looked like
a kernel oops but nothing was logged at that stage.

Third try was the kernel oops again on attempting to transfer a file.

By the way, last night I did grab the dmesg output from the last
attempt to use sky2 on my normal (from scratch) system in case it
would be useful. This is not to be confused with the PATA experiment
above:
http://www.mikeserv.org/files/dmesg-2.6.22-rc2.txt

Mike Houston
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.22-rc2

2007-05-22 Thread Linus Torvalds


On Tue, 22 May 2007, Mike Houston wrote:
 
 In this case I actually had the kernel crash. First time for me ever
 having a kernel oops! System locked up with keyboard LED's blinking.
 
 Not sure if anyone wants to see all of it (maybe some screwy
 userland stuff involved), so I won't include that mess in the
 message. It's here:
 http://www.mikeserv.org/files/kernelcrash.txt

I think you have major memory corruption. That first oops disassembles to

mov0x10(%eax),%esi
mov$0xfdfd,%eax
test   %esi,%esi
je after_call
mov%edx,%ecx
mov%edi,%eax
mov%ebx,%edx
call   *%esi
after_call:

which is (from net/ipv4/af_inet.c, inet_ioctl()):

default:
if (sk-sk_prot-ioctl)
err = sk-sk_prot-ioctl(sk, cmd, arg);
else
err = -ENOIOCTLCMD;
break;

and the load off sk-sk_prot-ioctl oopses, because sk-sk_prot is 
corrupt and contains 0x8e3cad42, which is not a valid kernel pointer.

The other oops is even worse. 

I also think it meshes with

sky2 eth0: descriptor error q=0x280 get=285 [800042375e2e5e] put=285

and I suspect your memory got corrupted by sky2 reading the wrong 
descriptors, and overwriting kernel memory.

So it's almost certainly some DMA problem. Now, _why_ you have DMA 
problems, I have no idea. But can you try:
 - disable CONFIG_PREEMPT
 - disable CONFIG_HIGHMEM if you have it on
 - just in general see if you can disable any kernel config options that 
   might be unnecessary.
to see if it changes the situation at all..

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.22-rc2

2007-05-22 Thread Stephen Hemminger

Linus Torvalds wrote:

On Tue, 22 May 2007, Mike Houston wrote:
  

In this case I actually had the kernel crash. First time for me ever
having a kernel oops! System locked up with keyboard LED's blinking.

Not sure if anyone wants to see all of it (maybe some screwy
userland stuff involved), so I won't include that mess in the
message. It's here:
http://www.mikeserv.org/files/kernelcrash.txt



I think you have major memory corruption. That first oops disassembles to

mov0x10(%eax),%esi
mov$0xfdfd,%eax
test   %esi,%esi
je after_call
mov%edx,%ecx
mov%edi,%eax
mov%ebx,%edx
call   *%esi
after_call:

which is (from net/ipv4/af_inet.c, inet_ioctl()):

default:
if (sk-sk_prot-ioctl)
err = sk-sk_prot-ioctl(sk, cmd, arg);
else
err = -ENOIOCTLCMD;
break;

and the load off sk-sk_prot-ioctl oopses, because sk-sk_prot is 
corrupt and contains 0x8e3cad42, which is not a valid kernel pointer.


The other oops is even worse. 


I also think it meshes with

sky2 eth0: descriptor error q=0x280 get=285 [800042375e2e5e] put=285

  
Descriptor error means,  the driver told it to do something but the 
OWNER bit wasn't set.

Only ever saw this on the Gigabyte motherboard.

It looks like the chip reads the wrong memory sometimes. The problem 
happens only on the on-board NIC's
and only on this kind of motherboard.  For testing, I have put code in 
to check that the receive data actually
arrived before the IRQ, it triggered on my Gigabyte 925 motherboard. It 
appears that DMA access
is messed up. This board has lots of overclocker friendly stuff; maybe 
the BIOS never really sets up the PCI

bridges and clocks properly.

It doesn't seem like a software or driver problem. I have tried tweaking 
PCI registers but nothing worked

in this case.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.22-rc2

2007-05-22 Thread Linus Torvalds


On Tue, 22 May 2007, Stephen Hemminger wrote:
 
 It looks like the chip reads the wrong memory sometimes. The problem happens
 only on the on-board NIC's and only on this kind of motherboard.

Do you know if it happens for particular addresses? (Ie, can you tell what 
the physical address of the descriptor is for the errors?)

 For testing, I have put code in to check that the receive data actually
 arrived before the IRQ, it triggered on my Gigabyte 925 motherboard. It
 appears that DMA access is messed up.

Yes, that certainly would also explain memory corruption. Either because 
writes went to the wrong address, or because writes went to the right 
address, but because an earlier IO descriptor read had gotten corrupted, 
the right address was in fact the wrong one ;)

The reason I ask whether you have some way of telling the pattern for the 
physical address is that one traditional cause of DMA errors is due to 
broken RAM remapping setup.

As an example of that - imagine that you have 1GB of RAM in the machine, 
and realize that the memory behind the 640kB - 1MB area isn't accessible, 
because it's taken up by the legacy ISA region.

You have two possible outcomes: either (a) the memory is just gone, and 
you lost it, or (b) there is some RAM remapping in the core chipset that 
makes the lost 384kB show up _above_ the 1GB mark instead.

The same legacy ISA hole situation happens for the legacy PCI hole, 
which is why if you have 4GB of RAM in the machine, usually you'll see 
3GB at addresses 0-3GB (roughly), and then you'll see the rest at above 
the 4GB mark, in order to have a nice PCI hole in the 32-bit access range.

There's also the legacy 286 hole at the 15-16MB mark (which nobody uses 
any more, but chipsets still inexplicably support), and the SMM remapping. 

Anyway, core chipsets generally do CPU memory accesses _differently_ from 
DMA accesses from the PCI bus (at a minimum, SMM is something that only 
the CPU can do), so I could see a situation where the remapping was set up 
correctly for the CPU (and perhaps for core chipset devices like the 
integrated southbridge), but devices that do DMA from the outside get 
screwed over.

But it might not happen for all addresses. Non-remapped stuff might work 
well, so if there is some way of figuring out what the bad DMA address was 
for an erreneous access, that might offer some clues.

 This board has lots of overclocker friendly stuff; maybe the BIOS 
 never really sets up the PCI bridges and clocks properly.

It's hard to set up a normal PCI-PCI bridge subtly incorrectly. But 
special RAM timing or remapping stuff for the host bridge - sure.

 It doesn't seem like a software or driver problem. I have tried tweaking PCI
 registers but nothing worked in this case.

Yeah, the PCI registers that would affect things like this tend to be in 
the host bridge, not on the normal device.

That said, Intel doesn't generally do the really insane things. And a lot 
of the old remapping stuff is simply not done any more. For example, I 
doubt that the 925 chipset even supports remapping the 640k-1M range any 
more: 384kB just isn't worth it when people talk about gigs of RAM, the 
way it was when 16MB was considered a lot.

And looking quickly at the Intel 925X MCH (memory controller hub) 
registers, nothing jumps out as a good candidate for some obvious bug. 

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.22-rc2

2007-05-21 Thread Linus Torvalds


On Mon, 21 May 2007, Stephen Hemminger wrote:
>
> AHCI on this motherboard doesn't seem to use MSI. The problems occur
> even if I boot with nomsi.

Have you tried playing with PCI latency counters etc? 

Maybe the SATA/AHCI thing is better at saturating the bus, and the sky2 
hardware gets upset if it has overlong DMA access latencies due to some 
other controller keeping the bus busy with a long burst access?

I can't really see that being a real problem in this day and age of PCI-X 
etc, but it _used_ to be a possible issue a decade ago. Maybe you've found 
a case where it matters even on modern hardware? We occasionally used to 
set the PCI latency timer to make people happy.

(Not that I'm convinced it even has any semantic meaning on a modern PCI 
system..)

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.22-rc2

2007-05-21 Thread Stephen Hemminger
On Tue, 22 May 2007 00:36:15 -0400
Jeff Garzik <[EMAIL PROTECTED]> wrote:

> Stephen Hemminger wrote:
> > There maybe some hardware level interaction with SATA controller.
> > I saw no failures running off i386 kernel of PATA drive and quickly
> > see errors with SATA/AHCI and x86_64.
> 
> 
> I presume AHCI is the only other device in the system using PCI MSI, 
> when you see problems?
> 
>   Jeff
> 
> 
AHCI on this motherboard doesn't seem to use MSI. The problems occur
even if I boot with nomsi.

-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.22-rc2

2007-05-21 Thread Jeff Garzik

Stephen Hemminger wrote:

There maybe some hardware level interaction with SATA controller.
I saw no failures running off i386 kernel of PATA drive and quickly
see errors with SATA/AHCI and x86_64.



I presume AHCI is the only other device in the system using PCI MSI, 
when you see problems?


Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.22-rc2

2007-05-21 Thread Stephen Hemminger
On Mon, 21 May 2007 22:58:06 -0400
Mike Houston <[EMAIL PROTECTED]> wrote:

> On Mon, 21 May 2007 10:37:55 -0700
> Stephen Hemminger <[EMAIL PROTECTED]> wrote:
> 
> > On Mon, 21 May 2007 13:10:55 -0400
> > Mike Houston <[EMAIL PROTECTED]> wrote:
> > 
> > > On Mon, 21 May 2007 08:45:49 -0700
> > > Stephen Hemminger <[EMAIL PROTECTED]> wrote:
> > > 
> > > > It's almost certainly a problem with the BIOS and hardware (not
> > > > a sky2) driver issue. Since there are many similar boards and
> > > > configurations, I made the decision not to enforce restrictions
> > > > in the driver.
> > > 
> > > >> May 20 15:57:48 cramit kernel: sky2 :04:00.0: v1.14 addr
> > > >> 0xf800 irq 16 Yukon-EC Ultra (0xb4) rev 2
> > > 
> > > Thank you for your answer. I was half wondering if that was the
> > > case after staring at those log messages several more times. I
> > > don't understand hardware at the low level but got thinking maybe
> > > interrupt routing issue. There's an Nvidia PCI Express card in
> > > there that gets IRQ 16, though it was not initialized by a driver
> > > at the time. (plain old VGA console after fresh cold boot... no
> > > framebuffer, no X, no nvidia module). I guess some things don't
> > > share well.
> > > 
> > > It works well in that other OS that came with the hardware, but
> > > that's beside the point.
> > 
> > It is some low level PCI Express related stuff, try latest BIOS (F9)
> > and if that doesn't help there is a EEPROM update from Gigabyte
> > for the Marvell hardware that might help.
> 
> Thanks for your suggestions, I followed through on them. It may still
> be interesting/useful to hear from me that it didn't help. The
> problem is the same.
> 
> My motherboard is a newer revision (Gigabyte GA-965P-DS3 Rev 3.3) and
> already had the "F10" bios version, but I flashed to the latest F11
> version anyways. I also flashed with the EEPROM update from Gigabyte,
> from a FAQ entry for my motherboard revision.
> (faq_marvell_eeprom.zip). Both operations were successful. I cleared
> the CMOS and reconfigured after the bios flash too.
> 
> Incidently, it was showing IRQ 16 in that early initialization
> message, but actually getting a MSI interrupt (IRQ 219, PCI-MSI-edge)
> 
> I've disabled the onboard yukon2 adapter in bios and gone
> back to the PCI card now. I think we can consider the matter closed,
> since it's not a problem with the driver, but just so you know, I'm
> always willing to help test when it's hardware that I have.
> 
> Mike Houston

There maybe some hardware level interaction with SATA controller.
I saw no failures running off i386 kernel of PATA drive and quickly
see errors with SATA/AHCI and x86_64.

-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.22-rc2

2007-05-21 Thread Mike Houston
On Mon, 21 May 2007 10:37:55 -0700
Stephen Hemminger <[EMAIL PROTECTED]> wrote:

> On Mon, 21 May 2007 13:10:55 -0400
> Mike Houston <[EMAIL PROTECTED]> wrote:
> 
> > On Mon, 21 May 2007 08:45:49 -0700
> > Stephen Hemminger <[EMAIL PROTECTED]> wrote:
> > 
> > > It's almost certainly a problem with the BIOS and hardware (not
> > > a sky2) driver issue. Since there are many similar boards and
> > > configurations, I made the decision not to enforce restrictions
> > > in the driver.
> > 
> > >> May 20 15:57:48 cramit kernel: sky2 :04:00.0: v1.14 addr
> > >> 0xf800 irq 16 Yukon-EC Ultra (0xb4) rev 2
> > 
> > Thank you for your answer. I was half wondering if that was the
> > case after staring at those log messages several more times. I
> > don't understand hardware at the low level but got thinking maybe
> > interrupt routing issue. There's an Nvidia PCI Express card in
> > there that gets IRQ 16, though it was not initialized by a driver
> > at the time. (plain old VGA console after fresh cold boot... no
> > framebuffer, no X, no nvidia module). I guess some things don't
> > share well.
> > 
> > It works well in that other OS that came with the hardware, but
> > that's beside the point.
> 
> It is some low level PCI Express related stuff, try latest BIOS (F9)
> and if that doesn't help there is a EEPROM update from Gigabyte
> for the Marvell hardware that might help.

Thanks for your suggestions, I followed through on them. It may still
be interesting/useful to hear from me that it didn't help. The
problem is the same.

My motherboard is a newer revision (Gigabyte GA-965P-DS3 Rev 3.3) and
already had the "F10" bios version, but I flashed to the latest F11
version anyways. I also flashed with the EEPROM update from Gigabyte,
from a FAQ entry for my motherboard revision.
(faq_marvell_eeprom.zip). Both operations were successful. I cleared
the CMOS and reconfigured after the bios flash too.

Incidently, it was showing IRQ 16 in that early initialization
message, but actually getting a MSI interrupt (IRQ 219, PCI-MSI-edge)

I've disabled the onboard yukon2 adapter in bios and gone
back to the PCI card now. I think we can consider the matter closed,
since it's not a problem with the driver, but just so you know, I'm
always willing to help test when it's hardware that I have.

Mike Houston
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.22-rc2

2007-05-21 Thread Stephen Hemminger
On Mon, 21 May 2007 13:10:55 -0400
Mike Houston <[EMAIL PROTECTED]> wrote:

> On Mon, 21 May 2007 08:45:49 -0700
> Stephen Hemminger <[EMAIL PROTECTED]> wrote:
> 
> > It's almost certainly a problem with the BIOS and hardware (not a
> > sky2) driver issue. Since there are many similar boards and
> > configurations, I made the decision not to enforce restrictions in
> > the driver.
> 
> >> May 20 15:57:48 cramit kernel: sky2 :04:00.0: v1.14 addr
> >> 0xf800 irq 16 Yukon-EC Ultra (0xb4) rev 2
> 
> Thank you for your answer. I was half wondering if that was the case
> after staring at those log messages several more times. I don't
> understand hardware at the low level but got thinking maybe interrupt
> routing issue. There's an Nvidia PCI Express card in there that gets
> IRQ 16, though it was not initialized by a driver at the time. (plain
> old VGA console after fresh cold boot... no framebuffer, no X, no
> nvidia module). I guess some things don't share well.
> 
> It works well in that other OS that came with the hardware, but
> that's beside the point.

It is some low level PCI Express related stuff, try latest BIOS (F9)
and if that doesn't help there is a EEPROM update from Gigabyte
for the Marvell hardware that might help.

-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.22-rc2

2007-05-21 Thread Mike Houston
On Mon, 21 May 2007 08:45:49 -0700
Stephen Hemminger <[EMAIL PROTECTED]> wrote:

> It's almost certainly a problem with the BIOS and hardware (not a
> sky2) driver issue. Since there are many similar boards and
> configurations, I made the decision not to enforce restrictions in
> the driver.

>> May 20 15:57:48 cramit kernel: sky2 :04:00.0: v1.14 addr
>> 0xf800 irq 16 Yukon-EC Ultra (0xb4) rev 2

Thank you for your answer. I was half wondering if that was the case
after staring at those log messages several more times. I don't
understand hardware at the low level but got thinking maybe interrupt
routing issue. There's an Nvidia PCI Express card in there that gets
IRQ 16, though it was not initialized by a driver at the time. (plain
old VGA console after fresh cold boot... no framebuffer, no X, no
nvidia module). I guess some things don't share well.

It works well in that other OS that came with the hardware, but
that's beside the point.

Mike Houston
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.22-rc2

2007-05-21 Thread Stephen Hemminger
On Sun, 20 May 2007 17:05:06 -0400
Mike Houston <[EMAIL PROTECTED]> wrote:

> On Fri, 18 May 2007 22:17:14 -0700 (PDT)
> Linus Torvalds <[EMAIL PROTECTED]> wrote:
> 
> 
> > Stephen Hemminger (7):
> >   [TCP] slow start: Make comments and code logic clearer.
> >   *** sky2: remove Gigabyte 88e8056 restriction ***
> >   sky2: PHY register settings
> >   sky2: keep track of receive alloc failures
> >   sky2: MIB counter overflow handling
> >   sky2: remove dual port workaround
> >   sky2: memory barriers change
> >
> 
> I tested this and it's still horribly broken for me with Gigabyte
> 88E8056 onboard LAN. Same symptom as before, it works for several
> seconds and then dies.

It's almost certainly a problem with the BIOS and hardware (not a sky2)
driver issue. Since there are many similar boards and configurations, I made
the decision not to enforce restrictions in the driver.

-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] linux-2.6.22-rc2. SLUB report[kzalloc(0)]

2007-05-21 Thread Jiri Slaby
Dan Kruchinin napsal(a):
> Hi.
> 
> There is a BUG message from SLUB during boot process:
> --
> May 21 05:39:10 midgard kernel: [   31.177484] BUG: at
> include/linux/slub_def.h:77 kmalloc_index()
> May 21 05:39:10 midgard kernel: [   31.178355]  []
> show_trace_log_lvl+0x1a/0x30
> May 21 05:39:10 midgard kernel: [   31.179263]  [] show_trace
> +0x12/0x20
> May 21 05:39:10 midgard kernel: [   31.180177]  [] dump_stack
> +0x16/0x20
> May 21 05:39:10 midgard kernel: [   31.181094]  [] get_slab
> +0x1cd/0x260
> May 21 05:39:10 midgard kernel: [   31.182024]  []
> __kmalloc_track_caller+0x19/0xa0
> May 21 05:39:10 midgard kernel: [   31.183019]  [] __kzalloc
> +0x19/0x50
> May 21 05:39:10 midgard kernel: [   31.184012]  []
> usb_get_configuration+0x9de/0x11c0 [usbcore]
> May 21 05:39:10 midgard kernel: [   31.185115]  []
> usb_new_device+0x17/0x190 [usbcore]
> May 21 05:39:10 midgard kernel: [   31.186181]  [] hub_thread
> +0x79a/0xfd0 [usbcore]
> May 21 05:39:10 midgard kernel: [   31.187185]  [] kthread
> +0x42/0x70
> May 21 05:39:10 midgard kernel: [   31.188190]  []
> kernel_thread_helper+0x7/0x10
> --
> 
> kzalloc(0, GFP_KERNEL) occurs in drivers/usb/core/config.c in 
> usb_get_configuration function. I already wrote about this slub bug
> report and offered a
> patch(http://www.uwsg.iu.edu/hypermail/linux/kernel/0705.1/0154.html). I
> don't know, may be it is not major thing, if it is, just ignore it. I
> just think, that 
> --
>   length = ncfg * sizeof(struct usb_host_config);
>   dev->config = kzalloc(length, GFP_KERNEL);
> --
> isn't clear, because ncfg - in my case - is 0 and I suppose, that size
> of the leastest slab cache can become(in future) smaller than
> sizeof(struct usb_host_config).

There is yet another patch for this:
http://lkml.org/lkml/2007/5/19/171
sitting here:
http://kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/gregkh-03-usb/usb-don-t-try-to-kzalloc-0-bytes.patch

regards,
-- 
http://www.fi.muni.cz/~xslaby/Jiri Slaby
faculty of informatics, masaryk university, brno, cz
e-mail: jirislaby gmail com, gpg pubkey fingerprint:
B674 9967 0407 CE62 ACC8  22A0 32CC 55C3 39D4 7A7E
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] linux-2.6.22-rc2. SLUB report[kzalloc(0)]

2007-05-21 Thread Jiri Slaby
Dan Kruchinin napsal(a):
 Hi.
 
 There is a BUG message from SLUB during boot process:
 --
 May 21 05:39:10 midgard kernel: [   31.177484] BUG: at
 include/linux/slub_def.h:77 kmalloc_index()
 May 21 05:39:10 midgard kernel: [   31.178355]  [c01062ca]
 show_trace_log_lvl+0x1a/0x30
 May 21 05:39:10 midgard kernel: [   31.179263]  [c0106e72] show_trace
 +0x12/0x20
 May 21 05:39:10 midgard kernel: [   31.180177]  [c0106ee6] dump_stack
 +0x16/0x20
 May 21 05:39:10 midgard kernel: [   31.181094]  [c018440d] get_slab
 +0x1cd/0x260
 May 21 05:39:10 midgard kernel: [   31.182024]  [c0185809]
 __kmalloc_track_caller+0x19/0xa0
 May 21 05:39:10 midgard kernel: [   31.183019]  [c0170059] __kzalloc
 +0x19/0x50
 May 21 05:39:10 midgard kernel: [   31.184012]  [df05586e]
 usb_get_configuration+0x9de/0x11c0 [usbcore]
 May 21 05:39:10 midgard kernel: [   31.185115]  [df04cd07]
 usb_new_device+0x17/0x190 [usbcore]
 May 21 05:39:10 midgard kernel: [   31.186181]  [df04e68a] hub_thread
 +0x79a/0xfd0 [usbcore]
 May 21 05:39:10 midgard kernel: [   31.187185]  [c013b8e2] kthread
 +0x42/0x70
 May 21 05:39:10 midgard kernel: [   31.188190]  [c0105ea7]
 kernel_thread_helper+0x7/0x10
 --
 
 kzalloc(0, GFP_KERNEL) occurs in drivers/usb/core/config.c in 
 usb_get_configuration function. I already wrote about this slub bug
 report and offered a
 patch(http://www.uwsg.iu.edu/hypermail/linux/kernel/0705.1/0154.html). I
 don't know, may be it is not major thing, if it is, just ignore it. I
 just think, that 
 --
   length = ncfg * sizeof(struct usb_host_config);
   dev-config = kzalloc(length, GFP_KERNEL);
 --
 isn't clear, because ncfg - in my case - is 0 and I suppose, that size
 of the leastest slab cache can become(in future) smaller than
 sizeof(struct usb_host_config).

There is yet another patch for this:
http://lkml.org/lkml/2007/5/19/171
sitting here:
http://kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/gregkh-03-usb/usb-don-t-try-to-kzalloc-0-bytes.patch

regards,
-- 
http://www.fi.muni.cz/~xslaby/Jiri Slaby
faculty of informatics, masaryk university, brno, cz
e-mail: jirislaby gmail com, gpg pubkey fingerprint:
B674 9967 0407 CE62 ACC8  22A0 32CC 55C3 39D4 7A7E
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.22-rc2

2007-05-21 Thread Stephen Hemminger
On Sun, 20 May 2007 17:05:06 -0400
Mike Houston [EMAIL PROTECTED] wrote:

 On Fri, 18 May 2007 22:17:14 -0700 (PDT)
 Linus Torvalds [EMAIL PROTECTED] wrote:
 
 
  Stephen Hemminger (7):
[TCP] slow start: Make comments and code logic clearer.
*** sky2: remove Gigabyte 88e8056 restriction ***
sky2: PHY register settings
sky2: keep track of receive alloc failures
sky2: MIB counter overflow handling
sky2: remove dual port workaround
sky2: memory barriers change
 
 
 I tested this and it's still horribly broken for me with Gigabyte
 88E8056 onboard LAN. Same symptom as before, it works for several
 seconds and then dies.

It's almost certainly a problem with the BIOS and hardware (not a sky2)
driver issue. Since there are many similar boards and configurations, I made
the decision not to enforce restrictions in the driver.

-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.22-rc2

2007-05-21 Thread Mike Houston
On Mon, 21 May 2007 08:45:49 -0700
Stephen Hemminger [EMAIL PROTECTED] wrote:

 It's almost certainly a problem with the BIOS and hardware (not a
 sky2) driver issue. Since there are many similar boards and
 configurations, I made the decision not to enforce restrictions in
 the driver.

 May 20 15:57:48 cramit kernel: sky2 :04:00.0: v1.14 addr
 0xf800 irq 16 Yukon-EC Ultra (0xb4) rev 2

Thank you for your answer. I was half wondering if that was the case
after staring at those log messages several more times. I don't
understand hardware at the low level but got thinking maybe interrupt
routing issue. There's an Nvidia PCI Express card in there that gets
IRQ 16, though it was not initialized by a driver at the time. (plain
old VGA console after fresh cold boot... no framebuffer, no X, no
nvidia module). I guess some things don't share well.

It works well in that other OS that came with the hardware, but
that's beside the point.

Mike Houston
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.22-rc2

2007-05-21 Thread Stephen Hemminger
On Mon, 21 May 2007 13:10:55 -0400
Mike Houston [EMAIL PROTECTED] wrote:

 On Mon, 21 May 2007 08:45:49 -0700
 Stephen Hemminger [EMAIL PROTECTED] wrote:
 
  It's almost certainly a problem with the BIOS and hardware (not a
  sky2) driver issue. Since there are many similar boards and
  configurations, I made the decision not to enforce restrictions in
  the driver.
 
  May 20 15:57:48 cramit kernel: sky2 :04:00.0: v1.14 addr
  0xf800 irq 16 Yukon-EC Ultra (0xb4) rev 2
 
 Thank you for your answer. I was half wondering if that was the case
 after staring at those log messages several more times. I don't
 understand hardware at the low level but got thinking maybe interrupt
 routing issue. There's an Nvidia PCI Express card in there that gets
 IRQ 16, though it was not initialized by a driver at the time. (plain
 old VGA console after fresh cold boot... no framebuffer, no X, no
 nvidia module). I guess some things don't share well.
 
 It works well in that other OS that came with the hardware, but
 that's beside the point.

It is some low level PCI Express related stuff, try latest BIOS (F9)
and if that doesn't help there is a EEPROM update from Gigabyte
for the Marvell hardware that might help.

-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.22-rc2

2007-05-21 Thread Mike Houston
On Mon, 21 May 2007 10:37:55 -0700
Stephen Hemminger [EMAIL PROTECTED] wrote:

 On Mon, 21 May 2007 13:10:55 -0400
 Mike Houston [EMAIL PROTECTED] wrote:
 
  On Mon, 21 May 2007 08:45:49 -0700
  Stephen Hemminger [EMAIL PROTECTED] wrote:
  
   It's almost certainly a problem with the BIOS and hardware (not
   a sky2) driver issue. Since there are many similar boards and
   configurations, I made the decision not to enforce restrictions
   in the driver.
  
   May 20 15:57:48 cramit kernel: sky2 :04:00.0: v1.14 addr
   0xf800 irq 16 Yukon-EC Ultra (0xb4) rev 2
  
  Thank you for your answer. I was half wondering if that was the
  case after staring at those log messages several more times. I
  don't understand hardware at the low level but got thinking maybe
  interrupt routing issue. There's an Nvidia PCI Express card in
  there that gets IRQ 16, though it was not initialized by a driver
  at the time. (plain old VGA console after fresh cold boot... no
  framebuffer, no X, no nvidia module). I guess some things don't
  share well.
  
  It works well in that other OS that came with the hardware, but
  that's beside the point.
 
 It is some low level PCI Express related stuff, try latest BIOS (F9)
 and if that doesn't help there is a EEPROM update from Gigabyte
 for the Marvell hardware that might help.

Thanks for your suggestions, I followed through on them. It may still
be interesting/useful to hear from me that it didn't help. The
problem is the same.

My motherboard is a newer revision (Gigabyte GA-965P-DS3 Rev 3.3) and
already had the F10 bios version, but I flashed to the latest F11
version anyways. I also flashed with the EEPROM update from Gigabyte,
from a FAQ entry for my motherboard revision.
(faq_marvell_eeprom.zip). Both operations were successful. I cleared
the CMOS and reconfigured after the bios flash too.

Incidently, it was showing IRQ 16 in that early initialization
message, but actually getting a MSI interrupt (IRQ 219, PCI-MSI-edge)

I've disabled the onboard yukon2 adapter in bios and gone
back to the PCI card now. I think we can consider the matter closed,
since it's not a problem with the driver, but just so you know, I'm
always willing to help test when it's hardware that I have.

Mike Houston
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.22-rc2

2007-05-21 Thread Stephen Hemminger
On Mon, 21 May 2007 22:58:06 -0400
Mike Houston [EMAIL PROTECTED] wrote:

 On Mon, 21 May 2007 10:37:55 -0700
 Stephen Hemminger [EMAIL PROTECTED] wrote:
 
  On Mon, 21 May 2007 13:10:55 -0400
  Mike Houston [EMAIL PROTECTED] wrote:
  
   On Mon, 21 May 2007 08:45:49 -0700
   Stephen Hemminger [EMAIL PROTECTED] wrote:
   
It's almost certainly a problem with the BIOS and hardware (not
a sky2) driver issue. Since there are many similar boards and
configurations, I made the decision not to enforce restrictions
in the driver.
   
May 20 15:57:48 cramit kernel: sky2 :04:00.0: v1.14 addr
0xf800 irq 16 Yukon-EC Ultra (0xb4) rev 2
   
   Thank you for your answer. I was half wondering if that was the
   case after staring at those log messages several more times. I
   don't understand hardware at the low level but got thinking maybe
   interrupt routing issue. There's an Nvidia PCI Express card in
   there that gets IRQ 16, though it was not initialized by a driver
   at the time. (plain old VGA console after fresh cold boot... no
   framebuffer, no X, no nvidia module). I guess some things don't
   share well.
   
   It works well in that other OS that came with the hardware, but
   that's beside the point.
  
  It is some low level PCI Express related stuff, try latest BIOS (F9)
  and if that doesn't help there is a EEPROM update from Gigabyte
  for the Marvell hardware that might help.
 
 Thanks for your suggestions, I followed through on them. It may still
 be interesting/useful to hear from me that it didn't help. The
 problem is the same.
 
 My motherboard is a newer revision (Gigabyte GA-965P-DS3 Rev 3.3) and
 already had the F10 bios version, but I flashed to the latest F11
 version anyways. I also flashed with the EEPROM update from Gigabyte,
 from a FAQ entry for my motherboard revision.
 (faq_marvell_eeprom.zip). Both operations were successful. I cleared
 the CMOS and reconfigured after the bios flash too.
 
 Incidently, it was showing IRQ 16 in that early initialization
 message, but actually getting a MSI interrupt (IRQ 219, PCI-MSI-edge)
 
 I've disabled the onboard yukon2 adapter in bios and gone
 back to the PCI card now. I think we can consider the matter closed,
 since it's not a problem with the driver, but just so you know, I'm
 always willing to help test when it's hardware that I have.
 
 Mike Houston

There maybe some hardware level interaction with SATA controller.
I saw no failures running off i386 kernel of PATA drive and quickly
see errors with SATA/AHCI and x86_64.

-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.22-rc2

2007-05-21 Thread Jeff Garzik

Stephen Hemminger wrote:

There maybe some hardware level interaction with SATA controller.
I saw no failures running off i386 kernel of PATA drive and quickly
see errors with SATA/AHCI and x86_64.



I presume AHCI is the only other device in the system using PCI MSI, 
when you see problems?


Jeff


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.22-rc2

2007-05-21 Thread Stephen Hemminger
On Tue, 22 May 2007 00:36:15 -0400
Jeff Garzik [EMAIL PROTECTED] wrote:

 Stephen Hemminger wrote:
  There maybe some hardware level interaction with SATA controller.
  I saw no failures running off i386 kernel of PATA drive and quickly
  see errors with SATA/AHCI and x86_64.
 
 
 I presume AHCI is the only other device in the system using PCI MSI, 
 when you see problems?
 
   Jeff
 
 
AHCI on this motherboard doesn't seem to use MSI. The problems occur
even if I boot with nomsi.

-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.22-rc2

2007-05-21 Thread Linus Torvalds


On Mon, 21 May 2007, Stephen Hemminger wrote:

 AHCI on this motherboard doesn't seem to use MSI. The problems occur
 even if I boot with nomsi.

Have you tried playing with PCI latency counters etc? 

Maybe the SATA/AHCI thing is better at saturating the bus, and the sky2 
hardware gets upset if it has overlong DMA access latencies due to some 
other controller keeping the bus busy with a long burst access?

I can't really see that being a real problem in this day and age of PCI-X 
etc, but it _used_ to be a possible issue a decade ago. Maybe you've found 
a case where it matters even on modern hardware? We occasionally used to 
set the PCI latency timer to make people happy.

(Not that I'm convinced it even has any semantic meaning on a modern PCI 
system..)

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC] linux-2.6.22-rc2. SLUB report[kzalloc(0)]

2007-05-20 Thread Dan Kruchinin
Hi.

There is a BUG message from SLUB during boot process:
--
May 21 05:39:10 midgard kernel: [   31.177484] BUG: at
include/linux/slub_def.h:77 kmalloc_index()
May 21 05:39:10 midgard kernel: [   31.178355]  []
show_trace_log_lvl+0x1a/0x30
May 21 05:39:10 midgard kernel: [   31.179263]  [] show_trace
+0x12/0x20
May 21 05:39:10 midgard kernel: [   31.180177]  [] dump_stack
+0x16/0x20
May 21 05:39:10 midgard kernel: [   31.181094]  [] get_slab
+0x1cd/0x260
May 21 05:39:10 midgard kernel: [   31.182024]  []
__kmalloc_track_caller+0x19/0xa0
May 21 05:39:10 midgard kernel: [   31.183019]  [] __kzalloc
+0x19/0x50
May 21 05:39:10 midgard kernel: [   31.184012]  []
usb_get_configuration+0x9de/0x11c0 [usbcore]
May 21 05:39:10 midgard kernel: [   31.185115]  []
usb_new_device+0x17/0x190 [usbcore]
May 21 05:39:10 midgard kernel: [   31.186181]  [] hub_thread
+0x79a/0xfd0 [usbcore]
May 21 05:39:10 midgard kernel: [   31.187185]  [] kthread
+0x42/0x70
May 21 05:39:10 midgard kernel: [   31.188190]  []
kernel_thread_helper+0x7/0x10
--

kzalloc(0, GFP_KERNEL) occurs in drivers/usb/core/config.c in 
usb_get_configuration function. I already wrote about this slub bug
report and offered a
patch(http://www.uwsg.iu.edu/hypermail/linux/kernel/0705.1/0154.html). I
don't know, may be it is not major thing, if it is, just ignore it. I
just think, that 
--
length = ncfg * sizeof(struct usb_host_config);
dev->config = kzalloc(length, GFP_KERNEL);
--
isn't clear, because ncfg - in my case - is 0 and I suppose, that size
of the leastest slab cache can become(in future) smaller than
sizeof(struct usb_host_config).

Thanks for attention.

-- 
Dan Kruchinin <[EMAIL PROTECTED]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.22-rc2

2007-05-20 Thread Mike Houston
On Fri, 18 May 2007 22:17:14 -0700 (PDT)
Linus Torvalds <[EMAIL PROTECTED]> wrote:


> Stephen Hemminger (7):
>   [TCP] slow start: Make comments and code logic clearer.
>   *** sky2: remove Gigabyte 88e8056 restriction ***
>   sky2: PHY register settings
>   sky2: keep track of receive alloc failures
>   sky2: MIB counter overflow handling
>   sky2: remove dual port workaround
>   sky2: memory barriers change
>

I tested this and it's still horribly broken for me with Gigabyte
88E8056 onboard LAN. Same symptom as before, it works for several
seconds and then dies.

Relevant portion of logs:

May 20 15:57:48 cramit kernel: sky2 :04:00.0: v1.14 addr
0xf800 irq 16 Yukon-EC Ultra (0xb4) rev 2
May 20 15:57:48 cramit kernel: sky2 eth0: addr 00:16:e6:da:f3:b5

May 20 15:57:48 cramit kernel: sky2 eth0: enabling interface
May 20 15:57:48 cramit kernel: sky2 eth0: ram buffer 0K
May 20 15:57:48 cramit kernel: ACPI: PCI Interrupt :00:1b.0[A] ->
GSI 22 (level, low) -> IRQ 18
May 20 15:57:48 cramit kernel: PCI: Setting latency timer of device
:00:1b.0 to 64
May 20 15:57:50 cramit kernel: sky2 eth0: Link is up at 100 Mbps,
full duplex, flow control both

Attempt to ftp a file to another box on LAN and about 1.5
megabytes into the transfer:

May 20 16:01:43 cramit kernel: sky2 eth0: hw error interrupt status
0x8
May 20 16:01:43 cramit kernel: sky2 eth0: MAC parity error
May 20 16:01:43 cramit kernel: sky2 :04:00.0: error interrupt
status=0x8000
May 20 16:01:43 cramit kernel: sky2 eth0: hw error interrupt status
0x8
May 20 16:01:43 cramit kernel: sky2 eth0: MAC parity error

Transfer stalls and that's all she wrote.

If interested in seeing kernel config:
http://www.mikeserv.org/files/config-2.6.22-rc2

Oh well, back to trusty rtl8139 based PCI card for now.

Thanks for working on this stuff,

Mike Houston
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.22-rc2: make -j makes it unresponsive

2007-05-20 Thread Rafael J. Wysocki
On Sunday, 20 May 2007 15:01, Krzysztof Halasa wrote:
> "Rafael J. Wysocki" <[EMAIL PROTECTED]> writes:
> 
> > Running 'make -j' kernel compilation on my test box (Athlon64 X2, 2 SATA
> > drives
> > with 6 software RAID1 ext3 and reiserfs partitions, 2 GB of RAM) makes it
> > completely unresponsive.  I can't even move the mouse pointer when it's
> > running, I can't log to the box from the network etc.
> 
> How many processes does it spawn? Try some sane limit.

Do you think it works as a fork bomb?  Well, it didn't work like that before,
AFAIR, but then 2.6.21 also does it with the same settings, so sorry for the
noise.

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.22-rc2: make -j makes it unresponsive

2007-05-20 Thread Krzysztof Halasa
"Rafael J. Wysocki" <[EMAIL PROTECTED]> writes:

> Running 'make -j' kernel compilation on my test box (Athlon64 X2, 2 SATA
> drives
> with 6 software RAID1 ext3 and reiserfs partitions, 2 GB of RAM) makes it
> completely unresponsive.  I can't even move the mouse pointer when it's
> running, I can't log to the box from the network etc.

How many processes does it spawn? Try some sane limit.
-- 
Krzysztof Halasa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.22-rc2: make -j makes it unresponsive

2007-05-20 Thread Rafael J. Wysocki
Hi,

On Saturday, 19 May 2007 07:17, Linus Torvalds wrote:
> 
> It's out there, both patches/tarballs and git trees are updated (although 
> mirroring might still be ongoing)
> 
> Various random fixes all over - the shortlog (appended) is fairly 
> readable. The most notable ones are probably more SLUB fixes, and the 
> epoll optimizations and cleanups.
> 
> But there's stuff in architectures (ia64, SH, AVR32, POWER), libata, 
> network drivers, sound.. Give it a try.
> 
> I've been telling some people off on merging stuff, and I'll get even more 
> hard-nosed about it after -rc2, so please don't even try to send anything 
> but real fixes.
> 
> I think the current situation looks reasonably good for 2.6.22, but I hope 
> everybody will take a good look at the regression lists (whether they 
> _think_ they are affected or not), and spend some time wondering "was that 
> anything I did, or is it something I can look at". Ok?

Running 'make -j' kernel compilation on my test box (Athlon64 X2, 2 SATA drives
with 6 software RAID1 ext3 and reiserfs partitions, 2 GB of RAM) makes it
completely unresponsive.  I can't even move the mouse pointer when it's
running, I can't log to the box from the network etc.

The anticipatory IO scheduler is used.

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.22-rc2: make -j makes it unresponsive

2007-05-20 Thread Rafael J. Wysocki
Hi,

On Saturday, 19 May 2007 07:17, Linus Torvalds wrote:
 
 It's out there, both patches/tarballs and git trees are updated (although 
 mirroring might still be ongoing)
 
 Various random fixes all over - the shortlog (appended) is fairly 
 readable. The most notable ones are probably more SLUB fixes, and the 
 epoll optimizations and cleanups.
 
 But there's stuff in architectures (ia64, SH, AVR32, POWER), libata, 
 network drivers, sound.. Give it a try.
 
 I've been telling some people off on merging stuff, and I'll get even more 
 hard-nosed about it after -rc2, so please don't even try to send anything 
 but real fixes.
 
 I think the current situation looks reasonably good for 2.6.22, but I hope 
 everybody will take a good look at the regression lists (whether they 
 _think_ they are affected or not), and spend some time wondering was that 
 anything I did, or is it something I can look at. Ok?

Running 'make -j' kernel compilation on my test box (Athlon64 X2, 2 SATA drives
with 6 software RAID1 ext3 and reiserfs partitions, 2 GB of RAM) makes it
completely unresponsive.  I can't even move the mouse pointer when it's
running, I can't log to the box from the network etc.

The anticipatory IO scheduler is used.

Greetings,
Rafael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.22-rc2: make -j makes it unresponsive

2007-05-20 Thread Krzysztof Halasa
Rafael J. Wysocki [EMAIL PROTECTED] writes:

 Running 'make -j' kernel compilation on my test box (Athlon64 X2, 2 SATA
 drives
 with 6 software RAID1 ext3 and reiserfs partitions, 2 GB of RAM) makes it
 completely unresponsive.  I can't even move the mouse pointer when it's
 running, I can't log to the box from the network etc.

How many processes does it spawn? Try some sane limit.
-- 
Krzysztof Halasa
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.22-rc2: make -j makes it unresponsive

2007-05-20 Thread Rafael J. Wysocki
On Sunday, 20 May 2007 15:01, Krzysztof Halasa wrote:
 Rafael J. Wysocki [EMAIL PROTECTED] writes:
 
  Running 'make -j' kernel compilation on my test box (Athlon64 X2, 2 SATA
  drives
  with 6 software RAID1 ext3 and reiserfs partitions, 2 GB of RAM) makes it
  completely unresponsive.  I can't even move the mouse pointer when it's
  running, I can't log to the box from the network etc.
 
 How many processes does it spawn? Try some sane limit.

Do you think it works as a fork bomb?  Well, it didn't work like that before,
AFAIR, but then 2.6.21 also does it with the same settings, so sorry for the
noise.

Greetings,
Rafael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.22-rc2

2007-05-20 Thread Mike Houston
On Fri, 18 May 2007 22:17:14 -0700 (PDT)
Linus Torvalds [EMAIL PROTECTED] wrote:


 Stephen Hemminger (7):
   [TCP] slow start: Make comments and code logic clearer.
   *** sky2: remove Gigabyte 88e8056 restriction ***
   sky2: PHY register settings
   sky2: keep track of receive alloc failures
   sky2: MIB counter overflow handling
   sky2: remove dual port workaround
   sky2: memory barriers change


I tested this and it's still horribly broken for me with Gigabyte
88E8056 onboard LAN. Same symptom as before, it works for several
seconds and then dies.

Relevant portion of logs:

May 20 15:57:48 cramit kernel: sky2 :04:00.0: v1.14 addr
0xf800 irq 16 Yukon-EC Ultra (0xb4) rev 2
May 20 15:57:48 cramit kernel: sky2 eth0: addr 00:16:e6:da:f3:b5

May 20 15:57:48 cramit kernel: sky2 eth0: enabling interface
May 20 15:57:48 cramit kernel: sky2 eth0: ram buffer 0K
May 20 15:57:48 cramit kernel: ACPI: PCI Interrupt :00:1b.0[A] -
GSI 22 (level, low) - IRQ 18
May 20 15:57:48 cramit kernel: PCI: Setting latency timer of device
:00:1b.0 to 64
May 20 15:57:50 cramit kernel: sky2 eth0: Link is up at 100 Mbps,
full duplex, flow control both

Attempt to ftp a file to another box on LAN and about 1.5
megabytes into the transfer:

May 20 16:01:43 cramit kernel: sky2 eth0: hw error interrupt status
0x8
May 20 16:01:43 cramit kernel: sky2 eth0: MAC parity error
May 20 16:01:43 cramit kernel: sky2 :04:00.0: error interrupt
status=0x8000
May 20 16:01:43 cramit kernel: sky2 eth0: hw error interrupt status
0x8
May 20 16:01:43 cramit kernel: sky2 eth0: MAC parity error

Transfer stalls and that's all she wrote.

If interested in seeing kernel config:
http://www.mikeserv.org/files/config-2.6.22-rc2

Oh well, back to trusty rtl8139 based PCI card for now.

Thanks for working on this stuff,

Mike Houston
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC] linux-2.6.22-rc2. SLUB report[kzalloc(0)]

2007-05-20 Thread Dan Kruchinin
Hi.

There is a BUG message from SLUB during boot process:
--
May 21 05:39:10 midgard kernel: [   31.177484] BUG: at
include/linux/slub_def.h:77 kmalloc_index()
May 21 05:39:10 midgard kernel: [   31.178355]  [c01062ca]
show_trace_log_lvl+0x1a/0x30
May 21 05:39:10 midgard kernel: [   31.179263]  [c0106e72] show_trace
+0x12/0x20
May 21 05:39:10 midgard kernel: [   31.180177]  [c0106ee6] dump_stack
+0x16/0x20
May 21 05:39:10 midgard kernel: [   31.181094]  [c018440d] get_slab
+0x1cd/0x260
May 21 05:39:10 midgard kernel: [   31.182024]  [c0185809]
__kmalloc_track_caller+0x19/0xa0
May 21 05:39:10 midgard kernel: [   31.183019]  [c0170059] __kzalloc
+0x19/0x50
May 21 05:39:10 midgard kernel: [   31.184012]  [df05586e]
usb_get_configuration+0x9de/0x11c0 [usbcore]
May 21 05:39:10 midgard kernel: [   31.185115]  [df04cd07]
usb_new_device+0x17/0x190 [usbcore]
May 21 05:39:10 midgard kernel: [   31.186181]  [df04e68a] hub_thread
+0x79a/0xfd0 [usbcore]
May 21 05:39:10 midgard kernel: [   31.187185]  [c013b8e2] kthread
+0x42/0x70
May 21 05:39:10 midgard kernel: [   31.188190]  [c0105ea7]
kernel_thread_helper+0x7/0x10
--

kzalloc(0, GFP_KERNEL) occurs in drivers/usb/core/config.c in 
usb_get_configuration function. I already wrote about this slub bug
report and offered a
patch(http://www.uwsg.iu.edu/hypermail/linux/kernel/0705.1/0154.html). I
don't know, may be it is not major thing, if it is, just ignore it. I
just think, that 
--
length = ncfg * sizeof(struct usb_host_config);
dev-config = kzalloc(length, GFP_KERNEL);
--
isn't clear, because ncfg - in my case - is 0 and I suppose, that size
of the leastest slab cache can become(in future) smaller than
sizeof(struct usb_host_config).

Thanks for attention.

-- 
Dan Kruchinin [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.22-rc2

2007-05-19 Thread Andrey Borzenkov
Linus Torvalds wrote:

> 
> It's out there, both patches/tarballs and git trees are updated (although
> mirroring might still be ongoing)
> 

trivia

make: Entering directory `/home/bor/src/linux-git'
  GEN /home/bor/build/linux-2.6.22/Makefile
scripts/kconfig/conf -s arch/i386/Kconfig
drivers/macintosh/Kconfig:116:warning: 'select' used by config
symbol 'PMAC_APM_EMU' refers to undefined
symbol 'SYS_SUPPORTS_APM_EMULATION'
drivers/net/Kconfig:2283:warning: 'select' used by config symbol 'UCC_GETH'
refers to undefined symbol 'UCC_FAST'
drivers/input/keyboard/Kconfig:170:warning: 'select' used by config
symbol 'KEYBOARD_ATARI' refers to undefined symbol 'ATARI_KBD_CORE'
drivers/input/mouse/Kconfig:182:warning: 'select' used by config
symbol 'MOUSE_ATARI' refers to undefined symbol 'ATARI_KBD_CORE'



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Linux 2.6.22-rc2

2007-05-19 Thread Andrey Borzenkov
Linus Torvalds wrote:

 
 It's out there, both patches/tarballs and git trees are updated (although
 mirroring might still be ongoing)
 

trivia

make: Entering directory `/home/bor/src/linux-git'
  GEN /home/bor/build/linux-2.6.22/Makefile
scripts/kconfig/conf -s arch/i386/Kconfig
drivers/macintosh/Kconfig:116:warning: 'select' used by config
symbol 'PMAC_APM_EMU' refers to undefined
symbol 'SYS_SUPPORTS_APM_EMULATION'
drivers/net/Kconfig:2283:warning: 'select' used by config symbol 'UCC_GETH'
refers to undefined symbol 'UCC_FAST'
drivers/input/keyboard/Kconfig:170:warning: 'select' used by config
symbol 'KEYBOARD_ATARI' refers to undefined symbol 'ATARI_KBD_CORE'
drivers/input/mouse/Kconfig:182:warning: 'select' used by config
symbol 'MOUSE_ATARI' refers to undefined symbol 'ATARI_KBD_CORE'



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Linux 2.6.22-rc2

2007-05-18 Thread Linus Torvalds

It's out there, both patches/tarballs and git trees are updated (although 
mirroring might still be ongoing)

Various random fixes all over - the shortlog (appended) is fairly 
readable. The most notable ones are probably more SLUB fixes, and the 
epoll optimizations and cleanups.

But there's stuff in architectures (ia64, SH, AVR32, POWER), libata, 
network drivers, sound.. Give it a try.

I've been telling some people off on merging stuff, and I'll get even more 
hard-nosed about it after -rc2, so please don't even try to send anything 
but real fixes.

I think the current situation looks reasonably good for 2.6.22, but I hope 
everybody will take a good look at the regression lists (whether they 
_think_ they are affected or not), and spend some time wondering "was that 
anything I did, or is it something I can look at". Ok?

Linus

---

Aaron Durbin (1):
  acpi: fix potential call to a freed memory section.

Al Viro (11):
  fix deadlock in loop.c
  missing mm.h in fw-ohci
  missing dependencies for USB drivers in input
  missing includes in mlx4
  em28xx and ivtv should depend on PCI
  rpadlpar breakage - fallout of struct subsystem removal
  m32r: __xchg() should be always_inline
  audit_match_signal() and friends are used only if CONFIG_AUDITSYSCALL is 
set
  fix uml-x86_64
  arm: walk_stacktrace() needs to be exported
  pata_scc had been missed by ata_std_prereset() switch

Alan Cox (1):
  sl82c105: Switch to ref counting API

Andrew Morton (2):
  parport_pc needs dma-mapping.h
  slub: fix handling of oversized slabs

Arthur Jones (1):
  IB/ipath: Shadow the gpio_mask register

Auke Kok (2):
  ixgb: don't print error if pci_enable_msi() fails, cleanup minor leak
  e1000: Fix msi enable leak on error, don't print error message, cleanup

Bartlomiej Zolnierkiewicz (13):
  pdc202xx_old: rewrite mode programming code (v2)
  serverworks: PIO mode setup fixes
  sis5513: PIO mode setup fixes
  alim15x3: use ide_tune_dma()
  pdc202xx_new: use ide_tune_dma()
  ide: always disable DMA before tuning it
  cs5530/sc1200: add ->udma_filter methods
  ide: use ide_tune_dma() part #2
  cs5530/sc1200: DMA support cleanup
  cs5530/sc1200: add ->speedproc support
  ide: remove ide_dma_enable()
  ide: add missing validity checks for identify words 62 and 63
  ide: remove ide_use_dma()

Becky Bruce (1):
  [POWERPC] Change include protections to ASM_POWERPC

Benjamin Herrenschmidt (4):
  [POWERPC] Add spinlock to request_phb_iospace()
  [POWERPC] Fix IO space on PCI buses created from of_platform
  [POWERPC] Make sure device node type/name is not NULL on hot-added nodes
  Make __vunmap static

Bernhard Walle (1):
  i386/x86-64: fix section mismatch

Christian Krafft (1):
  [POWERPC] cell_defconfig: Disable cpufreq and pmi

Christoph Hellwig (7):
  [AVR32] optimize pagefault path
  SUNRPC: remove dead variable 'rpciod_running'
  [IA64] optimize pagefaults a little
  [POWERPC] viopath: Use completion
  [POWERPC] viopath: Use a completion in some more places
  small netdevices.txt fix
  spidernet: node-aware skbuff allocation

Christoph Lameter (15):
  SLUB: CONFIG_LARGE_ALLOCS must consider MAX_ORDER limit
  SLUB: It is legit to allocate a slab of the maximum permitted size
  Fix: find_or_create_page skips cpuset memory spreading.
  Slab allocators: Drop support for destructors
  SLUB: Remove depends on EXPERIMENTAL and !ARCH_USES_SLAB_PAGE_STRUCT
  SLAB: Move two remaining SLAB specific definitions to slab_def.h
  SLUB: Define functions for cpu slab handling instead of using PageActive
  slab: warn on zero-length allocations
  SLUB: slabinfo fixes
  SLUB: Do our own flags based on PG_active and PG_error
  Remove SLAB_CTOR_CONSTRUCTOR
  SLUB: Simplify debug code
  Slab allocators: define common size limitations
  Fix page allocation flags in grow_dev_page()
  slub: another slabinfo fix

Corey Mutter (1):
  [IPV6]: Reverse sense of promisc tests in ip6_mc_input

Dan Aloni (1):
  make sysctl/kernel/core_pattern and fs/exec.c agree on maximum core 
filename size

Daniel Drake (2):
  [CPUFREQ] powernow-k7: fix MHz rounding issue with perflib
  [ALSA] usb-audio: another Logitech QuickCam ID

Daniel T Chen (1):
  [ALSA] Include quirks from Ubuntu Dapper/Edgy/Feisty

Dave Jiang (2):
  [POWERPC] Fix comment in booke_wdt
  [POWERPC] 85xx: Add device nodes for error reporting devices used by EDAC

Dave Jones (4):
  [CPUFREQ] Support rev H AMD64s in powernow-k8
  MAINTAINERS update.
  [CPUFREQ] Correct revision mask for powernow-k8
  [IPV4]: Correct rp_filter help text.

David Brownell (3):
  gpio interface loosens call restrictions
  rtc-omap build fix
  rtc kconfig clarification

David Gibson (4):
  [POWERPC] Remove 

Linux 2.6.22-rc2

2007-05-18 Thread Linus Torvalds

It's out there, both patches/tarballs and git trees are updated (although 
mirroring might still be ongoing)

Various random fixes all over - the shortlog (appended) is fairly 
readable. The most notable ones are probably more SLUB fixes, and the 
epoll optimizations and cleanups.

But there's stuff in architectures (ia64, SH, AVR32, POWER), libata, 
network drivers, sound.. Give it a try.

I've been telling some people off on merging stuff, and I'll get even more 
hard-nosed about it after -rc2, so please don't even try to send anything 
but real fixes.

I think the current situation looks reasonably good for 2.6.22, but I hope 
everybody will take a good look at the regression lists (whether they 
_think_ they are affected or not), and spend some time wondering was that 
anything I did, or is it something I can look at. Ok?

Linus

---

Aaron Durbin (1):
  acpi: fix potential call to a freed memory section.

Al Viro (11):
  fix deadlock in loop.c
  missing mm.h in fw-ohci
  missing dependencies for USB drivers in input
  missing includes in mlx4
  em28xx and ivtv should depend on PCI
  rpadlpar breakage - fallout of struct subsystem removal
  m32r: __xchg() should be always_inline
  audit_match_signal() and friends are used only if CONFIG_AUDITSYSCALL is 
set
  fix uml-x86_64
  arm: walk_stacktrace() needs to be exported
  pata_scc had been missed by ata_std_prereset() switch

Alan Cox (1):
  sl82c105: Switch to ref counting API

Andrew Morton (2):
  parport_pc needs dma-mapping.h
  slub: fix handling of oversized slabs

Arthur Jones (1):
  IB/ipath: Shadow the gpio_mask register

Auke Kok (2):
  ixgb: don't print error if pci_enable_msi() fails, cleanup minor leak
  e1000: Fix msi enable leak on error, don't print error message, cleanup

Bartlomiej Zolnierkiewicz (13):
  pdc202xx_old: rewrite mode programming code (v2)
  serverworks: PIO mode setup fixes
  sis5513: PIO mode setup fixes
  alim15x3: use ide_tune_dma()
  pdc202xx_new: use ide_tune_dma()
  ide: always disable DMA before tuning it
  cs5530/sc1200: add -udma_filter methods
  ide: use ide_tune_dma() part #2
  cs5530/sc1200: DMA support cleanup
  cs5530/sc1200: add -speedproc support
  ide: remove ide_dma_enable()
  ide: add missing validity checks for identify words 62 and 63
  ide: remove ide_use_dma()

Becky Bruce (1):
  [POWERPC] Change include protections to ASM_POWERPC

Benjamin Herrenschmidt (4):
  [POWERPC] Add spinlock to request_phb_iospace()
  [POWERPC] Fix IO space on PCI buses created from of_platform
  [POWERPC] Make sure device node type/name is not NULL on hot-added nodes
  Make __vunmap static

Bernhard Walle (1):
  i386/x86-64: fix section mismatch

Christian Krafft (1):
  [POWERPC] cell_defconfig: Disable cpufreq and pmi

Christoph Hellwig (7):
  [AVR32] optimize pagefault path
  SUNRPC: remove dead variable 'rpciod_running'
  [IA64] optimize pagefaults a little
  [POWERPC] viopath: Use completion
  [POWERPC] viopath: Use a completion in some more places
  small netdevices.txt fix
  spidernet: node-aware skbuff allocation

Christoph Lameter (15):
  SLUB: CONFIG_LARGE_ALLOCS must consider MAX_ORDER limit
  SLUB: It is legit to allocate a slab of the maximum permitted size
  Fix: find_or_create_page skips cpuset memory spreading.
  Slab allocators: Drop support for destructors
  SLUB: Remove depends on EXPERIMENTAL and !ARCH_USES_SLAB_PAGE_STRUCT
  SLAB: Move two remaining SLAB specific definitions to slab_def.h
  SLUB: Define functions for cpu slab handling instead of using PageActive
  slab: warn on zero-length allocations
  SLUB: slabinfo fixes
  SLUB: Do our own flags based on PG_active and PG_error
  Remove SLAB_CTOR_CONSTRUCTOR
  SLUB: Simplify debug code
  Slab allocators: define common size limitations
  Fix page allocation flags in grow_dev_page()
  slub: another slabinfo fix

Corey Mutter (1):
  [IPV6]: Reverse sense of promisc tests in ip6_mc_input

Dan Aloni (1):
  make sysctl/kernel/core_pattern and fs/exec.c agree on maximum core 
filename size

Daniel Drake (2):
  [CPUFREQ] powernow-k7: fix MHz rounding issue with perflib
  [ALSA] usb-audio: another Logitech QuickCam ID

Daniel T Chen (1):
  [ALSA] Include quirks from Ubuntu Dapper/Edgy/Feisty

Dave Jiang (2):
  [POWERPC] Fix comment in booke_wdt
  [POWERPC] 85xx: Add device nodes for error reporting devices used by EDAC

Dave Jones (4):
  [CPUFREQ] Support rev H AMD64s in powernow-k8
  MAINTAINERS update.
  [CPUFREQ] Correct revision mask for powernow-k8
  [IPV4]: Correct rp_filter help text.

David Brownell (3):
  gpio interface loosens call restrictions
  rtc-omap build fix
  rtc kconfig clarification

David Gibson (4):
  [POWERPC] Remove