Re: test10-pre1 problems on 4-way SuperServer8050
On Thu, 12 Oct 2000, Keith Owens wrote: > On Thu, 12 Oct 2000 12:56:09 +0100 (BST), > Tigran Aivazian <[EMAIL PROTECTED]> wrote: > >one correction -- it was "down and up the interface" that did the trick > >and not deleting the 64M mtrr entry. I.e. the eepro100 problem is better > >formulated as "when highmem is enabled one or both eepro100 interfaces > >sometimes do not work from boot but downing/upping the interface usually > >helps". When highmem is disabled, so far, _both_ eepro100 interfaces > >_always_ work on boot. > > That may only be coincidence. We have intermittent problems with > eepro100 under 2.4.0-testx, both ix86 and ia64. The symptoms are "card > reports no resources" messages; down and up the interface and it > usually works. Might be related or not, but I've had nothing but problems with eepro100 until it's forced to use 100/FD. Symptoms include: either no network at all (driver complaining "card reports no resources") or impossibly slow and erratic network connections (like "ypcat foo" hanging for a second a few times in between) - Panu - > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > Please read the FAQ at http://www.tux.org/lkml/ > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test10-pre1 problems on 4-way SuperServer8050
On Fri, 13 Oct 2000, Richard Guenther wrote: > On Fri, 13 Oct 2000, Rik van Riel wrote: > > On Thu, 12 Oct 2000, Richard Guenther wrote: > > > > > I reported this BUG on a few days ago but got no response - happens > > > on UP with only 32M ram, too. (see below). Also note the second > > > BUG at vmscan.c:538 which I believe never saw reported again. > > > > > > Oct 11 16:05:26 hilbert36 kernel: kernel BUG at page_alloc.c:221! > > > [snipped] > > > > Did you get the bug with or without VMware ? > > [it seems vmware is doing something strange ;)] > > I dont have VMware - at least it would be no fun on 32M and an > old P100 I suspect... :) OK, I'll look into this a bit more... [but I'm leaving for a conference in Miami in 3 hours, so there's little chance of hearing anything back from me in the next few days.] regards, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test10-pre1 problems on 4-way SuperServer8050
On Fri, 13 Oct 2000, Rik van Riel wrote: > On Thu, 12 Oct 2000, Richard Guenther wrote: > > > I reported this BUG on a few days ago but got no response - happens > > on UP with only 32M ram, too. (see below). Also note the second > > BUG at vmscan.c:538 which I believe never saw reported again. > > > > Oct 11 16:05:26 hilbert36 kernel: kernel BUG at page_alloc.c:221! > > [snipped] > > Did you get the bug with or without VMware ? > [it seems vmware is doing something strange ;)] I dont have VMware - at least it would be no fun on 32M and an old P100 I suspect... :) > The second bug is almost certainly a direct > consequence of the kernel continuing after > the first one happened... Yeah, that was my thought, too - but who knows... Richard. -- Richard Guenther <[EMAIL PROTECTED]> WWW: http://www.anatom.uni-tuebingen.de/~richi/ The GLAME Project: http://www.glame.de/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test10-pre1 problems on 4-way SuperServer8050
On Thu, 12 Oct 2000, Richard Guenther wrote: > I reported this BUG on a few days ago but got no response - happens > on UP with only 32M ram, too. (see below). Also note the second > BUG at vmscan.c:538 which I believe never saw reported again. > > Oct 11 16:05:26 hilbert36 kernel: kernel BUG at page_alloc.c:221! > [snipped] Did you get the bug with or without VMware ? [it seems vmware is doing something strange ;)] The second bug is almost certainly a direct consequence of the kernel continuing after the first one happened... regards, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
eepro100 problem [was: Re: test10-pre1 problems on 4-way SuperServer8050]
Hi, On Thu, Oct 12, 2000 at 02:19:27PM +0100, Tigran Aivazian wrote: > Having done a few more reboots I got more info -- one of the eepro100 > interfaces is dead only in 4 out 5 cases. So, sometimes, doing ifdown eth0 > ; ifup eth0 does help. Tigran, please check if you have any driver's messages, in particular, "card reports no resources". There is a known problem which fits the sympomes described by you. Dragan Stancevic <[EMAIL PROTECTED]> was going to look at Intel's errata about this matter. > > So, the latest status: all 6G of RAM work fast but the onboard eepro100 > interface, often, doesn't work. This starts to look like eepro100-driver > related so I copied Andrey Savochkin. Btw, one of my colleagues also > reported a similar situation on his quad Xeon with 6G RAM whereby one of > the eepro100 interfaces was dead until one restarts it. > > Starting to fiddle with eepro100.c now... Best regards Andrey - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test10-pre1 problems on 4-way SuperServer8050
} Hi, } } > How? If you compile with egcs-2.91.66 without frame pointers on ix86 then } > __builtin_return_address() yields garbage. Does anybody have a generic } > solution to this problem, other than "compile with frame pointers"? Or is } > it fixed in newer versions of gcc? } } Are you sure? I just I tried it 2.91.66 and it works. With } -fomit-frame-pointer only __builtin_return_address(0) works, but that is } true for any version. I've found, with several versions of gcc, that leaf functions will give bad results (sometimes resulting in a bad access fault) with calls to __builtin_return_address(0). The workaround in the kernel and RTLinux is making a call do your function isn't a leaf. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test10-pre1 problems on 4-way SuperServer8050
Hi, > How? If you compile with egcs-2.91.66 without frame pointers on ix86 then > __builtin_return_address() yields garbage. Does anybody have a generic > solution to this problem, other than "compile with frame pointers"? Or is > it fixed in newer versions of gcc? Are you sure? I just I tried it 2.91.66 and it works. With -fomit-frame-pointer only __builtin_return_address(0) works, but that is true for any version. bye, Roman - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: IRQ affinity vs. MTRRs, was Re: 36 bit MTRRs, Re: test10-pre1 problems on 4-way SuperServer8050
Boszormenyi Zoltan <[EMAIL PROTECTED]> writes: > The idea is that when it is sure that _only one_ (or some) CPU will access > a PCI card's mmio area then only that CPU's (those CPUs') MTRRs needs to > contain an entry for that area. > > Although there are (must be) common MTRR entries for the main memory > and the commonly accessed mmio register areas. > > The idea came because fiddling with MTRRs quickly revaled that > only 8 variable ones exist. I see. I think there is a more straightforward solution: PAT does the same thing as MTRRs, but has no such "number of ranges" limitation --- it lets you set the memory type on a page-by-page basis. If the number of MTRRs becomes a problem (anyone know how many the P4 has?), then the real solution is to implement PAT support. IIRC, only the PPro, the first PII model (Klamath?), and the first Celeron model have MTRR but not PAT (Athlon has PAT, but /proc/cpuinfo misreports it as "fcmov", at least in 2.2.14; Xeons always had PAT). David - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
[success!] Re: test10-pre1 problems on 4-way SuperServer8050
On Thu, 12 Oct 2000, Tigran Aivazian wrote: > On 12 Oct 2000, David Wragg wrote: > > Ok. I'll wait for feedback from Tigran, and if I don't get anything > > negative I'll submit to Linus. The 2.2 version of my patch fixes > > problems for other people, VA Linux have included it in their kernel > > for a while with no problems that have been reported back to me), and > > it's silly that it isn't in 2.4testX. I should have addressed this a > > while ago, but I have my own distractions from kernel hacking. > > > > Later on, you can send a mtrr.c maintenance patch, if you like. > > > > I've just caught up on this whole thread, and I don't have any > > objections in principle to Zoltan's patch being used instead of mine, > > though I'd like to take a look at it first. > > David, sorry I didn't know that your patch is fundamentally different from > Zoltan's. I will now re-test with your patch and see if it makes my > eepro100 "instabilities" go away. > > The performance problems went away as I said earlier, by fiddling with > cache settings in the BIOS. (with and without Zoltan's patch my machine is > now as fast as it can be) > hmmm, very interesting... It looks like your patch fixed all the remaining problems. I.e. not only my 6G is now fast (it was without your patch) but all the eepro100 interfaces now _always_ (tried 4 reboots) come up functioning. Your patch is now a permanent part of my tree, thank you! :) Thanks, Tigran - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: IRQ affinity vs. MTRRs, was Re: 36 bit MTRRs, Re: test10-pre1 problems on 4-way SuperServer8050
Boszormenyi Zoltan <[EMAIL PROTECTED]> writes: > I came up with an idea. The MTRRs are per-cpu things. > Ingo Molnar's IRQ affinity code helps binding certain > IRQ sources to certain CPUs. They are implemented as per-cpu things but the Intel manuals say that all cpus should have the same MTRR settings. They also give pseudo-code for how to update them on an SMP system, which mtrr.c follows. If the BIOS has set them up differently at boot time, mtrr.c will complain and copy the MTRR settings of CPU0 to the others. Regards, David Wragg - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test10-pre1 problems on 4-way SuperServer8050
On 12 Oct 2000, David Wragg wrote: > Ok. I'll wait for feedback from Tigran, and if I don't get anything > negative I'll submit to Linus. The 2.2 version of my patch fixes > problems for other people, VA Linux have included it in their kernel > for a while with no problems that have been reported back to me), and > it's silly that it isn't in 2.4testX. I should have addressed this a > while ago, but I have my own distractions from kernel hacking. > > Later on, you can send a mtrr.c maintenance patch, if you like. > > I've just caught up on this whole thread, and I don't have any > objections in principle to Zoltan's patch being used instead of mine, > though I'd like to take a look at it first. David, sorry I didn't know that your patch is fundamentally different from Zoltan's. I will now re-test with your patch and see if it makes my eepro100 "instabilities" go away. The performance problems went away as I said earlier, by fiddling with cache settings in the BIOS. (with and without Zoltan's patch my machine is now as fast as it can be) Regards, Tigran - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test10-pre1 problems on 4-way SuperServer8050
Richard Gooch <[EMAIL PROTECTED]> writes: > David Wragg writes: > > mtrr.c is broken for machines with >=4GB of memory (or less than 4GB, > > if the chipset reserves an addresses range below 4GB for PCI). > > > > The patch against 2.4.0-test9 to fix this is below. > > > > Richard: Is there a reason you haven't passed this on to Linus, or do > > you want me to do it? > > Partly because I haven't had time to look at it, partly because I'm > not sure if it's needed (why, exactly?) Because mtrr.c throws away the top 4 bits of 36-bit physical addresses, it gives misleading /proc/mtrr output on machines with >=4GB of memory, which I think requires a fix on its own. But worse, if it tries to make MTRR changes on such a machine, you can get bogus MTRR settings. This can ruin a machine's performance (if real memory ends up write combined or uncached) or give hardware instabilities (if a device's MMIO area gets the wrong memory type). So far, this probably hasn't bitten too many people, since relatively few Linux x86 users have >=4GB memory, and /proc/mtrr hasn't usually been altered without explicit intervention. But with XFree86-4 finally "out there" and more kernel drivers using MTRRs, this can only get worse. (Whether Tigran's performance problems are actually down to the mtrr.c issue, I don't know. It's not worth hypothesizing until we have accurate /proc/mtrr output.) When I checked the 2.2 version of my patch, it didn't involve a significant increase in code size. > and partly because I've > recently moved house and (STILL!) don't have IP access at home (not > even dialup) so I can't really look at stuff yet Ok. I'll wait for feedback from Tigran, and if I don't get anything negative I'll submit to Linus. The 2.2 version of my patch fixes problems for other people, VA Linux have included it in their kernel for a while with no problems that have been reported back to me), and it's silly that it isn't in 2.4testX. I should have addressed this a while ago, but I have my own distractions from kernel hacking. Later on, you can send a mtrr.c maintenance patch, if you like. I've just caught up on this whole thread, and I don't have any objections in principle to Zoltan's patch being used instead of mine, though I'd like to take a look at it first. Regards, David - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
[fixed (well, it works)]Re: test10-pre1 problems on 4-way SuperServer8050
Hello, Ok, I despaired a bit about mtrrs on the Linux side and went into BIOS and started playing with the cache settings there. The change that fixed the problem was to disable all "area CXXX-> : cached". Now, I have a really fast quad Xeon 6G RAM with consistently failing eepro100 interface. Downing/upping the interface does not help. I suppose in this state it is easier to debug because everything else is fully functional -- let's just find out why this particular eepro100 doesn't work. Kernel compiles in 54-60 seconds -- very impressive (I am talking about full make -j4 bzImage after make clean) Now, this is with and without Zoltan's big-mtrr patch, just verified a minute ago. Regards, Tigran - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test10-pre1 problems on 4-way SuperServer8050
On Thu, 12 Oct 2000 12:56:09 +0100 (BST), Tigran Aivazian <[EMAIL PROTECTED]> wrote: >one correction -- it was "down and up the interface" that did the trick >and not deleting the 64M mtrr entry. I.e. the eepro100 problem is better >formulated as "when highmem is enabled one or both eepro100 interfaces >sometimes do not work from boot but downing/upping the interface usually >helps". When highmem is disabled, so far, _both_ eepro100 interfaces >_always_ work on boot. That may only be coincidence. We have intermittent problems with eepro100 under 2.4.0-testx, both ix86 and ia64. The symptoms are "card reports no resources" messages; down and up the interface and it usually works. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test10-pre1 problems on 4-way SuperServer8050
On Thu, 12 Oct 2000, Tigran Aivazian wrote: > On Thu, 12 Oct 2000, Tigran Aivazian wrote: > > > On Wed, 11 Oct 2000, Linus Torvalds wrote: > > > What happens if MTRR support is entirely disabled? > > > > If MTRR support is disabled then both eepro100 interfaces work fine but > > the system is still 40x slower. This is the entire bootlog of > > 2.4.0-test10-pre1 + lspci-vvx + /proc/interrupts + /proc/iomem + ifconfig > > output > > one more finding -- deleting the strange 64M mtrr entry enabled the second > eepro100 interface! > > # cat /proc/mtrr > reg00: base=0x001 (4096MB), size=2048MB: write-combining, > count=1 > reg02: base=0xfc00 (4032MB), size= 64MB: uncachable, count=1 > # > # echo "disable=2" > /proc/mtrr > # cat /proc/mtrr > reg00: base=0x001 (4096MB), size=2048MB: write-combining, > count=1 > > (now down and up the interface and it works. Both eepro100 work) one correction -- it was "down and up the interface" that did the trick and not deleting the 64M mtrr entry. I.e. the eepro100 problem is better formulated as "when highmem is enabled one or both eepro100 interfaces sometimes do not work from boot but downing/upping the interface usually helps". When highmem is disabled, so far, _both_ eepro100 interfaces _always_ work on boot. Regards, Tigran - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 36 bit MTRRs, Re: test10-pre1 problems on 4-way SuperServer8050
On Thu, 12 Oct 2000, Boszormenyi Zoltan wrote: > On Thu, 12 Oct 2000, Boszormenyi Zoltan wrote: > > > echo "base=0 size=0x1 type=write-back" >/proc/mtrr > > echo "base=0x1 size=0x8000 type=write-back" >/proc/mtrr > > echo "base=0xfe00 size=0x80 type=write-combining" >/proc/mtrr > > echo "base=0xfde0 size=0x10 type=uncached" >/proc/mtrr > > echo "base=0xfe80 size=0x10 type=uncached" >/proc/mtrr > > echo "base=0xfe9ed000 size=0x1000 type=uncached" >/proc/mtrr > > echo "base=0xfe9ee000 size=0x2000 type=uncached" >/proc/mtrr > > echo "base=0xfeafe000 size=0x2000 type=uncached" >/proc/mtrr > > Sorry, use 'uncachable' instead of 'uncached'. :-( ok, doing it from the bottom up was fine (didn't lockup) but reaching the last (first in your list) entry was refused by mtrr: mtrr: 0x0,0x1 overlaps existing 0xfeafe000,0x2000 # cat /proc/mtrr reg00: base=0xfeafe000 (4074MB), size= 0kB: uncachable, count=1 reg01: base=0xfe9ee000 (4073MB), size= 0kB: uncachable, count=1 reg02: base=0xfe9ed000 (4073MB), size= 0kB: uncachable, count=1 reg03: base=0xfe80 (4072MB), size= 1MB: uncachable, count=1 reg04: base=0xfde0 (4062MB), size= 1MB: uncachable, count=1 reg05: base=0xfe00 (4064MB), size= 8MB: write-combining, count=1 reg06: base=0x001 (4096MB), size=2048MB: write-back, count=1 and machine is still slow. So, what is the correct way to cover the 6G by some mtrrs? I will now try to disable or change strategy of L2 caching in BIOS and see if it makes things worse. Regards, Tigran - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: IRQ affinity vs. MTRRs, was Re: 36 bit MTRRs, Re: test10-pre1 problems on 4-way SuperServer8050
On Thu, Oct 12, 2000 at 12:12:19PM +0200, Boszormenyi Zoltan wrote: > I came up with an idea. The MTRRs are per-cpu things. > Ingo Molnar's IRQ affinity code helps binding certain > IRQ sources to certain CPUs. > > What if the MTRR driver allows per-CPU settings, maybe only on > uncached areas? Of course the real memory should be cached in > every CPU to avoid slowdowns. So that if you set that eth0's > IRQ will be handled by CPU1, the MTRRs of CPU1 will be set > accordingly, and the other CPUs will not care about eth0, > so they do not need eth0's MTRR settings. A little question. Why do we want to bind irq of eth0 to a single CPU ? imho it will casue slowdown of some situation. Why don't we leave scheduler to select CPU for processing IRQ ? - Gabor - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 36 bit MTRRs, Re: test10-pre1 problems on 4-way SuperServer8050
> On Thu, 12 Oct 2000, Boszormenyi Zoltan wrote: > > > echo "base=0 size=0x1 type=write-back" >/proc/mtrr this line immediately locks up the machine. But I want to understand where did you get base=0 and size=0x1 from? Shouldn't it be base=0x10 and size=0xfccf according to this entry from e820: BIOS-e820: fccf @ 0010 (usable) Regards, Tigran - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test10-pre1 problems on 4-way SuperServer8050
On Thu, 12 Oct 2000 10:45:11 +0100 (BST), Tigran Aivazian <[EMAIL PROTECTED]> wrote: >It would be nice if /proc/mtrr showed eip of >the caller who set up the entry :) How? If you compile with egcs-2.91.66 without frame pointers on ix86 then __builtin_return_address() yields garbage. Does anybody have a generic solution to this problem, other than "compile with frame pointers"? Or is it fixed in newer versions of gcc? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test10-pre1 problems on 4-way SuperServer8050
On Thu, 12 Oct 2000, Tigran Aivazian wrote: > On Wed, 11 Oct 2000, Linus Torvalds wrote: > > What happens if MTRR support is entirely disabled? > > If MTRR support is disabled then both eepro100 interfaces work fine but > the system is still 40x slower. This is the entire bootlog of > 2.4.0-test10-pre1 + lspci-vvx + /proc/interrupts + /proc/iomem + ifconfig > output one more finding -- deleting the strange 64M mtrr entry enabled the second eepro100 interface! # cat /proc/mtrr reg00: base=0x001 (4096MB), size=2048MB: write-combining, count=1 reg02: base=0xfc00 (4032MB), size= 64MB: uncachable, count=1 # # echo "disable=2" > /proc/mtrr # cat /proc/mtrr reg00: base=0x001 (4096MB), size=2048MB: write-combining, count=1 (now down and up the interface and it works. Both eepro100 work) but the machine is still intolerably slow. Where did this 64M entry come from? (I don't have agp or drm support enabled or anything like that, I don't even have an agp bus!) It would be nice if /proc/mtrr showed eip of the caller who set up the entry :) Regards, Tigran - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 36 bit MTRRs, Re: test10-pre1 problems on 4-way SuperServer8050
On Thu, 12 Oct 2000, Boszormenyi Zoltan wrote: > Look at the e820 map in the boot log, mark those areas > as write-back and tell me what happens. Here is e820 map: BIOS-e820: 0009fc00 @ (usable) BIOS-e820: 0400 @ 0009fc00 (reserved) BIOS-e820: 0002 @ 000e (reserved) BIOS-e820: fccf @ 0010 (usable) BIOS-e820: f000 @ fcdf (ACPI data) BIOS-e820: 1000 @ fcdff000 (ACPI NVS) BIOS-e820: 1000 @ fec0 (reserved) BIOS-e820: 1000 @ fee0 (reserved) BIOS-e820: 0008 @ fff8 (reserved) BIOS-e820: 8000 @ 0001 (usable) I can easily setup the mtrr entry for the top 2G: BIOS-e820: 8000 @ 0001 (usable) # cat /proc/mtrr reg00: base=0x001 (4096MB), size=2048MB: write-combining, count=1 reg02: base=0xfc00 (4032MB), size= 64MB: uncachable, count=1 but trying to do the same for the low 4G: BIOS-e820: fccf @ 0010 (usable) mtrr complains: # echo "base=0x10 size=0xfccf type=write-combining" > /proc/mtrr mtrr: base(0x10) is not aligned on a size(0xfccf) boundary suggestions? Regards, Tigran - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test10-pre1 problems on 4-way SuperServer8050
Hi, someone looked at the XEON errata already, perhaps one can find the problem there? Just in case. G16 seems to have something to do with it ... But there are others also. I´ll boot linux and look into the sources ... Cheers Markus Tigran Aivazian wrote: > On Wed, 11 Oct 2000, Linus Torvalds wrote: > > What happens if MTRR support is entirely disabled? > > If MTRR support is disabled then both eepro100 interfaces work fine but > the system is still 40x slower. This is the entire bootlog of > 2.4.0-test10-pre1 + lspci-vvx + /proc/interrupts + /proc/iomem + ifconfig > output > > Two currently active ideas (from Mark, Linus and Zoltan): > > a) one needs to use big-mtrr patch from Zoltan, look at e820 map and > manually set up mtrrs to cover all 6G. > > b) this is an L2 cache-tag issue and there is just not enough bits in the > tag to cover such high addresses so nothing will help, save removing the > extra 2G or so out of the machine (or using them as MTD devices : I > hope this is _not_ the case... > > another idea (in parallel) is that eepro100 stops working because its PCI > memory space is marked as cacheable. > > All should become clear soon -- I will spend the whole day on this, slowly > trying to understand what's going on. > > Regards, > Tigran > > Linux version 2.4.0-test9 (root@hilbert) (gcc version egcs-2.91.66 19990314/Linux >(egcs-1.1.2 release)) #15 SMP Wed Oct 11 19:23:15 BST 2000 > BIOS-provided physical RAM map: > BIOS-e820: 0009fc00 @ (usable) > BIOS-e820: 0400 @ 0009fc00 (reserved) > BIOS-e820: 0002 @ 000e (reserved) > BIOS-e820: fccf @ 0010 (usable) > BIOS-e820: f000 @ fcdf (ACPI data) > BIOS-e820: 1000 @ fcdff000 (ACPI NVS) > BIOS-e820: 1000 @ fec0 (reserved) > BIOS-e820: 1000 @ fee0 (reserved) > BIOS-e820: 0008 @ fff8 (reserved) > BIOS-e820: 8000 @ 0001 (usable) > 5248MB HIGHMEM available. > Scan SMP from c000 for 1024 bytes. > Scan SMP from c009fc00 for 1024 bytes. > Scan SMP from c00f for 65536 bytes. > found SMP MP-table at 000fb4d0 > hm, page 000fb000 reserved twice. > hm, page 000fc000 reserved twice. > hm, page 000f5000 reserved twice. > hm, page 000f6000 reserved twice. > On node 0 totalpages: 1572864 > zone(0): 4096 pages. > zone(1): 225280 pages. > zone(2): 1343488 pages. > Intel MultiProcessor Specification v1.1 > Virtual Wire compatibility mode. > OEM ID: AMI Product ID: CNB20HE APIC at: 0xFEE0 > Processor #0 Pentium(tm) Pro APIC version 17 > Floating point unit present. > Machine Exception supported. > 64 bit compare & exchange supported. > Internal APIC present. > Bootup CPU > Processor #1 Pentium(tm) Pro APIC version 17 > Floating point unit present. > Machine Exception supported. > 64 bit compare & exchange supported. > Internal APIC present. > Processor #2 Pentium(tm) Pro APIC version 17 > Floating point unit present. > Machine Exception supported. > 64 bit compare & exchange supported. > Internal APIC present. > Processor #3 Pentium(tm) Pro APIC version 17 > Floating point unit present. > Machine Exception supported. > 64 bit compare & exchange supported. > Internal APIC present. > Bus #0 is PCI > Bus #1 is PCI > Bus #2 is PCI > Bus #3 is ISA > I/O APIC #4 Version 17 at 0xFEC0. > I/O APIC #5 Version 17 at 0xFEC01000. > Int: type 0, pol 3, trig 3, bus 0, IRQ 04, APIC ID 5, APIC INT 0a > Int: type 0, pol 3, trig 3, bus 0, IRQ 08, APIC ID 5, APIC INT 0b > Int: type 0, pol 3, trig 3, bus 0, IRQ 0c, APIC ID 5, APIC INT 0f > Int: type 0, pol 3, trig 3, bus 0, IRQ 3c, APIC ID 4, APIC INT 0a > Int: type 0, pol 3, trig 3, bus 1, IRQ 15, APIC ID 5, APIC INT 01 > Int: type 0, pol 3, trig 3, bus 1, IRQ 14, APIC ID 5, APIC INT 00 > Int: type 3, pol 1, trig 1, bus 3, IRQ 00, APIC ID 4, APIC INT 00 > Int: type 0, pol 1, trig 1, bus 3, IRQ 01, APIC ID 4, APIC INT 01 > Int: type 0, pol 1, trig 1, bus 3, IRQ 00, APIC ID 4, APIC INT 02 > Int: type 0, pol 1, trig 1, bus 3, IRQ 03, APIC ID 4, APIC INT 03 > Int: type 0, pol 1, trig 1, bus 3, IRQ 04, APIC ID 4, APIC INT 04 > Int: type 0, pol 1, trig 1, bus 3, IRQ 06, APIC ID 4, APIC INT 06 > Int: type 0, pol 1, trig 1, bus 3, IRQ 07, APIC ID 4, APIC INT 07 > Int: type 0, pol 1, trig 1, bus 3, IRQ 08, APIC ID 4, APIC INT 08 > Int: type 0, pol 1, trig 1, bus 3, IRQ 0c, APIC ID 4, APIC INT 0c > Int: type 0, pol 1, trig 1, bus 3, IRQ 0d, APIC ID 4, APIC INT 0d > Int: type 0, pol 1, trig 1, bus 3, IRQ 0e, APIC ID 4, APIC INT 0e > Int: type 0, pol 1, trig 1, bus 3, IRQ 0f, APIC ID 4, APIC INT 0f > Lint: type 3, pol 1, trig 1, bus 3, IRQ 00, APIC ID ff, APIC LINT 00 > Lint: type 1, pol 1, trig 1, bus 0, IRQ 00, APIC ID ff, APIC LINT 01 > Processors: 4 > mapped APIC to e000 (fee0) > mapped IOAPI
Re: test10-pre1 problems on 4-way SuperServer8050
Hi! I reported this BUG on a few days ago but got no response - happens on UP with only 32M ram, too. (see below). Also note the second BUG at vmscan.c:538 which I believe never saw reported again. Richard. On Wed, 11 Oct 2000, Tigran Aivazian wrote: > On Wed, 11 Oct 2000, Rik van Riel wrote: > > Could you send me the backtrace of one of the cases where > > you hit the bug ? > > here you are: > > Oct 11 16:05:26 hilbert36 kernel: kernel BUG at page_alloc.c:221! [snipped] -- Richard Guenther <[EMAIL PROTECTED]> WWW: http://www.anatom.uni-tuebingen.de/~richi/ The GLAME Project: http://www.glame.de/ Oct 7 11:50:47 localhost kernel: kernel BUG at page_alloc.c:91! Oct 7 11:50:47 localhost kernel: invalid operand: Oct 7 11:50:47 localhost kernel: CPU:0 Oct 7 11:50:47 localhost kernel: EIP:0010:[__free_pages_ok+73/892] Oct 7 11:50:47 localhost kernel: EFLAGS: 00010286 Oct 7 11:50:47 localhost kernel: eax: 001f ebx: c1002a90 ecx: c10a4000 edx: Oct 7 11:50:47 localhost kernel: esi: c1002aac edi: ebp: 002c esp: c10a5f64 Oct 7 11:50:47 localhost kernel: ds: 0018 es: 0018 ss: 0018 Oct 7 11:50:47 localhost kernel: Process kswapd (pid: 2, stackpage=c10a5000) Oct 7 11:50:47 localhost kernel: Stack: c01d4877 c01d4a65 005b c1002a90 c1002aac 00ce 002c 00ce Oct 7 11:50:47 localhost kernel:002b 0003 c0126042 c01278cb c0126229 0004 Oct 7 11:50:47 localhost kernel: 0004 c0126870 0004 Oct 7 11:50:47 localhost kernel: Call Trace: [tvecs+8671/55752] [tvecs+9165/55752] [page_launder+674/1888] [__free_pages+19/20] [page_launder+1161/1888] [do_try_to_free_pages+52/128] [tvecs+7999/55752] Oct 7 11:50:47 localhost kernel:[kswapd+115/288] [kernel_thread+40/56] Oct 7 11:50:47 localhost kernel: Code: 0f 0b 83 c4 0c 89 f6 89 da 2b 15 f8 89 26 c0 89 d0 c1 e0 04 Oct 7 11:50:51 localhost kernel: kernel BUG at vmscan.c:538! Oct 7 11:50:51 localhost kernel: invalid operand: Oct 7 11:50:51 localhost kernel: CPU:0 Oct 7 11:50:51 localhost kernel: EIP:0010:[reclaim_page+897/980] Oct 7 11:50:51 localhost kernel: EFLAGS: 00010282 Oct 7 11:50:51 localhost kernel: eax: 001c ebx: c1002aac ecx: c1636000 edx: 0010 Oct 7 11:50:51 localhost kernel: esi: c1002a90 edi: ebp: 0040 esp: c1637e3c Oct 7 11:50:51 localhost kernel: ds: 0018 es: 0018 ss: 0018 Oct 7 11:50:51 localhost kernel: Process cc1 (pid: 2614, stackpage=c1637000) Oct 7 11:50:51 localhost kernel: Stack: c01d4277 c01d4456 021a c020bb20 c020bdb4 c0127548 Oct 7 11:50:51 localhost kernel:c020bb20 c020bdb8 0001 c0127702 c020bdac Oct 7 11:50:51 localhost kernel: 0001 1000 c03a7d60 0001 c04fe080 0007a746 0005 Oct 7 11:50:51 localhost kernel: Call Trace: [tvecs+7135/55752] [tvecs+7614/55752] [__alloc_pages_limit+124/172] [__alloc_pages+394/756] [do_anonymous_page+57/160] [do_no_page+48/192] [handle_mm_fault+232/340] Oct 7 11:50:51 localhost kernel:[do_page_fault+299/976] [merge_segments+324/364] [do_brk+267/316] [sys_brk+180/216] [error_code+44/64] Oct 7 11:50:51 localhost kernel: Code: 0f 0b 83 c4 0c 31 c0 0f b3 46 18 8d 4e 28 8d 46 2c 39 46 2c - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test10-pre1 problems on 4-way SuperServer8050
On Thu, 12 Oct 2000, Matti Aarnio wrote: > > CPU0: Intel Pentium III (Cascades) stepping 01 > > CPU1: Intel Pentium III (Cascades) stepping 01 > > CPU2: Intel Pentium III (Cascades) stepping 01 > > CPU3: Intel Pentium III (Cascades) stepping 01 > > Total of 4 processors activated (5606.60 BogoMIPS). > > Hmm.. More marketing names, what is "Cascades" in the scale > of "cheap bastards" versus "all bells and whistless" ? > (Celeron vs. XEON, that is.) It is a Xeon 700MHz with 1M cache, at least we paid for it as such! :) here is a sample from /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 10 model name : Pentium III (Cascades) stepping: 1 cpu MHz : 701.000611 cache size : 1024 KB fdiv_bug: no hlt_bug : no sep_bug : no f00f_bug: no coma_bug: no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr xmm bogomips: 1399.19 the other 3 look the same. Tigran - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test10-pre1 problems on 4-way SuperServer8050
On Thu, Oct 12, 2000 at 09:21:00AM +0100, Tigran Aivazian wrote: > If MTRR support is disabled then both eepro100 interfaces work fine but > the system is still 40x slower. This is the entire bootlog of > 2.4.0-test10-pre1 + lspci-vvx + /proc/interrupts + /proc/iomem + ifconfig > output > > Two currently active ideas (from Mark, Linus and Zoltan): > > b) this is an L2 cache-tag issue and there is just not enough bits in the > tag to cover such high addresses so nothing will help, save removing the > extra 2G or so out of the machine (or using them as MTD devices : I > hope this is _not_ the case... Reminds me of the difference in between Celeron and XEON variants of Pentium II -- Celerons can cache only the low 4 GB of address space, XEONs can cache whole 36 bits. (Propably other differences exist also, but that is primary one concerning memory cacheability --> apparent speed.) > CPU0: Intel Pentium III (Cascades) stepping 01 > CPU1: Intel Pentium III (Cascades) stepping 01 > CPU2: Intel Pentium III (Cascades) stepping 01 > CPU3: Intel Pentium III (Cascades) stepping 01 > Total of 4 processors activated (5606.60 BogoMIPS). Hmm.. More marketing names, what is "Cascades" in the scale of "cheap bastards" versus "all bells and whistless" ? (Celeron vs. XEON, that is.) /Matti Aarnio - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test10-pre1 problems on 4-way SuperServer8050
On Wed, 11 Oct 2000, Linus Torvalds wrote: > What happens if MTRR support is entirely disabled? If MTRR support is disabled then both eepro100 interfaces work fine but the system is still 40x slower. This is the entire bootlog of 2.4.0-test10-pre1 + lspci-vvx + /proc/interrupts + /proc/iomem + ifconfig output Two currently active ideas (from Mark, Linus and Zoltan): a) one needs to use big-mtrr patch from Zoltan, look at e820 map and manually set up mtrrs to cover all 6G. b) this is an L2 cache-tag issue and there is just not enough bits in the tag to cover such high addresses so nothing will help, save removing the extra 2G or so out of the machine (or using them as MTD devices : I hope this is _not_ the case... another idea (in parallel) is that eepro100 stops working because its PCI memory space is marked as cacheable. All should become clear soon -- I will spend the whole day on this, slowly trying to understand what's going on. Regards, Tigran Linux version 2.4.0-test9 (root@hilbert) (gcc version egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)) #15 SMP Wed Oct 11 19:23:15 BST 2000 BIOS-provided physical RAM map: BIOS-e820: 0009fc00 @ (usable) BIOS-e820: 0400 @ 0009fc00 (reserved) BIOS-e820: 0002 @ 000e (reserved) BIOS-e820: fccf @ 0010 (usable) BIOS-e820: f000 @ fcdf (ACPI data) BIOS-e820: 1000 @ fcdff000 (ACPI NVS) BIOS-e820: 1000 @ fec0 (reserved) BIOS-e820: 1000 @ fee0 (reserved) BIOS-e820: 0008 @ fff8 (reserved) BIOS-e820: 8000 @ 0001 (usable) 5248MB HIGHMEM available. Scan SMP from c000 for 1024 bytes. Scan SMP from c009fc00 for 1024 bytes. Scan SMP from c00f for 65536 bytes. found SMP MP-table at 000fb4d0 hm, page 000fb000 reserved twice. hm, page 000fc000 reserved twice. hm, page 000f5000 reserved twice. hm, page 000f6000 reserved twice. On node 0 totalpages: 1572864 zone(0): 4096 pages. zone(1): 225280 pages. zone(2): 1343488 pages. Intel MultiProcessor Specification v1.1 Virtual Wire compatibility mode. OEM ID: AMI Product ID: CNB20HE APIC at: 0xFEE0 Processor #0 Pentium(tm) Pro APIC version 17 Floating point unit present. Machine Exception supported. 64 bit compare & exchange supported. Internal APIC present. Bootup CPU Processor #1 Pentium(tm) Pro APIC version 17 Floating point unit present. Machine Exception supported. 64 bit compare & exchange supported. Internal APIC present. Processor #2 Pentium(tm) Pro APIC version 17 Floating point unit present. Machine Exception supported. 64 bit compare & exchange supported. Internal APIC present. Processor #3 Pentium(tm) Pro APIC version 17 Floating point unit present. Machine Exception supported. 64 bit compare & exchange supported. Internal APIC present. Bus #0 is PCI Bus #1 is PCI Bus #2 is PCI Bus #3 is ISA I/O APIC #4 Version 17 at 0xFEC0. I/O APIC #5 Version 17 at 0xFEC01000. Int: type 0, pol 3, trig 3, bus 0, IRQ 04, APIC ID 5, APIC INT 0a Int: type 0, pol 3, trig 3, bus 0, IRQ 08, APIC ID 5, APIC INT 0b Int: type 0, pol 3, trig 3, bus 0, IRQ 0c, APIC ID 5, APIC INT 0f Int: type 0, pol 3, trig 3, bus 0, IRQ 3c, APIC ID 4, APIC INT 0a Int: type 0, pol 3, trig 3, bus 1, IRQ 15, APIC ID 5, APIC INT 01 Int: type 0, pol 3, trig 3, bus 1, IRQ 14, APIC ID 5, APIC INT 00 Int: type 3, pol 1, trig 1, bus 3, IRQ 00, APIC ID 4, APIC INT 00 Int: type 0, pol 1, trig 1, bus 3, IRQ 01, APIC ID 4, APIC INT 01 Int: type 0, pol 1, trig 1, bus 3, IRQ 00, APIC ID 4, APIC INT 02 Int: type 0, pol 1, trig 1, bus 3, IRQ 03, APIC ID 4, APIC INT 03 Int: type 0, pol 1, trig 1, bus 3, IRQ 04, APIC ID 4, APIC INT 04 Int: type 0, pol 1, trig 1, bus 3, IRQ 06, APIC ID 4, APIC INT 06 Int: type 0, pol 1, trig 1, bus 3, IRQ 07, APIC ID 4, APIC INT 07 Int: type 0, pol 1, trig 1, bus 3, IRQ 08, APIC ID 4, APIC INT 08 Int: type 0, pol 1, trig 1, bus 3, IRQ 0c, APIC ID 4, APIC INT 0c Int: type 0, pol 1, trig 1, bus 3, IRQ 0d, APIC ID 4, APIC INT 0d Int: type 0, pol 1, trig 1, bus 3, IRQ 0e, APIC ID 4, APIC INT 0e Int: type 0, pol 1, trig 1, bus 3, IRQ 0f, APIC ID 4, APIC INT 0f Lint: type 3, pol 1, trig 1, bus 3, IRQ 00, APIC ID ff, APIC LINT 00 Lint: type 1, pol 1, trig 1, bus 0, IRQ 00, APIC ID ff, APIC LINT 01 Processors: 4 mapped APIC to e000 (fee0) mapped IOAPIC to d000 (fec0) mapped IOAPIC to c000 (fec01000) Kernel command line: auto BOOT_IMAGE=240-test10 ro root=805 BOOT_FILE=/boot/vmlinuz-2.4.0-test10 console=ttyS0,9600 console=tty0 Initializing CPU#0 Detected 701.611 MHz processor. Console: colour VGA+ 80x25 Calibrating delay loop... 1399.19 BogoMIPS Memory: 6132848k/6291456k available (1531k kernel code, 106956k reserved, 88k data, 188k init, 5322688k highmem) Dentry-cache hash table ent
Re: test10-pre1 problems on 4-way SuperServer8050
David Wragg writes: > Tigran Aivazian <[EMAIL PROTECTED]> writes: > > b) it detects all memory correctly but creates a write-back mtrr only for > > the first 2G, is this normal? > > mtrr.c is broken for machines with >=4GB of memory (or less than 4GB, > if the chipset reserves an addresses range below 4GB for PCI). > > The patch against 2.4.0-test9 to fix this is below. > > Richard: Is there a reason you haven't passed this on to Linus, or do > you want me to do it? Partly because I haven't had time to look at it, partly because I'm not sure if it's needed (why, exactly?), and partly because I've recently moved house and (STILL!) don't have IP access at home (not even dialup) so I can't really look at stuff yet :-( :-( :-( BTW: I'm away at conferences for the next two weeks, so don't expect fast responses. Regards, Richard Permanent: [EMAIL PROTECTED] Current: [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test10-pre1 problems on 4-way SuperServer8050
On Wed, 11 Oct 2000, Rik van Riel wrote: > On Wed, 11 Oct 2000, Tigran Aivazian wrote: > > > On Wed, 11 Oct 2000, Rik van Riel wrote: > > > > Could you send me the backtrace of one of the cases where > > > > you hit the bug ? > > > > just to add -- I was following Alan Cox's suggestion of > > incrementing "mem=N" and finding the value where the system > > stops working normally. It was ok as high as "mem=3096M" but > > then I realized that I was also using Zoltan's big-mtrr patch at > > the same time so I will retest the whole thing without it... > > tomorrow. > > > > Just to clarify - the problem _does_ show up without Zoltan's > > patch but my "mem=N" tests were done with it so those findings > > are not really proving much. I need to redo them with vanilla > > kernel. > > Interesting, so up to 3GB works just fine with the new > VM and above that you can trigger all kinds of funny > errors ? I bet that the performance thing at least is due to MTRR issues. Basically, if Tigran ends up using memory that is non-cached, a 30-40 times perfomance degradation is not just explainable, it's expected. Also, the eepro100 will not work correctly if its PCI space is set to be cacheable. What happens if MTRR support is entirely disabled? Make it print out what the BIOS set up, nothing more. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test10-pre1 problems on 4-way SuperServer8050
On Wed, 11 Oct 2000, Tigran Aivazian wrote: > it works fine then. Kernel compiles in 68 seconds as it should. Shall I > keep incrementing mem= to see what happens next... I suspect fixing the mtrrs on the machine will fix this problem, as a 38-40 times slowdown on a machine that isn't swapping is most likely a lack of memory caching (as Rik pointed out 38-40 times is right on the nose for the difference in speed between the cache and main memory). -ben - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test10-pre1 problems on 4-way SuperServer8050
On Wed, 11 Oct 2000, Tigran Aivazian wrote: > > On Wed, 11 Oct 2000, Rik van Riel wrote: > > > Could you send me the backtrace of one of the cases where > > > you hit the bug ? > > just to add -- I was following Alan Cox's suggestion of > incrementing "mem=N" and finding the value where the system > stops working normally. It was ok as high as "mem=3096M" but > then I realized that I was also using Zoltan's big-mtrr patch at > the same time so I will retest the whole thing without it... > tomorrow. > > Just to clarify - the problem _does_ show up without Zoltan's > patch but my "mem=N" tests were done with it so those findings > are not really proving much. I need to redo them with vanilla > kernel. Interesting, so up to 3GB works just fine with the new VM and above that you can trigger all kinds of funny errors ? Could it be that the kernel fills most of low memory with kernel data structures to manage high memory, so that it doesn't have enough low memory left to do the bookkeeping for the eepro card, etc??? regards, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test10-pre1 problems on 4-way SuperServer8050
Tigran Aivazian <[EMAIL PROTECTED]> writes: > b) it detects all memory correctly but creates a write-back mtrr only for > the first 2G, is this normal? mtrr.c is broken for machines with >=4GB of memory (or less than 4GB, if the chipset reserves an addresses range below 4GB for PCI). The patch against 2.4.0-test9 to fix this is below. Richard: Is there a reason you haven't passed this on to Linus, or do you want me to do it? Dave diff -rua linux-2.4.0test9/arch/i386/kernel/mtrr.c linux-2.4.0test9.mod/arch/i386/kernel/mtrr.c --- linux-2.4.0test9/arch/i386/kernel/mtrr.cWed Oct 11 19:54:56 2000 +++ linux-2.4.0test9.mod/arch/i386/kernel/mtrr.cWed Oct 11 20:48:26 2000 @@ -503,9 +503,9 @@ static void intel_get_mtrr (unsigned int reg, unsigned long *base, unsigned long *size, mtrr_type *type) { -unsigned long dummy, mask_lo, base_lo; +unsigned long mask_lo, mask_hi, base_lo, base_hi; -rdmsr (MTRRphysMask_MSR(reg), mask_lo, dummy); +rdmsr (MTRRphysMask_MSR(reg), mask_lo, mask_hi); if ( (mask_lo & 0x800) == 0 ) { /* Invalid (i.e. free) range */ @@ -515,20 +515,17 @@ return; } -rdmsr(MTRRphysBase_MSR(reg), base_lo, dummy); +rdmsr(MTRRphysBase_MSR(reg), base_lo, base_hi); -/* We ignore the extra address bits (32-35). If someone wants to - run x86 Linux on a machine with >4GB memory, this will be the - least of their problems. */ +/* Work out the shifted address mask. */ +mask_lo = 0xff00 | mask_hi << (32 - PAGE_SHIFT) + | mask_lo >> PAGE_SHIFT; -/* Clean up mask_lo so it gives the real address mask. */ -mask_lo = (mask_lo & 0xf000UL); /* This works correctly if size is a power of two, i.e. a contiguous range. */ -*size = ~(mask_lo - 1); - -*base = (base_lo & 0xf000UL); -*type = (base_lo & 0xff); +*size = -mask_lo; +*base = base_hi << (32 - PAGE_SHIFT) | base_lo >> PAGE_SHIFT; +*type = base_lo & 0xff; } /* End Function intel_get_mtrr */ static void cyrix_get_arr (unsigned int reg, unsigned long *base, @@ -553,13 +550,13 @@ /* Enable interrupts if it was enabled previously */ __restore_flags (flags); shift = ((unsigned char *) base)[1] & 0x0f; -*base &= 0xf000UL; +*base >>= PAGE_SHIFT; /* Power of two, at least 4K on ARR0-ARR6, 256K on ARR7 * Note: shift==0xf means 4G, this is unsupported. */ if (shift) - *size = (reg < 7 ? 0x800UL : 0x2UL) << shift; + *size = (reg < 7 ? 0x1UL : 0x40UL) << (shift - 1); else *size = 0; @@ -596,7 +593,7 @@ /* Upper dword is region 1, lower is region 0 */ if (reg == 1) low = high; /* The base masks off on the right alignment */ -*base = low & 0xFFFE; +*base = (low & 0xFFFE) >> PAGE_SHIFT; *type = 0; if (low & 1) *type = MTRR_TYPE_UNCACHABLE; if (low & 2) *type = MTRR_TYPE_WRCOMB; @@ -621,7 +618,7 @@ * *128K ... */ low = (~low) & 0x1FFFC; -*size = (low + 4) << 15; +*size = (low + 4) << (15 - PAGE_SHIFT); return; } /* End Function amd_get_mtrr */ @@ -634,8 +631,8 @@ static void centaur_get_mcr (unsigned int reg, unsigned long *base, unsigned long *size, mtrr_type *type) { -*base = centaur_mcr[reg].high & 0xf000; -*size = (~(centaur_mcr[reg].low & 0xf000))+1; +*base = centaur_mcr[reg].high >> PAGE_SHIFT; +*size = -(centaur_mcr[reg].low & 0xf000) >> PAGE_SHIFT; *type = MTRR_TYPE_WRCOMB; /* If it is there, it is write-combining */ } /* End Function centaur_get_mcr */ @@ -665,8 +662,10 @@ } else { - wrmsr (MTRRphysBase_MSR (reg), base | type, 0); - wrmsr (MTRRphysMask_MSR (reg), ~(size - 1) | 0x800, 0); + wrmsr (MTRRphysBase_MSR (reg), base << PAGE_SHIFT | type, + (base & 0xf0) >> (32 - PAGE_SHIFT)); + wrmsr (MTRRphysMask_MSR (reg), -size << PAGE_SHIFT | 0x800, + (-size & 0xf0) >> (32 - PAGE_SHIFT)); } if (do_safe) set_mtrr_done (&ctxt); } /* End Function intel_set_mtrr_up */ @@ -680,7 +679,9 @@ arr = CX86_ARR_BASE + (reg << 1) + reg; /* avoid multiplication by 3 */ /* count down from 32M (ARR0-ARR6) or from 2G (ARR7) */ -size >>= (reg < 7 ? 12 : 18); +if (reg >= 7) + size >>= 6; + size &= 0x7fff; /* make sure arr_size <= 14 */ for(arr_size = 0; size; arr_size++, size >>= 1); @@ -705,6 +706,7 @@ } if (do_safe) set_mtrr_prepare (&ctxt); +base <<= PAGE_SHIFT; setCx86(arr,((unsigned char *) &base)[3]); setCx86(arr+1, ((unsigned char *) &base)[2]); setCx86(arr+2, (((unsigned char *) &base)[1]) | arr_size); @@ -724,34 +726,36 @@ [RETURNS] Nothing. */ { -u32 low, high; +u32 regs[2]; struct set_mtrr_context ctxt; if (do_safe) set_mtrr_prepare (&ctxt); /*
Re: test10-pre1 problems on 4-way SuperServer8050
On Wed, 11 Oct 2000, Tigran Aivazian wrote: > On Wed, 11 Oct 2000, Rik van Riel wrote: > > Could you send me the backtrace of one of the cases where > > you hit the bug ? > > here you are: > Oct 11 16:05:26 hilbert36 kernel: kernel BUG at page_alloc.c:221! > Oct 11 16:05:27 hilbert36 kernel: Call Trace: [tvecs+9181/112440] >[tvecs+9707/112440] [__alloc_pages+225/740] [filemap_nopage+240/1120] >[do_no_page+93/440] [] [] > Oct 11 16:05:27 hilbert36 kernel:[] [] [] >[handle_mm_fault+944/1388] [do_page_fault+0/1008] [unmap_fixup+99/316] >[do_page_fault+323/1008] [do_page_fault+0/1008] > Oct 11 16:05:27 hilbert36 kernel:[timer_bh+56/700] [bh_action+78/176] >[tasklet_hi_action+81/124] [do_softirq+90/136] [do_IRQ+218/236] [error_code+52/60] Ughhh, of course ... this comes into __alloc_pages(), which finds a page with some flags set on the free list ... this backtrace - of course - isn't helpful in debugging the thing, sorry for wasting your time... [off to find other ways of finding this problem ... note that __free_pages_ok() does the SAME BUG() check before putting the page on the free list] regards, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test10-pre1 problems on 4-way SuperServer8050
> On Wed, 11 Oct 2000, Rik van Riel wrote: > > Could you send me the backtrace of one of the cases where > > you hit the bug ? just to add -- I was following Alan Cox's suggestion of incrementing "mem=N" and finding the value where the system stops working normally. It was ok as high as "mem=3096M" but then I realized that I was also using Zoltan's big-mtrr patch at the same time so I will retest the whole thing without it... tomorrow. Just to clarify - the problem _does_ show up without Zoltan's patch but my "mem=N" tests were done with it so those findings are not really proving much. I need to redo them with vanilla kernel. Regards, Tigran - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test10-pre1 problems on 4-way SuperServer8050
On Wed, 11 Oct 2000, Rik van Riel wrote: > Could you send me the backtrace of one of the cases where > you hit the bug ? here you are: Oct 11 16:05:26 hilbert36 kernel: kernel BUG at page_alloc.c:221! Oct 11 16:05:26 hilbert36 kernel: invalid operand: Oct 11 16:05:26 hilbert36 kernel: CPU:2 Oct 11 16:05:26 hilbert36 kernel: EIP:0010:[rmqueue+590/636] Oct 11 16:05:26 hilbert36 kernel: EFLAGS: 00010292 Oct 11 16:05:26 hilbert36 kernel: eax: 0020 ebx: c75e0a4c ecx: c027ff68 edx: 0026 Oct 11 16:05:26 hilbert36 kernel: esi: 0001 edi: c02811d0 ebp: esp: f7205d84 Oct 11 16:05:26 hilbert36 kernel: ds: 0018 es: 0018 ss: 0018 Oct 11 16:05:26 hilbert36 kernel: Process head (pid: 582, stackpage=f7205000) Oct 11 16:05:26 hilbert36 kernel: Stack: c023a365 c023a573 00dd c02811d0 0112 c0281430 c02811f8 Oct 11 16:05:26 hilbert36 kernel:0014789c 0014789f 0286 c02811d0 c0133309 c532a204 0112 Oct 11 16:05:26 hilbert36 kernel:0001 f7562500 c75da270 0015 0001 c028142c c012a944 f7554460 Oct 11 16:05:27 hilbert36 kernel: Call Trace: [tvecs+9181/112440] [tvecs+9707/112440] [__alloc_pages+225/740] [filemap_nopage+240/1120] [do_no_page+93/440] [] [] Oct 11 16:05:27 hilbert36 kernel:[] [] [] [handle_mm_fault+944/1388] [do_page_fault+0/1008] [unmap_fixup+99/316] [do_page_fault+323/1008] [do_page_fault+0/1008] Oct 11 16:05:27 hilbert36 kernel:[timer_bh+56/700] [bh_action+78/176] [tasklet_hi_action+81/124] [do_softirq+90/136] [do_IRQ+218/236] [error_code+52/60] Oct 11 16:05:27 hilbert36 kernel: Code: 0f 0b 83 c4 0c 90 89 d8 eb 1c 45 83 c6 0c 83 fd 09 0f 86 c6 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test10-pre1 problems on 4-way SuperServer8050
On Wed, 11 Oct 2000, Tigran Aivazian wrote: > On Wed, 11 Oct 2000, Rik van Riel wrote: > > On Wed, 11 Oct 2000, Tigran Aivazian wrote: > > > On Wed, 11 Oct 2000, Mark Hemment wrote: > > > > On Wed, 11 Oct 2000, Tigran Aivazian wrote: > > > > > > > > > a) one of the eepro100 interfaces (the onboard one on the S2QR6 mb) is > > > > > malfunctioning, interrupts are generated but no traffic gets through (YES, > > > > > I did plug it in correctly, this time, and I repeat 2.2.16 works!) > > > > > > > > I saw this the other week on our two-way Dell under a reasonibly heavy > > > > load - but with 3c59x.c driver, the eepro100s survived! > > > > Either NIC (had two Tornados) could go this away after anything from 1 > > > > to 36 hours of load. They would end up running in "poll" mode off the > > > > transmit watchdog timer. > > > > Swapped them for a dual-port eepro100 and no more problems. > > > > > > I disabled eepro100 support completely and the problem is still > > > there. What I also noticed is that with highmem-PAE enabled I > > > get BUG in page_alloc.c at line 221 so it is probably a VM > > > problem recently introduced (hence cc'd Rik). > > > > Can you trigger this bug /without/ PAE ? > > no, I can't. I wonder if PAE somehow messes with the locking semantics of the page table things, because the test on line 221 of page_alloc.c depends on the fact that locking works. [in fact, that test is there exactly to verify that nothing went wrong with the locking and we're not re-using a page that's already in use] Could you send me the backtrace of one of the cases where you hit the bug ? > > > I will continue to narrow down by removing some things (like > > > mtrr) from the equation. Rik, the problem is that when one > > > enables PAE (or just highmem-4G) support on a 4-way 6G RAM > > > machine becomes 38-40 times slower. > > > > 38-40 times slower in what kind of benchmark ? > > compiling the kernel, specifically. But even a simple thing like > "time ps" shows about 0.9 seconds real time when it should show > something like 0.021 seconds. Everything becomes unbearably > slow, make xconfig takes 4-5 minutes to startup, the shutdown > becomes impossible so I do sysrq-B after a few syncs etc. Also, > as I said, one of the eepro100 interfaces becomes dead. I > believe this _is_ the same problem even if it really seems it is > not. OUCH ... this shouldn't happen and to be honest I don't have an explanation for how this /could/ happen (or even what it would have to do with the new VM)... regards, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test10-pre1 problems on 4-way SuperServer8050
On Wed, 11 Oct 2000, Rik van Riel wrote: > On Wed, 11 Oct 2000, Tigran Aivazian wrote: > > On Wed, 11 Oct 2000, Mark Hemment wrote: > > > On Wed, 11 Oct 2000, Tigran Aivazian wrote: > > > > > > > a) one of the eepro100 interfaces (the onboard one on the S2QR6 mb) is > > > > malfunctioning, interrupts are generated but no traffic gets through (YES, > > > > I did plug it in correctly, this time, and I repeat 2.2.16 works!) > > > > > > I saw this the other week on our two-way Dell under a reasonibly heavy > > > load - but with 3c59x.c driver, the eepro100s survived! > > > Either NIC (had two Tornados) could go this away after anything from 1 > > > to 36 hours of load. They would end up running in "poll" mode off the > > > transmit watchdog timer. > > > Swapped them for a dual-port eepro100 and no more problems. > > > > I disabled eepro100 support completely and the problem is still > > there. What I also noticed is that with highmem-PAE enabled I > > get BUG in page_alloc.c at line 221 so it is probably a VM > > problem recently introduced (hence cc'd Rik). > > Can you trigger this bug /without/ PAE ? > no, I can't. > I've been stress-testing my dual-cpu test machines (one with > 64MB and one with 1GB ram) very very heavily for the last 4 > days and haven't encountered any bug whatsoever ... > > Btw, what compiler are you using ? kgcc on red hat 6.9 which is really egcs-2.91.66 > > > I will continue to narrow down by removing some things (like > > mtrr) from the equation. Rik, the problem is that when one > > enables PAE (or just highmem-4G) support on a 4-way 6G RAM > > machine becomes 38-40 times slower. > > 38-40 times slower in what kind of benchmark ? compiling the kernel, specifically. But even a simple thing like "time ps" shows about 0.9 seconds real time when it should show something like 0.021 seconds. Everything becomes unbearably slow, make xconfig takes 4-5 minutes to startup, the shutdown becomes impossible so I do sysrq-B after a few syncs etc. Also, as I said, one of the eepro100 interfaces becomes dead. I believe this _is_ the same problem even if it really seems it is not. Regards, Tigran - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test10-pre1 problems on 4-way SuperServer8050
On Wed, 11 Oct 2000, Tigran Aivazian wrote: > On Wed, 11 Oct 2000, Mark Hemment wrote: > > On Wed, 11 Oct 2000, Tigran Aivazian wrote: > > > > > a) one of the eepro100 interfaces (the onboard one on the S2QR6 mb) is > > > malfunctioning, interrupts are generated but no traffic gets through (YES, > > > I did plug it in correctly, this time, and I repeat 2.2.16 works!) > > > > I saw this the other week on our two-way Dell under a reasonibly heavy > > load - but with 3c59x.c driver, the eepro100s survived! > > Either NIC (had two Tornados) could go this away after anything from 1 > > to 36 hours of load. They would end up running in "poll" mode off the > > transmit watchdog timer. > > Swapped them for a dual-port eepro100 and no more problems. > > I disabled eepro100 support completely and the problem is still > there. What I also noticed is that with highmem-PAE enabled I > get BUG in page_alloc.c at line 221 so it is probably a VM > problem recently introduced (hence cc'd Rik). Can you trigger this bug /without/ PAE ? I've been stress-testing my dual-cpu test machines (one with 64MB and one with 1GB ram) very very heavily for the last 4 days and haven't encountered any bug whatsoever ... Btw, what compiler are you using ? > I will continue to narrow down by removing some things (like > mtrr) from the equation. Rik, the problem is that when one > enables PAE (or just highmem-4G) support on a 4-way 6G RAM > machine becomes 38-40 times slower. 38-40 times slower in what kind of benchmark ? regards, Rik -- "What you're running that piece of shit Gnome?!?!" -- Miguel de Icaza, UKUUG 2000 http://www.conectiva.com/ http://www.surriel.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test10-pre1 problems on 4-way SuperServer8050
> On Wed, 11 Oct 2000, Alan Cox wrote: > > > > I will continue to narrow down by removing some things (like mtrr) from > > > the equation. Rik, the problem is that when one enables PAE (or just > > > highmem-4G) support on a 4-way 6G RAM machine becomes 38-40 times slower. > > > > What happens if you boot a PAE kernel with mem=512M on that box ? > > > > it works fine then. Kernel compiles in 68 seconds as it should. Shall I > keep incrementing mem= to see what happens next... The threshold is probably >1Gig, but see where it actually is, thats the point the other zones and memory juggling kicks in - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test10-pre1 problems on 4-way SuperServer8050
On Wed, 11 Oct 2000, Alan Cox wrote: > > I will continue to narrow down by removing some things (like mtrr) from > > the equation. Rik, the problem is that when one enables PAE (or just > > highmem-4G) support on a 4-way 6G RAM machine becomes 38-40 times slower. > > What happens if you boot a PAE kernel with mem=512M on that box ? > it works fine then. Kernel compiles in 68 seconds as it should. Shall I keep incrementing mem= to see what happens next... Regards, Tigran - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test10-pre1 problems on 4-way SuperServer8050
> I will continue to narrow down by removing some things (like mtrr) from > the equation. Rik, the problem is that when one enables PAE (or just > highmem-4G) support on a 4-way 6G RAM machine becomes 38-40 times slower. What happens if you boot a PAE kernel with mem=512M on that box ? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: 36 bit MTRRs, Re: test10-pre1 problems on 4-way SuperServer8050
On Wed, 11 Oct 2000, Boszormenyi Zoltan wrote: > On Wed, 11 Oct 2000, Tigran Aivazian wrote: > > I will continue to narrow down by removing some things (like mtrr) from > > the equation. Rik, the problem is that when one enables PAE (or just > > highmem-4G) support on a 4-way 6G RAM machine becomes 38-40 times slower. > > Will you please try this patch? This is almost the same as I > sent to you before, it is just against 2.4.0-test10-pre1 and > it lacks the corrections to e.g. the frame buffer drivers. Hi Zoltan, I have tried your patch and although it works: # cat /proc/mtrr reg00: base=0x ( 0MB), size=4096MB: write-back, count=1 reg01: base=0x001 (4096MB), size=2048MB: write-back, count=1 reg02: base=0xfc00 (4032MB), size= 64MB: uncachable, count=1 unfortunately, it doesn't solve the problem. The machine is still unbearably slow (up to 40x slower!) and one of the eepro100 interfaces is still not working. Another interesting idea was suggested by Mark Hemment - to switch memlist_add_head() -> memlist_add_tail() in expand()/__free_pages_ok() (see mm/page_alloc.c) and it did make a difference -- both eepro100 started to work fine but the machine remained just as slow as before. So, the problem is complex but I am told that Rik and others are aware that at present there is no working support for highmem. Strangely, my desktop dual PIII550 with 1G RAM works just fine with highmem... nice and fast, no problems whatsoever and it is filled up with all kinds of devices, from soundcard to bttv848, 3dfx, eepro100, ne2k, 8139 etc etc. Everything just works. Regards, Tigran - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
36 bit MTRRs, Re: test10-pre1 problems on 4-way SuperServer8050
On Wed, 11 Oct 2000, Tigran Aivazian wrote: > I will continue to narrow down by removing some things (like mtrr) from > the equation. Rik, the problem is that when one enables PAE (or just > highmem-4G) support on a 4-way 6G RAM machine becomes 38-40 times slower. Will you please try this patch? This is almost the same as I sent to you before, it is just against 2.4.0-test10-pre1 and it lacks the corrections to e.g. the frame buffer drivers. I am now running test10-pre1 with this patch and: - [root@localhost /root]# cd /proc [root@localhost /proc]# cat mtrr reg00: base=0x ( 0MB), size= 128MB: write-back, count=1 reg05: base=0xe200 (3616MB), size= 32MB: write-combining, count=1 [root@localhost /proc]# echo "base=0x2 size=0x1 type=write-combining" >mtrr [root@localhost /proc]# cat mtrr reg00: base=0x ( 0MB), size= 128MB: write-back, count=1 reg01: base=0x002 (8192MB), size=4096MB: write-combining, count=1 reg05: base=0xe200 (3616MB), size= 32MB: write-combining, count=1 -- This is on a dual P-III machine with 128 MB memory. If it causes problems on Athlons then change the line #define AMD_OR_MASK(0xf000UL) to #define AMD_OR_MASK(INTEL_OR_MASK) and recompile and tell me what happens. Also, I would like to hear reports from non-Intel (Cyrix, etc.) and older AMD machines. I do not have my Cyrix 6x86MX anymore but this scheme worked on that. Regards, Zoltan Boszormenyi mtrrpage-new.diff.gz
Re: test10-pre1 problems on 4-way SuperServer8050
Hi Mark, On Wed, 11 Oct 2000, Mark Hemment wrote: > Hi Tigran, > > On Wed, 11 Oct 2000, Tigran Aivazian wrote: > > > a) one of the eepro100 interfaces (the onboard one on the S2QR6 mb) is > > malfunctioning, interrupts are generated but no traffic gets through (YES, > > I did plug it in correctly, this time, and I repeat 2.2.16 works!) > > I saw this the other week on our two-way Dell under a reasonibly heavy > load - but with 3c59x.c driver, the eepro100s survived! > Either NIC (had two Tornados) could go this away after anything from 1 > to 36 hours of load. They would end up running in "poll" mode off the > transmit watchdog timer. > Swapped them for a dual-port eepro100 and no more problems. I disabled eepro100 support completely and the problem is still there. What I also noticed is that with highmem-PAE enabled I get BUG in page_alloc.c at line 221 so it is probably a VM problem recently introduced (hence cc'd Rik). I will continue to narrow down by removing some things (like mtrr) from the equation. Rik, the problem is that when one enables PAE (or just highmem-4G) support on a 4-way 6G RAM machine becomes 38-40 times slower. Regards, Tigran - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: [more findings!] Re: test10-pre1 problems on 4-way SuperServer8050
ok, confirmed -- it is _not_ PAE-related. Just using a plain highmem (4G) support causes all these problems -- the machine becomes 38-40 times slower overall and one of the eepro100 cards stops working. I will try Zoltan's ideas on 64bit mtrrs but any more ideas are welcome... Thanks, Tigran On Wed, 11 Oct 2000, Tigran Aivazian wrote: > Amazing, disabling highmem altogether (not just PAE) i.e. being able to > use only low 896M of RAM got rid of _both_ the eepro100 and slowness > problems! > > The system is now very fast (kernel compile in 61 seconds!) and all > eepro100 interfaces work fine. I will now test with plain highmem (4G) but > no PAE... and see what happens > > On Wed, 11 Oct 2000, Tigran Aivazian wrote: > > > Hi, > > > > I have installed 2.4.0-test10-pre1 on a 4-way Xeon 700MHz 6G RAM machine > > and observe various problems, not present in > > 2.2.16-(redhat69's-number-17). > > > > a) one of the eepro100 interfaces (the onboard one on the S2QR6 mb) is > > malfunctioning, interrupts are generated but no traffic gets through (YES, > > I did plug it in correctly, this time, and I repeat 2.2.16 works!) > > > > b) it detects all memory correctly but creates a write-back mtrr only for > > the first 2G, is this normal? > > > > # cat /proc/meminfo /proc/mtrr > > total:used:free: shared: buffers: cached: > > Mem: 1985175552 107397120 1884320 6864896 60833792 > > Swap: 18917703680 1891770368 > > MemTotal: 6132952 kB > > MemFree: 6028072 kB > > MemShared: 0 kB > > Buffers: 6704 kB > > Cached: 59408 kB > > Active: 9884 kB > > Inact_dirty: 56228 kB > > Inact_clean: 0 kB > > Inact_target: 96 kB > > HighTotal: 5322688 kB > > HighFree: 5247736 kB > > LowTotal: 810264 kB > > LowFree:780336 kB > > SwapTotal: 1847432 kB > > SwapFree: 1847432 kB > > reg01: base=0x ( 0MB), size=2048MB: write-back, count=1 > > reg02: base=0xfc00 (4032MB), size= 64MB: uncachable, count=1 > > > > c) /proc/meminfo shows the number of bytes incorrectly, The B() macro of > > fs/proc/proc_misc.c looks fine but perhaps the %8lu format specifier > > should be extended to %16lu? (we should care about correctness more than > > about binary compatibility with apps that may parse /proc/meminfo file) > > > > d) the system is incredibly slow. It took only 1 minute 20 seconds to > > compile the kernel (make -j4 bzImage, with mem=512M because the e820 > > (or whatever is in 2.2.x, I don't care) algorithm didn't work so I > > gave it at least "some memory" to work with) on 2.2.16 and it took about > > an hour to compile on 2.4.0-test10. I expected 50 seconds or so Must > > be something to do with caching? I enabled PAE of course. It is probably > > something simple to fix as I expect this machine to be the fastest in the > > world (for this price :) > > > > I will slowly go through all of these problems, starting with the simplest > > c) > > > > Regards, > > Tigran > > > > - > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > > the body of a message to [EMAIL PROTECTED] > > Please read the FAQ at http://www.tux.org/lkml/ > > > > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: test10-pre1 problems on 4-way SuperServer8050
Hi Tigran, On Wed, 11 Oct 2000, Tigran Aivazian wrote: > a) one of the eepro100 interfaces (the onboard one on the S2QR6 mb) is > malfunctioning, interrupts are generated but no traffic gets through (YES, > I did plug it in correctly, this time, and I repeat 2.2.16 works!) I saw this the other week on our two-way Dell under a reasonibly heavy load - but with 3c59x.c driver, the eepro100s survived! Either NIC (had two Tornados) could go this away after anything from 1 to 36 hours of load. They would end up running in "poll" mode off the transmit watchdog timer. Swapped them for a dual-port eepro100 and no more problems. Mark - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
[more findings!] Re: test10-pre1 problems on 4-way SuperServer8050
Amazing, disabling highmem altogether (not just PAE) i.e. being able to use only low 896M of RAM got rid of _both_ the eepro100 and slowness problems! The system is now very fast (kernel compile in 61 seconds!) and all eepro100 interfaces work fine. I will now test with plain highmem (4G) but no PAE... and see what happens On Wed, 11 Oct 2000, Tigran Aivazian wrote: > Hi, > > I have installed 2.4.0-test10-pre1 on a 4-way Xeon 700MHz 6G RAM machine > and observe various problems, not present in > 2.2.16-(redhat69's-number-17). > > a) one of the eepro100 interfaces (the onboard one on the S2QR6 mb) is > malfunctioning, interrupts are generated but no traffic gets through (YES, > I did plug it in correctly, this time, and I repeat 2.2.16 works!) > > b) it detects all memory correctly but creates a write-back mtrr only for > the first 2G, is this normal? > > # cat /proc/meminfo /proc/mtrr > total:used:free: shared: buffers: cached: > Mem: 1985175552 107397120 1884320 6864896 60833792 > Swap: 18917703680 1891770368 > MemTotal: 6132952 kB > MemFree: 6028072 kB > MemShared: 0 kB > Buffers: 6704 kB > Cached: 59408 kB > Active: 9884 kB > Inact_dirty: 56228 kB > Inact_clean: 0 kB > Inact_target: 96 kB > HighTotal: 5322688 kB > HighFree: 5247736 kB > LowTotal: 810264 kB > LowFree:780336 kB > SwapTotal: 1847432 kB > SwapFree: 1847432 kB > reg01: base=0x ( 0MB), size=2048MB: write-back, count=1 > reg02: base=0xfc00 (4032MB), size= 64MB: uncachable, count=1 > > c) /proc/meminfo shows the number of bytes incorrectly, The B() macro of > fs/proc/proc_misc.c looks fine but perhaps the %8lu format specifier > should be extended to %16lu? (we should care about correctness more than > about binary compatibility with apps that may parse /proc/meminfo file) > > d) the system is incredibly slow. It took only 1 minute 20 seconds to > compile the kernel (make -j4 bzImage, with mem=512M because the e820 > (or whatever is in 2.2.x, I don't care) algorithm didn't work so I > gave it at least "some memory" to work with) on 2.2.16 and it took about > an hour to compile on 2.4.0-test10. I expected 50 seconds or so Must > be something to do with caching? I enabled PAE of course. It is probably > something simple to fix as I expect this machine to be the fastest in the > world (for this price :) > > I will slowly go through all of these problems, starting with the simplest > c) > > Regards, > Tigran > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > Please read the FAQ at http://www.tux.org/lkml/ > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
test10-pre1 problems on 4-way SuperServer8050
Hi, I have installed 2.4.0-test10-pre1 on a 4-way Xeon 700MHz 6G RAM machine and observe various problems, not present in 2.2.16-(redhat69's-number-17). a) one of the eepro100 interfaces (the onboard one on the S2QR6 mb) is malfunctioning, interrupts are generated but no traffic gets through (YES, I did plug it in correctly, this time, and I repeat 2.2.16 works!) b) it detects all memory correctly but creates a write-back mtrr only for the first 2G, is this normal? # cat /proc/meminfo /proc/mtrr total:used:free: shared: buffers: cached: Mem: 1985175552 107397120 1884320 6864896 60833792 Swap: 18917703680 1891770368 MemTotal: 6132952 kB MemFree: 6028072 kB MemShared: 0 kB Buffers: 6704 kB Cached: 59408 kB Active: 9884 kB Inact_dirty: 56228 kB Inact_clean: 0 kB Inact_target: 96 kB HighTotal: 5322688 kB HighFree: 5247736 kB LowTotal: 810264 kB LowFree:780336 kB SwapTotal: 1847432 kB SwapFree: 1847432 kB reg01: base=0x ( 0MB), size=2048MB: write-back, count=1 reg02: base=0xfc00 (4032MB), size= 64MB: uncachable, count=1 c) /proc/meminfo shows the number of bytes incorrectly, The B() macro of fs/proc/proc_misc.c looks fine but perhaps the %8lu format specifier should be extended to %16lu? (we should care about correctness more than about binary compatibility with apps that may parse /proc/meminfo file) d) the system is incredibly slow. It took only 1 minute 20 seconds to compile the kernel (make -j4 bzImage, with mem=512M because the e820 (or whatever is in 2.2.x, I don't care) algorithm didn't work so I gave it at least "some memory" to work with) on 2.2.16 and it took about an hour to compile on 2.4.0-test10. I expected 50 seconds or so Must be something to do with caching? I enabled PAE of course. It is probably something simple to fix as I expect this machine to be the fastest in the world (for this price :) I will slowly go through all of these problems, starting with the simplest c) Regards, Tigran - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/