Please help with following NUMA-related questions
I will greatly appreciate any help regarding the following matters: (1) How to know whether my machine is NUMA-aware or not, (2) Difference between memory bank interleaving and node interleaving (3) When the BIOS asks me to set bank interleaving as AUTO, then it says that AUTO allows memory access to spread out over banks on the same node or across nodes decreasing memory access contentions. However, I have no idea when the memory access is spread over banks on the same node or across nodes. I also do not know how to tell the machine to access memory across the nodes or on the same node. I have no idea as to how the AUTO choice affects NUMA-awareness. (4) The BIOS also tells me that I could choose bani interleaving as DISABLED. But I do not know what its implications are for NUMA awareness. Here are other relevant details. I have a dual-Opteron 250 (2.4GHz) set in Tyan Thunder 2885 K8W with AMIBIOS version 2.05. When I bought it last year, the machine was running under SuSE 9.1 Pro and the Linux kernel was 2.6.5-7.108-smp. At that time both the Hardware Info from YAST and /vat/log/messages were explicitly mentioning things : Scanning NUMA topology in Northbridge 24 <6>Number of nodes 2 (10010) <6>Node 0 MemBase Limit 7fff <6>Node 1 MemBase 8000 Limit cbff <6>Using node hash shift of 24 These messages indicated that NODE interleaving was off and the machine was NUMA-aware. Then, after a few months, the motherboard failed and the machine was sent to the vendor for repair. It came back with SuSE 9.3 and the Linux kernel version 2.6.11.4-21.7-smp ([EMAIL PROTECTED]) (gcc version 3.3.5 20050117 (prerelease) (SUSE Linux)) #1 SMP Thu Jun 2 14:23:14 UTC 2005. Now both the Hardware Info from YAST and /vat/log/messages DO NOT mention NUMA anywhere, and I do not have anyway to check whether the NODE-Interleaving is OFF or ON. My difficulties are compounded because I do not know how to interpret the chipset related setting in the BIOS. Currently, in the BIOS setting (Chipset->memory config -> Bank Interleaving), I am asked to choose between AUTO & DISABLED. No choice is offered for Node Interleaving. The only guidance for the choice is that interleaving allows memory access to spread out over banks on the same node or across nodes decreasing memory access contentions. Nothing is mentioned about what happens when Interleaving is disabled. Furthermore, if I choose AUTO, then I do not know when the memory is spread out over banks on the same node or across nodes. Any help will be greatly appreciated. Thanking you in advance. -- Best regards. Sheo (Sheo S. Prasad) Creative Research Enterprises 6354 Camino del Lago Pleasanton, CA 94566, USA Voice Phone: (+1) 925 426-9341 Fax Phone: (+1) 925 426-9417 e-mail: [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
/proc/mtrr & BIOS-provided RAM map
I am having a horrible problem of wild and random variation in execution time of of a benchmark test program from one fresh reboot to another fresh reboot (and also from one try to another in any one reboot). At the advice of a few colleagues at linux-kernel@vger.kernel.org and at [EMAIL PROTECTED] (who very kindly spared their valuable time to give me some advice), I tried booting with mem=4000M, mem= 3264, mem=3000M and mem=2000M (i.e, a value that is less than the actual RAM (4096M PC3200 in 4 dimms with 2 dimms on each CPU that is an Opteron 250 at 2.4 GHz) that are under the control of SuSE 9.3 operating system. None of the setting has helped. I continue to get the same wild and random variation in the execution of the test program under identical conditions (test program is the only program running, no Internet etc.). Also, no matter what is set for in mem=M , the contents of the /proc/mtrr & and the BIOS provide RAM map remain exactly the same (although the dmesg shows the amount of memory change with the value of in mem=M). Is this normal? Does this show any BIOS problem? What these are telling me. I will greatly appreciate any advice that any one may have for me on the above questions. The following are the contents of /proc/mtrr and the BIOS provided RAM map. (1) content of /proc/mtrr : reg00: base=0x ( 0MB), size=2048MB: write-back, count=1 reg01: base=0x8000 (2048MB), size=1024MB: write-back, count=1 reg02: base=0xc000 (3072MB), size= 128MB: write-back, count=1 reg03: base=0xc800 (3200MB), size= 64MB: write-back, count=1 reg04: base=0xd000 (3328MB), size= 256MB: write-combining, count=2 (2) BIOS provided RAM map: BIOS-e820: - 0009fc00 (usable) BIOS-e820: 0009fc00 - 000a (reserved) BIOS-e820: 000e - 0010 (reserved) BIOS-e820: 0010 - cbff (usable) BIOS-e820: cbff - cbfff000 (ACPI data) BIOS-e820: cbfff000 - cc00 (ACPI NVS) BIOS-e820: ff78 - 0001 (reserved) Thanks for your help. -- Best regards. Sheo (Sheo S. Prasad) Creative Research Enterprises 6354 Camino del Lago Pleasanton, CA 94566, USA Voice Phone: (+1) 925 426-9341 Fax Phone: (+1) 925 426-9417 e-mail: [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Disturbing wide variation in execution time
On Thursday July 7 2005 11:46 am, Philippe Troin wrote: > Sheo Shanker Prasad <[EMAIL PROTECTED]> writes: > > I will appreciate your help in eliminating a disturbing wide > > variation (by a factors of 2 to 2.5) in the execution time of a test > > (execution benchmark) program under identical conditions even when > > the machine is freshly started (rebooted) and no other user program > > is running (not even e-mail or Internet browser). > > > > I have a dual Opteron 250 (2.4 GHz) running SuSE 9.3 Pro & Linux > > version 2.6.11.4-21.7-smp ([EMAIL PROTECTED]) (gcc version 3.3.5 > > 20050117 (prerelease) (SUSE Linux)) #1 SMP Thu Jun 2 14:23:14 UTC > > 2005. The motherboard is Tyan Thunder K8W (S2885 ANRF) with AMI BIOS > > > > The machine has 4GB of PC3200 DDR RAM, two dimms on each CPU. > > > > The original machine bought from a vendor about 6 months ago. At > > that time it was running SuSE 9.1 Pro and the execution time for the > > same test program was consistently the same (around 2m 37s +/- a few > > %). Then the mother board failed and the machine went totally > > dead. The vendor then replaced the failed motherboard with a new > > Tyan Thunder K8W and installed the SuSE 9.3. I am not sure whether > > or not the AMI BIOS was also replaced. > > > > When the repaired machine was started, I began to notice the > > disturbing wide variation and the frequect significant slow down of > > the machine as exhibited by the factor of 2 to 2.5 increased > > execution time of the test program as described above. Sometimes it > > would be quite fast (executing at the original 2m 40s) and sometime > > a factor of 2.5 slow, and sometimes with speed in between. > > 8< snip >8 Thanks very much for your taking time to think about my problem. Here are answers to your questions. > > 1. Are you running an i386 kernel or an x86_64 kernel? I think, I am running a x86_64 kernel. I think so, because I had asked the vendor of the machine to install x86_64 and because the file System.map-2.6.11.4-21.7-smp in the /boot directory has an entry: 804f T x86_64_start_kernel and that directory also contains the gzipped file: symvers-2.6.11.4-21.7-x86_64-smp.gz The operating system is Linux version 2.6.11.4-21.7-smp ([EMAIL PROTECTED]) (gcc version 3.3.5 20050117 (prerelease) (SUSE Linux)) #1 SMP Thu Jun 2 14:23:14 UTC 2005 > > 2. Which BIOS version? The BIOS is AMIBIOS version is 08.00.10 with the build date of 02/11/05 09:44:04 and has the ID: 0001. > > 3. Is node interleaving enabled in the BIOS? When I go through the BIOS setup, I do not see any choice for the node interleaving ON or OFF. However, I think that the two CPUs (as node0 and node1) are made NUMA aware by default, but I could be quite wrong. Out of ignorance, therefore, the following are the contents of /sys/devices/system/node/node0/numastat & numa_hit 3620274 numa_miss 0 numa_foreign 0 interleave_hit 21903 local_node 3610298agravaited other_node 9976 Similarly, following are the the contents of /sys/devices/system/node/node1/numastat numa_hit 3089426 numa_miss 0 numa_foreign 0 interleave_hit 38355 local_node 3072605 other_node 16821 > > Phil. Thanks again, Phil, and I hope to hear from you soon. -- Best regards. Sheo (Sheo S. Prasad) Creative Research Enterprises 6354 Camino del Lago Pleasanton, CA 94566, USA Voice Phone: (+1) 925 426-9341 Fax Phone: (+1) 925 426-9417 e-mail: [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Disturbing wide variation in execution time
Dear David, The program is an atmospheric chemistry-transport modeling code that computes the distributions of atmospheric species (e.g., ozone) as a function of latitude and altitude and how that changes with time. Thanks for taking time to think about my problem. I greatly appreciate it. I hope to hear from you soon. Regards. Sheo On Thursday July 7 2005 12:10 am, you wrote: > From: Sheo Shanker Prasad <[EMAIL PROTECTED]> > Date: Wed, 6 Jul 2005 23:44:53 -0700 > > > I will appreciate your help in eliminating a disturbing wide variation > > (by a factors of 2 to 2.5) in the execution time of a test (execution > > benchmark) program under identical conditions even when the machine is > > freshly started (rebooted) and no other user program is running (not even > > e-mail or Internet browser). > > You haven't told us exactly what your test program does. > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- Best regards. Sheo (Sheo S. Prasad) Creative Research Enterprises 6354 Camino del Lago Pleasanton, CA 94566, USA Voice Phone: (+1) 925 426-9341 Fax Phone: (+1) 925 426-9417 e-mail: [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Disturbing wide variation in execution time
I will appreciate your help in eliminating a disturbing wide variation (by a factors of 2 to 2.5) in the execution time of a test (execution benchmark) program under identical conditions even when the machine is freshly started (rebooted) and no other user program is running (not even e-mail or Internet browser). I have a dual Opteron 250 (2.4 GHz) running SuSE 9.3 Pro & Linux version 2.6.11.4-21.7-smp ([EMAIL PROTECTED]) (gcc version 3.3.5 20050117 (prerelease) (SUSE Linux)) #1 SMP Thu Jun 2 14:23:14 UTC 2005. The motherboard is Tyan Thunder K8W (S2885 ANRF) with AMI BIOS The machine has 4GB of PC3200 DDR RAM, two dimms on each CPU. The original machine bought from a vendor about 6 months ago. At that time it was running SuSE 9.1 Pro and the execution time for the same test program was consistently the same (around 2m 37s +/- a few %). Then the mother board failed and the machine went totally dead. The vendor then replaced the failed motherboard with a new Tyan Thunder K8W and installed the SuSE 9.3. I am not sure whether or not the AMI BIOS was also replaced. When the repaired machine was started, I began to notice the disturbing wide variation and the frequect significant slow down of the machine as exhibited by the factor of 2 to 2.5 increased execution time of the test program as described above. Sometimes it would be quite fast (executing at the original 2m 40s) and sometime a factor of 2.5 slow, and sometimes with speed in between. I have already done these tests. I have tested the memory using both memtest86+ version 1.6 and memtest86-3.2. In both tests done over 3 cycles NO memory error was reported. I also ran Linux version of BYTE Bench mark for memory, floating point and integer indices. These tests matched test reported by others for their Opteron 250. Nevertheless, I have this wide and random variation in the execution time of given program under identical conditions. Guided by the comments I received from suse-amd64 user mailing list and the advises posted on LKML.ORG (this list), I tried booting with the option "mem=3000M" (significantly less than 4000M). That does not help either. I am now perplexed as to why the machine is behaving with so unpredictable speeds varying by factors of 2 to 2.5. What could the the cause and how can I get rid of it and make the machine reliable and efficient? (Also, when I boot with mem=3000M, then does that mean that the remainingg memory is wasted? What is the significance of putting that limit on the memory?) Your help will be greatly appreciated. Best regards. Sheo (Sheo S. Prasad) Creative Research Enterprises 6354 Camino del Lago Pleasanton, CA 94566, USA Voice Phone: (+1) 925 426-9341 Fax Phone: (+1) 925 426-9417 e-mail: [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/