Please help with following NUMA-related questions

2005-07-25 Thread Sheo Shanker Prasad
I will greatly appreciate any help regarding the following matters:

(1) How to know whether my machine is NUMA-aware or not,

(2) Difference between memory bank interleaving and node interleaving

(3) When the BIOS asks me to set bank interleaving as AUTO, then it says that  
AUTO allows memory access to spread out over banks on the same node or across 
nodes decreasing memory access contentions. However, I have no idea when the 
memory access is spread over banks on the same node or across nodes. I also 
do not know how to tell the machine to access memory across the nodes or on 
the same node. I have no idea as to how the AUTO choice affects 
NUMA-awareness.

(4) The BIOS also tells me that I could choose bani interleaving as DISABLED. 
But I do not know what its implications are for NUMA awareness.

Here are other relevant details. I have a dual-Opteron 250 (2.4GHz) set in 
Tyan Thunder 2885 K8W with AMIBIOS version 2.05.

When I bought it last year, the machine was running under SuSE 9.1 Pro and the 
Linux kernel was 2.6.5-7.108-smp. At that time both the Hardware Info from 
YAST and /vat/log/messages were explicitly mentioning things :

Scanning NUMA topology in Northbridge 24
  <6>Number of nodes 2 (10010)
  <6>Node 0 MemBase  Limit 7fff
  <6>Node 1 MemBase 8000 Limit cbff
  <6>Using node hash shift of 24

These messages indicated that NODE interleaving was off and the machine was 
NUMA-aware.

Then, after a few months, the motherboard failed and the machine was sent to 
the vendor for repair. It came back with SuSE 9.3 and the Linux kernel 
version 2.6.11.4-21.7-smp ([EMAIL PROTECTED]) (gcc version 3.3.5 20050117 
(prerelease) (SUSE Linux)) #1 SMP Thu Jun 2 14:23:14 UTC 2005.


Now  both the Hardware Info from YAST and /vat/log/messages DO NOT mention 
NUMA anywhere, and I do not have anyway to check whether the 
NODE-Interleaving is OFF or ON. My difficulties are compounded because I do 
not know how to interpret the chipset related setting in the BIOS.

Currently, in the BIOS setting (Chipset->memory config -> Bank Interleaving), 
I am asked to choose between AUTO & DISABLED. No choice is offered for Node 
Interleaving.

The only guidance for the choice is that interleaving allows memory access to 
spread out over banks on the same node or across nodes decreasing memory 
access contentions. Nothing is mentioned about what happens when Interleaving 
is disabled. Furthermore, if I choose AUTO, then I do not know when the 
memory is spread out over banks on the same node or across nodes.

Any help will be greatly appreciated.

Thanking you in advance.

-- 
Best regards.

Sheo
(Sheo S. Prasad)
Creative Research Enterprises
6354 Camino del Lago
Pleasanton, CA 94566, USA
Voice Phone: (+1) 925 426-9341
Fax   Phone: (+1) 925 426-9417
e-mail: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


/proc/mtrr & BIOS-provided RAM map

2005-07-09 Thread Sheo Shanker Prasad
I am having a horrible problem of wild and random variation in execution time 
of of a benchmark test program from one fresh reboot to another fresh reboot 
(and also from one try to another in any one reboot).

At the advice of a few colleagues at linux-kernel@vger.kernel.org and at 
[EMAIL PROTECTED] (who very kindly spared their valuable time to give me 
some advice), I tried booting with mem=4000M, mem= 3264, mem=3000M and 
mem=2000M (i.e, a value that is less than the actual RAM (4096M PC3200 in 4 
dimms with 2 dimms on each CPU that is an Opteron 250 at 2.4 GHz) that are 
under the control of SuSE 9.3 operating system.

None of the setting has helped. I continue to get the same wild and random 
variation in the execution of the test program under identical conditions 
(test program is the only program running, no Internet etc.).

Also, no matter what is set for  in mem=M ,  the contents of 
the /proc/mtrr & and the BIOS provide RAM map remain exactly the same 
(although the dmesg shows the amount of memory change with the value of  
in mem=M).

Is this normal? Does this show any BIOS problem? What these are telling me.

I will greatly appreciate any advice that any one may have for me on the above 
questions. The following are the contents of /proc/mtrr and the BIOS provided 
RAM map.

(1) content of /proc/mtrr :

reg00: base=0x (   0MB), size=2048MB: write-back, count=1
reg01: base=0x8000 (2048MB), size=1024MB: write-back, count=1
reg02: base=0xc000 (3072MB), size= 128MB: write-back, count=1
reg03: base=0xc800 (3200MB), size=  64MB: write-back, count=1
reg04: base=0xd000 (3328MB), size= 256MB: write-combining, count=2

(2) BIOS provided RAM map:

 BIOS-e820:  - 0009fc00 (usable)
 BIOS-e820: 0009fc00 - 000a (reserved)
 BIOS-e820: 000e - 0010 (reserved)
 BIOS-e820: 0010 - cbff (usable)
 BIOS-e820: cbff - cbfff000 (ACPI data)
 BIOS-e820: cbfff000 - cc00 (ACPI NVS)
 BIOS-e820: ff78 - 0001 (reserved)

Thanks for your help.
-- 
Best regards.

Sheo
(Sheo S. Prasad)
Creative Research Enterprises
6354 Camino del Lago
Pleasanton, CA 94566, USA
Voice Phone: (+1) 925 426-9341
Fax   Phone: (+1) 925 426-9417
e-mail: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Disturbing wide variation in execution time

2005-07-08 Thread Sheo Shanker Prasad
On Thursday July 7 2005 11:46 am, Philippe Troin wrote:
> Sheo Shanker Prasad <[EMAIL PROTECTED]> writes:
> > I will appreciate your help in eliminating a disturbing wide
> > variation (by a factors of 2 to 2.5) in the execution time of a test
> > (execution benchmark) program under identical conditions even when
> > the machine is freshly started (rebooted) and no other user program
> > is running (not even e-mail or Internet browser).
> >
> > I have a dual Opteron 250 (2.4 GHz) running SuSE 9.3 Pro & Linux
> > version 2.6.11.4-21.7-smp ([EMAIL PROTECTED]) (gcc version 3.3.5
> > 20050117 (prerelease) (SUSE Linux)) #1 SMP Thu Jun 2 14:23:14 UTC
> > 2005. The motherboard is Tyan Thunder K8W (S2885 ANRF) with AMI BIOS
> >
> > The machine has 4GB of PC3200 DDR RAM, two dimms on each CPU.
> >
> > The original machine bought from a vendor about 6 months ago. At
> > that time it was running SuSE 9.1 Pro and the execution time for the
> > same test program was consistently the same (around 2m 37s +/- a few
> > %). Then the mother board failed and the machine went totally
> > dead. The vendor then replaced the failed motherboard with a new
> > Tyan Thunder K8W and installed the SuSE 9.3. I am not sure whether
> > or not the AMI BIOS was also replaced.
> >
> > When the repaired machine was started, I began to notice the
> > disturbing wide variation and the frequect significant slow down of
> > the machine as exhibited by the factor of 2 to 2.5 increased
> > execution time of the test program as described above.  Sometimes it
> > would be quite fast (executing at the original 2m 40s) and sometime
> > a factor of 2.5 slow, and sometimes with speed in between.
>
> 8< snip >8

Thanks very much for your taking time to think about my problem. Here are 
answers to your questions.
>
>  1. Are you running an i386 kernel or an x86_64 kernel?

I think, I am running a x86_64 kernel.  I think so, because I had asked the 
vendor of the machine to install x86_64 and because the file

System.map-2.6.11.4-21.7-smp 

in the /boot directory has an entry: 804f T x86_64_start_kernel

and that directory also contains the gzipped file:

 symvers-2.6.11.4-21.7-x86_64-smp.gz

The operating system is Linux version 2.6.11.4-21.7-smp ([EMAIL PROTECTED])  
(gcc version 3.3.5 20050117 (prerelease) (SUSE Linux)) #1 SMP Thu Jun 2 
14:23:14 UTC 2005
>
>  2. Which BIOS version?

The BIOS is AMIBIOS version is 08.00.10 with the build date of 02/11/05 
09:44:04 and has the ID:  0001.

>
>  3. Is node interleaving enabled in the BIOS?

When I go through the BIOS setup, I do not see any choice for the node 
interleaving ON or OFF. However, I think that the two CPUs (as node0 and 
node1) are made NUMA aware by default, but I could be quite wrong. 

Out of ignorance, therefore,  the following are the contents of 

 /sys/devices/system/node/node0/numastat &

numa_hit 3620274
numa_miss 0
numa_foreign 0
interleave_hit 21903
local_node 3610298agravaited
other_node 9976

Similarly, following are the  the contents of

  /sys/devices/system/node/node1/numastat

numa_hit 3089426
numa_miss 0
numa_foreign 0
interleave_hit 38355
local_node 3072605
other_node 16821

>
> Phil.

 Thanks again, Phil, and I hope to hear from you soon.
-- 
Best regards.

Sheo
(Sheo S. Prasad)
Creative Research Enterprises
6354 Camino del Lago
Pleasanton, CA 94566, USA
Voice Phone: (+1) 925 426-9341
Fax   Phone: (+1) 925 426-9417
e-mail: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Disturbing wide variation in execution time

2005-07-07 Thread Sheo Shanker Prasad
Dear David,

The program is an atmospheric chemistry-transport modeling code that computes 
the distributions of atmospheric species (e.g., ozone) as a function of 
latitude and altitude and how that changes with time.

Thanks for taking time to think about my problem. I greatly appreciate it.

I hope to hear from you soon.

Regards.

Sheo


On Thursday July 7 2005 12:10 am, you wrote:
> From: Sheo Shanker Prasad <[EMAIL PROTECTED]>
> Date: Wed, 6 Jul 2005 23:44:53 -0700
>
> > I will appreciate your help in eliminating a disturbing wide variation
> > (by a factors of 2 to 2.5) in the execution time of a test (execution
> > benchmark) program under identical conditions even when the machine is
> > freshly started (rebooted) and no other user program is running (not even
> > e-mail or Internet browser).
>
> You haven't told us exactly what your test program does.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Best regards.

Sheo
(Sheo S. Prasad)
Creative Research Enterprises
6354 Camino del Lago
Pleasanton, CA 94566, USA
Voice Phone: (+1) 925 426-9341
Fax   Phone: (+1) 925 426-9417
e-mail: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Disturbing wide variation in execution time

2005-07-06 Thread Sheo Shanker Prasad
I will appreciate your help in eliminating a disturbing wide variation (by a 
factors of 2 to 2.5) in the execution time of a test (execution benchmark) 
program under identical conditions even when the machine is freshly started 
(rebooted) and no other user program is running (not even e-mail or Internet 
browser).

I have a dual Opteron 250 (2.4 GHz) running SuSE 9.3 Pro & Linux version 
2.6.11.4-21.7-smp ([EMAIL PROTECTED]) (gcc version 3.3.5 20050117 (prerelease) 
(SUSE Linux)) #1 SMP Thu Jun 2 14:23:14 UTC 2005. The motherboard is Tyan 
Thunder K8W (S2885 ANRF) with AMI BIOS

The machine has 4GB of PC3200 DDR RAM, two dimms on each CPU.

The original machine bought from a vendor about 6 months ago. At that time it 
was running SuSE 9.1 Pro and the execution time for the same test program was 
consistently the same (around 2m 37s +/- a few %). Then the mother board 
failed and the machine went totally dead. The vendor then replaced the failed 
motherboard with a new Tyan Thunder K8W and installed the SuSE 9.3. I am not 
sure whether or not the AMI BIOS was also replaced.

When the repaired machine was started, I began to notice the disturbing wide 
variation and the frequect significant slow down of the machine as exhibited 
by the factor of 2 to 2.5 increased execution time of the test program as 
described above.  Sometimes it would be quite fast (executing at the original 
2m 40s) and sometime a factor of 2.5 slow, and sometimes with speed in 
between.

I have already done these tests. I have tested the memory using both 
memtest86+ version 1.6 and memtest86-3.2. In both tests done over 3 cycles NO 
memory error was reported. I also ran Linux version of BYTE Bench mark for 
memory, floating point and integer indices. These tests matched test reported 
by others for their Opteron 250. 

Nevertheless, I have this wide and random variation in the execution time of 
given program under identical conditions. Guided by the comments I 
received from suse-amd64 user mailing list and the advises posted on LKML.ORG 
(this list), I tried booting with the option "mem=3000M" (significantly less 
than 4000M). That does not help either.

I am now perplexed as to why the machine is behaving with so unpredictable 
speeds varying by  factors of 2 to 2.5. What could the the cause and how can 
I get rid of it and make the machine reliable and efficient? (Also, when I 
boot with mem=3000M, then does that mean that the remainingg memory is wasted? 
What is the significance of putting that limit on the memory?)

Your help will be greatly appreciated.

Best regards.

Sheo
(Sheo S. Prasad)
Creative Research Enterprises
6354 Camino del Lago
Pleasanton, CA 94566, USA
Voice Phone: (+1) 925 426-9341
Fax   Phone: (+1) 925 426-9417
e-mail: [EMAIL PROTECTED]

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/