Re: [OMPI users] Bad performance when scattering big size of data?

2010-10-05 Thread Terry Dontje

 On 10/05/2010 10:23 AM, Storm Zhang wrote:
Sorry, I should say one more thing about the 500 procs test. I tried 
to run two 500 procs at the same time using SGE and it runs fast and 
finishes at the same time as the single run. So I think OpenMPI can 
handle them separately very well.


For the bind-to-core, I tried to run mpirun --help but not find the 
bind-to-core info. I only see bynode or byslot options. Is it same as 
bind-to-core? My mpirun shows version 1.3.3 but ompi_info shows 1.4.2.


No, -bynode/-byslot is for mapping not binding.  I cannot explain the 
different release versions of ompi_info and mpirun.  Have you done a 
which to see where each of them are located.  Anyways, 1.3.3 does not 
have any of the -bind-to-* options.


--td

Thanks a lot.

Linbao


On Mon, Oct 4, 2010 at 9:18 PM, Eugene Loh > wrote:


Storm Zhang wrote:


Here is what I meant: the results of 500 procs in fact shows
it with 272-304(<500) real cores, the program's running time
is good, which is almost five times 100 procs' time. So it can
be handled very well. Therefore I guess OpenMPI or Rocks OS
does make use of hyperthreading to do the job. But with 600
procs, the running time is more than double of that of 500
procs. I don't know why. This is my problem.
BTW, how to use -bind-to-core? I added it as mpirun's options.
It always gives me error " the executable 'bind-to-core' can't
be found. Isn't it like:
mpirun --mca btl_tcp_if_include eth0 -np 600  -bind-to-core
scatttest


Thanks for sending the mpirun run and error message.  That helps.

It's not recognizing the --bind-to-core option.  (Single hyphen,
as you had, should also be okay.)  Skimming through the e-mail, it
looks like you are using OMPI 1.3.2 and 1.4.2.  Did you try
--bind-to-core with both?  If I remember my version numbers,
--bind-to-core will not be recognized with 1.3.2, but should be
with 1.4.2.  Could it be that you only tried 1.3.2?

Another option is to try "mpirun --help".  Make sure that it
reports --bind-to-core.

___
users mailing list
us...@open-mpi.org 
http://www.open-mpi.org/mailman/listinfo.cgi/users



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle * - Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI users] Bad performance when scattering big size of data?

2010-10-05 Thread Storm Zhang
Sorry, I should say one more thing about the 500 procs test. I tried to run
two 500 procs at the same time using SGE and it runs fast and finishes at
the same time as the single run. So I think OpenMPI can handle them
separately very well.

For the bind-to-core, I tried to run mpirun --help but not find the
bind-to-core info. I only see bynode or byslot options. Is it same as
bind-to-core? My mpirun shows version 1.3.3 but ompi_info shows 1.4.2.

Thanks a lot.

Linbao


On Mon, Oct 4, 2010 at 9:18 PM, Eugene Loh  wrote:

> Storm Zhang wrote:
>
>
>> Here is what I meant: the results of 500 procs in fact shows it with
>> 272-304(<500) real cores, the program's running time is good, which is
>> almost five times 100 procs' time. So it can be handled very well. Therefore
>> I guess OpenMPI or Rocks OS does make use of hyperthreading to do the job.
>> But with 600 procs, the running time is more than double of that of 500
>> procs. I don't know why. This is my problem.
>> BTW, how to use -bind-to-core? I added it as mpirun's options. It always
>> gives me error " the executable 'bind-to-core' can't be found. Isn't it
>> like:
>> mpirun --mca btl_tcp_if_include eth0 -np 600  -bind-to-core scatttest
>>
>
> Thanks for sending the mpirun run and error message.  That helps.
>
> It's not recognizing the --bind-to-core option.  (Single hyphen, as you
> had, should also be okay.)  Skimming through the e-mail, it looks like you
> are using OMPI 1.3.2 and 1.4.2.  Did you try --bind-to-core with both?  If I
> remember my version numbers, --bind-to-core will not be recognized with
> 1.3.2, but should be with 1.4.2.  Could it be that you only tried 1.3.2?
>
> Another option is to try "mpirun --help".  Make sure that it reports
> --bind-to-core.
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] Bad performance when scattering big size of data?

2010-10-04 Thread Eugene Loh

Storm Zhang wrote:



Here is what I meant: the results of 500 procs in fact shows it with 
272-304(<500) real cores, the program's running time is good, which is 
almost five times 100 procs' time. So it can be handled very well. 
Therefore I guess OpenMPI or Rocks OS does make use of hyperthreading 
to do the job. But with 600 procs, the running time is more than 
double of that of 500 procs. I don't know why. This is my problem.  

BTW, how to use -bind-to-core? I added it as mpirun's options. It 
always gives me error " the executable 'bind-to-core' can't be found. 
Isn't it like:

mpirun --mca btl_tcp_if_include eth0 -np 600  -bind-to-core scatttest


Thanks for sending the mpirun run and error message.  That helps.

It's not recognizing the --bind-to-core option.  (Single hyphen, as you 
had, should also be okay.)  Skimming through the e-mail, it looks like 
you are using OMPI 1.3.2 and 1.4.2.  Did you try --bind-to-core with 
both?  If I remember my version numbers, --bind-to-core will not be 
recognized with 1.3.2, but should be with 1.4.2.  Could it be that you 
only tried 1.3.2?


Another option is to try "mpirun --help".  Make sure that it reports 
--bind-to-core.


Re: [OMPI users] Bad performance when scattering big size of data?

2010-10-04 Thread Storm Zhang
Here is what I meant: the results of 500 procs in fact shows it with
272-304(<500) real cores, the program's running time is good, which is
almost five times 100 procs' time. So it can be handled very well. Therefore
I guess OpenMPI or Rocks OS does make use of hyperthreading to do the job.
But with 600 procs, the running time is more than double of that of 500
procs. I don't know why. This is my problem.

BTW, how to use -bind-to-core? I added it as mpirun's options. It always
gives me error " the executable 'bind-to-core' can't be found. Isn't it
like:
mpirun --mca btl_tcp_if_include eth0 -np 600  -bind-to-core scatttest

Thank you very much.

Linbao

On Mon, Oct 4, 2010 at 4:42 PM, Ralph Castain  wrote:

>
> On Oct 4, 2010, at 1:48 PM, Storm Zhang wrote:
>
> Thanks a lot, Ralgh. As I said, I also tried to use SGE(also showing 1024
> available for parallel tasks) which only assign 34-38 compute nodes which
> only has 272-304 real cores for 500 procs running. The running time is
> consistent with 100 procs and not a lot fluctuations due to the number of
> machines' changing.
>
>
> Afraid I don't understand your statement. If you have 500 procs running on
> < 500 cores, then the performance relative to a high-performance job (#procs
> <= #cores) will be worse. We deliberately dial down the performance when
> oversubscribed to ensure that procs "play nice" in situations where the node
> is oversubscribed.
>
>  So I guess it is not related to hyperthreading. Correct me if I'm wrong.
>
>
> Has nothing to do with hyperthreading - OMPI has no knowledge of
> hyperthreads at this time.
>
>
> BTW, how to bind the proc to the core? I tried --bind-to-core or
> -bind-to-core but neither works. Is it for OpenMP, not for OpenMPI?
>
>
> Those should work. You might try --report-bindings to see what OMPI thought
> it did.
>
>
> Linbao
>
>
> On Mon, Oct 4, 2010 at 12:27 PM, Ralph Castain  wrote:
>
>> Some of what you are seeing is the natural result of context
>> switchingsome thoughts regarding the results:
>>
>> 1. You didn't bind your procs to cores when running with #procs < #cores,
>> so you're performance in those scenarios will also be less than max.
>>
>> 2. Once the number of procs exceeds the number of cores, you guarantee a
>> lot of context switching, so performance will definitely take a hit.
>>
>> 3. Sometime in the not-too-distant-future, OMPI will (hopefully) become
>> hyperthread aware. For now, we don't see them as separate processing units.
>> So as far as OMPI is concerned, you only have 512 computing units to work
>> with, not 1024.
>>
>> Bottom line is that you are running oversubscribed, so OMPI turns down
>> your performance so that the machine doesn't hemorrhage as it context
>> switches.
>>
>>
>> On Oct 4, 2010, at 11:06 AM, Doug Reeder wrote:
>>
>> In my experience hyperthreading can't really deliver two cores worth of
>> processing simultaneously for processes expecting sole use of a core. Since
>> you really have 512 cores I'm not surprised that you see a performance hit
>> when requesting > 512 compute units. We should really get input from a
>> hyperthreading expert, preferably form intel.
>>
>> Doug Reeder
>> On Oct 4, 2010, at 9:53 AM, Storm Zhang wrote:
>>
>> We have 64 compute nodes which are dual qual-core and hyperthreaded CPUs.
>> So we have 1024 compute units shown in the ROCKS 5.3 system. I'm trying to
>> scatter an array from the master node to the compute nodes using mpiCC and
>> mpirun using C++.
>>
>> Here is my test:
>>
>> The array size is 18KB * Number of compute nodes and is scattered to the
>> compute nodes 5000 times repeatly.
>>
>> The average running time(seconds):
>>
>> 100 nodes: 170,
>> 400 nodes: 690,
>> 500 nodes: 855,
>> 600 nodes: 2550,
>> 700 nodes: 2720,
>> 800 nodes: 2900,
>>
>> There is a big jump of running time from 500 nodes to 600 nodes. Don't
>> know what's the problem.
>> Tried both in OMPI 1.3.2 and OMPI 1.4.2. Running time is a little faster
>> for all the tests in 1.4.2 but the jump still exists.
>> Tried using either Bcast function or simply Send/Recv which give very
>> close results.
>> Tried both in running it directly or using SGE and got the same results.
>>
>> The code and ompi_info are attached to this email. The direct running
>> command is :
>> /opt/openmpi/bin/mpirun --mca btl_tcp_if_include eth0 --machinefile
>> ../machines -np 600 scatttest
>>
>> The ifconfig of head node for eth0 is:
>> eth0  Link encap:Ethernet  HWaddr 00:26:B9:56:8B:44
>>   inet addr:192.168.1.1  Bcast:192.168.1.255  Mask:255.255.255.0
>>   inet6 addr: fe80::226:b9ff:fe56:8b44/64 Scope:Link
>>   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>>   RX packets:1096060373 errors:0 dropped:2512622 overruns:0
>> frame:0
>>   TX packets:513387679 errors:0 dropped:0 overruns:0 carrier:0
>>   collisions:0 txqueuelen:1000
>>   RX bytes:832328807459 (775.1 GiB)  TX 

Re: [OMPI users] Bad performance when scattering big size of data?

2010-10-04 Thread Ralph Castain

On Oct 4, 2010, at 1:48 PM, Storm Zhang wrote:

> Thanks a lot, Ralgh. As I said, I also tried to use SGE(also showing 1024 
> available for parallel tasks) which only assign 34-38 compute nodes which 
> only has 272-304 real cores for 500 procs running. The running time is 
> consistent with 100 procs and not a lot fluctuations due to the number of 
> machines' changing.

Afraid I don't understand your statement. If you have 500 procs running on < 
500 cores, then the performance relative to a high-performance job (#procs <= 
#cores) will be worse. We deliberately dial down the performance when 
oversubscribed to ensure that procs "play nice" in situations where the node is 
oversubscribed.

>  So I guess it is not related to hyperthreading. Correct me if I'm wrong.

Has nothing to do with hyperthreading - OMPI has no knowledge of hyperthreads 
at this time.

> 
> BTW, how to bind the proc to the core? I tried --bind-to-core or 
> -bind-to-core but neither works. Is it for OpenMP, not for OpenMPI? 

Those should work. You might try --report-bindings to see what OMPI thought it 
did.

> 
> Linbao
> 
> 
> On Mon, Oct 4, 2010 at 12:27 PM, Ralph Castain  wrote:
> Some of what you are seeing is the natural result of context 
> switchingsome thoughts regarding the results:
> 
> 1. You didn't bind your procs to cores when running with #procs < #cores, so 
> you're performance in those scenarios will also be less than max. 
> 
> 2. Once the number of procs exceeds the number of cores, you guarantee a lot 
> of context switching, so performance will definitely take a hit.
> 
> 3. Sometime in the not-too-distant-future, OMPI will (hopefully) become 
> hyperthread aware. For now, we don't see them as separate processing units. 
> So as far as OMPI is concerned, you only have 512 computing units to work 
> with, not 1024.
> 
> Bottom line is that you are running oversubscribed, so OMPI turns down your 
> performance so that the machine doesn't hemorrhage as it context switches.
> 
> 
> On Oct 4, 2010, at 11:06 AM, Doug Reeder wrote:
> 
>> In my experience hyperthreading can't really deliver two cores worth of 
>> processing simultaneously for processes expecting sole use of a core. Since 
>> you really have 512 cores I'm not surprised that you see a performance hit 
>> when requesting > 512 compute units. We should really get input from a 
>> hyperthreading expert, preferably form intel.
>> 
>> Doug Reeder
>> On Oct 4, 2010, at 9:53 AM, Storm Zhang wrote:
>> 
>>> We have 64 compute nodes which are dual qual-core and hyperthreaded CPUs. 
>>> So we have 1024 compute units shown in the ROCKS 5.3 system. I'm trying to 
>>> scatter an array from the master node to the compute nodes using mpiCC and 
>>> mpirun using C++. 
>>> 
>>> Here is my test:
>>> 
>>> The array size is 18KB * Number of compute nodes and is scattered to the 
>>> compute nodes 5000 times repeatly. 
>>> 
>>> The average running time(seconds):
>>> 
>>> 100 nodes: 170,
>>> 400 nodes: 690,
>>> 500 nodes: 855,
>>> 600 nodes: 2550,
>>> 700 nodes: 2720,
>>> 800 nodes: 2900,
>>> 
>>> There is a big jump of running time from 500 nodes to 600 nodes. Don't know 
>>> what's the problem. 
>>> Tried both in OMPI 1.3.2 and OMPI 1.4.2. Running time is a little faster 
>>> for all the tests in 1.4.2 but the jump still exists. 
>>> Tried using either Bcast function or simply Send/Recv which give very close 
>>> results. 
>>> Tried both in running it directly or using SGE and got the same results.
>>> 
>>> The code and ompi_info are attached to this email. The direct running 
>>> command is :
>>> /opt/openmpi/bin/mpirun --mca btl_tcp_if_include eth0 --machinefile 
>>> ../machines -np 600 scatttest
>>> 
>>> The ifconfig of head node for eth0 is:
>>> eth0  Link encap:Ethernet  HWaddr 00:26:B9:56:8B:44  
>>>   inet addr:192.168.1.1  Bcast:192.168.1.255  Mask:255.255.255.0
>>>   inet6 addr: fe80::226:b9ff:fe56:8b44/64 Scope:Link
>>>   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>>>   RX packets:1096060373 errors:0 dropped:2512622 overruns:0 frame:0
>>>   TX packets:513387679 errors:0 dropped:0 overruns:0 carrier:0
>>>   collisions:0 txqueuelen:1000 
>>>   RX bytes:832328807459 (775.1 GiB)  TX bytes:250824621959 (233.5 
>>> GiB)
>>>   Interrupt:106 Memory:d600-d6012800 
>>> 
>>> A typical ifconfig of a compute node is:
>>> eth0  Link encap:Ethernet  HWaddr 00:21:9B:9A:15:AC  
>>>   inet addr:192.168.1.253  Bcast:192.168.1.255  Mask:255.255.255.0
>>>   inet6 addr: fe80::221:9bff:fe9a:15ac/64 Scope:Link
>>>   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>>>   RX packets:362716422 errors:0 dropped:0 overruns:0 frame:0
>>>   TX packets:349967746 errors:0 dropped:0 overruns:0 carrier:0
>>>   collisions:0 txqueuelen:1000 
>>>   RX bytes:139699954685 (130.1 GiB)  TX bytes:338207741480 (314.9 
>>> GiB)
>>>   

Re: [OMPI users] Bad performance when scattering big size of data?

2010-10-04 Thread Storm Zhang
Thanks a lot, Ralgh. As I said, I also tried to use SGE(also showing 1024
available for parallel tasks) which only assign 34-38 compute nodes which
only has 272-304 real cores for 500 procs running. The running time is
consistent with 100 procs and not a lot fluctuations due to the number of
machines' changing. So I guess it is not related to hyperthreading. Correct
me if I'm wrong.

BTW, how to bind the proc to the core? I tried --bind-to-core or
-bind-to-core but neither works. Is it for OpenMP, not for OpenMPI?

Linbao


On Mon, Oct 4, 2010 at 12:27 PM, Ralph Castain  wrote:

> Some of what you are seeing is the natural result of context
> switchingsome thoughts regarding the results:
>
> 1. You didn't bind your procs to cores when running with #procs < #cores,
> so you're performance in those scenarios will also be less than max.
>
> 2. Once the number of procs exceeds the number of cores, you guarantee a
> lot of context switching, so performance will definitely take a hit.
>
> 3. Sometime in the not-too-distant-future, OMPI will (hopefully) become
> hyperthread aware. For now, we don't see them as separate processing units.
> So as far as OMPI is concerned, you only have 512 computing units to work
> with, not 1024.
>
> Bottom line is that you are running oversubscribed, so OMPI turns down your
> performance so that the machine doesn't hemorrhage as it context switches.
>
>
> On Oct 4, 2010, at 11:06 AM, Doug Reeder wrote:
>
> In my experience hyperthreading can't really deliver two cores worth of
> processing simultaneously for processes expecting sole use of a core. Since
> you really have 512 cores I'm not surprised that you see a performance hit
> when requesting > 512 compute units. We should really get input from a
> hyperthreading expert, preferably form intel.
>
> Doug Reeder
> On Oct 4, 2010, at 9:53 AM, Storm Zhang wrote:
>
> We have 64 compute nodes which are dual qual-core and hyperthreaded CPUs.
> So we have 1024 compute units shown in the ROCKS 5.3 system. I'm trying to
> scatter an array from the master node to the compute nodes using mpiCC and
> mpirun using C++.
>
> Here is my test:
>
> The array size is 18KB * Number of compute nodes and is scattered to the
> compute nodes 5000 times repeatly.
>
> The average running time(seconds):
>
> 100 nodes: 170,
> 400 nodes: 690,
> 500 nodes: 855,
> 600 nodes: 2550,
> 700 nodes: 2720,
> 800 nodes: 2900,
>
> There is a big jump of running time from 500 nodes to 600 nodes. Don't
> know what's the problem.
> Tried both in OMPI 1.3.2 and OMPI 1.4.2. Running time is a little faster
> for all the tests in 1.4.2 but the jump still exists.
> Tried using either Bcast function or simply Send/Recv which give very close
> results.
> Tried both in running it directly or using SGE and got the same results.
>
> The code and ompi_info are attached to this email. The direct running
> command is :
> /opt/openmpi/bin/mpirun --mca btl_tcp_if_include eth0 --machinefile
> ../machines -np 600 scatttest
>
> The ifconfig of head node for eth0 is:
> eth0  Link encap:Ethernet  HWaddr 00:26:B9:56:8B:44
>   inet addr:192.168.1.1  Bcast:192.168.1.255  Mask:255.255.255.0
>   inet6 addr: fe80::226:b9ff:fe56:8b44/64 Scope:Link
>   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>   RX packets:1096060373 errors:0 dropped:2512622 overruns:0 frame:0
>   TX packets:513387679 errors:0 dropped:0 overruns:0 carrier:0
>   collisions:0 txqueuelen:1000
>   RX bytes:832328807459 (775.1 GiB)  TX bytes:250824621959 (233.5
> GiB)
>   Interrupt:106 Memory:d600-d6012800
>
> A typical ifconfig of a compute node is:
> eth0  Link encap:Ethernet  HWaddr 00:21:9B:9A:15:AC
>   inet addr:192.168.1.253  Bcast:192.168.1.255  Mask:255.255.255.0
>   inet6 addr: fe80::221:9bff:fe9a:15ac/64 Scope:Link
>   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>   RX packets:362716422 errors:0 dropped:0 overruns:0 frame:0
>   TX packets:349967746 errors:0 dropped:0 overruns:0 carrier:0
>   collisions:0 txqueuelen:1000
>   RX bytes:139699954685 (130.1 GiB)  TX bytes:338207741480 (314.9
> GiB)
>   Interrupt:82 Memory:d600-d6012800
>
>
> Does anyone help me out of this? It bothers me a lot.
>
> Thank you very much.
>
> Linbao
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] Bad performance when scattering big size of data?

2010-10-04 Thread Ralph Castain
Some of what you are seeing is the natural result of context switchingsome 
thoughts regarding the results:

1. You didn't bind your procs to cores when running with #procs < #cores, so 
you're performance in those scenarios will also be less than max. 

2. Once the number of procs exceeds the number of cores, you guarantee a lot of 
context switching, so performance will definitely take a hit.

3. Sometime in the not-too-distant-future, OMPI will (hopefully) become 
hyperthread aware. For now, we don't see them as separate processing units. So 
as far as OMPI is concerned, you only have 512 computing units to work with, 
not 1024.

Bottom line is that you are running oversubscribed, so OMPI turns down your 
performance so that the machine doesn't hemorrhage as it context switches.


On Oct 4, 2010, at 11:06 AM, Doug Reeder wrote:

> In my experience hyperthreading can't really deliver two cores worth of 
> processing simultaneously for processes expecting sole use of a core. Since 
> you really have 512 cores I'm not surprised that you see a performance hit 
> when requesting > 512 compute units. We should really get input from a 
> hyperthreading expert, preferably form intel.
> 
> Doug Reeder
> On Oct 4, 2010, at 9:53 AM, Storm Zhang wrote:
> 
>> We have 64 compute nodes which are dual qual-core and hyperthreaded CPUs. So 
>> we have 1024 compute units shown in the ROCKS 5.3 system. I'm trying to 
>> scatter an array from the master node to the compute nodes using mpiCC and 
>> mpirun using C++. 
>> 
>> Here is my test:
>> 
>> The array size is 18KB * Number of compute nodes and is scattered to the 
>> compute nodes 5000 times repeatly. 
>> 
>> The average running time(seconds):
>> 
>> 100 nodes: 170,
>> 400 nodes: 690,
>> 500 nodes: 855,
>> 600 nodes: 2550,
>> 700 nodes: 2720,
>> 800 nodes: 2900,
>> 
>> There is a big jump of running time from 500 nodes to 600 nodes. Don't know 
>> what's the problem. 
>> Tried both in OMPI 1.3.2 and OMPI 1.4.2. Running time is a little faster for 
>> all the tests in 1.4.2 but the jump still exists. 
>> Tried using either Bcast function or simply Send/Recv which give very close 
>> results. 
>> Tried both in running it directly or using SGE and got the same results.
>> 
>> The code and ompi_info are attached to this email. The direct running 
>> command is :
>> /opt/openmpi/bin/mpirun --mca btl_tcp_if_include eth0 --machinefile 
>> ../machines -np 600 scatttest
>> 
>> The ifconfig of head node for eth0 is:
>> eth0  Link encap:Ethernet  HWaddr 00:26:B9:56:8B:44  
>>   inet addr:192.168.1.1  Bcast:192.168.1.255  Mask:255.255.255.0
>>   inet6 addr: fe80::226:b9ff:fe56:8b44/64 Scope:Link
>>   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>>   RX packets:1096060373 errors:0 dropped:2512622 overruns:0 frame:0
>>   TX packets:513387679 errors:0 dropped:0 overruns:0 carrier:0
>>   collisions:0 txqueuelen:1000 
>>   RX bytes:832328807459 (775.1 GiB)  TX bytes:250824621959 (233.5 
>> GiB)
>>   Interrupt:106 Memory:d600-d6012800 
>> 
>> A typical ifconfig of a compute node is:
>> eth0  Link encap:Ethernet  HWaddr 00:21:9B:9A:15:AC  
>>   inet addr:192.168.1.253  Bcast:192.168.1.255  Mask:255.255.255.0
>>   inet6 addr: fe80::221:9bff:fe9a:15ac/64 Scope:Link
>>   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>>   RX packets:362716422 errors:0 dropped:0 overruns:0 frame:0
>>   TX packets:349967746 errors:0 dropped:0 overruns:0 carrier:0
>>   collisions:0 txqueuelen:1000 
>>   RX bytes:139699954685 (130.1 GiB)  TX bytes:338207741480 (314.9 
>> GiB)
>>   Interrupt:82 Memory:d600-d6012800 
>> 
>> 
>> Does anyone help me out of this? It bothers me a lot.
>> 
>> Thank you very much.
>> 
>> Linbao
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] Bad performance when scattering big size of data?

2010-10-04 Thread Storm Zhang
Thanks a lot for your reply, Doug.

There is one more thing I forgot to mention. For 500 nodes test, I observe
if I use SGE, it runs only almost half of our cluster, like 35-38 nodes, not
uniformly distributed on the whole cluster but the running time is still
good.  So I guess it is not a hyperthreading problem.

Linbao

On Mon, Oct 4, 2010 at 12:06 PM, Doug Reeder  wrote:

> In my experience hyperthreading can't really deliver two cores worth of
> processing simultaneously for processes expecting sole use of a core. Since
> you really have 512 cores I'm not surprised that you see a performance hit
> when requesting > 512 compute units. We should really get input from a
> hyperthreading expert, preferably form intel.
>
> Doug Reeder
> On Oct 4, 2010, at 9:53 AM, Storm Zhang wrote:
>
> We have 64 compute nodes which are dual qual-core and hyperthreaded CPUs.
> So we have 1024 compute units shown in the ROCKS 5.3 system. I'm trying to
> scatter an array from the master node to the compute nodes using mpiCC and
> mpirun using C++.
>
> Here is my test:
>
> The array size is 18KB * Number of compute nodes and is scattered to the
> compute nodes 5000 times repeatly.
>
> The average running time(seconds):
>
> 100 nodes: 170,
> 400 nodes: 690,
> 500 nodes: 855,
> 600 nodes: 2550,
> 700 nodes: 2720,
> 800 nodes: 2900,
>
> There is a big jump of running time from 500 nodes to 600 nodes. Don't
> know what's the problem.
> Tried both in OMPI 1.3.2 and OMPI 1.4.2. Running time is a little faster
> for all the tests in 1.4.2 but the jump still exists.
> Tried using either Bcast function or simply Send/Recv which give very close
> results.
> Tried both in running it directly or using SGE and got the same results.
>
> The code and ompi_info are attached to this email. The direct running
> command is :
> /opt/openmpi/bin/mpirun --mca btl_tcp_if_include eth0 --machinefile
> ../machines -np 600 scatttest
>
> The ifconfig of head node for eth0 is:
> eth0  Link encap:Ethernet  HWaddr 00:26:B9:56:8B:44
>   inet addr:192.168.1.1  Bcast:192.168.1.255  Mask:255.255.255.0
>   inet6 addr: fe80::226:b9ff:fe56:8b44/64 Scope:Link
>   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>   RX packets:1096060373 errors:0 dropped:2512622 overruns:0 frame:0
>   TX packets:513387679 errors:0 dropped:0 overruns:0 carrier:0
>   collisions:0 txqueuelen:1000
>   RX bytes:832328807459 (775.1 GiB)  TX bytes:250824621959 (233.5
> GiB)
>   Interrupt:106 Memory:d600-d6012800
>
> A typical ifconfig of a compute node is:
> eth0  Link encap:Ethernet  HWaddr 00:21:9B:9A:15:AC
>   inet addr:192.168.1.253  Bcast:192.168.1.255  Mask:255.255.255.0
>   inet6 addr: fe80::221:9bff:fe9a:15ac/64 Scope:Link
>   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>   RX packets:362716422 errors:0 dropped:0 overruns:0 frame:0
>   TX packets:349967746 errors:0 dropped:0 overruns:0 carrier:0
>   collisions:0 txqueuelen:1000
>   RX bytes:139699954685 (130.1 GiB)  TX bytes:338207741480 (314.9
> GiB)
>   Interrupt:82 Memory:d600-d6012800
>
>
> Does anyone help me out of this? It bothers me a lot.
>
> Thank you very much.
>
> Linbao
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] Bad performance when scattering big size of data?

2010-10-04 Thread Doug Reeder
In my experience hyperthreading can't really deliver two cores worth of 
processing simultaneously for processes expecting sole use of a core. Since you 
really have 512 cores I'm not surprised that you see a performance hit when 
requesting > 512 compute units. We should really get input from a 
hyperthreading expert, preferably form intel.

Doug Reeder
On Oct 4, 2010, at 9:53 AM, Storm Zhang wrote:

> We have 64 compute nodes which are dual qual-core and hyperthreaded CPUs. So 
> we have 1024 compute units shown in the ROCKS 5.3 system. I'm trying to 
> scatter an array from the master node to the compute nodes using mpiCC and 
> mpirun using C++. 
> 
> Here is my test:
> 
> The array size is 18KB * Number of compute nodes and is scattered to the 
> compute nodes 5000 times repeatly. 
> 
> The average running time(seconds):
> 
> 100 nodes: 170,
> 400 nodes: 690,
> 500 nodes: 855,
> 600 nodes: 2550,
> 700 nodes: 2720,
> 800 nodes: 2900,
> 
> There is a big jump of running time from 500 nodes to 600 nodes. Don't know 
> what's the problem. 
> Tried both in OMPI 1.3.2 and OMPI 1.4.2. Running time is a little faster for 
> all the tests in 1.4.2 but the jump still exists. 
> Tried using either Bcast function or simply Send/Recv which give very close 
> results. 
> Tried both in running it directly or using SGE and got the same results.
> 
> The code and ompi_info are attached to this email. The direct running command 
> is :
> /opt/openmpi/bin/mpirun --mca btl_tcp_if_include eth0 --machinefile 
> ../machines -np 600 scatttest
> 
> The ifconfig of head node for eth0 is:
> eth0  Link encap:Ethernet  HWaddr 00:26:B9:56:8B:44  
>   inet addr:192.168.1.1  Bcast:192.168.1.255  Mask:255.255.255.0
>   inet6 addr: fe80::226:b9ff:fe56:8b44/64 Scope:Link
>   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>   RX packets:1096060373 errors:0 dropped:2512622 overruns:0 frame:0
>   TX packets:513387679 errors:0 dropped:0 overruns:0 carrier:0
>   collisions:0 txqueuelen:1000 
>   RX bytes:832328807459 (775.1 GiB)  TX bytes:250824621959 (233.5 GiB)
>   Interrupt:106 Memory:d600-d6012800 
> 
> A typical ifconfig of a compute node is:
> eth0  Link encap:Ethernet  HWaddr 00:21:9B:9A:15:AC  
>   inet addr:192.168.1.253  Bcast:192.168.1.255  Mask:255.255.255.0
>   inet6 addr: fe80::221:9bff:fe9a:15ac/64 Scope:Link
>   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>   RX packets:362716422 errors:0 dropped:0 overruns:0 frame:0
>   TX packets:349967746 errors:0 dropped:0 overruns:0 carrier:0
>   collisions:0 txqueuelen:1000 
>   RX bytes:139699954685 (130.1 GiB)  TX bytes:338207741480 (314.9 GiB)
>   Interrupt:82 Memory:d600-d6012800 
> 
> 
> Does anyone help me out of this? It bothers me a lot.
> 
> Thank you very much.
> 
> Linbao
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



[OMPI users] Bad performance when scattering big size of data?

2010-10-04 Thread Storm Zhang
We have 64 compute nodes which are dual qual-core and hyperthreaded CPUs. So
we have 1024 compute units shown in the ROCKS 5.3 system. I'm trying to
scatter an array from the master node to the compute nodes using mpiCC and
mpirun using C++.

Here is my test:

The array size is 18KB * Number of compute nodes and is scattered to the
compute nodes 5000 times repeatly.

The average running time(seconds):

100 nodes: 170,
400 nodes: 690,
500 nodes: 855,
600 nodes: 2550,
700 nodes: 2720,
800 nodes: 2900,

There is a big jump of running time from 500 nodes to 600 nodes. Don't know
what's the problem.
Tried both in OMPI 1.3.2 and OMPI 1.4.2. Running time is a little faster for
all the tests in 1.4.2 but the jump still exists.
Tried using either Bcast function or simply Send/Recv which give very close
results.
Tried both in running it directly or using SGE and got the same results.

The code and ompi_info are attached to this email. The direct running
command is :
/opt/openmpi/bin/mpirun --mca btl_tcp_if_include eth0 --machinefile
../machines -np 600 scatttest

The ifconfig of head node for eth0 is:
eth0  Link encap:Ethernet  HWaddr 00:26:B9:56:8B:44
  inet addr:192.168.1.1  Bcast:192.168.1.255  Mask:255.255.255.0
  inet6 addr: fe80::226:b9ff:fe56:8b44/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:1096060373 errors:0 dropped:2512622 overruns:0 frame:0
  TX packets:513387679 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:832328807459 (775.1 GiB)  TX bytes:250824621959 (233.5
GiB)
  Interrupt:106 Memory:d600-d6012800

A typical ifconfig of a compute node is:
eth0  Link encap:Ethernet  HWaddr 00:21:9B:9A:15:AC
  inet addr:192.168.1.253  Bcast:192.168.1.255  Mask:255.255.255.0
  inet6 addr: fe80::221:9bff:fe9a:15ac/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:362716422 errors:0 dropped:0 overruns:0 frame:0
  TX packets:349967746 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:139699954685 (130.1 GiB)  TX bytes:338207741480 (314.9
GiB)
  Interrupt:82 Memory:d600-d6012800


Does anyone help me out of this? It bothers me a lot.

Thank you very much.

Linbao


scatttest.cpp
Description: Binary data


ompi_info
Description: Binary data