On Tue, 2007-12-11 at 17:11 -0500, Jie Chen wrote:
> Ingo Molnar wrote:
> > * Jie Chen <[EMAIL PROTECTED]> wrote:
> >
> >> The following is pthread_sync output for 2.6.21.7-cfs-v24 #1 SMP
> >> kernel.
> >
> >> 2 threads:
> >
> >> PARALLEL time = 11.106580 microseconds +/- 0.002460
> >>
On Tue, 2007-12-11 at 17:11 -0500, Jie Chen wrote:
Ingo Molnar wrote:
* Jie Chen [EMAIL PROTECTED] wrote:
The following is pthread_sync output for 2.6.21.7-cfs-v24 #1 SMP
kernel.
2 threads:
PARALLEL time = 11.106580 microseconds +/- 0.002460
PARALLEL overhead =
Ingo Molnar wrote:
* Jie Chen <[EMAIL PROTECTED]> wrote:
The following is pthread_sync output for 2.6.21.7-cfs-v24 #1 SMP
kernel.
2 threads:
PARALLEL time = 11.106580 microseconds +/- 0.002460
PARALLEL overhead =0.617590 microseconds +/- 0.003409
Output for Kernel 2.6.24-rc4
* Jie Chen <[EMAIL PROTECTED]> wrote:
> The following is pthread_sync output for 2.6.21.7-cfs-v24 #1 SMP
> kernel.
> 2 threads:
> PARALLEL time = 11.106580 microseconds +/- 0.002460
> PARALLEL overhead =0.617590 microseconds +/- 0.003409
> Output for Kernel 2.6.24-rc4 #1 SMP
>
Ingo Molnar wrote:
* Jie Chen <[EMAIL PROTECTED]> wrote:
Hi, Ingo:
I guess it is a good news. I did patch 2.6.21.7 kernel using your cfs
patch. The results of pthread_sync is the same as the non-patched
2.6.21 kernel. This means the performance of is not related to the
scheduler. As for
* Jie Chen <[EMAIL PROTECTED]> wrote:
> Hi, Ingo:
>
> I guess it is a good news. I did patch 2.6.21.7 kernel using your cfs
> patch. The results of pthread_sync is the same as the non-patched
> 2.6.21 kernel. This means the performance of is not related to the
> scheduler. As for overhead of
Ingo Molnar wrote:
* Jie Chen <[EMAIL PROTECTED]> wrote:
and then you use this in the measurement loop:
for (k=0; k<=OUTERREPS; k++){
start = getclock();
for (j=0; jthe problem is, this does not take the overhead of gettimeofday into
account - which overhead can easily reach 10
* Jie Chen <[EMAIL PROTECTED]> wrote:
>> and then you use this in the measurement loop:
>>
>>for (k=0; k<=OUTERREPS; k++){
>> start = getclock();
>> for (j=0; j> #ifdef _QMT_PUBLIC
>>delay((void *)0, 0);
>> #else
>>delay(0, 0, 0, (void *)0);
>> #endif
>> }
* Jie Chen [EMAIL PROTECTED] wrote:
and then you use this in the measurement loop:
for (k=0; k=OUTERREPS; k++){
start = getclock();
for (j=0; jinnerreps; j++){
#ifdef _QMT_PUBLIC
delay((void *)0, 0);
#else
delay(0, 0, 0, (void *)0);
#endif
}
Ingo Molnar wrote:
* Jie Chen [EMAIL PROTECTED] wrote:
and then you use this in the measurement loop:
for (k=0; k=OUTERREPS; k++){
start = getclock();
for (j=0; jinnerreps; j++){
#ifdef _QMT_PUBLIC
delay((void *)0, 0);
#else
delay(0, 0, 0, (void *)0);
#endif
* Jie Chen [EMAIL PROTECTED] wrote:
Hi, Ingo:
I guess it is a good news. I did patch 2.6.21.7 kernel using your cfs
patch. The results of pthread_sync is the same as the non-patched
2.6.21 kernel. This means the performance of is not related to the
scheduler. As for overhead of the
Ingo Molnar wrote:
* Jie Chen [EMAIL PROTECTED] wrote:
Hi, Ingo:
I guess it is a good news. I did patch 2.6.21.7 kernel using your cfs
patch. The results of pthread_sync is the same as the non-patched
2.6.21 kernel. This means the performance of is not related to the
scheduler. As for
* Jie Chen [EMAIL PROTECTED] wrote:
The following is pthread_sync output for 2.6.21.7-cfs-v24 #1 SMP
kernel.
2 threads:
PARALLEL time = 11.106580 microseconds +/- 0.002460
PARALLEL overhead =0.617590 microseconds +/- 0.003409
Output for Kernel 2.6.24-rc4 #1 SMP
PARALLEL
Ingo Molnar wrote:
* Jie Chen [EMAIL PROTECTED] wrote:
The following is pthread_sync output for 2.6.21.7-cfs-v24 #1 SMP
kernel.
2 threads:
PARALLEL time = 11.106580 microseconds +/- 0.002460
PARALLEL overhead =0.617590 microseconds +/- 0.003409
Output for Kernel 2.6.24-rc4
Ingo Molnar wrote:
* Jie Chen <[EMAIL PROTECTED]> wrote:
I did patch the header file and recompiled the kernel. I observed no
difference (two threads overhead stays too high). Thank you.
ok, i think i found it. You do this in your qmt/pthread_sync.c
test-code:
double get_time_of_day_()
* Jie Chen <[EMAIL PROTECTED]> wrote:
> I did patch the header file and recompiled the kernel. I observed no
> difference (two threads overhead stays too high). Thank you.
ok, i think i found it. You do this in your qmt/pthread_sync.c
test-code:
double get_time_of_day_()
{
...
err =
* Jie Chen [EMAIL PROTECTED] wrote:
I did patch the header file and recompiled the kernel. I observed no
difference (two threads overhead stays too high). Thank you.
ok, i think i found it. You do this in your qmt/pthread_sync.c
test-code:
double get_time_of_day_()
{
...
err =
Ingo Molnar wrote:
* Jie Chen [EMAIL PROTECTED] wrote:
I did patch the header file and recompiled the kernel. I observed no
difference (two threads overhead stays too high). Thank you.
ok, i think i found it. You do this in your qmt/pthread_sync.c
test-code:
double get_time_of_day_()
{
Ingo Molnar wrote:
* Jie Chen <[EMAIL PROTECTED]> wrote:
not "BARRIER time". I've re-read the discussion and found no hint
about how to build and run a barrier test. Either i missed it or it's
so obvious to you that you didnt mention it :-)
Ingo
Hi, Ingo:
Did you do configure
* Jie Chen <[EMAIL PROTECTED]> wrote:
>> not "BARRIER time". I've re-read the discussion and found no hint
>> about how to build and run a barrier test. Either i missed it or it's
>> so obvious to you that you didnt mention it :-)
>>
>> Ingo
>
> Hi, Ingo:
>
> Did you do configure
* Jie Chen [EMAIL PROTECTED] wrote:
not BARRIER time. I've re-read the discussion and found no hint
about how to build and run a barrier test. Either i missed it or it's
so obvious to you that you didnt mention it :-)
Ingo
Hi, Ingo:
Did you do configure --enable-public-release?
Ingo Molnar wrote:
* Jie Chen [EMAIL PROTECTED] wrote:
not BARRIER time. I've re-read the discussion and found no hint
about how to build and run a barrier test. Either i missed it or it's
so obvious to you that you didnt mention it :-)
Ingo
Hi, Ingo:
Did you do configure
Ingo Molnar wrote:
* Jie Chen <[EMAIL PROTECTED]> wrote:
sorry to be dense, but could you give me instructions how i could
remove the affinity mask and test the "barrier overhead" myself? I
have built "pthread_sync" and it outputs numbers for me - which one
would be the barrier overhead:
On Wed, 2007-11-21 at 15:34 -0500, Jie Chen wrote:
> It is clearly that the synchronization overhead increases as the number
> of threads increases in the kernel 2.6.21. But the synchronization
> overhead actually decreases as the number of threads increases in the
> kernel 2.6.23.8 (We
* Jie Chen <[EMAIL PROTECTED]> wrote:
>> sorry to be dense, but could you give me instructions how i could
>> remove the affinity mask and test the "barrier overhead" myself? I
>> have built "pthread_sync" and it outputs numbers for me - which one
>> would be the barrier overhead:
Peter Zijlstra wrote:
On Wed, 2007-11-21 at 15:34 -0500, Jie Chen wrote:
It is clearly that the synchronization overhead increases as the number
of threads increases in the kernel 2.6.21. But the synchronization
overhead actually decreases as the number of threads increases in the
kernel
Ingo Molnar wrote:
* Jie Chen <[EMAIL PROTECTED]> wrote:
I just disabled the affinity mask and reran the test. There were no
significant changes for two threads (barrier overhead is around 9
microseconds). As for 8 threads, the barrier overhead actually drops a
little, which is good. Let me
* Jie Chen <[EMAIL PROTECTED]> wrote:
> I just disabled the affinity mask and reran the test. There were no
> significant changes for two threads (barrier overhead is around 9
> microseconds). As for 8 threads, the barrier overhead actually drops a
> little, which is good. Let me know whether
Ingo Molnar wrote:
* Jie Chen <[EMAIL PROTECTED]> wrote:
Since I am using affinity flag to bind each thread to a different
core, the synchronization overhead should increases as the number of
cores/threads increases. But what we observed in the new kernel is the
opposite. The barrier
* Jie Chen <[EMAIL PROTECTED]> wrote:
> Since I am using affinity flag to bind each thread to a different
> core, the synchronization overhead should increases as the number of
> cores/threads increases. But what we observed in the new kernel is the
> opposite. The barrier overhead of two
Ingo Molnar wrote:
* Jie Chen <[EMAIL PROTECTED]> wrote:
the moment you saturate the system a bit more, the numbers should
improve even with such a ping-pong test.
You are right. If I manually do load balance (bind unrelated processes
on the other cores), my test code perform as well as it
* Jie Chen <[EMAIL PROTECTED]> wrote:
>> the moment you saturate the system a bit more, the numbers should
>> improve even with such a ping-pong test.
>
> You are right. If I manually do load balance (bind unrelated processes
> on the other cores), my test code perform as well as it did in the
Ingo Molnar a écrit :
* Eric Dumazet <[EMAIL PROTECTED]> wrote:
$ gcc -O2 -o burner burner.c
$ ./burner
Time to perform the unit of work on one thread is 0.040328 s
Time to perform the unit of work on 2 threads is 0.040221 s
ok, but this actually suggests that scheduling is fine for this,
* Eric Dumazet <[EMAIL PROTECTED]> wrote:
> $ gcc -O2 -o burner burner.c
> $ ./burner
> Time to perform the unit of work on one thread is 0.040328 s
> Time to perform the unit of work on 2 threads is 0.040221 s
ok, but this actually suggests that scheduling is fine for this,
correct?
Ingo Molnar wrote:
* Jie Chen <[EMAIL PROTECTED]> wrote:
I just ran the same test on two 2.6.24-rc4 kernels: one with
CONFIG_FAIR_GROUP_SCHED on and the other with CONFIG_FAIR_GROUP_SCHED
off. The odd behavior I described in my previous e-mails were still
there for both kernels. Let me know
Ingo Molnar a écrit :
* Jie Chen <[EMAIL PROTECTED]> wrote:
I just ran the same test on two 2.6.24-rc4 kernels: one with
CONFIG_FAIR_GROUP_SCHED on and the other with CONFIG_FAIR_GROUP_SCHED
off. The odd behavior I described in my previous e-mails were still
there for both kernels. Let me
* Jie Chen <[EMAIL PROTECTED]> wrote:
> I just ran the same test on two 2.6.24-rc4 kernels: one with
> CONFIG_FAIR_GROUP_SCHED on and the other with CONFIG_FAIR_GROUP_SCHED
> off. The odd behavior I described in my previous e-mails were still
> there for both kernels. Let me know If I can be
Ingo Molnar wrote:
* Jie Chen <[EMAIL PROTECTED]> wrote:
Simon Holm Th??gersen wrote:
ons, 21 11 2007 kl. 20:52 -0500, skrev Jie Chen:
There is a backport of the CFS scheduler to 2.6.21, see
http://lkml.org/lkml/2007/11/19/127
Hi, Simon:
I will try that after the thanksgiving holiday to
Ingo Molnar wrote:
* Jie Chen [EMAIL PROTECTED] wrote:
Simon Holm Th??gersen wrote:
ons, 21 11 2007 kl. 20:52 -0500, skrev Jie Chen:
There is a backport of the CFS scheduler to 2.6.21, see
http://lkml.org/lkml/2007/11/19/127
Hi, Simon:
I will try that after the thanksgiving holiday to find
Ingo Molnar wrote:
* Jie Chen [EMAIL PROTECTED] wrote:
the moment you saturate the system a bit more, the numbers should
improve even with such a ping-pong test.
You are right. If I manually do load balance (bind unrelated processes
on the other cores), my test code perform as well as it did
Ingo Molnar a écrit :
* Jie Chen [EMAIL PROTECTED] wrote:
I just ran the same test on two 2.6.24-rc4 kernels: one with
CONFIG_FAIR_GROUP_SCHED on and the other with CONFIG_FAIR_GROUP_SCHED
off. The odd behavior I described in my previous e-mails were still
there for both kernels. Let me know
Ingo Molnar a écrit :
* Eric Dumazet [EMAIL PROTECTED] wrote:
$ gcc -O2 -o burner burner.c
$ ./burner
Time to perform the unit of work on one thread is 0.040328 s
Time to perform the unit of work on 2 threads is 0.040221 s
ok, but this actually suggests that scheduling is fine for this,
* Jie Chen [EMAIL PROTECTED] wrote:
the moment you saturate the system a bit more, the numbers should
improve even with such a ping-pong test.
You are right. If I manually do load balance (bind unrelated processes
on the other cores), my test code perform as well as it did in the
kernel
* Eric Dumazet [EMAIL PROTECTED] wrote:
$ gcc -O2 -o burner burner.c
$ ./burner
Time to perform the unit of work on one thread is 0.040328 s
Time to perform the unit of work on 2 threads is 0.040221 s
ok, but this actually suggests that scheduling is fine for this,
correct?
Ingo
* Jie Chen [EMAIL PROTECTED] wrote:
sorry to be dense, but could you give me instructions how i could
remove the affinity mask and test the barrier overhead myself? I
have built pthread_sync and it outputs numbers for me - which one
would be the barrier overhead: Reference_time_1 ?
To
Ingo Molnar wrote:
* Jie Chen [EMAIL PROTECTED] wrote:
Since I am using affinity flag to bind each thread to a different
core, the synchronization overhead should increases as the number of
cores/threads increases. But what we observed in the new kernel is the
opposite. The barrier overhead
* Jie Chen [EMAIL PROTECTED] wrote:
Since I am using affinity flag to bind each thread to a different
core, the synchronization overhead should increases as the number of
cores/threads increases. But what we observed in the new kernel is the
opposite. The barrier overhead of two threads
Ingo Molnar wrote:
* Jie Chen [EMAIL PROTECTED] wrote:
sorry to be dense, but could you give me instructions how i could
remove the affinity mask and test the barrier overhead myself? I
have built pthread_sync and it outputs numbers for me - which one
would be the barrier overhead:
Peter Zijlstra wrote:
On Wed, 2007-11-21 at 15:34 -0500, Jie Chen wrote:
It is clearly that the synchronization overhead increases as the number
of threads increases in the kernel 2.6.21. But the synchronization
overhead actually decreases as the number of threads increases in the
kernel
Ingo Molnar wrote:
* Jie Chen [EMAIL PROTECTED] wrote:
I just disabled the affinity mask and reran the test. There were no
significant changes for two threads (barrier overhead is around 9
microseconds). As for 8 threads, the barrier overhead actually drops a
little, which is good. Let me
On Wed, 2007-11-21 at 15:34 -0500, Jie Chen wrote:
It is clearly that the synchronization overhead increases as the number
of threads increases in the kernel 2.6.21. But the synchronization
overhead actually decreases as the number of threads increases in the
kernel 2.6.23.8 (We observed
Ingo Molnar wrote:
* Jie Chen [EMAIL PROTECTED] wrote:
I just ran the same test on two 2.6.24-rc4 kernels: one with
CONFIG_FAIR_GROUP_SCHED on and the other with CONFIG_FAIR_GROUP_SCHED
off. The odd behavior I described in my previous e-mails were still
there for both kernels. Let me know If
* Jie Chen [EMAIL PROTECTED] wrote:
I just disabled the affinity mask and reran the test. There were no
significant changes for two threads (barrier overhead is around 9
microseconds). As for 8 threads, the barrier overhead actually drops a
little, which is good. Let me know whether I can
* Jie Chen [EMAIL PROTECTED] wrote:
I just ran the same test on two 2.6.24-rc4 kernels: one with
CONFIG_FAIR_GROUP_SCHED on and the other with CONFIG_FAIR_GROUP_SCHED
off. The odd behavior I described in my previous e-mails were still
there for both kernels. Let me know If I can be any
Ingo Molnar wrote:
* Jie Chen <[EMAIL PROTECTED]> wrote:
Simon Holm Th??gersen wrote:
ons, 21 11 2007 kl. 20:52 -0500, skrev Jie Chen:
There is a backport of the CFS scheduler to 2.6.21, see
http://lkml.org/lkml/2007/11/19/127
Hi, Simon:
I will try that after the thanksgiving holiday to
* Jie Chen <[EMAIL PROTECTED]> wrote:
> Simon Holm Th??gersen wrote:
>> ons, 21 11 2007 kl. 20:52 -0500, skrev Jie Chen:
>
>> There is a backport of the CFS scheduler to 2.6.21, see
>> http://lkml.org/lkml/2007/11/19/127
>>
> Hi, Simon:
>
> I will try that after the thanksgiving holiday to find
* Jie Chen [EMAIL PROTECTED] wrote:
Simon Holm Th??gersen wrote:
ons, 21 11 2007 kl. 20:52 -0500, skrev Jie Chen:
There is a backport of the CFS scheduler to 2.6.21, see
http://lkml.org/lkml/2007/11/19/127
Hi, Simon:
I will try that after the thanksgiving holiday to find out whether the
Ingo Molnar wrote:
* Jie Chen [EMAIL PROTECTED] wrote:
Simon Holm Th??gersen wrote:
ons, 21 11 2007 kl. 20:52 -0500, skrev Jie Chen:
There is a backport of the CFS scheduler to 2.6.21, see
http://lkml.org/lkml/2007/11/19/127
Hi, Simon:
I will try that after the thanksgiving holiday to find
On Wed, Nov 21, 2007 at 09:58:10PM -0500, Jie Chen wrote:
> Simon Holm Th??gersen wrote:
> >ons, 21 11 2007 kl. 20:52 -0500, skrev Jie Chen:
>
> >There is a backport of the CFS scheduler to 2.6.21, see
> >http://lkml.org/lkml/2007/11/19/127
> >
> Hi, Simon:
>
> I will try that after the
On Wed, Nov 21, 2007 at 09:58:10PM -0500, Jie Chen wrote:
Simon Holm Th??gersen wrote:
ons, 21 11 2007 kl. 20:52 -0500, skrev Jie Chen:
There is a backport of the CFS scheduler to 2.6.21, see
http://lkml.org/lkml/2007/11/19/127
Hi, Simon:
I will try that after the thanksgiving holiday
Simon Holm Thøgersen wrote:
ons, 21 11 2007 kl. 20:52 -0500, skrev Jie Chen:
There is a backport of the CFS scheduler to 2.6.21, see
http://lkml.org/lkml/2007/11/19/127
Hi, Simon:
I will try that after the thanksgiving holiday to find out whether the
odd behavior will show up using 2.6.21
ons, 21 11 2007 kl. 20:52 -0500, skrev Jie Chen:
> Eric Dumazet wrote:
> > Jie Chen a écrit :
> >> Hi, there:
> >>
> >> We have a simple pthread program that measures the synchronization
> >> overheads for various synchronization mechanisms such as spin locks,
> >> barriers (the barrier is
Eric Dumazet wrote:
Jie Chen a écrit :
Hi, there:
We have a simple pthread program that measures the synchronization
overheads for various synchronization mechanisms such as spin locks,
barriers (the barrier is implemented using queue-based barrier
algorithm) and so on. We have dual
Jie Chen a écrit :
Hi, there:
We have a simple pthread program that measures the synchronization
overheads for various synchronization mechanisms such as spin locks,
barriers (the barrier is implemented using queue-based barrier
algorithm) and so on. We have dual quad-core AMD opterons
Hi, there:
We have a simple pthread program that measures the synchronization
overheads for various synchronization mechanisms such as spin locks,
barriers (the barrier is implemented using queue-based barrier
algorithm) and so on. We have dual quad-core AMD opterons (barcelona)
clusters
Hi, there:
We have a simple pthread program that measures the synchronization
overheads for various synchronization mechanisms such as spin locks,
barriers (the barrier is implemented using queue-based barrier
algorithm) and so on. We have dual quad-core AMD opterons (barcelona)
clusters
Jie Chen a écrit :
Hi, there:
We have a simple pthread program that measures the synchronization
overheads for various synchronization mechanisms such as spin locks,
barriers (the barrier is implemented using queue-based barrier
algorithm) and so on. We have dual quad-core AMD opterons
Eric Dumazet wrote:
Jie Chen a écrit :
Hi, there:
We have a simple pthread program that measures the synchronization
overheads for various synchronization mechanisms such as spin locks,
barriers (the barrier is implemented using queue-based barrier
algorithm) and so on. We have dual
ons, 21 11 2007 kl. 20:52 -0500, skrev Jie Chen:
Eric Dumazet wrote:
Jie Chen a écrit :
Hi, there:
We have a simple pthread program that measures the synchronization
overheads for various synchronization mechanisms such as spin locks,
barriers (the barrier is implemented using
Simon Holm Thøgersen wrote:
ons, 21 11 2007 kl. 20:52 -0500, skrev Jie Chen:
There is a backport of the CFS scheduler to 2.6.21, see
http://lkml.org/lkml/2007/11/19/127
Hi, Simon:
I will try that after the thanksgiving holiday to find out whether the
odd behavior will show up using 2.6.21
70 matches
Mail list logo