Re: scheduling oddity on 2.6.20.3 stock

2007-06-02 Thread Bill Davidsen

David Schwartz wrote:

bunzip2 -c $file.bz2 |gzip -9 >$file.gz


So here are some actual results from a dual P3-1Ghz machine (2.6.21.1,
CFSv9). First lets time each operation individually:

$ time bunzip2 -k linux-2.6.21.tar.bz2

real1m5.626s
user1m2.240s
sys 0m3.144s


$ time gzip -9 linux-2.6.21.tar

real1m17.652s
user1m15.609s
sys 0m1.912s

The compress was the most complex (no surprise there) but they are close
enough that efficient overlap will definitely affect the total wall time. If
we can both decompress and compress in 1:17, we are optimal. First, let's
try the normal way:

$ time (bunzip2 -c linux-2.6.21.tar.bz2 | gzip -9 > test1)

real1m45.051s
user2m16.945s
sys 0m2.752s

1:45, or 1/3 over optimal. Now, with a 32MB non-blocking cache between the
two processes ('accel' creates a 32MB cache and uses 'select' to fill from
stdin and empty to stdout without blocking either direction):

$ time (bunzip2 -c linux-2.6.21.tar.bz2 | ./accel | gzip -9 > test2)

real1m18.361s
user2m19.589s
sys 0m6.356s

Within testing accuracy of optimal.

So it's not the scheduler. It's the fact that bunzip2/gzip have inadequate
input/output buffering. I don't think it's unreasonable to consider this a
defect in those programs.


They are hardly designed to optimize this operation...

For a tunable buffer program allowing the buffer size and buffers in the 
pool to be set, see www.tmr.com/~public/source program ptbuf. I wrote it 
as a proof of concept for a pthreads presentation I was giving, and it 
happened to be useful.


--
Bill Davidsen <[EMAIL PROTECTED]>
  "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: scheduling oddity on 2.6.20.3 stock

2007-06-02 Thread Bill Davidsen

David Schwartz wrote:

bunzip2 -c $file.bz2 |gzip -9 $file.gz


So here are some actual results from a dual P3-1Ghz machine (2.6.21.1,
CFSv9). First lets time each operation individually:

$ time bunzip2 -k linux-2.6.21.tar.bz2

real1m5.626s
user1m2.240s
sys 0m3.144s


$ time gzip -9 linux-2.6.21.tar

real1m17.652s
user1m15.609s
sys 0m1.912s

The compress was the most complex (no surprise there) but they are close
enough that efficient overlap will definitely affect the total wall time. If
we can both decompress and compress in 1:17, we are optimal. First, let's
try the normal way:

$ time (bunzip2 -c linux-2.6.21.tar.bz2 | gzip -9  test1)

real1m45.051s
user2m16.945s
sys 0m2.752s

1:45, or 1/3 over optimal. Now, with a 32MB non-blocking cache between the
two processes ('accel' creates a 32MB cache and uses 'select' to fill from
stdin and empty to stdout without blocking either direction):

$ time (bunzip2 -c linux-2.6.21.tar.bz2 | ./accel | gzip -9  test2)

real1m18.361s
user2m19.589s
sys 0m6.356s

Within testing accuracy of optimal.

So it's not the scheduler. It's the fact that bunzip2/gzip have inadequate
input/output buffering. I don't think it's unreasonable to consider this a
defect in those programs.


They are hardly designed to optimize this operation...

For a tunable buffer program allowing the buffer size and buffers in the 
pool to be set, see www.tmr.com/~public/source program ptbuf. I wrote it 
as a proof of concept for a pthreads presentation I was giving, and it 
happened to be useful.


--
Bill Davidsen [EMAIL PROTECTED]
  We have more to fear from the bungling of the incompetent than from
the machinations of the wicked.  - from Slashdot
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: scheduling oddity on 2.6.20.3 stock

2007-05-16 Thread David Schwartz

> > >> bunzip2 -c $file.bz2 |gzip -9 >$file.gz

So here are some actual results from a dual P3-1Ghz machine (2.6.21.1,
CFSv9). First lets time each operation individually:

$ time bunzip2 -k linux-2.6.21.tar.bz2

real1m5.626s
user1m2.240s
sys 0m3.144s


$ time gzip -9 linux-2.6.21.tar

real1m17.652s
user1m15.609s
sys 0m1.912s

The compress was the most complex (no surprise there) but they are close
enough that efficient overlap will definitely affect the total wall time. If
we can both decompress and compress in 1:17, we are optimal. First, let's
try the normal way:

$ time (bunzip2 -c linux-2.6.21.tar.bz2 | gzip -9 > test1)

real1m45.051s
user2m16.945s
sys 0m2.752s

1:45, or 1/3 over optimal. Now, with a 32MB non-blocking cache between the
two processes ('accel' creates a 32MB cache and uses 'select' to fill from
stdin and empty to stdout without blocking either direction):

$ time (bunzip2 -c linux-2.6.21.tar.bz2 | ./accel | gzip -9 > test2)

real1m18.361s
user2m19.589s
sys 0m6.356s

Within testing accuracy of optimal.

So it's not the scheduler. It's the fact that bunzip2/gzip have inadequate
input/output buffering. I don't think it's unreasonable to consider this a
defect in those programs.

DS


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: scheduling oddity on 2.6.20.3 stock

2007-05-16 Thread David Schwartz

> On Thu, 3 May 2007, David Schwartz wrote:
>
> >> I needed to recompress some files from .bz2 to .gz so I setup
> a script to
> >> do
> >>
> >> bunzip2 -c $file.bz2 |gzip -9 >$file.gz
> >>
> >> I expected that the two CPU heavy processes would end up on different
> >> cpu's and spend a little time shuffling data between the two cpu's on a
> >> system (dual core opteron)
> >>
> >> however, instead what I find is that each process is getting 50% of one
> >> cpu while the other cpu is 97% idle.
> >
> > That would only be possible if the compression/decompression
> block size is
> > small compared to the maximum pipe buffer size. I suspect the
> reverse is the
> > case.
>
> I'm still running into this problem in various forms
>
> is there an easy way to change the maximum pipe buffer size? (including a
> simple change to the kernel source, I do compile my own kernels)

No. Changing the size will not do what you want it to do since that only
tells the kernel what the size is, it does not determine what it is.

> > It would be interesting to write an intermediate process that basically
> > enlarged the pipe buffers and see if that changed anything.
> > Basically, the
> > intermediate process would allocate a large buffer (16MB or so)
> > and fill it
> > from 'bunzip2' while draining it to 'gzip' in a non-blocking
> > way (unless the
> > buffer was full/empty, of course).

It is not particularly hard to write such a process. I have a proxy that I
can easily tweak to do this. I'm going to give it a shot and see if it
helps.

DS


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: scheduling oddity on 2.6.20.3 stock

2007-05-16 Thread david

On Thu, 3 May 2007, David Schwartz wrote:


I needed to recompress some files from .bz2 to .gz so I setup a script to
do

bunzip2 -c $file.bz2 |gzip -9 >$file.gz

I expected that the two CPU heavy processes would end up on different
cpu's and spend a little time shuffling data between the two cpu's on a
system (dual core opteron)

however, instead what I find is that each process is getting 50% of one
cpu while the other cpu is 97% idle.


That would only be possible if the compression/decompression block size is
small compared to the maximum pipe buffer size. I suspect the reverse is the
case.


I'm still running into this problem in various forms

is there an easy way to change the maximum pipe buffer size? (including a 
simple change to the kernel source, I do compile my own kernels)



It would be interesting to write an intermediate process that basically
enlarged the pipe buffers and see if that changed anything. Basically, the
intermediate process would allocate a large buffer (16MB or so) and fill it
from 'bunzip2' while draining it to 'gzip' in a non-blocking way (unless the
buffer was full/empty, of course).

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: scheduling oddity on 2.6.20.3 stock

2007-05-16 Thread david

On Thu, 3 May 2007, David Schwartz wrote:


I needed to recompress some files from .bz2 to .gz so I setup a script to
do

bunzip2 -c $file.bz2 |gzip -9 $file.gz

I expected that the two CPU heavy processes would end up on different
cpu's and spend a little time shuffling data between the two cpu's on a
system (dual core opteron)

however, instead what I find is that each process is getting 50% of one
cpu while the other cpu is 97% idle.


That would only be possible if the compression/decompression block size is
small compared to the maximum pipe buffer size. I suspect the reverse is the
case.


I'm still running into this problem in various forms

is there an easy way to change the maximum pipe buffer size? (including a 
simple change to the kernel source, I do compile my own kernels)



It would be interesting to write an intermediate process that basically
enlarged the pipe buffers and see if that changed anything. Basically, the
intermediate process would allocate a large buffer (16MB or so) and fill it
from 'bunzip2' while draining it to 'gzip' in a non-blocking way (unless the
buffer was full/empty, of course).

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: scheduling oddity on 2.6.20.3 stock

2007-05-16 Thread David Schwartz

 On Thu, 3 May 2007, David Schwartz wrote:

  I needed to recompress some files from .bz2 to .gz so I setup
 a script to
  do
 
  bunzip2 -c $file.bz2 |gzip -9 $file.gz
 
  I expected that the two CPU heavy processes would end up on different
  cpu's and spend a little time shuffling data between the two cpu's on a
  system (dual core opteron)
 
  however, instead what I find is that each process is getting 50% of one
  cpu while the other cpu is 97% idle.
 
  That would only be possible if the compression/decompression
 block size is
  small compared to the maximum pipe buffer size. I suspect the
 reverse is the
  case.

 I'm still running into this problem in various forms

 is there an easy way to change the maximum pipe buffer size? (including a
 simple change to the kernel source, I do compile my own kernels)

No. Changing the size will not do what you want it to do since that only
tells the kernel what the size is, it does not determine what it is.

  It would be interesting to write an intermediate process that basically
  enlarged the pipe buffers and see if that changed anything.
  Basically, the
  intermediate process would allocate a large buffer (16MB or so)
  and fill it
  from 'bunzip2' while draining it to 'gzip' in a non-blocking
  way (unless the
  buffer was full/empty, of course).

It is not particularly hard to write such a process. I have a proxy that I
can easily tweak to do this. I'm going to give it a shot and see if it
helps.

DS


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: scheduling oddity on 2.6.20.3 stock

2007-05-16 Thread David Schwartz

   bunzip2 -c $file.bz2 |gzip -9 $file.gz

So here are some actual results from a dual P3-1Ghz machine (2.6.21.1,
CFSv9). First lets time each operation individually:

$ time bunzip2 -k linux-2.6.21.tar.bz2

real1m5.626s
user1m2.240s
sys 0m3.144s


$ time gzip -9 linux-2.6.21.tar

real1m17.652s
user1m15.609s
sys 0m1.912s

The compress was the most complex (no surprise there) but they are close
enough that efficient overlap will definitely affect the total wall time. If
we can both decompress and compress in 1:17, we are optimal. First, let's
try the normal way:

$ time (bunzip2 -c linux-2.6.21.tar.bz2 | gzip -9  test1)

real1m45.051s
user2m16.945s
sys 0m2.752s

1:45, or 1/3 over optimal. Now, with a 32MB non-blocking cache between the
two processes ('accel' creates a 32MB cache and uses 'select' to fill from
stdin and empty to stdout without blocking either direction):

$ time (bunzip2 -c linux-2.6.21.tar.bz2 | ./accel | gzip -9  test2)

real1m18.361s
user2m19.589s
sys 0m6.356s

Within testing accuracy of optimal.

So it's not the scheduler. It's the fact that bunzip2/gzip have inadequate
input/output buffering. I don't think it's unreasonable to consider this a
defect in those programs.

DS


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: scheduling oddity on 2.6.20.3 stock

2007-05-03 Thread david

On Thu, 3 May 2007, David Schwartz wrote:


I needed to recompress some files from .bz2 to .gz so I setup a script to
do

bunzip2 -c $file.bz2 |gzip -9 >$file.gz

I expected that the two CPU heavy processes would end up on different
cpu's and spend a little time shuffling data between the two cpu's on a
system (dual core opteron)

however, instead what I find is that each process is getting 50% of one
cpu while the other cpu is 97% idle.


That would only be possible if the compression/decompression block size is
small compared to the maximum pipe buffer size. I suspect the reverse is the
case.

It would be interesting to write an intermediate process that basically
enlarged the pipe buffers and see if that changed anything. Basically, the
intermediate process would allocate a large buffer (16MB or so) and fill it
from 'bunzip2' while draining it to 'gzip' in a non-blocking way (unless the
buffer was full/empty, of course).


hmm, how about
bunzip2 -c $file.bz2 |dd bs=8m |gzip -9 >$file.gz
should that work?

David Lang
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: scheduling oddity on 2.6.20.3 stock

2007-05-03 Thread David Schwartz

> I needed to recompress some files from .bz2 to .gz so I setup a script to
> do
>
> bunzip2 -c $file.bz2 |gzip -9 >$file.gz
>
> I expected that the two CPU heavy processes would end up on different
> cpu's and spend a little time shuffling data between the two cpu's on a
> system (dual core opteron)
>
> however, instead what I find is that each process is getting 50% of one
> cpu while the other cpu is 97% idle.

That would only be possible if the compression/decompression block size is
small compared to the maximum pipe buffer size. I suspect the reverse is the
case.

It would be interesting to write an intermediate process that basically
enlarged the pipe buffers and see if that changed anything. Basically, the
intermediate process would allocate a large buffer (16MB or so) and fill it
from 'bunzip2' while draining it to 'gzip' in a non-blocking way (unless the
buffer was full/empty, of course).

DS


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: scheduling oddity on 2.6.20.3 stock

2007-05-03 Thread David Schwartz

 I needed to recompress some files from .bz2 to .gz so I setup a script to
 do

 bunzip2 -c $file.bz2 |gzip -9 $file.gz

 I expected that the two CPU heavy processes would end up on different
 cpu's and spend a little time shuffling data between the two cpu's on a
 system (dual core opteron)

 however, instead what I find is that each process is getting 50% of one
 cpu while the other cpu is 97% idle.

That would only be possible if the compression/decompression block size is
small compared to the maximum pipe buffer size. I suspect the reverse is the
case.

It would be interesting to write an intermediate process that basically
enlarged the pipe buffers and see if that changed anything. Basically, the
intermediate process would allocate a large buffer (16MB or so) and fill it
from 'bunzip2' while draining it to 'gzip' in a non-blocking way (unless the
buffer was full/empty, of course).

DS


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: scheduling oddity on 2.6.20.3 stock

2007-05-03 Thread david

On Thu, 3 May 2007, David Schwartz wrote:


I needed to recompress some files from .bz2 to .gz so I setup a script to
do

bunzip2 -c $file.bz2 |gzip -9 $file.gz

I expected that the two CPU heavy processes would end up on different
cpu's and spend a little time shuffling data between the two cpu's on a
system (dual core opteron)

however, instead what I find is that each process is getting 50% of one
cpu while the other cpu is 97% idle.


That would only be possible if the compression/decompression block size is
small compared to the maximum pipe buffer size. I suspect the reverse is the
case.

It would be interesting to write an intermediate process that basically
enlarged the pipe buffers and see if that changed anything. Basically, the
intermediate process would allocate a large buffer (16MB or so) and fill it
from 'bunzip2' while draining it to 'gzip' in a non-blocking way (unless the
buffer was full/empty, of course).


hmm, how about
bunzip2 -c $file.bz2 |dd bs=8m |gzip -9 $file.gz
should that work?

David Lang
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/