Re: Processing a big file using more CPUs

2019-02-12 Thread Shlomi Fish
On Mon, 11 Feb 2019 23:54:43 +0100
Ole Tange  wrote:

> On Mon, Feb 4, 2019 at 10:19 PM Nio Wiklund  wrote:
> :
> >cat bigfile | parallel --pipe --recend '' -k gzip -9 > bigfile.gz  
> :
> > The reason why I want this is that I often create compressed images of
> > the content of a drive, /dev/sdx, and I lose approximately half the
> > compression improvement from gzip to xz, when using parallel. The
> > improvement in speed is good, 2.5 times, but I think larger blocks would
> > give xz a chance to get a compression much closer to what it can get
> > without parallel.
> >
> > Is it possible with with the current code? In that case how?  
> 
> Since version 2016-07-22:
> 
> parallel --pipepart -a bigfile --recend '' -k --block -1 xz > bigfile.xz
> parallel --pipepart -a /dev/sdx --recend '' -k --block -1 xz > bigfile.xz
> 
> Unfortunately the size computation of block devices only works under
> GNU/Linux.
> 
> (That said: pxz exists, and it may be more relevant to use here).
> 

Hi Ole!

https://jnovy.fedorapeople.org/pxz/node2.html - I see, but note that xz has a
-T flag now as well - https://linux.die.net/man/1/xz .

> 
> /Ole
> 



-- 
-
Shlomi Fish   http://www.shlomifish.org/
NSA Factoids - http://www.shlomifish.org/humour/bits/facts/NSA/

Chuck Norris’ ciphers were once broken. He responded by breaking those
individuals. (By sevvie: http://sevvie.github.io/ .)
— http://www.shlomifish.org/humour/bits/facts/Chuck-Norris/

Please reply to list if it's a mailing list post - http://shlom.in/reply .



Re: Processing a big file using more CPUs

2019-02-11 Thread Nio Wiklund

Den 2019-02-11 kl. 23:54, skrev Ole Tange:

On Mon, Feb 4, 2019 at 10:19 PM Nio Wiklund  wrote:
:

cat bigfile | parallel --pipe --recend '' -k gzip -9 > bigfile.gz

:

The reason why I want this is that I often create compressed images of
the content of a drive, /dev/sdx, and I lose approximately half the
compression improvement from gzip to xz, when using parallel. The
improvement in speed is good, 2.5 times, but I think larger blocks would
give xz a chance to get a compression much closer to what it can get
without parallel.

Is it possible with with the current code? In that case how?


Since version 2016-07-22:

parallel --pipepart -a bigfile --recend '' -k --block -1 xz > bigfile.xz
parallel --pipepart -a /dev/sdx --recend '' -k --block -1 xz > bigfile.xz

Unfortunately the size computation of block devices only works under GNU/Linux.

(That said: pxz exists, and it may be more relevant to use here).


/Ole



Thanks for this reply, Ole,

I will test how your suggested command lines work for me, and also look 
into parallel processing within xz.


Best regards
Nio



Re: Processing a big file using more CPUs

2019-02-11 Thread Ole Tange
On Mon, Feb 4, 2019 at 10:19 PM Nio Wiklund  wrote:
:
>cat bigfile | parallel --pipe --recend '' -k gzip -9 > bigfile.gz
:
> The reason why I want this is that I often create compressed images of
> the content of a drive, /dev/sdx, and I lose approximately half the
> compression improvement from gzip to xz, when using parallel. The
> improvement in speed is good, 2.5 times, but I think larger blocks would
> give xz a chance to get a compression much closer to what it can get
> without parallel.
>
> Is it possible with with the current code? In that case how?

Since version 2016-07-22:

parallel --pipepart -a bigfile --recend '' -k --block -1 xz > bigfile.xz
parallel --pipepart -a /dev/sdx --recend '' -k --block -1 xz > bigfile.xz

Unfortunately the size computation of block devices only works under GNU/Linux.

(That said: pxz exists, and it may be more relevant to use here).


/Ole



Processing a big file using more CPUs

2019-02-04 Thread Nio Wiklund

Hi parallel users,

Background

EXAMPLE: Processing a big file using more CPUs

To process a big file or some output you can use --pipe to split up the 
data into blocks and pipe the blocks into the processing program.


If the program is gzip -9 you can do:

  cat bigfile | parallel --pipe --recend '' -k gzip -9 > bigfile.gz

This will split bigfile into blocks of 1 MB and pass that to gzip -9 in 
parallel. One gzip will be run per CPU. The output of gzip -9 will be 
kept in order and saved to bigfile.gz


Question

I would like to create blocks of suitable size for each cpu/thread for 
binary files, like it is possible with --pipepart --block -1 for text 
files (with lines).


I have tried but can only get 1 MiB size block (default).

The reason why I want this is that I often create compressed images of 
the content of a drive, /dev/sdx, and I lose approximately half the 
comptression improvement from gzip to xz, when using parallel. The 
improvement in speed is good, 2.5 times, but I think larger blocks would 
give xz a chance to get a compression much closer to what it can get 
without parallel.


Is it possible with with the current code? In that case how?

Otherwise I think it would be a good idea to modify the code to make it 
possible.


Best regards
Nio