Re: Processing a big file using more CPUs
On Mon, 11 Feb 2019 23:54:43 +0100 Ole Tange wrote: > On Mon, Feb 4, 2019 at 10:19 PM Nio Wiklund wrote: > : > >cat bigfile | parallel --pipe --recend '' -k gzip -9 > bigfile.gz > : > > The reason why I want this is that I often create compressed images of > > the content of a drive, /dev/sdx, and I lose approximately half the > > compression improvement from gzip to xz, when using parallel. The > > improvement in speed is good, 2.5 times, but I think larger blocks would > > give xz a chance to get a compression much closer to what it can get > > without parallel. > > > > Is it possible with with the current code? In that case how? > > Since version 2016-07-22: > > parallel --pipepart -a bigfile --recend '' -k --block -1 xz > bigfile.xz > parallel --pipepart -a /dev/sdx --recend '' -k --block -1 xz > bigfile.xz > > Unfortunately the size computation of block devices only works under > GNU/Linux. > > (That said: pxz exists, and it may be more relevant to use here). > Hi Ole! https://jnovy.fedorapeople.org/pxz/node2.html - I see, but note that xz has a -T flag now as well - https://linux.die.net/man/1/xz . > > /Ole > -- - Shlomi Fish http://www.shlomifish.org/ NSA Factoids - http://www.shlomifish.org/humour/bits/facts/NSA/ Chuck Norris’ ciphers were once broken. He responded by breaking those individuals. (By sevvie: http://sevvie.github.io/ .) — http://www.shlomifish.org/humour/bits/facts/Chuck-Norris/ Please reply to list if it's a mailing list post - http://shlom.in/reply .
Re: Processing a big file using more CPUs
Den 2019-02-11 kl. 23:54, skrev Ole Tange: On Mon, Feb 4, 2019 at 10:19 PM Nio Wiklund wrote: : cat bigfile | parallel --pipe --recend '' -k gzip -9 > bigfile.gz : The reason why I want this is that I often create compressed images of the content of a drive, /dev/sdx, and I lose approximately half the compression improvement from gzip to xz, when using parallel. The improvement in speed is good, 2.5 times, but I think larger blocks would give xz a chance to get a compression much closer to what it can get without parallel. Is it possible with with the current code? In that case how? Since version 2016-07-22: parallel --pipepart -a bigfile --recend '' -k --block -1 xz > bigfile.xz parallel --pipepart -a /dev/sdx --recend '' -k --block -1 xz > bigfile.xz Unfortunately the size computation of block devices only works under GNU/Linux. (That said: pxz exists, and it may be more relevant to use here). /Ole Thanks for this reply, Ole, I will test how your suggested command lines work for me, and also look into parallel processing within xz. Best regards Nio
Re: Processing a big file using more CPUs
On Mon, Feb 4, 2019 at 10:19 PM Nio Wiklund wrote: : >cat bigfile | parallel --pipe --recend '' -k gzip -9 > bigfile.gz : > The reason why I want this is that I often create compressed images of > the content of a drive, /dev/sdx, and I lose approximately half the > compression improvement from gzip to xz, when using parallel. The > improvement in speed is good, 2.5 times, but I think larger blocks would > give xz a chance to get a compression much closer to what it can get > without parallel. > > Is it possible with with the current code? In that case how? Since version 2016-07-22: parallel --pipepart -a bigfile --recend '' -k --block -1 xz > bigfile.xz parallel --pipepart -a /dev/sdx --recend '' -k --block -1 xz > bigfile.xz Unfortunately the size computation of block devices only works under GNU/Linux. (That said: pxz exists, and it may be more relevant to use here). /Ole
Processing a big file using more CPUs
Hi parallel users, Background EXAMPLE: Processing a big file using more CPUs To process a big file or some output you can use --pipe to split up the data into blocks and pipe the blocks into the processing program. If the program is gzip -9 you can do: cat bigfile | parallel --pipe --recend '' -k gzip -9 > bigfile.gz This will split bigfile into blocks of 1 MB and pass that to gzip -9 in parallel. One gzip will be run per CPU. The output of gzip -9 will be kept in order and saved to bigfile.gz Question I would like to create blocks of suitable size for each cpu/thread for binary files, like it is possible with --pipepart --block -1 for text files (with lines). I have tried but can only get 1 MiB size block (default). The reason why I want this is that I often create compressed images of the content of a drive, /dev/sdx, and I lose approximately half the comptression improvement from gzip to xz, when using parallel. The improvement in speed is good, 2.5 times, but I think larger blocks would give xz a chance to get a compression much closer to what it can get without parallel. Is it possible with with the current code? In that case how? Otherwise I think it would be a good idea to modify the code to make it possible. Best regards Nio