On Wed, Mar 26, 2014 at 9:32 PM, David <dgpick...@aol.com> wrote: > ETL programs like Ab Initio know how to tell parallel processes to split up > big files and process each part separately, even when the files are linefeed > delimited (they all agree to search up (or down) for the dividing linefeed > closest to N bytes down file). Does anyone know of a utility that can split > a file this way (without reading it sequentially)? Is this in gnu parallel?
GNU Parallel will do that except it will read it sequentially. > It'd be nice to be able to take a list of mixed size files and divide them > by size into N chunks of approximately equal lines, estimated using byte > sizes and with an algorythm for searching for the record delimiter > (linefeed) such that no records are lost. Sort of a mixed input leveller > for parallel loads. If it is part of parallel, then parallel can launch > processing for each chunk and to combine the chunks. That is what --pipe does (except it reads sequentially): cat files* | parallel --pipe --block 10m wc /Ole