Hi Ole, thanks for the reply. Not quite. True, I am observing the same thing (empty files 12 through 20 below), but what is bothering me is file #11, which has 13 bytes, and could have easily fit into file #10 (1092 bytes) and still been well below the 1200 threshold.
Another way to have asked this question might have been: Will parallel always assume the last record is partial, if you only provide recstart? Because for some file types, it might not be feasible to provide a recend (like FASTA files, where all you can rely on is the ">" which marks the start of the header for each record.) So in these situations will parallel always kick the last single record into its own solitary process? -rw-rw-r-- 1 staff staff 1092 Mar 29 16:25 partialpseudofasta_10.txt -rw-rw-r-- 1 staff staff 13 Mar 29 16:25 partialpseudofasta_11.txt -rw-rw-r-- 1 staff staff 0 Mar 29 16:25 partialpseudofasta_12.txt -rw-rw-r-- 1 staff staff 0 Mar 29 16:25 partialpseudofasta_13.txt -rw-rw-r-- 1 staff staff 0 Mar 29 16:25 partialpseudofasta_14.txt -rw-rw-r-- 1 staff staff 0 Mar 29 16:25 partialpseudofasta_15.txt -rw-rw-r-- 1 staff staff 0 Mar 29 16:25 partialpseudofasta_16.txt -rw-rw-r-- 1 staff staff 0 Mar 29 16:25 partialpseudofasta_17.txt -rw-rw-r-- 1 staff staff 0 Mar 29 16:25 partialpseudofasta_18.txt -rw-rw-r-- 1 staff staff 0 Mar 29 16:25 partialpseudofasta_19.txt -rw-rw-r-- 1 staff staff 1188 Mar 29 16:25 partialpseudofasta_1.txt -rw-rw-r-- 1 staff staff 0 Mar 29 16:25 partialpseudofasta_20.txt -rw-rw-r-- 1 staff staff 1200 Mar 29 16:25 partialpseudofasta_2.txt -rw-rw-r-- 1 staff staff 1200 Mar 29 16:25 partialpseudofasta_3.txt -rw-rw-r-- 1 staff staff 1200 Mar 29 16:25 partialpseudofasta_4.txt -rw-rw-r-- 1 staff staff 1200 Mar 29 16:25 partialpseudofasta_5.txt -rw-rw-r-- 1 staff staff 1200 Mar 29 16:25 partialpseudofasta_6.txt -rw-rw-r-- 1 staff staff 1200 Mar 29 16:25 partialpseudofasta_7.txt -rw-rw-r-- 1 staff staff 1200 Mar 29 16:25 partialpseudofasta_8.txt -rw-rw-r-- 1 staff staff 1200 Mar 29 16:25 partialpseudofasta_9.txt Thanks Owen On Thu, Mar 29, 2012 at 5:40 PM, Ole Tange <[email protected]> wrote: > On Fri, Mar 30, 2012 at 1:54 AM, <[email protected]> wrote: >> Hello, >> >> I don't need to say how great GNU parallel is (GREAT!). > > Good to hear. > >> But for the >> first time, I have encountered a behavior I didn't expect from it. I >> am trying to break up a big input FASTA file (DNA sequence) using the >> --block and --recstart options. But it always seems to create ONE >> more file than I really want. I mean, if I have specified 10 jobs (-j >> 10), and if the block size on the 10th job is still below my >> specification (--block 1200), why does it make an 11th file? This >> means that 10 jobs in parallel run, and then 1 MORE job has to run to >> get the last record. > > It sounds like: https://savannah.gnu.org/bugs/?34241 > > /Ole
