On Sun, Sep 19, 2021 at 6:39 PM Thomas Bereknyei <tombe...@gmail.com> wrote: > > The limitation mentioned in the man page has bitten me several times
GNU Parallel determines the number of jobslots by looking at both input and limits. Before starting the first job it runs a loop: While jobslots < requested_jobslots: Read a job Reserve file handles and process If all jobs read: Stop If not enough file handles or process: Stop Give warning about the limit You see will see that when you ask for infinite jobslot (-j0) and pass more jobs than there are file handles for: $ seq 1000000 | parallel -j0 true parallel: Warning: Only enough file handles to run 252 jobs in parallel. parallel: Warning: Try running 'parallel -j0 -N 252 --pipe parallel -j0' parallel: Warning: or increasing 'ulimit -n' (try: ulimit -n `ulimit -Hn`) parallel: Warning: or increasing 'nofile' in /etc/security/limits.conf parallel: Warning: or increasing /proc/sys/fs/file-max The limit of 252 jobs is because a job takes 4 file handles and a normal system is configured with ~1000 file handles. GNU Parallel tries to give you this warning ASAP in a predictable way. So instead of starting the jobs, it only reserves the file handles and processes. If GNU Parallel had started the jobs, the jobs might be so quick to run, that you never hit the limit of 252 because the first job may have finished before job 252 was started. But this makes the behaviour unpredictable: Sometimes you would hit the limit (if your computer is slow or jobs are long running) while other times you would not hit the limit. However, there is no need to reserve more slots than there are inputs. So if the input is only 100 jobs, GNU Parallel will leave the loop above after 100 rounds, thus not issue the warning. See: $ seq 100 | parallel -j0 true ((No warning)) > Is there a way to remove this limitation altogether? I do not see how to change this for -j0: You are asking for infinite jobs, and we *know* you cannot get that. If the limit is higher than the number of jobs, then simply set jobslots=total_jobs. If the limit is lower, set jobslots=limit. To do this we need to determine if the limit is higher than jobs, which the above does. So while I do not see we can change the behaviour for -j0, we could probably change it for values less than infinite: If you run -j1000, GNU Parallel could ignore reading the job, and simply assume you will pass it more than 1000 jobs, and generate 1000 dummy jobs. Even if you do not pass GNU Parallel 1000 jobs. This would make startup slower if you request more jobslots than you have jobs, but this might be an acceptable price to pay. Try: @@ -7270,7 +7270,7 @@ sub compute_number_of_processes($) { my $before_getting_arg = time; if(!$Global::dummy_jobs) { - get_args_or_jobs() or last; +# get_args_or_jobs() or last; } $wait_time_for_getting_args += time - $before_getting_arg; $system_limit++; /Ole