Re: parallel bug: Warning: No more file handles. when one job is delayed (reproducible with a test case)

2017-10-29 Thread Shlomi Fish
On Sun, 29 Oct 2017 00:26:15 +0200
Ole Tange  wrote:

> On Sat, Oct 28, 2017 at 12:44 PM, Shlomi Fish  wrote:
> 
> > I see. So what you are saying is that parallel will work fine despite the
> > warning and will continue running?  
> 
> Yep. But if you run into this problem, it might be a better idea to
> remove -k and use something like --results instead.
> 
> /Ole

Thanks, Ole! I'll investigate.

-- 
-
Shlomi Fish   http://www.shlomifish.org/
http://www.shlomifish.org/humour/ways_to_do_it.html

If a million Shakespeares had to write together, they would write like a monkey.
— based on Stephen Wright, via Nadav Har’El.

Please reply to list if it's a mailing list post - http://shlom.in/reply .



Re: parallel bug: Warning: No more file handles. when one job is delayed (reproducible with a test case)

2017-10-28 Thread Ole Tange
On Sat, Oct 28, 2017 at 12:44 PM, Shlomi Fish  wrote:

> I see. So what you are saying is that parallel will work fine despite the
> warning and will continue running?

Yep. But if you run into this problem, it might be a better idea to
remove -k and use something like --results instead.

/Ole



Re: parallel bug: Warning: No more file handles. when one job is delayed (reproducible with a test case)

2017-10-28 Thread Shlomi Fish
Hi Ole,

On Sat, 28 Oct 2017 00:28:02 +0200
Ole Tange  wrote:

> On Fri, Oct 27, 2017 at 9:23 AM, Shlomi Fish  wrote:
> 
> > Thanks for your work.  
> 
> Good to know it is appreciated.
> 
> > Attached are two files to reproduce a bug I ran into with GNU parallel
> > including the latest one on Mageia v7 x86-64:
> >
> > shlomif@telaviv1:~$ bash run-range.bash
> > parallel: Warning: No more file handles.
> > parallel: Warning: Raising ulimit -n or /etc/security/limits.conf may help.
> > ^CCompleted!
> >
> > The run-single.bash script is delayed for n=1 and meanwhile other jobs
> > accumulate which may explain the problem. This problem caused me to lose one
> > night of uptime on an AWS instance because "parallel" got stuck, so I'd
> > appreciate an investigation and a fix.  
> 
> Your problem can be illustrated with:
> 
>   seq 0 1000 | parallel -k -t sleep '{= $_ = $_ ? 0 : 10 =};echo {}'
> 
> This will run 'sleep 10' followed by 1000 jobs of 'sleep 0'. -t causes
> the command to be printed as soon as it is started.
> 
> Because of -k GNU Parallel must keep the order of the output. It does
> that by having open files to the temporary output files of jobs run.
> What happens here, is that before we can close any of the files, we
> will have to wait for the first job to complete. Because the other
> jobs are very fast to complete, then GNU Parallel runs out of file
> handles, and thus warns you:
> 
>   parallel: Warning: No more file handles.
>   parallel: Warning: Raising ulimit -n or /etc/security/limits.conf may help.
> 
> But it is just a warning: As soon as the first job completes, it
> completes the remaining jobs.
> 
> > Also see
> > https://lists.gnu.org/archive/html/parallel/2017-07/msg6.html .  
> 
> If you use -k in that, then we have the explanation: GNU Parallel does
> not stop. It waits for one of the jobs to complete before it can close
> more filehandles.
> 

I see. So what you are saying is that parallel will work fine despite the
warning and will continue running? Since it still got stuck, then the problem
is likely elsewhere. Thanks!

> 
> /Ole
> 



-- 
-
Shlomi Fish   http://www.shlomifish.org/
Parody of "The Fountainhead" - http://shlom.in/towtf

A kid always wishes they were older until they are 18. Afterwards, they always
wish they were younger.

Please reply to list if it's a mailing list post - http://shlom.in/reply .



Re: parallel bug: Warning: No more file handles. when one job is delayed (reproducible with a test case)

2017-10-27 Thread Ole Tange
On Fri, Oct 27, 2017 at 9:23 AM, Shlomi Fish  wrote:

> Thanks for your work.

Good to know it is appreciated.

> Attached are two files to reproduce a bug I ran into with GNU parallel
> including the latest one on Mageia v7 x86-64:
>
> shlomif@telaviv1:~$ bash run-range.bash
> parallel: Warning: No more file handles.
> parallel: Warning: Raising ulimit -n or /etc/security/limits.conf may help.
> ^CCompleted!
>
> The run-single.bash script is delayed for n=1 and meanwhile other jobs
> accumulate which may explain the problem. This problem caused me to lose one
> night of uptime on an AWS instance because "parallel" got stuck, so I'd
> appreciate an investigation and a fix.

Your problem can be illustrated with:

  seq 0 1000 | parallel -k -t sleep '{= $_ = $_ ? 0 : 10 =};echo {}'

This will run 'sleep 10' followed by 1000 jobs of 'sleep 0'. -t causes
the command to be printed as soon as it is started.

Because of -k GNU Parallel must keep the order of the output. It does
that by having open files to the temporary output files of jobs run.
What happens here, is that before we can close any of the files, we
will have to wait for the first job to complete. Because the other
jobs are very fast to complete, then GNU Parallel runs out of file
handles, and thus warns you:

  parallel: Warning: No more file handles.
  parallel: Warning: Raising ulimit -n or /etc/security/limits.conf may help.

But it is just a warning: As soon as the first job completes, it
completes the remaining jobs.

> Also see https://lists.gnu.org/archive/html/parallel/2017-07/msg6.html .

If you use -k in that, then we have the explanation: GNU Parallel does
not stop. It waits for one of the jobs to complete before it can close
more filehandles.


/Ole



parallel bug: Warning: No more file handles. when one job is delayed (reproducible with a test case)

2017-10-27 Thread Shlomi Fish
Hi all!

Thanks for your work.

Attached are two files to reproduce a bug I ran into with GNU parallel
including the latest one on Mageia v7 x86-64:

shlomif@telaviv1:~$ bash run-range.bash 
parallel: Warning: No more file handles.
parallel: Warning: Raising ulimit -n or /etc/security/limits.conf may help. 
^CCompleted!

The run-single.bash script is delayed for n=1 and meanwhile other jobs
accumulate which may explain the problem. This problem caused me to lose one
night of uptime on an AWS instance because "parallel" got stuck, so I'd
appreciate an investigation and a fix.

Also see https://lists.gnu.org/archive/html/parallel/2017-07/msg6.html .

Regards,

Shlomi Fish

-- 
-
Shlomi Fish   http://www.shlomifish.org/
My Aphorisms - http://www.shlomifish.org/humour.html

Chuck Norris has 99 problems including a bitch.
— http://www.shlomifish.org/humour/bits/facts/Chuck-Norris/

Please reply to list if it's a mailing list post - http://shlom.in/reply .


run-range.bash
Description: Binary data


run-single.bash
Description: Binary data