On Mon, Jan 9, 2012 at 5:47 PM, rambach <[email protected]> wrote: > On 1/7/2012 4:31 AM, Ole Tange wrote: >> On Fri, Dec 16, 2011 at 12:45 PM, Ole Tange<[email protected]> wrote: >>> On Fri, Dec 16, 2011 at 9:01 AM, rambach<[email protected]> wrote: >>>> On 12/15/2011 11:35 PM, Ole Tange wrote: >>>>> On Wed, Dec 14, 2011 at 2:35 PM, rambach<[email protected]> wrote: >>>>>> On 12/12/2011 11:07 PM, Ole Tange wrote: >>>>>>> >>>>>>> * Only look for the job-number. >> >> This is now implemented. You can do: >> >> timeout -k 1 1 parallel -j2 --resume --joblog /tmp/joblog2 sleep {} >> ::: 1.1 2.2 3.3 4.4 >> parallel -j2 --resume --joblog /tmp/joblog2 sleep {} ::: 1.1 2.2 3.3 >> 4.4; >> >> Please test it. > > thanks, very good job. > the functionality works nice and smooth. > > i'm sure others will benefit from this feature as well. > > however, what i found during testing is that GNU Parallel has some sort of > memleak: > the following command > seq 100000 | parallel -j200 "echo {}; sleep 1" > starts with a virtual mem usage of about 38 MB, and reaches 50 MB at around > 25000 finished jobs. > the size of used memory increases steadily, so at 12MB per 25000 jobs, you'd > run out of mem on a 128 MB sys pretty quick. > the leak is independent of the --resume option and even --joblog.
The leak is sort of not a leak: When you use multiple input sources GNU Parallel has to generate all combinations, thus it has to remember all arguments seen so far. Only in the special case where there is only one input source can GNU Parallel safely forget already seen arguments. So in the new git version this optimization is now implemented. /Ole
