Re: Help in parallelizing bedtools

Stefano Capomaccio Thu, 06 Mar 2014 23:52:23 -0800

Dear Ole,
thanks for the advices.
I found yesterday the problem:
I did not set the id for the semaphore, now I have the expected results.
This is the improved code.



for file in `ls $directory`
do
    sem *--id my_id* -j"$numcore" R < script.R --slave --args $file
$otherfile $directory > "$file".log
done
sem --wait *--id my_id*


You said that the semaphore is slower... I will try your last line and see
if I gain something.

Thank you for this nice piece of software.
I will surely cite you in my future publications (hopefully :-) )


Stefano


On Fri, Mar 7, 2014 at 2:12 AM, Ole Tange <ta...@gnu.org> wrote:

> On Sun, Mar 2, 2014 at 4:37 PM, Stefano Capomaccio <capemas...@gmail.com>
> wrote:
>
> > I'm a happy user of parallel 20140122
>
> Great to hear. If you like GNU Parallel:
>
> * Walk through the tutorial
> (http://www.gnu.org/software/parallel/parallel_tutorial.html)
> * Give a demo at your local user group/team/colleagues
> * Post the intro videos and tutorial on Reddit/Diaspora*/forums/blogs/
> Identi.ca/Google+/Twitter/Facebook/Linkedin/mailing lists
> * Request or write a review for your favourite blog or magazine
> * Invite me for your next conference
>
> If you use GNU Parallel for research:
>
> * Please cite GNU Parallel in you publications (use --bibtex)
>
> If GNU Parallel saves you money:
>
> * (Have your company) donate to FSF https://my.fsf.org/donate/
>
> > but I'm stucked in a problem with the semaphore option.
>
> Semaphore is slower than normal parallel mode and seems to have a race
> condition if you run 100s of jobs in parallel.
>
> > In the following bash code my intent is to run on several cores
> (specified
> > by $numcore) an R script.
> >
> > for file in `ls $directory`
> > do
> >   sem -j"$numcore" R < rscript.R --slave --args $file $other_input
> > $directory > "$file".gw.log
> > done
> > sem --wait
>
> The above should work. I can, however, not test it, as you have not
> provided enough information. Please follow the section REPORTING BUGS
> in the man page:
>
> * A complete example that others can run that shows the problem. This
> should preferably be small and simple. A combination of yes, seq, cat,
> echo, and sleep can reproduce most errors. If your example requires
> large files, see if you can make them by something like seq 1000000 >
> file or yes | head -n 10000000 > file. If your example requires remote
> execution, see if you can use localhost - maybe using another login.
>
> * The output of your example. If your problem is not easily reproduced
> by others, the output might help them figure out the problem.
>
> * Whether you have watched the intro videos
> (http://www.youtube.com/playlist?list=PL284C9FF2488BC6D1), walked
> through the tutorial (man parallel_tutorial), and read the EXAMPLE
> section in the man page (man parallel - search for EXAMPLE:).
>
> If you suspect the error is dependent on your environment or
> distribution, please see if you can reproduce the error on one of
> these VirtualBox images:
> http://sourceforge.net/projects/virtualboximage/files/
>
> In this case I think it is dependent on your environment, so please
> make an reproducible example on a virtual machine.
>
> > This task has to be done 32 times on 10 cores.
> >
> > I have noticed that parallel spreads correctly the job over the desired
> > cores but it seems that when the for exausts the files (the thirty files)
> > does not wait until every job is done and the following lines of code are
> > executed making you think that the analysis is done while there are some
> > cores that are running.
>
> With 'sem --wait' it sounds like an error.
>
> > This is not convenient because I need the ouput of the 32 process to be
> > parsed aftwerwards this step and I miss two of them avery time.
> > Results are indeed correct but I cannot pipe this step.
>
> A work around:
>
> ls $directory | parallel -j"$numcore" R '<' rscript.R --slave --args
> {} $other_input $directory '>' {}.gw.log
>
> Also you might find --results useful. And you might even take a look
> at --shebang-wrap:
>
>        R:       #!/usr/bin/parallel --shebang-wrap /usr/bin/Rscript
> --vanilla --slave
>
>
> /Ole
>



-- 
.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.

*He tried to be a scientist*

Stefano Capomaccio, PhD
Università Cattolica del Sacro Cuore
Via Emilia Parmense, 84
29122 - Piacenza (PC), Italy
Phone +39 0523 599203 (office)
Phone +39 0523 599482 (lab)
email: stefano.capomac...@unicatt.it
email: capemas...@gmail.com
skype: capemaster

.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.

Re: Help in parallelizing bedtools

Reply via email to