On Sun, Mar 2, 2014 at 4:37 PM, Stefano Capomaccio <capemas...@gmail.com> wrote:
> I'm a happy user of parallel 20140122 Great to hear. If you like GNU Parallel: * Walk through the tutorial (http://www.gnu.org/software/parallel/parallel_tutorial.html) * Give a demo at your local user group/team/colleagues * Post the intro videos and tutorial on Reddit/Diaspora*/forums/blogs/ Identi.ca/Google+/Twitter/Facebook/Linkedin/mailing lists * Request or write a review for your favourite blog or magazine * Invite me for your next conference If you use GNU Parallel for research: * Please cite GNU Parallel in you publications (use --bibtex) If GNU Parallel saves you money: * (Have your company) donate to FSF https://my.fsf.org/donate/ > but I'm stucked in a problem with the semaphore option. Semaphore is slower than normal parallel mode and seems to have a race condition if you run 100s of jobs in parallel. > In the following bash code my intent is to run on several cores (specified > by $numcore) an R script. > > for file in `ls $directory` > do > sem -j"$numcore" R < rscript.R --slave --args $file $other_input > $directory > "$file".gw.log > done > sem --wait The above should work. I can, however, not test it, as you have not provided enough information. Please follow the section REPORTING BUGS in the man page: * A complete example that others can run that shows the problem. This should preferably be small and simple. A combination of yes, seq, cat, echo, and sleep can reproduce most errors. If your example requires large files, see if you can make them by something like seq 1000000 > file or yes | head -n 10000000 > file. If your example requires remote execution, see if you can use localhost - maybe using another login. * The output of your example. If your problem is not easily reproduced by others, the output might help them figure out the problem. * Whether you have watched the intro videos (http://www.youtube.com/playlist?list=PL284C9FF2488BC6D1), walked through the tutorial (man parallel_tutorial), and read the EXAMPLE section in the man page (man parallel - search for EXAMPLE:). If you suspect the error is dependent on your environment or distribution, please see if you can reproduce the error on one of these VirtualBox images: http://sourceforge.net/projects/virtualboximage/files/ In this case I think it is dependent on your environment, so please make an reproducible example on a virtual machine. > This task has to be done 32 times on 10 cores. > > I have noticed that parallel spreads correctly the job over the desired > cores but it seems that when the for exausts the files (the thirty files) > does not wait until every job is done and the following lines of code are > executed making you think that the analysis is done while there are some > cores that are running. With 'sem --wait' it sounds like an error. > This is not convenient because I need the ouput of the 32 process to be > parsed aftwerwards this step and I miss two of them avery time. > Results are indeed correct but I cannot pipe this step. A work around: ls $directory | parallel -j"$numcore" R '<' rscript.R --slave --args {} $other_input $directory '>' {}.gw.log Also you might find --results useful. And you might even take a look at --shebang-wrap: R: #!/usr/bin/parallel --shebang-wrap /usr/bin/Rscript --vanilla --slave /Ole