Dear All,
I have a perl script that works initially by fetching hundreds of files from a 
server. I know that gnu parallel is really suited for getting files from an ftp 
site using wget command in parallel. I would like to use the advantage of this 
awesome Gnu prallel tool by calling inside my perl script.
Is that possible, if so an example would be very much appreciated. ? 
Regards
Yacob 


--- On Sun, 20/1/13, Nanditha Rao <[email protected]> wrote:

From: Nanditha Rao <[email protected]>
Subject: Re: Multiple jobs on a multicore machine or cluster
To: [email protected]
Date: Sunday, 20 January, 2013, 9:45

Never mind. I figured out that I had to transfer the files using --transfer 
before running them. And that it gets copied to the home directory of the 
destination machine by default.

On Sat, Jan 19, 2013 at 12:58 AM, Nanditha Rao <[email protected]> wrote:




On Fri, Jan 18, 2013 at 12:53 AM, Ole Tange <[email protected]> wrote:



On Thu, Jan 17, 2013 at 12:58 PM, Nanditha Rao <[email protected]> wrote:



> 1. I need to run multiple jobs on a multicore (and multithreaded) machine. I

> am using the GNU Parallel utility to distribute jobs across the cores to

> speed up the task. The commands to be executed are available in a file

> called 'commands'. I use the following command to run the GNU Parallel.

>

> cat commands | parallel -j +0

>

> As per the guidance at this location- gnu parallel, this command is supposed

> to use all the cores to run this task. My machine has 2 cores and 2 threads

> per core.



I take it that you have a CPU with hyperthreading.
[Nanditha: I guess so. I am using an Intel core i3 laptop to test this tool 
out..]





> The system monitor however shows 4 CPUs (CPU1 and CPU2 belong to

> core1, CPU3 and CPU4 belong to core2). Each job (simulation) takes about 20

> seconds to run on a single core. I ran 2 jobs in parallel using this GNU

> parallel utility with the command above. I observe in the system monitor



What system monitor are you using?
[Nanditha: gnome-system-monitor on ubuntu] 



> that, if the 2 jobs are assigned to cpu1 and cpu2 (that is the same core),

> there is obviously no speed-up.



Why obviously? Normally I measure a speedup of 30-70% when using hyperthreading.
[Nanditha: I somehow dont see a speedup. Running a single job on single thread 
on single core versus two threads on the same core is taking the same time- 
about 20seconds] 





> They take about 40seconds to finish, which

> is about the time they would take if run sequentially. However, sometimes

> the tool distributes the 2 jobs to CPU1 and CPU3 or CPU4 (which means, 2

> jobs are assigned to 2 different cores). In this case, both jobs finish

> parallely in 20 seconds.



GNU Parallel does not do the distributing; it simply spawns jobs. The

distribution is done by your operating system.



> Now, I want to know if there is a way in which I can force the tool to run

> on different "cores" and not on different "threads" on the same core, so

> that there is appreciable speed-up. Any help is appreciated. Thanks!



If you are using GNU/Linux you can use taskset which can set a mask on

which cores a task can be scheduled on. If you want every other:

1010(bin) = 0xA. For a 128 core machine you could run:



cat commands | taskset 0xaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa parallel -j +0
[Nanditha: Tried this, thanks. But seems like it doesnt help speedup the jobs 
as assumed by me earlier] 





> 2. Also, I want to know if there is a way to run this utility over a cluster

> of machines.. say, there are four 12-core machines in a cluster (making it a

> 48-core cluster).



cat commands | parallel -j +0 -S server1,server2,server3,server4
[Nanditha: I tried this option.  cat commands|parallel -j +0 --sshlogin 
username@ip_address


However, I get an error that the files listed the 'commands' file are not to be 
found. Basically I am running a simulation and invoking the commands through 
the file called 'commands'. Is there some path I need to specify as to where 
they should get copied in the destination server? Or by default where does it 
get copied to and where do I go to see my results file.  This is the error I 
get (where each file is part of the command that I specify in 'commands':)


decoder_node_1_line0_sim_4.sp: No such file or 
directorydecoder_node_1_line0_sim_3.sp: No such file or 
directorydecoder_node_1_line0_sim_1.sp: No such file or directory


decoder_node_1_line0_sim_2.sp: No such file or directory
My commands file contains:


ngspice decoder_node_1_line0_sim_1.sp ngspice decoder_node_1_line0_sim_2.sp 
ngspice decoder_node_1_line0_sim_3.sp 


ngspice decoder_node_1_line0_sim_4.sp 
and the tool parallel is being invoked from the directory in which these files 
are present. So, I expect that the tool should pick these files up from the 
current directory and distribute it to the server and run them. It runs locally 
on my machine, but the -S option gives me the above error. Can you pls suggest?


Thanks!



Please read 
http://www.gnu.org/software/parallel/man.html#example__using_remote_computers

or watch the intro videos:

https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1





/Ole





Reply via email to