Hi,
How are you doing this exactly ?
If you have a text file with a list of files to download I would suggest
you use aria2c, which have this functionality built in:
$ aria2c -i urls.txt -j 100
$ aria2c --help
[...]
-i, --input-file=FILE Downloads URIs found in FILE. You can specify
multiple URIs for a single entity: separate
URIs on a single line using the TAB character.
Reads input from stdin when '-' is specified.
Additionally, options can be specified after
each
line of URI. This optional line must start
with
one or more white spaces and have one option
per
single line. See INPUT FILE section of man
page
for details. See also --deferred-input option.
Possible Values: /path/to/file, -
Tags: #basic
-j, --max-concurrent-downloads=N Set maximum number of parallel downloads
for
every static (HTTP/FTP) URL, torrent and
metalink.
See also --split option.
Possible Values: 1-*
Default: 5
Tags: #basic
[...]
aria2:
http://aria2.sourceforge.net/
http://sourceforge.net/apps/trac/aria2/wiki
On Tue, Jan 29, 2013 at 9:38 PM, yacob sen <[email protected]> wrote:
>
> Dear All,
>
> I have a perl script that works initially by fetching hundreds of files
> from a server. I know that gnu parallel is really suited for getting files
> from an ftp site using wget command in parallel. I would like to use the
> advantage of this awesome Gnu prallel tool by calling inside my perl script.
>
> Is that possible, if so an example would be very much appreciated. ?
>
> Regards
>
> Yacob
>
>
>
> --- On *Sun, 20/1/13, Nanditha Rao <[email protected]>* wrote:
>
>
> From: Nanditha Rao <[email protected]>
> Subject: Re: Multiple jobs on a multicore machine or cluster
> To: [email protected]
> Date: Sunday, 20 January, 2013, 9:45
>
> Never mind. I figured out that I had to transfer the files using
> --transfer before running them. And that it gets copied to the home
> directory of the destination machine by default.
>
> On Sat, Jan 19, 2013 at 12:58 AM, Nanditha Rao
> <[email protected]<http://mc/[email protected]>
> > wrote:
>
>
>
> On Fri, Jan 18, 2013 at 12:53 AM, Ole Tange
> <[email protected]<http://mc/[email protected]>
> > wrote:
>
> On Thu, Jan 17, 2013 at 12:58 PM, Nanditha Rao
> <[email protected]<http://mc/[email protected]>>
> wrote:
>
> > 1. I need to run multiple jobs on a multicore (and multithreaded)
> machine. I
> > am using the GNU Parallel utility to distribute jobs across the cores to
> > speed up the task. The commands to be executed are available in a file
> > called 'commands'. I use the following command to run the GNU Parallel.
> >
> > cat commands | parallel -j +0
> >
> > As per the guidance at this location- gnu parallel, this command is
> supposed
> > to use all the cores to run this task. My machine has 2 cores and 2
> threads
> > per core.
>
> I take it that you have a CPU with hyperthreading.
>
> [Nanditha: I guess so. I am using an Intel core i3 laptop to test this
> tool out..]
>
>
> > The system monitor however shows 4 CPUs (CPU1 and CPU2 belong to
> > core1, CPU3 and CPU4 belong to core2). Each job (simulation) takes about
> 20
> > seconds to run on a single core. I ran 2 jobs in parallel using this GNU
> > parallel utility with the command above. I observe in the system monitor
>
> What system monitor are you using?
>
> [Nanditha: gnome-system-monitor on ubuntu]
>
>
> > that, if the 2 jobs are assigned to cpu1 and cpu2 (that is the same
> core),
> > there is obviously no speed-up.
>
> Why obviously? Normally I measure a speedup of 30-70% when using
> hyperthreading.
>
> [Nanditha: I somehow dont see a speedup. Running a single job on single
> thread on single core versus two threads on the same core is taking the
> same time- about 20seconds]
>
>
> > They take about 40seconds to finish, which
> > is about the time they would take if run sequentially. However, sometimes
> > the tool distributes the 2 jobs to CPU1 and CPU3 or CPU4 (which means, 2
> > jobs are assigned to 2 different cores). In this case, both jobs finish
> > parallely in 20 seconds.
>
> GNU Parallel does not do the distributing; it simply spawns jobs. The
> distribution is done by your operating system.
>
> > Now, I want to know if there is a way in which I can force the tool to
> run
> > on different "cores" and not on different "threads" on the same core, so
> > that there is appreciable speed-up. Any help is appreciated. Thanks!
>
> If you are using GNU/Linux you can use taskset which can set a mask on
> which cores a task can be scheduled on. If you want every other:
> 1010(bin) = 0xA. For a 128 core machine you could run:
>
> cat commands | taskset 0xaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa parallel -j +0
>
> [Nanditha: Tried this, thanks. But seems like it doesnt help speedup the
> jobs as assumed by me earlier]
>
>
> > 2. Also, I want to know if there is a way to run this utility over a
> cluster
> > of machines.. say, there are four 12-core machines in a cluster (making
> it a
> > 48-core cluster).
>
> cat commands | parallel -j +0 -S server1,server2,server3,server4
>
> [Nanditha: I tried this option. cat commands|parallel -j +0 --sshlogin
> username@ip_address
> However, I get an error that the files listed the 'commands' file are not
> to be found. Basically I am running a simulation and invoking the commands
> through the file called 'commands'. Is there some path I need to specify as
> to where they should get copied in the destination server? Or by default
> where does it get copied to and where do I go to see my results file. This
> is the error I get (where each file is part of the command that I specify
> in 'commands':)
> decoder_node_1_line0_sim_4.sp: No such file or directory
> decoder_node_1_line0_sim_3.sp: No such file or directory
> decoder_node_1_line0_sim_1.sp: No such file or directory
> decoder_node_1_line0_sim_2.sp: No such file or directory
>
> My commands file contains:
> ngspice decoder_node_1_line0_sim_1.sp
> ngspice decoder_node_1_line0_sim_2.sp
> ngspice decoder_node_1_line0_sim_3.sp
> ngspice decoder_node_1_line0_sim_4.sp
>
> and the tool parallel is being invoked from the directory in which these
> files are present. So, I expect that the tool should pick these files up
> from the current directory and distribute it to the server and run them. It
> runs locally on my machine, but the -S option gives me the above error. Can
> you pls suggest?
> Thanks!
>
>
> Please read
> http://www.gnu.org/software/parallel/man.html#example__using_remote_computers
> or watch the intro videos:
> https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
>
>
> /Ole
>
>
>
>