See http://wiki.apache.org/nutch/NutchTutorial#A3.3._Using_the_crawl_script

just go to runtime/deploy/bin and run the script from there.

Julien


On 29 August 2014 13:38, Meraj A. Khan <[email protected]> wrote:

> Hi Julien,
>
> I have 15 domains and they are all being fetched in a single map task which
> does not fetch all the urls no matter what depth or topN i give.
>
> I am submitting the Nutch job jar which seems to be using the Crawl.java
> class, how do I use the Crawl script on a Hadoop cluster, are there any
> pointers you can share?
>
> Thanks.
> On Aug 29, 2014 4:40 AM, "Julien Nioche" <[email protected]>
> wrote:
>
> > Hi Meraj,
> >
> > The generator will place all the URLs in a single segment if all they
> > belong to the same host for politeness reason. Otherwise it will use
> > whichever value is passed with the -numFetchers parameter in the
> generation
> > step.
> >
> > Why don't you use the crawl script in /bin instead of tinkering with the
> > (now deprecated) Crawl class? It comes with a good default configuration
> > and should make your life easier.
> >
> > Julien
> >
> >
> > On 28 August 2014 06:47, Meraj A. Khan <[email protected]> wrote:
> >
> > > Hi All,
> > >
> > > I am running Nutch 1.7 on Hadoop 2.3.0 cluster and and I noticed that
> > there
> > > is only a single reducer in the generate partition job. I am  running
> in
> > a
> > > situation where the subsequent fetch is only running in a single map
> task
> > > (I believe as a consequence of a single reducer in the earlier phase).
> > How
> > > can I force Nutch to do fetch in multiple map tasks , is there a
> setting
> > to
> > > force more than one reducers in the generate-partition job to have more
> > map
> > > tasks ?.
> > >
> > > Please also note that I have commented out the code in Crawl.java to
> not
> > do
> > > the LInkInversion phase as , I dont need the scoring of the URLS that
> > Nutch
> > > crawls, every URL is equally important to me.
> > >
> > > Thanks.
> > >
> >
> >
> >
> > --
> >
> > Open Source Solutions for Text Engineering
> >
> > http://digitalpebble.blogspot.com/
> > http://www.digitalpebble.com
> > http://twitter.com/digitalpebble
> >
>



-- 

Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Reply via email to