Hi @ all,
where is it possible to set plugin (my own plugin) specific parameters /
configurations? Thanks in advance.
Regards,
MyD
--
View this message in context:
http://www.nabble.com/Where-to-put-plugin-specific-parameters---configurations-tp22577145p22577145.html
Sent from the Nutch -
Hi @ all,
is it possible to set the next fetch schedule for a url in another crawl
dir?
Example:
crawl.dir.A
- retrieve links and set the fetch schedule but this should go into the
crawl.dir.B
crawl.dir.B
Thanks in advance
Regards,
MyD
--
View this message in context:
well you can always write a bash script or a java class that does
this. writing a java class is probably better and easier. you have a
manual for importing nutch into eclipse in case you don't know how. i
needed a similar thing done and it turned out that using java really
is easier...
On Wed,
Hi ripper,
Thanks, do u know how to do it in java? I tried to, but haven't found the
suitable classes. Thanks in advance.
Cheers,
MyD
ripper07 wrote:
well you can always write a bash script or a java class that does
this. writing a java class is probably better and easier. you have a
you have a manual for importing nutch into eclipse in case you don't know
how
can u pl mention the link...
thanx in advance
ripper07 wrote:
well you can always write a bash script or a java class that does
this. writing a java class is probably better and easier. you have a
manual for
ok this is how i did it. i created a class in the
org.apache.nutch.crawl package, the same package where the crawl class
(which is nutch's main class, called by the crawl command). in that
class, you create the crawl class with the appropriate parameter. just
look at the code once you import it
ripper07 wrote:
ok this is how i did it. i created a class in the
org.apache.nutch.crawl package, the same package where the crawl class
(which is nutch's main class, called by the crawl command). in that
class, you create the crawl class with the appropriate parameter. just
look at the
Hi,
I heard from my friends that doing incremental index update in Nutch
is not easy. Is it there a way to configure the Nutch crawler to craw
only the changed website and then update the existing index?
Thanks
Victor
I actually use Nutch as a large scale search engine on two products. I think a
few things that would be nice to have are built in options to produce an
incremental index and maybe a quartz scheduler to automate it completely.
One thing that would be nice is when one of us figures something
Hi,
During tests of crawling (with crawl command) big 1mln website HDD space
was run out.
So I have
crawldb with 1 112 000 urls (112 000 urls were tested before)
segments with 40GB of data
index with partial data
/tmp/hadoop-root with 173GB of temporary hadoop data
After looking at mailing
hi, all:
i can get index url like
http://app02.laopdr.gov.la/ePortal/news/detail.action?id=10110from=ePortal_NewsDetail_FromHome
but cannot get index like
http://app02.laopdr.gov.la/ePortal/news/detail.action?request_locale=en_USid=10110from=ePortal_NewsDetail_FromHome
and
When I try to merge the Segments of two crawls, 2Gb and 1Gb each. I get a
very bizarre eror:
Exception in thread main java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232)
at
Hi:
you can see source code of Crawl class which can be used to start nutch by
java command without cgywin.
java -D... -classpath ... org.apache.nutch.crawl.Crawl urls -depth 10 -topN
1000
good luck
yanky
2009/3/18 MyD myd.ro...@googlemail.com
This is an interesting question. If you know
Hi:
you can put any parameters in nutch-site.xml as property settings, and get
property from your plugin class by conf.get(your property name)
good luck
yanky
2009/3/18 MyD myd.ro...@googlemail.com
Hi @ all,
where is it possible to set plugin (my own plugin) specific parameters /
Hi:
according to my understanding, in nuch 1.0, you can configure nutch to
recrawl with a specific schedule:
see this issue: http://issues.apache.org/jira/browse/NUTCH-61
and this class: AdaptiveFetchSchedule
by the way, there is no way to configure nutch to only recrawl changed
website,
please help me, it is Urgent and Important, thanks
-- Forwarded message --
From: 陈琛 kylin.chc...@gmail.com
Date: 2009/3/19
Subject: index web
To: nutch-user@lucene.apache.org
hi, all:
i can get index url like
16 matches
Mail list logo