Can nutch run with hadoop-0.20.0 ?

2009-07-31 Thread lei wang
In hadoop-0.20.0, change the hadoop-site.xml configure file to core-site.xml, hdfs-site.xml, So how to config the nutch-1.0 to run with hadoop-0.20.0? Any help will be appreciated.

Re: job failed for "java.io.IOException: Task process exit with nonzero status of 255."

2009-07-14 Thread lei wang
can anyone help me? On Tue, Jul 14, 2009 at 7:05 PM, lei wang wrote: > I run nutch to convert arc file to segements, it works well for 1 > millions pages, but when i increase the page counts to 500 millions, it > failed for the error messges as below. can anyone he

job failed for "java.io.IOException: Task process exit with nonzero status of 255."

2009-07-14 Thread lei wang
I run nutch to convert arc file to segements, it works well for 1 millions pages, but when i increase the page counts to 500 millions, it failed for the error messges as below. can anyone help me ? I java.io.IOException: Task process exit with nonzero status of 255. at org.apache.hadoop

Too many fether failures

2009-07-11 Thread lei wang
This is also how I fixed this problem. On 6/21/08, Sayali Kulkarni wrote: > > Hi! > > My problem of "Too many fetch failures" as well as "shuffle error" was > resolved when I added the list of all the slave machines in the /etc/hosts > file. > > Earlier on every slave I just had the entries of th

Re: how to allow every url to b accepted

2009-07-10 Thread lei wang
change crawl-urlfilter.txt to this: === # skip URLs containing certain characters as probable queries, etc. -[ ] # skip URLs with slash-delimited segment that repeats 3+ times, to break loops -.*(/[^/]+)/[^/]+\1/[^/]+\1/ # accept hosts in MY.DOMAIN.NAME +^.*

Re: Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.

2009-07-10 Thread lei wang
anyone help? so disappointed. On Fri, Jul 10, 2009 at 4:29 PM, lei wang wrote: > Yes, I am also occuring to this problem. Can anyone help? > > > On Sun, Jul 5, 2009 at 11:33 PM, xiao yang wrote: > >> I often get this error message while crawling the intranet >> Is i

job failed for "Too many fetch-failures"

2009-07-10 Thread lei wang
hi, everyone, an error occurred to me as "Too many fetch-failures" when use nutch-1.0. can anyone help me, thanks a lot. my hadoop-site.xml file config as : fs.default.name hdfs://distributed1:9000/ The name of the default file system. Either the literal string "local" or a host:port

Re: Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.

2009-07-10 Thread lei wang
Yes, I am also occuring to this problem. Can anyone help? On Sun, Jul 5, 2009 at 11:33 PM, xiao yang wrote: > I often get this error message while crawling the intranet > Is it the network problem? What can I do for it? > > $bin/nutch crawl urls -dir crawl -depth 3 -topN 4 > > crawl started in:

Arc to segements failed for " Task attempt_200907091108_0001_m_000520_0 failed to report status for 602 seconds. Killing!"

2009-07-09 Thread lei wang
hi, I try to convert arc file to segments these days , nutch goes well for convert 2millions pages,but for it failed for " Task attempt_200907091108_0001_m_000520_0 failed to report status for 602 seconds. Killing!" when i increase the page counter to 7 millions, I have 10 nodes. for the hadoop-s

Re: nutch crawldb failed for java heap space

2009-07-05 Thread lei wang
Would you give me the detail solution ? Thanks. On Sun, Jul 5, 2009 at 10:06 PM, lei wang wrote: > > Thanks, i think you are not read my issue carefully. > your suggestion in the reply I have tested for long time. > > > On Sun, Jul 5, 2009 at 9:46 PM, Julien Nioche &l

Re: nutch crawldb failed for java heap space

2009-07-05 Thread lei wang
tting the heapsize in the right way. This should be done in > the hadoop-site.xml using the parameter mapred.child.java.opts. Changing > hadoop-env modifies the amount of memory to be used for the services > (JobTracker, DataNodes, etc...) but not for the hadoop tasks > > HTH > > Ju

Re: nutch crawldb failed for java heap space

2009-07-03 Thread lei wang
These is no one to tell us how to manage a cluster running, so disappointed. On Fri, Jul 3, 2009 at 12:21 AM, lei wang wrote: > Hi,everyone, these days a nutch problem occur to me when I test nutch to > index 2 millions pages. > > When then program steps into the reduce stag

nutch crawldb failed for java heap space

2009-07-02 Thread lei wang
Hi,everyone, these days a nutch problem occur to me when I test nutch to index 2 millions pages. When then program steps into the reduce stage of crawldb update, the error messeges gives as below: Before this test, I try to crawl and index 1 millions pages, nutch goes well. I alter the HADOOP_H

Re: How torunning nutch on 2G memory tasknode

2009-07-02 Thread lei wang
yes,i also occurred this problem,when i export HADOOP_HEAPSIZE=2000 ,4000,6000,my Memory is 6GB On Wed, Jun 24, 2009 at 4:59 PM, SunGod wrote: > Error occurred in "crawldb TestDB/crawldb" reduce phase > > i get error msg --- java.lang.OutOfMemoryError: Java heap space > > my command > bin/nutc

Re: spliting an index

2009-06-16 Thread lei wang
I agree with you that we should spilit up index at the stage of indexing. We are thinking on the same page. Maybe we can read index file directory and segements directory in nutch api, and spilt segements file dir by documents, and build index on each segements file? nutch claim that it a dist