In hadoop-0.20.0, change the hadoop-site.xml configure file to
core-site.xml, hdfs-site.xml,
So how to config the nutch-1.0 to run with hadoop-0.20.0?
Any help will be appreciated.
can anyone help me?
On Tue, Jul 14, 2009 at 7:05 PM, lei wang wrote:
> I run nutch to convert arc file to segements, it works well for 1
> millions pages, but when i increase the page counts to 500 millions, it
> failed for the error messges as below. can anyone he
I run nutch to convert arc file to segements, it works well for 1 millions
pages, but when i increase the page counts to 500 millions, it failed for
the error messges as below. can anyone help me ? I
java.io.IOException: Task process exit with nonzero status of 255.
at org.apache.hadoop
This is also how I fixed this problem.
On 6/21/08, Sayali Kulkarni wrote:
>
> Hi!
>
> My problem of "Too many fetch failures" as well as "shuffle error" was
> resolved when I added the list of all the slave machines in the /etc/hosts
> file.
>
> Earlier on every slave I just had the entries of th
change crawl-urlfilter.txt to this:
===
# skip URLs containing certain characters as probable queries, etc.
-[ ]
# skip URLs with slash-delimited segment that repeats 3+ times, to break
loops
-.*(/[^/]+)/[^/]+\1/[^/]+\1/
# accept hosts in MY.DOMAIN.NAME
+^.*
anyone help? so disappointed.
On Fri, Jul 10, 2009 at 4:29 PM, lei wang wrote:
> Yes, I am also occuring to this problem. Can anyone help?
>
>
> On Sun, Jul 5, 2009 at 11:33 PM, xiao yang wrote:
>
>> I often get this error message while crawling the intranet
>> Is i
hi, everyone, an error occurred to me as "Too many fetch-failures" when use
nutch-1.0.
can anyone help me, thanks a lot.
my hadoop-site.xml file config as :
fs.default.name
hdfs://distributed1:9000/
The name of the default file system. Either the literal
string "local" or a host:port
Yes, I am also occuring to this problem. Can anyone help?
On Sun, Jul 5, 2009 at 11:33 PM, xiao yang wrote:
> I often get this error message while crawling the intranet
> Is it the network problem? What can I do for it?
>
> $bin/nutch crawl urls -dir crawl -depth 3 -topN 4
>
> crawl started in:
hi, I try to convert arc file to segments these days , nutch goes well for
convert 2millions pages,but for it failed for " Task
attempt_200907091108_0001_m_000520_0 failed to report status for 602
seconds. Killing!" when i increase the page counter to 7 millions, I have 10
nodes. for the hadoop-s
Would you give me the detail solution ? Thanks.
On Sun, Jul 5, 2009 at 10:06 PM, lei wang wrote:
>
> Thanks, i think you are not read my issue carefully.
> your suggestion in the reply I have tested for long time.
>
>
> On Sun, Jul 5, 2009 at 9:46 PM, Julien Nioche &l
tting the heapsize in the right way. This should be done in
> the hadoop-site.xml using the parameter mapred.child.java.opts. Changing
> hadoop-env modifies the amount of memory to be used for the services
> (JobTracker, DataNodes, etc...) but not for the hadoop tasks
>
> HTH
>
> Ju
These is no one to tell us how to manage a cluster running, so disappointed.
On Fri, Jul 3, 2009 at 12:21 AM, lei wang wrote:
> Hi,everyone, these days a nutch problem occur to me when I test nutch to
> index 2 millions pages.
>
> When then program steps into the reduce stag
Hi,everyone, these days a nutch problem occur to me when I test nutch to
index 2 millions pages.
When then program steps into the reduce stage of crawldb update, the error
messeges gives as below:
Before this test, I try to crawl and index 1 millions pages, nutch goes
well.
I alter the HADOOP_H
yes,i also occurred this problem,when i export HADOOP_HEAPSIZE=2000
,4000,6000,my Memory is 6GB
On Wed, Jun 24, 2009 at 4:59 PM, SunGod wrote:
> Error occurred in "crawldb TestDB/crawldb" reduce phase
>
> i get error msg --- java.lang.OutOfMemoryError: Java heap space
>
> my command
> bin/nutc
I agree with you that we should spilit up index at the stage of
indexing. We are thinking on the same page. Maybe we can read index file
directory and segements directory in nutch api, and spilt segements file
dir by documents, and build index on each segements file?
nutch claim that it a dist
15 matches
Mail list logo