(cc to d...@nutch since you are addressing devs too)
Hey Andrzej:
>
> As you probably know, there are currently two active lines of
> development for Nutch:
> [...snip...]
>
> Regarding branch-1.2 (which is a maintenance branch after release 1.2)
> there have been pretty no updates there, if
Use the hadoop.tmp.dir setting in nutch-site.conf to point to a disk where
plenty is space is available.
> Other users have previously reported similar problems which were due to a
> lack on space on disk as suggested by this
>
> *Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException
Other users have previously reported similar problems which were due to a
lack on space on disk as suggested by this
*Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could
notfind any valid local directory for
taskTracker/jobcache/job_local_0001/attempt_local_0001_m_32_0/outp
+1 from me. I've committed today a bunch of patches which were in 1.2 but
not in 1.3 (just one last one to do) but haven't compared with 2.0
Having a release based on 1.3 would be great as it would be a nice
transition towards 2.0 (delegate indexing/search, dependency management with
Ivy, separati
Hi users & devs,
As you probably know, there are currently two active lines of
development for Nutch:
* Nutch trunk, a.k.a. Nutch 2.0: this is based on a completely
redesigned storage layer that uses Apache Gora, which in turn can use
various storage implementations such as HBase, Cassandra,
Which command did you use? Merging segments is very expensive in resources, so
I try to avoid merging them.
-Original Message-
From: Marseld Dedgjonaj
To: user
Sent: Tue, Jan 4, 2011 7:12 am
Subject: FW: Exception on segment merging
I see in hadup log and some more details a
Hello,
Thanks you for your response.
Let me give you more detail of the issue that I have.
First definitions. Let say I have my own domain that I host on a dedicated
server and call it mydomain.com
Next, call subdomain the followings answers.mydomain.com, mail.mydomain.com,
maps.mydomain.com a
I see in hadup log and some more details about the exception are there.
Please help me what to check for this error.
Here are the details:
2011-01-04 07:40:23,999 INFO segment.SegmentMerger - Slice size: 5
URLs.
2011-01-04 07:40:36,563 INFO segment.SegmentMerger - Slice size: 5
URLs.
20
Alex,
See http://wiki.apache.org/solr/FieldCollapsing for implementing this in
SOLR. Since the indexing and searching is delegated to SOLR as of Nutch 1.3
and 2.0 this won't be implemented directly in Nutch.
HTH
Julien
On 4 January 2011 00:10, wrote:
> Hello,
>
> I used nutch-1.2 to index a
I am using a hadoop cluster so I am putting my conf files into the nutch
source conf directory and building a nutch job file. I am then putting the
job file into the classpath. I thought it was working fine since it seems to
be reading the regex-urlfilter.txt from there. However, I am getting
messa
Hello everybody,
I have configured nutch-1.2 to crawl all urls of a specific website.
It runs fine for a while but now that the number of indexed urls has grown
more than 30'000, I got an exception on segment merging.
Have anybody seen this kind of error.
The exception is shown below.
On Tue, Jan 4, 2011 at 5:40 AM, wrote:
> Hello,
>
> I used nutch-1.2 to index a few domains. I noticed that nutch correctly
> crawled all sub-pages of domains. By sub-pages I mean the followings, for
> example for a domain mydomain.com all links inside it like
> mydomain.com/show/photos/1 and e
12 matches
Mail list logo