Re: Specify at least one source--a file or resource collection error

2009-09-29 Thread Jaime Martín
Hi again: I just want to be able to build nucth in eclipse. What version do you use? Is last official release 1.0 not advisable? any plugin or reliable svn version required? thank you very much. El 23 de septiembre de 2009 15:40, Jaime Martín james...@gmail.comescribió: Hi: I´m following the

Merging Segments Problem

2009-09-29 Thread Mina Azib
I am crawling daily and putting each new crawl into a new directory every day using the nutch crawl command. Lets say i get up to having a bunch(20+) of crawls(793 MB each) crawled. I can merge all of their segments into 1 large segment (3.4 GB) at the same time with no problem in approx 3 hours

Multilanguage support in Nutch 1.0

2009-09-29 Thread David Jashi
Hello, all. I've got a bit of a trouble with Nutch 1.0 and multilanguage support: I have fresh install of Nutch and two analysis plugins I'd like to turn on: analysis-de (German) and analysis-ge (Georgian) Here are the innards of my seed file: ---

Re: Merging Segments Problem

2009-09-29 Thread MilleBii
There are several reports on the mailing list about this, basically merging segments is very consumming in ressources HD CPU... To help it a few things can be done : + use compress mode for hadoop file system + use (pseudo) distributed mode since it will map reduce in parallel using less HD I

[ANN] Carrot2 version 3.1.0 released

2009-09-29 Thread Stanislaw Osinski
Dear All, [Apologies for cross-posting.] This is just to let you know that we've released version 3.1.0 of Carrot2 Search Results Clustering Engine. The 3.1.0 release comes with: * Experimental support for clustering Chinese Simplified content (based on Lucene's Smart Chinese Analyzer) *

R: Using Nutch for only retriving HTML

2009-09-29 Thread O. Olson
Sorry for pushing this topic, but I would like to know if Nutch would help me get the raw HTML in my situation described below. I am sure it would be a simple answer to those who know Nutch. If not then I guess Nutch is the wrong tool for the job. Thanks, O. O. --- Gio 24/9/09, O. Olson

Re: R: Using Nutch for only retriving HTML

2009-09-29 Thread Susam Pal
On Wed, Sep 30, 2009 at 1:39 AM, O. Olson olson_...@yahoo.it wrote: Sorry for pushing this topic, but I would like to know if Nutch would help me get the raw HTML in my situation described below. I am sure it would be a simple answer to those who know Nutch. If not then I guess Nutch is the

RE: Multilanguage support in Nutch 1.0

2009-09-29 Thread BELLINI ADAM
hi try to activate the language-identifier plugin you must add it in the nutch-site.xml file in the nameplugin.includes/name section. it's some thing like that property nameplugin.includes/name