Re: Format of the Nutch Results

2010-04-22 Thread nachonieto3
Thank you very much!!!I've tried the command as you told me...but I still have some problems...Till I understand is something about the JAVA_HOME, that I've already defined and checked the integrability of the file. I leave you a capture of the problem, maybe someone know what I'm doing wrong. Th

Scheduler questions, 1.1 nightly build.

2010-04-22 Thread Phil Barnett
I'm having a problem where shouldfetch is rejecting everything. I have deleted the crawl directory and started the entire crawl from scratch by rm -rf crawl mkdir crawl mkdir segments I'm absolutely baffled by how this scheduler works. Is there documentation? Is the fetchtime saved somewhere ot

Re: Scheduler questions, 1.1 nightly build.

2010-04-22 Thread Phil Barnett
I should add that what I really want to do is toss all previous crawl information and reindex everything every night. It's just a few servers and very low impact. My crawl on 1.0 takes about 10 minutes. On Thu, Apr 22, 2010 at 4:59 AM, Phil Barnett wrote: > I'm having a problem where shouldfetch

Lucandra - Lucene/Solr on Cassandra: April 26, NYC

2010-04-22 Thread Otis Gospodnetic
Hello folks, Those of you in or near NYC and using Lucene or Solr should come to "Lucandra - a Cassandra-based backend for Lucene and Solr" on April 26th: http://www.meetup.com/NYC-Search-and-Discovery/calendar/12979971/ The presenter will be Lucandra's author, Jake Luciani. Please spread the

RE: Is there some arbitrary limit on content stored for use by summaries?

2010-04-22 Thread Tim Redding
Hey Arkadi, I've tried upping the value to Integer.MAX_VALUE but it still doesn't show a relevant summary. :-( Any other ideas? Tim.. -Original Message- From: arkadi.kosmy...@csiro.au [mailto:arkadi.kosmy...@csiro.au] Sent: 21 April 2010 23:29 To: nutch-user@lucene.apache.org Su

Re: Is there some arbitrary limit on content stored for use by summaries?

2010-04-22 Thread Julien Nioche
Try refetching with a different value for : file.content.limit 65536 The length limit for downloaded content, in bytes. If this value is nonnegative (>=0), content longer than it will be truncated; otherwise, no truncation at all. Julien On 22 April 2010 18:44, Tim Redding wrote:

Language specifications

2010-04-22 Thread Joshua J Pavel
Alternate question... thanks to everyone who has tried to help me through the hadoop/AIX issues with 1.0, but I'm going to need to shelf that for just a second while I work on some stuff with 0.9 again. I need to support one site that has 3 translations: English, French, and Spanish. The languag

RE: Language specifications

2010-04-22 Thread Arkadi.Kosmynin
Hi Joshua, > -Original Message- > From: Joshua J Pavel [mailto:jpa...@us.ibm.com] > Sent: Friday, 23 April 2010 6:57 AM > To: nutch-user@lucene.apache.org > Subject: Language specifications > > > Alternate question... thanks to everyone who has tried to help me > through > the hadoop/AIX

Re: how to parse html files while crawling

2010-04-22 Thread cefurkan0 cefurkan0
well main question is that i need html elements removed files this is important not other things is this possible ? On 21 April 2010 16:38, nachonieto3 wrote: > > Thank you a lot! Now I'm working on that, I have some doubts more...I'm not > able to run the command readseg...I've been consulti