Thank you very much!!!I've tried the command as you told me...but I still
have some problems...Till I understand is something about the JAVA_HOME,
that I've already defined and checked the integrability of the file. I leave
you a capture of the problem, maybe someone know what I'm doing wrong.
Th
I'm having a problem where shouldfetch is rejecting everything. I have
deleted the crawl directory and started the entire crawl from scratch by
rm -rf crawl
mkdir crawl
mkdir segments
I'm absolutely baffled by how this scheduler works.
Is there documentation?
Is the fetchtime saved somewhere ot
I should add that what I really want to do is toss all previous crawl
information and reindex everything every night. It's just a few servers and
very low impact. My crawl on 1.0 takes about 10 minutes.
On Thu, Apr 22, 2010 at 4:59 AM, Phil Barnett wrote:
> I'm having a problem where shouldfetch
Hello folks,
Those of you in or near NYC and using Lucene or Solr should come to "Lucandra -
a Cassandra-based backend for Lucene and Solr" on April 26th:
http://www.meetup.com/NYC-Search-and-Discovery/calendar/12979971/
The presenter will be Lucandra's author, Jake Luciani.
Please spread the
Hey Arkadi,
I've tried upping the value to Integer.MAX_VALUE but it still doesn't
show a relevant summary. :-(
Any other ideas?
Tim..
-Original Message-
From: arkadi.kosmy...@csiro.au [mailto:arkadi.kosmy...@csiro.au]
Sent: 21 April 2010 23:29
To: nutch-user@lucene.apache.org
Su
Try refetching with a different value for :
file.content.limit
65536
The length limit for downloaded content, in bytes.
If this value is nonnegative (>=0), content longer than it will be
truncated;
otherwise, no truncation at all.
Julien
On 22 April 2010 18:44, Tim Redding wrote:
Alternate question... thanks to everyone who has tried to help me through
the hadoop/AIX issues with 1.0, but I'm going to need to shelf that for
just a second while I work on some stuff with 0.9 again.
I need to support one site that has 3 translations: English, French, and
Spanish. The languag
Hi Joshua,
> -Original Message-
> From: Joshua J Pavel [mailto:jpa...@us.ibm.com]
> Sent: Friday, 23 April 2010 6:57 AM
> To: nutch-user@lucene.apache.org
> Subject: Language specifications
>
>
> Alternate question... thanks to everyone who has tried to help me
> through
> the hadoop/AIX
well main question is that
i need html elements removed files
this is important not other things
is this possible ?
On 21 April 2010 16:38, nachonieto3 wrote:
>
> Thank you a lot! Now I'm working on that, I have some doubts more...I'm not
> able to run the command readseg...I've been consulti