Adam,
I'm using cygwin to run the scripts. I use EditPlus to edit the files. But
EditPlus won't allow me to edit the crc file. I'll see if I can ftp the file
to a unix machine.
Vijaya Peters
SRA International, Inc.
12500 Fair Lakes Circle
Room 3507
Fairfax, VA 22033
Tel: 703-222-9207
Hallo,
Does anyone know when nutch will use the new hadoop version?
Thanks a lot
Tom
It has already been commited to SVN. You can pull and build an SVN
release or we will be doing a 1.1 release shortly.
Dennis
Tom Landvoigt wrote:
Hallo,
Does anyone know when nutch will use the new hadoop version?
Thanks a lot
Tom
hi,
you shouldnt open the crc file you have to open the other one, which is
part-0.
use vi top edit part-.
if you will not find this file so your dump failed...just check the
logs/hadoop.log file
Subject: RE: how to force nutch to do a recrawl
Date: Fri, 11 Dec 2009 09:14:26
hi,
since i have custom plugin which parse and index DC meta, i was filling the
dc.description and dc.keywords...and since in the solr i was searching also in
description and keywords and display the title and 4 first lines of content,
this make the noindexed page to be displayed in the
Guys is there a way you can get Luke to read the index from hdfs:// ???
Or you have to copy it out to the local filesystem?
--
-MilleBii-
On 2009-12-11 22:21, MilleBii wrote:
Guys is there a way you can get Luke to read the index from hdfs:// ???
Or you have to copy it out to the local filesystem?
Luke 0.9.9 can open indexes directly from HDFS hosted on Hadoop 0.19.x.
Luke 0.9.9.1 can do the same, but uses Hadoop 0.20.1.
Start
Hi,
We want to strip out irrelevant contents from the web pages we crawl.
Examples of irrelevant contents are display ads that surround the main body
of article on a web page.
Please share your experience.
Thanks