date:20091211

RE: how to force nutch to do a recrawl

2009-12-11 Thread Peters, Vijaya

Adam, I'm using cygwin to run the scripts. I use EditPlus to edit the files. But EditPlus won't allow me to edit the crc file. I'll see if I can ftp the file to a unix machine. Vijaya Peters SRA International, Inc. 12500 Fair Lakes Circle Room 3507 Fairfax, VA 22033 Tel: 703-222-9207

Nutch with hadoop 0.20.x

2009-12-11 Thread Tom Landvoigt

Hallo, Does anyone know when nutch will use the new hadoop version? Thanks a lot Tom

Re: Nutch with hadoop 0.20.x

2009-12-11 Thread Dennis Kubes

It has already been commited to SVN. You can pull and build an SVN release or we will be doing a 1.1 release shortly. Dennis Tom Landvoigt wrote: Hallo, Does anyone know when nutch will use the new hadoop version? Thanks a lot Tom

RE: how to force nutch to do a recrawl

2009-12-11 Thread BELLINI ADAM

hi, you shouldnt open the crc file you have to open the other one, which is part-0. use vi top edit part-. if you will not find this file so your dump failed...just check the logs/hadoop.log file Subject: RE: how to force nutch to do a recrawl Date: Fri, 11 Dec 2009 09:14:26

RE: NOINDEX, NOFOLLOW

2009-12-11 Thread BELLINI ADAM

hi, since i have custom plugin which parse and index DC meta, i was filling the dc.description and dc.keywords...and since in the solr i was searching also in description and keywords and display the title and 4 first lines of content, this make the noindexed page to be displayed in the

Luke reading index in hdfs

2009-12-11 Thread MilleBii

Guys is there a way you can get Luke to read the index from hdfs:// ??? Or you have to copy it out to the local filesystem? -- -MilleBii-

Re: Luke reading index in hdfs

2009-12-11 Thread Andrzej Bialecki

On 2009-12-11 22:21, MilleBii wrote: Guys is there a way you can get Luke to read the index from hdfs:// ??? Or you have to copy it out to the local filesystem? Luke 0.9.9 can open indexes directly from HDFS hosted on Hadoop 0.19.x. Luke 0.9.9.1 can do the same, but uses Hadoop 0.20.1. Start

stripping irrelevant contents

2009-12-11 Thread Ted Yu

Hi, We want to strip out irrelevant contents from the web pages we crawl. Examples of irrelevant contents are display ads that surround the main body of article on a web page. Please share your experience. Thanks

RE: how to force nutch to do a recrawl

Nutch with hadoop 0.20.x

Re: Nutch with hadoop 0.20.x

RE: how to force nutch to do a recrawl

RE: NOINDEX, NOFOLLOW

Luke reading index in hdfs

Re: Luke reading index in hdfs

stripping irrelevant contents

8 matches

Site Navigation

Mail list logo

Footer information