Re: Which Operating-System do you use for Nutch

2006-09-26 Thread Tomi NA
On 9/25/06, Jim Wilson [EMAIL PROTECTED] wrote: flamebait You can get it working on Windows if you're willing to work for it. To use Nutch OOTB, you have to install Cygwin since the provided Nutch launcher is written in Bash. Members of the community have provided alternatives, such as this

Re: term frequency

2006-09-26 Thread Enis Soztutar
Chris K Wensel wrote: Hi all I'm interested in playing with term frequency values in a nutch index on a per document and index wide scope. for example, something similar to this lucene faq entry. http://tinyurl.com/ra3ys so what is the 'correct' way to inspect the nutch index for these

Re: stop an index server

2006-09-26 Thread Jim Wilson
Do you mean what crawl-urlfilter.txt line you'd need? I think the following would do it: -^http://server:port/ But I'm not convinced that this is what you were asking ... -- Jim On 9/26/06, Alvaro Cabrerizo [EMAIL PROTECTED] wrote: How could I stop an index server (started with bin/nutch

How to crawl (store) only english pages?

2006-09-26 Thread Mike Smith
Hi, Is there any way to store only english pages at the crawling stage rather than adding just the meta data lang:en to the index using language identifier plugin? Thanks, Mike

Re: stop an index server

2006-09-26 Thread Alvaro Cabrerizo
Ok, I'll try to explain it in a more clear way. Imagine that you have finished crawling a group of sites and you have a well formed index. Then you configure tomcat, create a nutch-site.xml, add the property searcher.dir pointing to a search-servers.txt that contains this line: 127.0.0.1 4.

[ANNOUNCE] Nutch 0.8.1 available

2006-09-26 Thread Sami Siren
Nutch Project is pleased to announce the availability of 0.8.1 release of Nutch - the open source web-search software based on lucene and hadoop. The release is immediately available for download from: http://lucene.apache.org/nutch/release/ Nutch 0.8.1 is a maintenance release for 0.8