Re: [Nutch-general] Re: Using Nutch with Ferret (ruby)

2006-03-31 Thread Erik Hatcher
gave up on Ferret for the time being because of this incompatibility and am now prototyping with Solr while still using my custom XML-RPC search server for now. Erik -Mike On 3/30/06, Erik Hatcher [EMAIL PROTECTED] wrote: There is one incompatibility between Ferret and Java

Re: [Nutch-general] Re: Using Nutch with Ferret (ruby)

2006-03-30 Thread Erik Hatcher
There is one incompatibility between Ferret and Java Lucene of note. It is the UTF-8 issue that has surfaced with regards to Java Lucene. All can be well between Java Lucene and Ferret, until characters in another range are indexed, and then Ferret will blow up trying to search the

Re: [Nutch-general] Nutch web services

2006-03-24 Thread Erik Hatcher
Nutch has a servlet that supports A9s OpenSearch API. Are you needing more capabilities than this offers? Erik On Mar 24, 2006, at 9:16 AM, Aled Jones wrote: Hi Might have asked this before, but has anyone developed web services for nutch? I know there are web services for

Re: Good man is Different than Man good in Nutch?

2005-11-30 Thread Erik Hatcher
On 29 Nov 2005, at 22:41, Victor Lee wrote: ok, now I remembered something from the book Lucene in Action, it said something about word distance. So that's why they returns different results. But still, when I remembered when I went to Google Adwords and get the new Maximum CPC estimates

Re: lucene jar version

2005-11-12 Thread Erik Hatcher
and 1.4.3 would do the trick, but would be a lot to wade through. Erik regards, [EMAIL PROTECTED] - Original Message - From: Erik Hatcher [EMAIL PROTECTED] To: nutch-user@lucene.apache.org Sent: Thursday, November 10, 2005 1:51 AM Subject: Re: lucene jar version Nutch

Re: lucene jar version

2005-11-09 Thread Erik Hatcher
Nutch is using an un-official version of Lucene, which is some build from the trunk of Subversion. In the trunk of Lucene, those methods are deprecated and thus the 1.9- rc1-dev JAR you have has them flagged as such. Erik On 9 Nov 2005, at 17:37, Kenji wrote: Hi, I'm new here.

Re: Jira - Nutch 48 - did you mean patch

2005-10-31 Thread Erik Hatcher
No, Lucene does not have a built-in query that uses regular expressions. It's trivial to write a custom Query class like WildcardQuery that does regular expression searching. In fact, I've created this and am contributing it to Lucene as soon as I can (slowly but surely). As for how

Re: output format as xml?

2005-10-01 Thread Erik Hatcher
Nutch supports the OpenSearch API, which is a variant of RSS, and in XML. Erik On Sep 30, 2005, at 8:03 PM, gekkokid wrote: Howdy, if nutch doesnt support xml as a result format - its open source so you can customise it to your needs :) _gk - Original Message - From: XIN

Re: [Nutch-general] VOTE: (Re: RSS Feed Parser)

2005-08-11 Thread Erik Hatcher
+1 - with it disabled there isn't much risk. On Aug 11, 2005, at 6:07 PM, Andrzej Bialecki wrote: Chris Mattmann wrote: Hi Zaheed, Thanks for the nice comments. I've went ahead and wrote an HTML page that summarizes what I sent to Zaheed with respect to installing the parse-rss plugin.

Re: [Nutch-general] number of indexed pages

2005-07-29 Thread Erik Hatcher
Two options: bin/nutch readdb crawl/db -stats or use Luke (Google for luke lucene) to open the Lucene index. Erik On Jul 28, 2005, at 9:44 PM, blackwater dev wrote: After I finish a crawl...what is the best way to go into my crawl directory and get the number of indexed pages?

Re: [Nutch-general] Re: RDF plugin questions

2005-07-21 Thread Erik Hatcher
this integration task much much more difficult as it already is. Greetings, Stefan Am 19.07.2005 um 14:57 schrieb Erik Hatcher: Hi, I'm embarking on an adventure with Nutch to crawl 19th century digital scholarly archives (like the Rossetti Archive, where I work) for the nines.org system

Nutch + RDF for scholarly archives

2005-06-29 Thread Erik Hatcher
Is anyone here using Nutch for crawling digital scholarly archives? If so, are you also harvesting and indexing additional metadata? My group (http://www.patacriticism.org) is considering using Nutch to crawl a specific set of sites and index the HTML as full-text and also retrieve any