Fwd: [Collex] application#index (ActionController::RoutingError) no route found to match \/nines/ escape(document.title) u,\ with {:method=:get}

2007-07-10 Thread Erik Hatcher
Please reply to me directly as well, as I'm not on the nutch-dev list regularly. I'm curious ... Googlebot, Yahoo Slurp, and now CazoodleBot (based on Nutch) are hitting our site at http://www.nines.org and I get all sorts of invalid links crawled. Is our site doing something wrong in

Re: [VOTE] Commiter access for Stefan Groschupf

2005-12-17 Thread Erik Hatcher
+1 On Dec 16, 2005, at 4:50 PM, Andrzej Bialecki wrote: Hi, During the past year and more Stefan participated actively in the development, and contributed many high-quality patches. He's been spending considerable effort on addressing many issues in JIRA, and proposing fixes and

Re: [Nutch-dev] RE: [proposal] Generic Markup Language Parser

2005-11-25 Thread Erik Hatcher
On 24 Nov 2005, at 23:49, Chris Mattmann wrote: Dublin core may is good for semantic web, but not for a content storage. I completely disagree with that. Me too. In fact, I think many people would disagree with that in fact. Dublin core is a standard metadata model for electronic

Re: Lucene or Nutch

2005-11-09 Thread Erik Hatcher
Yes, Lucene is the best fit for what you're after. Nutch is built on Lucene, and adds web crawling on top. You don't need a web crawler, so using Lucene directly is the best fit - of course you'll have to write code to integrate Lucene. Erik On 9 Nov 2005, at 08:48, Klaus wrote:

Re: Javacc

2005-11-07 Thread Erik Hatcher
What version of Ant are you using and what version of Lucene? The latest trunk version of Lucene has gone back to using the javacc task in Ant, which is a facade that handles all the various versions of JavaCC, so my hunch is that your Ant distribution needs to be updated. Erik On 6

Re: nutch downloads

2005-10-12 Thread Erik Hatcher
Joshua, We have received your message. I'm only remotely involved with Nutch, so I'm prodding other committers to Nutch to please update the links to take advantage of the mirroring system in place. Please - someone reply back volunteering to correct this ASAP. Erik On Oct 11,

Re: work on Nutch made Index with Lukes HighFreqTerms

2005-09-05 Thread Erik Hatcher
Nils, Your message is best directed to java-user@lucene.apache.org (please subscribe before sending to this address). Erik On Sep 5, 2005, at 3:22 PM, Nils Hoeller wrote: Hi, I now have implemented the Top Ten Term Search I have asked about before. I just need to filter the stuff

Re: junit test failed

2005-08-28 Thread Erik Hatcher
This has nothing to do with the version of Ant. JUnit's JAR file needs to be in ANT_HOME/lib for junit to work. Erik On Aug 28, 2005, at 9:45 PM, Fuad Efendi wrote: Check version of ANT! Line 173:junit printsummary=yes haltonfailure=no fork=yes dir=${basedir}

Re: Information extraction

2005-07-26 Thread Erik Hatcher
Further on the information extraction idea, consider what the SIMILE team at MIT are doing... http://simile.mit.edu The lower-case semantic web is gaining a lot of momentum these days, and I'm a strong proponent and student of it at the moment. Scraping rich information from a site is

Fwd: svn commit: r220056 - /lucene/nutch/trunk/src/test/org/apache/nutch/plugin/TestPluginSystem.java

2005-07-21 Thread Erik Hatcher
For grins I tried to see if I had commit access to fix the misspelling myself. Lo and behold I do! I hope I didn't step on any toes by committing this - if so let me know and I'll be more patient and submit patches. I'm a newbie to Nutch and definitely don't want to step in to