Please reply to me directly as well, as I'm not on the nutch-dev list
regularly.
I'm curious ... Googlebot, Yahoo Slurp, and now CazoodleBot (based on
Nutch) are hitting our site at http://www.nines.org and I get all
sorts of invalid links crawled. Is our site doing something wrong in
+1
On Dec 16, 2005, at 4:50 PM, Andrzej Bialecki wrote:
Hi,
During the past year and more Stefan participated actively in the
development, and contributed many high-quality patches. He's been
spending considerable effort on addressing many issues in JIRA, and
proposing fixes and
On 24 Nov 2005, at 23:49, Chris Mattmann wrote:
Dublin core may is good for semantic web, but not for a content
storage.
I completely disagree with that.
Me too.
In fact, I think many people would disagree
with that in fact. Dublin core is a standard metadata model for
electronic
Yes, Lucene is the best fit for what you're after. Nutch is built on
Lucene, and adds web crawling on top. You don't need a web crawler,
so using Lucene directly is the best fit - of course you'll have to
write code to integrate Lucene.
Erik
On 9 Nov 2005, at 08:48, Klaus wrote:
What version of Ant are you using and what version of Lucene?
The latest trunk version of Lucene has gone back to using the
javacc task in Ant, which is a facade that handles all the various
versions of JavaCC, so my hunch is that your Ant distribution needs
to be updated.
Erik
On 6
Joshua,
We have received your message. I'm only remotely involved with
Nutch, so I'm prodding other committers to Nutch to please update the
links to take advantage of the mirroring system in place.
Please - someone reply back volunteering to correct this ASAP.
Erik
On Oct 11,
Nils,
Your message is best directed to java-user@lucene.apache.org (please
subscribe before sending to this address).
Erik
On Sep 5, 2005, at 3:22 PM, Nils Hoeller wrote:
Hi,
I now have implemented the Top Ten Term Search I have
asked about before.
I just need to filter the stuff
This has nothing to do with the version of Ant. JUnit's JAR file
needs to be in ANT_HOME/lib for junit to work.
Erik
On Aug 28, 2005, at 9:45 PM, Fuad Efendi wrote:
Check version of ANT!
Line 173:junit printsummary=yes haltonfailure=no fork=yes
dir=${basedir}
Further on the information extraction idea, consider what the SIMILE
team at MIT are doing... http://simile.mit.edu
The lower-case semantic web is gaining a lot of momentum these days,
and I'm a strong proponent and student of it at the moment. Scraping
rich information from a site is
For grins I tried to see if I had commit access to fix the
misspelling myself. Lo and behold I do! I hope I didn't step on any
toes by committing this - if so let me know and I'll be more patient
and submit patches. I'm a newbie to Nutch and definitely don't want
to step in to
10 matches
Mail list logo