Hi,
I'm using nutch to crawl different intranet sites. The idea is to use
the craw-urlfilter to tell the crawler to stay inside the seeded
domain. I don't want it to follow links all around my intranet and crawl
the same sites twice. This ideally means i'd have to rewrite the
nutch-site.xml each
hey!
Were you able to nail this? Can you share your findings / code?
Best,
Animesh
David Jashi wrote:
By the way, Otis, and what should one do to make found words highlight in
search results?
If the found word is not in the form that search criteria is, its not
highlighted.
On Wed,
howie - can you please send me the code for the same?
Howie Wang wrote:
Any tips on how to stem the search query text?
Take a look at
org.apache.nutch.searcher.Query
Thanks for the tips. I think I have something that works
for me. I ended up creating a QueryFilter called
Susam Pal wrote:
Hi,
Indeed the answer is negative and also, many people have often asked
this in this list. Martin has very nicely explained the problems and
possible solution. I'll just add what I have thought of. I have often
wondered what it would take to create a nice configurable
On 2010-03-10 19:26, conficio wrote:
Susam Pal wrote:
Hi,
Indeed the answer is negative and also, many people have often asked
this in this list. Martin has very nicely explained the problems and
possible solution. I'll just add what I have thought of. I have often
wondered what it would
i read lotoff post regarding redirected urls but didnt find a sollution !
From: mbel...@msn.com
To: nutch-user@lucene.apache.org; mille...@gmail.com
Subject: RE: Content of redirected urls empty
Date: Tue, 9 Mar 2010 16:59:05 +
hi,
i dont know if you did find few minutes to
Is this an appropriate place to ask what hardware and OS people are running?
If not, sorry for the spam. :)
Right now I am experimenting with three Intel Atom 330 based computers
running Fedora Core.
Jesse
int GetRandomNumber()
{
return 4; // Chosen by fair roll of dice
//