Hello,
Just my 2 cents: the Intranet crawl functionnality is VERY confusing.
If it was just taken out of the tutorial, and out of the set of
commands, that would actually help A LOT: I understood many many
things about Nutch once I tried so-called whole-web crawling, where
one has to use every
Hello,
Is it possible to have more than one Nutch application on one Nutch
installation?
What I would like to do would be to have several (4-5) indexes
relating to independant websites, searchable independently but with
just one Nutch install (ie, one Tomcat webapp).
On the indexing side, this
helpfull...
Thanks, Frank
On 3/6/06, Ravi Chintakunta [EMAIL PROTECTED] wrote:
Hi Frank,
Have a look at this thread.
http://www.mail-archive.com/nutch-user@lucene.apache.org/msg03014.html
- Ravi
On 3/6/06, Franz Werfel [EMAIL PROTECTED] wrote:
Hello,
Is it possible to have more than one
Hello,
Sorry this is probably in the documentation somewhere, but I couldn't find it.
How to index and search accented words without accents?
For example: Portégé (a model for Toshiba laptops) would be indexed
as portege; and the search for portégé would be equivalent to the
search for portege
On Windows XP the Tomcat service is labeled Apache Tomcat so it is
filed under A...
On 2/17/06, Michael Ji [EMAIL PROTECTED] wrote:
hi,
I use cgywin in windows XP as platform for nutch,
somehow, I found I couldn't kill webservice.
Whenever how many time I use tomcat5/bin/catalina.sh
stop,
Thank you, but shouldn't this be a part of the analyzer?
Lucene has analyzers that do this by default, why not Nutch?
Thanks,
Frank.
On 2/20/06, Howie Wang [EMAIL PROTECTED] wrote:
I threw this code together a while ago and it seems to work for me.
The performance could probably be improved,
Hello,
Is it possible not to index certain pages based on their content or on
their size (and not on their url)? If so, how?
Thanks,
F.
Hello,
I am new to Nutch and would like to use a custom title for indexed html pages.
It seems that by default the title used is the content of the title
tag in htmlhead
For the site I need to index (let's call it example.com), this title
only holds the url of the site, so almost every page