Re: project vitality? / less documentation is more!

2006-03-07 Thread Franz Werfel
Hello, Just my 2 cents: the Intranet crawl functionnality is VERY confusing. If it was just taken out of the tutorial, and out of the set of commands, that would actually help A LOT: I understood many many things about Nutch once I tried so-called whole-web crawling, where one has to use every

Multi-applications?

2006-03-06 Thread Franz Werfel
Hello, Is it possible to have more than one Nutch application on one Nutch installation? What I would like to do would be to have several (4-5) indexes relating to independant websites, searchable independently but with just one Nutch install (ie, one Tomcat webapp). On the indexing side, this

Re: Multi-applications?

2006-03-06 Thread Franz Werfel
helpfull... Thanks, Frank On 3/6/06, Ravi Chintakunta [EMAIL PROTECTED] wrote: Hi Frank, Have a look at this thread. http://www.mail-archive.com/nutch-user@lucene.apache.org/msg03014.html - Ravi On 3/6/06, Franz Werfel [EMAIL PROTECTED] wrote: Hello, Is it possible to have more than one

No Accents

2006-02-20 Thread Franz Werfel
Hello, Sorry this is probably in the documentation somewhere, but I couldn't find it. How to index and search accented words without accents? For example: Portégé (a model for Toshiba laptops) would be indexed as portege; and the search for portégé would be equivalent to the search for portege

Re: shutdown tomcat web service

2006-02-20 Thread Franz Werfel
On Windows XP the Tomcat service is labeled Apache Tomcat so it is filed under A... On 2/17/06, Michael Ji [EMAIL PROTECTED] wrote: hi, I use cgywin in windows XP as platform for nutch, somehow, I found I couldn't kill webservice. Whenever how many time I use tomcat5/bin/catalina.sh stop,

Re: No Accents

2006-02-20 Thread Franz Werfel
Thank you, but shouldn't this be a part of the analyzer? Lucene has analyzers that do this by default, why not Nutch? Thanks, Frank. On 2/20/06, Howie Wang [EMAIL PROTECTED] wrote: I threw this code together a while ago and it seems to work for me. The performance could probably be improved,

Filter based on content

2006-01-24 Thread Franz Werfel
Hello, Is it possible not to index certain pages based on their content or on their size (and not on their url)? If so, how? Thanks, F.

Custom title for indexing

2005-11-02 Thread Franz Werfel
Hello, I am new to Nutch and would like to use a custom title for indexed html pages. It seems that by default the title used is the content of the title tag in htmlhead For the site I need to index (let's call it example.com), this title only holds the url of the site, so almost every page