date:20100310

use different confs for different crawls

2010-03-10 Thread Claudio Martella

Hi, I'm using nutch to crawl different intranet sites. The idea is to use the craw-urlfilter to tell the crawler to stay inside the seeded domain. I don't want it to follow links all around my intranet and crawl the same sites twice. This ideally means i'd have to rewrite the nutch-site.xml each

Re: Stemming issues

2010-03-10 Thread kanimesh

hey! Were you able to nail this? Can you share your findings / code? Best, Animesh David Jashi wrote: By the way, Otis, and what should one do to make found words highlight in search results? If the found word is not in the form that search criteria is, its not highlighted. On Wed,

Re: Stemming in Nutch

2010-03-10 Thread kanimesh

howie - can you please send me the code for the same? Howie Wang wrote: Any tips on how to stem the search query text? Take a look at org.apache.nutch.searcher.Query Thanks for the tips. I think I have something that works for me. I ended up creating a QueryFilter called

Re: form-based authentication? Any progress

2010-03-10 Thread conficio

Susam Pal wrote: Hi, Indeed the answer is negative and also, many people have often asked this in this list. Martin has very nicely explained the problems and possible solution. I'll just add what I have thought of. I have often wondered what it would take to create a nice configurable

Re: form-based authentication? Any progress

2010-03-10 Thread Andrzej Bialecki

On 2010-03-10 19:26, conficio wrote: Susam Pal wrote: Hi, Indeed the answer is negative and also, many people have often asked this in this list. Martin has very nicely explained the problems and possible solution. I'll just add what I have thought of. I have often wondered what it would

RE: Content of redirected urls empty

2010-03-10 Thread BELLINI ADAM

i read lotoff post regarding redirected urls but didnt find a sollution ! From: mbel...@msn.com To: nutch-user@lucene.apache.org; mille...@gmail.com Subject: RE: Content of redirected urls empty Date: Tue, 9 Mar 2010 16:59:05 + hi, i dont know if you did find few minutes to

hardware questions?

2010-03-10 Thread Jesse Hires

Is this an appropriate place to ask what hardware and OS people are running? If not, sorry for the spam. :) Right now I am experimenting with three Intel Atom 330 based computers running Fedora Core. Jesse int GetRandomNumber() { return 4; // Chosen by fair roll of dice //

use different confs for different crawls

Re: Stemming issues

Re: Stemming in Nutch

Re: form-based authentication? Any progress

Re: form-based authentication? Any progress

RE: Content of redirected urls empty

hardware questions?

7 matches

Site Navigation

Mail list logo

Footer information