Re: newbie questions

2009-12-01 Thread Mischa Tuffield
Hello Brian, Getting a response from another newbie here, so I could be wrong (do excuse if I am). If you are attempting to run a search index from the filesystem you need to have the following in your nutch-site.xml : property namefs.default.name/name valuefile:value

Re: odd warnings

2009-12-01 Thread Andrzej Bialecki
Jesse Hires wrote: What is segments.gen and segments_2 ? The warning I am getting happens when I dedup two indexes. I create index1 and index2 through generate/fetch/index/...etc index1 is an index of 1/2 the segments. index2 is an index of the other 1/2 The warning is happening on both

RE: recrawl.sh stopped at depth 7/10 without error

2009-12-01 Thread BELLINI ADAM
hi, anay idea guys ?? thanx From: mbel...@msn.com To: nutch-user@lucene.apache.org Subject: RE: recrawl.sh stopped at depth 7/10 without error Date: Fri, 27 Nov 2009 20:11:12 + hi, this is the main loop of my recrawl.sh do echo --- Beginning crawl at depth `expr

using lucene and nutch in searches with OR operator

2009-12-01 Thread julianum
Hello everyone, the application I'm developing I use nutch normal in the polls by the AND operator and using the Lucene (for lack of support Nuth) for research with the OR operator. However, the survey takes about 4 times more than the equivalent in nutch search. That's why I'm crawling the field

NYC Search Discovery Meetup

2009-12-01 Thread Otis Gospodnetic
Hello, For those living in or near NYC, you may be interested in joining (and/or presenting?) at the NYC Search Discovery Meetup. Topics are: search, machine learning, data mining, NLP, information gathering, information extraction, etc. http://www.meetup.com/NYC-Search-and-Discovery/ Our

crawl dates with fetch interval 0

2009-12-01 Thread reinhard schwab
i'm observing crawl dates, which have fetch interval with value 0. when i dump the segment, i see Recno:: 33 URL:: http://www.wachauclimbing.net/home/impressum-disclaimer/comment-page-1/ CrawlDatum:: Version: 7 Status: 65 (signature) Fetch time: Tue Dec 01 23:41:15 CET 2009 Modified time: Thu