subject:"Re\: Accessing crawled data"

Re: Accessing crawled data

2009-12-23 Thread Claudio Martella

Andrzej Bialecki wrote: On 2009-12-22 16:07, Claudio Martella wrote: Andrzej Bialecki wrote: On 2009-12-22 13:16, Claudio Martella wrote: Yes, I'am aware of that. The problem is that i have some fields of the SolrDocument that i want to compute by text analysis (basically i want to do some

Re: Accessing crawled data

2009-12-22 Thread Claudio Martella

From: claudio.marte...@tis.bz.it To: nutch-user@lucene.apache.org Subject: Re: Accessing crawled data Hi, actually i completely mis-explained myself. I'll try to make myself clear: i'd like to extract the information in the segments by using the parsers. This means i can basically use

Re: Accessing crawled data

2009-12-22 Thread Andrzej Bialecki

On 2009-12-22 13:16, Claudio Martella wrote: Yes, I'am aware of that. The problem is that i have some fields of the SolrDocument that i want to compute by text analysis (basically i want to do some smart keywords extraction) so i have to get in the middle between crawling and indexing! My actual

Re: Accessing crawled data

2009-12-22 Thread Claudio Martella

Andrzej Bialecki wrote: On 2009-12-22 13:16, Claudio Martella wrote: Yes, I'am aware of that. The problem is that i have some fields of the SolrDocument that i want to compute by text analysis (basically i want to do some smart keywords extraction) so i have to get in the middle between

Re: Accessing crawled data

2009-12-22 Thread Andrzej Bialecki

On 2009-12-22 16:07, Claudio Martella wrote: Andrzej Bialecki wrote: On 2009-12-22 13:16, Claudio Martella wrote: Yes, I'am aware of that. The problem is that i have some fields of the SolrDocument that i want to compute by text analysis (basically i want to do some smart keywords extraction)

RE: Accessing crawled data

2009-12-17 Thread BELLINI ADAM

/ dump_folder -nofetch -nogenerate -noparse -noparsedata -noparsetex this command will return only the content (source pages) hope it will help. Date: Thu, 17 Dec 2009 15:32:33 +0100 From: claudio.marte...@tis.bz.it To: nutch-user@lucene.apache.org Subject: Re: Accessing crawled data Hi

Re: Accessing crawled data

2009-12-16 Thread reinhard schwab

if you dont want to refetch already fetched pages, i think of 3 possibilities: a/ set a very high fetch interval b/ use a customized fetch schedule class instead of DefaultFetchSchedule implement there a method public boolean shouldFetch(Text url, CrawlDatum datum, long curTime) which returns

Re: Accessing crawled data

Re: Accessing crawled data

Re: Accessing crawled data

Re: Accessing crawled data

Re: Accessing crawled data

RE: Accessing crawled data

Re: Accessing crawled data

7 matches

Site Navigation

Mail list logo

Footer information