Re: Targeting Specific Links

2009-10-23 Thread Andrzej Bialecki
Eric Osgood wrote: Andrzej, Based on what you suggested below, I have begun to write my own scoring plugin: Great! in distributeScoreToOutlinks() if the link contains the string im looking for, I set its score to kept_score and add a flag to the metaData in parseData (KEEP, true). How

Re: Targeting Specific Links

2009-10-22 Thread Eric Osgood
Andrzej, Based on what you suggested below, I have begun to write my own scoring plugin: in distributeScoreToOutlinks() if the link contains the string im looking for, I set its score to kept_score and add a flag to the metaData in parseData (KEEP, true). How do I check for this flag

Re: Targeting Specific Links

2009-10-22 Thread Eric Osgood
Also, In the scoring-links plugin, I set the return value for ScoringFilter.generatorSortValue() to Float.MinValue for all urls and it still fetched everything - maybe Float.MinValue isn't the correct value to set so a link never gets fetched? Thanks, Eric On Oct 22, 2009, at 1:10 PM,

Re: Targeting Specific Links

2009-10-07 Thread Andrzej Bialecki
Eric Osgood wrote: Andrzej, How would I check for a flag during fetch? You would check for a flag during generation - please check ScoringFilter.generatorSortValue(), that's where you can check for a flag and set the sort value to Float.MIN_VALUE - this way the link will never be selected

Targeting Specific Links

2009-10-06 Thread Eric Osgood
Is there a way to inspect the list of links that nutch finds per page and then at that point choose which links I want to include / exclude? that is the ideal remedy to my problem. Eric Osgood - Cal Poly - Computer Engineering Moon Valley Software

Re: Targeting Specific Links

2009-10-06 Thread Andrzej Bialecki
Eric Osgood wrote: Is there a way to inspect the list of links that nutch finds per page and then at that point choose which links I want to include / exclude? that is the ideal remedy to my problem. Yes, look at ParseOutputFormat, you can make this decision there. There are two standard

Re: Targeting Specific Links

2009-10-06 Thread Eric Osgood
Andrzej, How would I check for a flag during fetch? Maybe this explanation can shed some light: Ideally, I would like to check the list of links for each page, but still needing a total of X links per page, if I find the links I want, I add them to the list up until X, if I don' reach X, I

Targeting Specific Links for Crawling

2009-10-05 Thread Eric
Does anyone know if it possible to target only certain links for crawling dynamically during a crawl? My goal would be to write a plugin for this functionality but I don't know where to start. Thanks, EO

Re: Targeting Specific Links for Crawling

2009-10-05 Thread Andrzej Bialecki
Eric wrote: Does anyone know if it possible to target only certain links for crawling dynamically during a crawl? My goal would be to write a plugin for this functionality but I don't know where to start. URLFilter plugins may be what you want. -- Best regards, Andrzej Bialecki ___.

RE: Targeting Specific Links for Crawling

2009-10-05 Thread BELLINI ADAM
Specific Links for Crawling Eric wrote: Does anyone know if it possible to target only certain links for crawling dynamically during a crawl? My goal would be to write a plugin for this functionality but I don't know where to start. URLFilter plugins may be what you want. -- Best

Re: Targeting Specific Links for Crawling

2009-10-05 Thread Eric
@lucene.apache.org Subject: Re: Targeting Specific Links for Crawling Eric wrote: Does anyone know if it possible to target only certain links for crawling dynamically during a crawl? My goal would be to write a plugin for this functionality but I don't know where to start. URLFilter plugins

RE: Targeting Specific Links for Crawling

2009-10-05 Thread BELLINI ADAM
can just set a regular expression to accept only those kind of links Date: Mon, 5 Oct 2009 21:39:52 +0200 From: a...@getopt.org To: nutch-user@lucene.apache.org Subject: Re: Targeting Specific Links for Crawling Eric wrote: Does anyone know if it possible to target only certain