[jira] Closed: (NUTCH-271) Meta-data per URL/site/section

2006-07-19 Thread Andrzej Bialecki (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-271?page=all ] Andrzej Bialecki closed NUTCH-271. --- Resolution: Fixed I'm closing this issue, because this functionality can be achieved by using a combination of CrawlDatum.metaData and url/scoring

[jira] Commented: (NUTCH-271) Meta-data per URL/site/section

2006-07-19 Thread Stefan Neufeind (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-271?page=comments#action_1246 ] Stefan Neufeind commented on NUTCH-271: --- Does somebody have an existing demo-plugin for that, that would catch URL-prefixes from a file and in case matches

Re: [jira] Commented: (NUTCH-271) Meta-data per URL/site/section

2006-07-19 Thread Sami Siren
0.8 has subcollection plugin. It can add subollection id for set of urls and then you can limit searching to subcollections. Is that what you're after? -- Sami Siren Stefan Neufeind (JIRA) wrote: [ http://issues.apache.org/jira/browse/NUTCH-271?page=comments#action_1246 ]

Re: [jira] Commented: (NUTCH-293) support for Crawl-delay in Robots.txt

2006-07-19 Thread Sami Siren
Andrzej Bialecki (JIRA) wrote: [ http://issues.apache.org/jira/browse/NUTCH-293?page=comments#action_12422244 ] Andrzej Bialecki commented on NUTCH-293: - I'm working on this patch to commit it. Just a quick note to Sami: Math.max() is

Re: [jira] Commented: (NUTCH-293) support for Crawl-delay in Robots.txt

2006-07-19 Thread Andrzej Bialecki
Sami Siren wrote: Andrzej Bialecki (JIRA) wrote: [ http://issues.apache.org/jira/browse/NUTCH-293?page=comments#action_12422244 ]Andrzej Bialecki commented on NUTCH-293: - I'm working on this patch to commit it. Just a quick note to

[jira] Updated: (NUTCH-323) CrawlDatum.set just reference a mapWritable of a other object but not copy it.

2006-07-19 Thread Stefan Groschupf (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-323?page=all ] Stefan Groschupf updated NUTCH-323: --- Attachment: MapWritableCopyConstructor.patch Attached patch add a copy constructor to the map writable and use it in the CrawlDatum.set methode. However

[jira] Closed: (NUTCH-293) support for Crawl-delay in Robots.txt

2006-07-19 Thread Andrzej Bialecki (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-293?page=all ] Andrzej Bialecki closed NUTCH-293. --- Fix Version/s: 0.8-dev Resolution: Fixed Patch applied with minor changes. Thank you! support for Crawl-delay in Robots.txt

[jira] Closed: (NUTCH-323) CrawlDatum.set just reference a mapWritable of a other object but not copy it.

2006-07-19 Thread Andrzej Bialecki (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-323?page=all ] Andrzej Bialecki closed NUTCH-323. --- Resolution: Fixed Assignee: Andrzej Bialecki Patch applied to trunk/ . This should solve some serious issues with CrawlDatum.metaData handling in

[jira] Closed: (NUTCH-321) Scoring API deficiency

2006-07-19 Thread Andrzej Bialecki (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-321?page=all ] Andrzej Bialecki closed NUTCH-321. --- Resolution: Fixed Patch applied to trunk/ . NOTE: this requires a (trivial) change in any custom scoring plugin. Most likely, to accomodate for the

[jira] Created: (NUTCH-324) db.score.link.internal and db.score.link.external are ignored

2006-07-19 Thread Stefan Groschupf (JIRA)
db.score.link.internal and db.score.link.external are ignored - Key: NUTCH-324 URL: http://issues.apache.org/jira/browse/NUTCH-324 Project: Nutch Issue Type: Improvement

[jira] Updated: (NUTCH-324) db.score.link.internal and db.score.link.external are ignored

2006-07-19 Thread Stefan Groschupf (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-324?page=all ] Stefan Groschupf updated NUTCH-324: --- Attachment: InternalAndExternalLinkScoreFactor.patch Multiply the score of a page during distributeScoreToOutlink with db.score.link.internal or

[jira] Resolved: (NUTCH-319) OPICScoringFilter should use logging API instead of printStackTrace

2006-07-19 Thread Stefan Groschupf (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-319?page=all ] Stefan Groschupf resolved NUTCH-319. Resolution: Won't Fix Sorry, that is bogus since it is wriiten to the logging stream. OPICScoringFilter should use logging API instead of