[ http://issues.apache.org/jira/browse/NUTCH-271?page=all ]
Andrzej Bialecki closed NUTCH-271.
---
Resolution: Fixed
I'm closing this issue, because this functionality can be achieved by using a
combination of CrawlDatum.metaData and url/scoring
[
http://issues.apache.org/jira/browse/NUTCH-271?page=comments#action_1246 ]
Stefan Neufeind commented on NUTCH-271:
---
Does somebody have an existing demo-plugin for that, that would catch
URL-prefixes from a file and in case matches
0.8 has subcollection plugin. It can add subollection id for set of urls
and then you can limit searching to subcollections. Is that what you're
after?
--
Sami Siren
Stefan Neufeind (JIRA) wrote:
[ http://issues.apache.org/jira/browse/NUTCH-271?page=comments#action_1246 ]
Andrzej Bialecki (JIRA) wrote:
[ http://issues.apache.org/jira/browse/NUTCH-293?page=comments#action_12422244 ]
Andrzej Bialecki commented on NUTCH-293:
-
I'm working on this patch to commit it. Just a quick note to Sami: Math.max()
is
Sami Siren wrote:
Andrzej Bialecki (JIRA) wrote:
[
http://issues.apache.org/jira/browse/NUTCH-293?page=comments#action_12422244
]Andrzej Bialecki commented on NUTCH-293:
-
I'm working on this patch to commit it. Just a quick note to
[ http://issues.apache.org/jira/browse/NUTCH-323?page=all ]
Stefan Groschupf updated NUTCH-323:
---
Attachment: MapWritableCopyConstructor.patch
Attached patch add a copy constructor to the map writable and use it in the
CrawlDatum.set methode. However
[ http://issues.apache.org/jira/browse/NUTCH-293?page=all ]
Andrzej Bialecki closed NUTCH-293.
---
Fix Version/s: 0.8-dev
Resolution: Fixed
Patch applied with minor changes. Thank you!
support for Crawl-delay in Robots.txt
[ http://issues.apache.org/jira/browse/NUTCH-323?page=all ]
Andrzej Bialecki closed NUTCH-323.
---
Resolution: Fixed
Assignee: Andrzej Bialecki
Patch applied to trunk/ . This should solve some serious issues with
CrawlDatum.metaData handling in
[ http://issues.apache.org/jira/browse/NUTCH-321?page=all ]
Andrzej Bialecki closed NUTCH-321.
---
Resolution: Fixed
Patch applied to trunk/ .
NOTE: this requires a (trivial) change in any custom scoring plugin. Most
likely, to accomodate for the
db.score.link.internal and db.score.link.external are ignored
-
Key: NUTCH-324
URL: http://issues.apache.org/jira/browse/NUTCH-324
Project: Nutch
Issue Type: Improvement
[ http://issues.apache.org/jira/browse/NUTCH-324?page=all ]
Stefan Groschupf updated NUTCH-324:
---
Attachment: InternalAndExternalLinkScoreFactor.patch
Multiply the score of a page during distributeScoreToOutlink with
db.score.link.internal or
[ http://issues.apache.org/jira/browse/NUTCH-319?page=all ]
Stefan Groschupf resolved NUTCH-319.
Resolution: Won't Fix
Sorry, that is bogus since it is wriiten to the logging stream.
OPICScoringFilter should use logging API instead of
12 matches
Mail list logo