[ http://issues.apache.org/jira/browse/NUTCH-319?page=all ]
Stefan Groschupf resolved NUTCH-319.
Resolution: Won't Fix
Sorry, that is bogus since it is wriiten to the logging stream.
> OPICScoringFilter should use logging API instead of printStackTr
[ http://issues.apache.org/jira/browse/NUTCH-324?page=all ]
Stefan Groschupf updated NUTCH-324:
---
Attachment: InternalAndExternalLinkScoreFactor.patch
Multiply the score of a page during distributeScoreToOutlink with
db.score.link.internal or db.score.
db.score.link.internal and db.score.link.external are ignored
-
Key: NUTCH-324
URL: http://issues.apache.org/jira/browse/NUTCH-324
Project: Nutch
Issue Type: Improvement
C
Hello,
I am starting a new dotcom and am looking for a developer to design a
website and a webcrawler to add content to the site. I have a list of
requirements, if you or anyone you know is interested please e-mail me
directly at [EMAIL PROTECTED]
Thank you,
BRIAN
Brian M.B. K
[ http://issues.apache.org/jira/browse/NUTCH-321?page=all ]
Andrzej Bialecki closed NUTCH-321.
---
Resolution: Fixed
Patch applied to trunk/ .
NOTE: this requires a (trivial) change in any custom scoring plugin. Most
likely, to accomodate for the futur
[ http://issues.apache.org/jira/browse/NUTCH-323?page=all ]
Andrzej Bialecki closed NUTCH-323.
---
Resolution: Fixed
Assignee: Andrzej Bialecki
Patch applied to trunk/ . This should solve some serious issues with
CrawlDatum.metaData handling in C
[ http://issues.apache.org/jira/browse/NUTCH-293?page=all ]
Andrzej Bialecki closed NUTCH-293.
---
Fix Version/s: 0.8-dev
Resolution: Fixed
Patch applied with minor changes. Thank you!
> support for Crawl-delay in Robots.txt
> ---
[ http://issues.apache.org/jira/browse/NUTCH-323?page=all ]
Stefan Groschupf updated NUTCH-323:
---
Attachment: MapWritableCopyConstructor.patch
Attached patch add a copy constructor to the map writable and use it in the
CrawlDatum.set methode. However
CrawlDatum.set just reference a mapWritable of a other object but not copy it.
--
Key: NUTCH-323
URL: http://issues.apache.org/jira/browse/NUTCH-323
Project: Nutch
I
Sami Siren wrote:
Andrzej Bialecki (JIRA) wrote:
[
http://issues.apache.org/jira/browse/NUTCH-293?page=comments#action_12422244
]Andrzej Bialecki commented on NUTCH-293:
-
I'm working on this patch to commit it. Just a quick note to Sam
Andrzej Bialecki (JIRA) wrote:
[ http://issues.apache.org/jira/browse/NUTCH-293?page=comments#action_12422244 ]
Andrzej Bialecki commented on NUTCH-293:
-
I'm working on this patch to commit it. Just a quick note to Sami: Math.max()
is n
[
http://issues.apache.org/jira/browse/NUTCH-293?page=comments#action_12422244 ]
Andrzej Bialecki commented on NUTCH-293:
-
I'm working on this patch to commit it. Just a quick note to Sami: Math.max()
is not optimal, because it always p
0.8 has subcollection plugin. It can add subollection id for set of urls
and then you can limit searching to subcollections. Is that what you're
after?
--
Sami Siren
Stefan Neufeind (JIRA) wrote:
[ http://issues.apache.org/jira/browse/NUTCH-271?page=comments#action_1246 ]
[
http://issues.apache.org/jira/browse/NUTCH-271?page=comments#action_1246 ]
Stefan Neufeind commented on NUTCH-271:
---
Does somebody have an existing demo-plugin for that, that would catch
URL-prefixes from a file and in case matches ar
[ http://issues.apache.org/jira/browse/NUTCH-271?page=all ]
Andrzej Bialecki closed NUTCH-271.
---
Resolution: Fixed
I'm closing this issue, because this functionality can be achieved by using a
combination of CrawlDatum.metaData and url/scoring filters
[ http://issues.apache.org/jira/browse/NUTCH-173?page=all ]
Andrzej Bialecki closed NUTCH-173.
---
Resolution: Fixed
Patch applied to trunk/ . Thank you!
> PerHost Crawling Policy ( crawl.ignore.external.links )
> ---
The reccomended plugin example has a compile error due to a missing
library the error is:
compile:
[echo] Compiling plugin: recommended
[javac] Compiling 3 source files to
/usr/local/nutch/build/recommended/classes
[javac]
/usr/local/nutch/src/plugin/recommended/src/java/org/apache/n
Fetcher discards ProtocolStatus, doesn't store redirected pages
---
Key: NUTCH-322
URL: http://issues.apache.org/jira/browse/NUTCH-322
Project: Nutch
Issue Type: Bug
Compo
18 matches
Mail list logo