[jira] Resolved: (NUTCH-319) OPICScoringFilter should use logging API instead of printStackTrace

2006-07-19 Thread Stefan Groschupf (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-319?page=all ] Stefan Groschupf resolved NUTCH-319. Resolution: Won't Fix Sorry, that is bogus since it is wriiten to the logging stream. > OPICScoringFilter should use logging API instead of printStackTr

[jira] Updated: (NUTCH-324) db.score.link.internal and db.score.link.external are ignored

2006-07-19 Thread Stefan Groschupf (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-324?page=all ] Stefan Groschupf updated NUTCH-324: --- Attachment: InternalAndExternalLinkScoreFactor.patch Multiply the score of a page during distributeScoreToOutlink with db.score.link.internal or db.score.

[jira] Created: (NUTCH-324) db.score.link.internal and db.score.link.external are ignored

2006-07-19 Thread Stefan Groschupf (JIRA)
db.score.link.internal and db.score.link.external are ignored - Key: NUTCH-324 URL: http://issues.apache.org/jira/browse/NUTCH-324 Project: Nutch Issue Type: Improvement C

Webcrawler

2006-07-19 Thread Brian M.B. Keaney
Hello, I am starting a new dotcom and am looking for a developer to design a website and a webcrawler to add content to the site. I have a list of requirements, if you or anyone you know is interested please e-mail me directly at [EMAIL PROTECTED] Thank you, BRIAN Brian M.B. K

[jira] Closed: (NUTCH-321) Scoring API deficiency

2006-07-19 Thread Andrzej Bialecki (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-321?page=all ] Andrzej Bialecki closed NUTCH-321. --- Resolution: Fixed Patch applied to trunk/ . NOTE: this requires a (trivial) change in any custom scoring plugin. Most likely, to accomodate for the futur

[jira] Closed: (NUTCH-323) CrawlDatum.set just reference a mapWritable of a other object but not copy it.

2006-07-19 Thread Andrzej Bialecki (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-323?page=all ] Andrzej Bialecki closed NUTCH-323. --- Resolution: Fixed Assignee: Andrzej Bialecki Patch applied to trunk/ . This should solve some serious issues with CrawlDatum.metaData handling in C

[jira] Closed: (NUTCH-293) support for Crawl-delay in Robots.txt

2006-07-19 Thread Andrzej Bialecki (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-293?page=all ] Andrzej Bialecki closed NUTCH-293. --- Fix Version/s: 0.8-dev Resolution: Fixed Patch applied with minor changes. Thank you! > support for Crawl-delay in Robots.txt > ---

[jira] Updated: (NUTCH-323) CrawlDatum.set just reference a mapWritable of a other object but not copy it.

2006-07-19 Thread Stefan Groschupf (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-323?page=all ] Stefan Groschupf updated NUTCH-323: --- Attachment: MapWritableCopyConstructor.patch Attached patch add a copy constructor to the map writable and use it in the CrawlDatum.set methode. However

[jira] Created: (NUTCH-323) CrawlDatum.set just reference a mapWritable of a other object but not copy it.

2006-07-19 Thread Stefan Groschupf (JIRA)
CrawlDatum.set just reference a mapWritable of a other object but not copy it. -- Key: NUTCH-323 URL: http://issues.apache.org/jira/browse/NUTCH-323 Project: Nutch I

Re: [jira] Commented: (NUTCH-293) support for Crawl-delay in Robots.txt

2006-07-19 Thread Andrzej Bialecki
Sami Siren wrote: Andrzej Bialecki (JIRA) wrote: [ http://issues.apache.org/jira/browse/NUTCH-293?page=comments#action_12422244 ]Andrzej Bialecki commented on NUTCH-293: - I'm working on this patch to commit it. Just a quick note to Sam

Re: [jira] Commented: (NUTCH-293) support for Crawl-delay in Robots.txt

2006-07-19 Thread Sami Siren
Andrzej Bialecki (JIRA) wrote: [ http://issues.apache.org/jira/browse/NUTCH-293?page=comments#action_12422244 ] Andrzej Bialecki commented on NUTCH-293: - I'm working on this patch to commit it. Just a quick note to Sami: Math.max() is n

[jira] Commented: (NUTCH-293) support for Crawl-delay in Robots.txt

2006-07-19 Thread Andrzej Bialecki (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-293?page=comments#action_12422244 ] Andrzej Bialecki commented on NUTCH-293: - I'm working on this patch to commit it. Just a quick note to Sami: Math.max() is not optimal, because it always p

Re: [jira] Commented: (NUTCH-271) Meta-data per URL/site/section

2006-07-19 Thread Sami Siren
0.8 has subcollection plugin. It can add subollection id for set of urls and then you can limit searching to subcollections. Is that what you're after? -- Sami Siren Stefan Neufeind (JIRA) wrote: [ http://issues.apache.org/jira/browse/NUTCH-271?page=comments#action_1246 ]

[jira] Commented: (NUTCH-271) Meta-data per URL/site/section

2006-07-19 Thread Stefan Neufeind (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-271?page=comments#action_1246 ] Stefan Neufeind commented on NUTCH-271: --- Does somebody have an existing demo-plugin for that, that would catch URL-prefixes from a file and in case matches ar

[jira] Closed: (NUTCH-271) Meta-data per URL/site/section

2006-07-19 Thread Andrzej Bialecki (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-271?page=all ] Andrzej Bialecki closed NUTCH-271. --- Resolution: Fixed I'm closing this issue, because this functionality can be achieved by using a combination of CrawlDatum.metaData and url/scoring filters

[jira] Closed: (NUTCH-173) PerHost Crawling Policy ( crawl.ignore.external.links )

2006-07-19 Thread Andrzej Bialecki (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-173?page=all ] Andrzej Bialecki closed NUTCH-173. --- Resolution: Fixed Patch applied to trunk/ . Thank you! > PerHost Crawling Policy ( crawl.ignore.external.links ) > ---

error in recommended plugin example

2006-07-19 Thread Chris Stephens
The reccomended plugin example has a compile error due to a missing library the error is: compile: [echo] Compiling plugin: recommended [javac] Compiling 3 source files to /usr/local/nutch/build/recommended/classes [javac] /usr/local/nutch/src/plugin/recommended/src/java/org/apache/n

[jira] Created: (NUTCH-322) Fetcher discards ProtocolStatus, doesn't store redirected pages

2006-07-19 Thread Andrzej Bialecki (JIRA)
Fetcher discards ProtocolStatus, doesn't store redirected pages --- Key: NUTCH-322 URL: http://issues.apache.org/jira/browse/NUTCH-322 Project: Nutch Issue Type: Bug Compo