Proxy Exceptions

2006-02-01 Thread Guenter, Matthias
Hello Would you think it is a good idea to add a proxy exception list to org.apache.nutch.protocol.http.api.HttpBase and org.apache.nutch.protocol.http.HttpResponse? When scanning intra- and extranets in one go, it otherwise creates problems. E.g. property namehttp.proxy.exception.list/name

[jira] Commented: (NUTCH-192) meta data support for CrawlDatum

2006-02-01 Thread Stefan Groschupf (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-192?page=comments#action_12364788 ] Stefan Groschupf commented on NUTCH-192: That's true. In any case I don't wan't to store the class id map. Since if we do that, you are right we can use strings. What

[jira] Commented: (NUTCH-192) meta data support for CrawlDatum

2006-02-01 Thread Stefan Groschupf (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-192?page=comments#action_12364795 ] Stefan Groschupf commented on NUTCH-192: A perfect plan, I will do that so and commit a new patch. :) THANKS! meta data support for CrawlDatum

[jira] Closed: (NUTCH-194) Nutch-169 introduced two tiny bugs

2006-02-01 Thread Andrzej Bialecki (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-194?page=all ] Andrzej Bialecki closed NUTCH-194: --- Fix Version: 0.8-dev Resolution: Fixed Applied. Thanks! Nutch-169 introduced two tiny bugs -- Key:

Integrating Nutch w/Alexa

2006-02-01 Thread Ken Krugler
Hi there, Has anybody looked into running Nutch with Alexa? I.e. using their data store as the source for data that you'd typically be fetching? The fact that their APIs are Perl C based would make this non-trivial, I imagine. I tried searching on their documentation site for Java - kind

[jira] Commented: (NUTCH-196) lib-xml and lib-log4j plugins

2006-02-01 Thread Andrzej Bialecki (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-196?page=comments#action_12364861 ] Andrzej Bialecki commented on NUTCH-196: - I don't think it's necessary for the core to use anything else than the standard XML APIs. I specifically meant the plugins

[jira] Updated: (NUTCH-185) XMLParser is configurable xml parser plugin.

2006-02-01 Thread Rida Benjelloun (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-185?page=all ] Rida Benjelloun updated NUTCH-185: -- Summary: XMLParser is configurable xml parser plugin. (was: XMLParser is configurable plugin. It use XPath and namespaces to do the mapping between

Re: Cmd line for running plugins

2006-02-01 Thread Matt Kangas
+1 That would help NUTCH-87 (whitelist urlfilter). The code is all in one plugin, but there is a utility class that needs to be called from the command line. On Feb 1, 2006, at 4:35 PM, Andrzej Bialecki wrote: Hi, I just found out that it's not possible to invoke main() methods of

Re: Cmd line for running plugins

2006-02-01 Thread Stefan Groschupf
+1 Am 01.02.2006 um 22:35 schrieb Andrzej Bialecki: Hi, I just found out that it's not possible to invoke main() methods of plugins through the bin/nutch script. Sometimes it's useful for testing and debugging - I can do it from within Eclipse, because I have all plugins on the

[jira] Created: (NUTCH-197) NullPointerException in TaskRunner if application jar does not have lib directory

2006-02-01 Thread Owen O'Malley (JIRA)
NullPointerException in TaskRunner if application jar does not have lib directory --- Key: NUTCH-197 URL: http://issues.apache.org/jira/browse/NUTCH-197 Project: Nutch Type: Bug

[jira] Resolved: (NUTCH-197) NullPointerException in TaskRunner if application jar does not have lib directory

2006-02-01 Thread Doug Cutting (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-197?page=all ] Doug Cutting resolved NUTCH-197: Fix Version: 0.8-dev Resolution: Fixed I just committed this. Thanks, Owen! NullPointerException in TaskRunner if application jar does not have lib

[jira] Updated: (NUTCH-192) meta data support for CrawlDatum

2006-02-01 Thread Stefan Groschupf (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-192?page=all ] Stefan Groschupf updated NUTCH-192: --- Attachment: metadata010206.patch As discussed... meta data support for CrawlDatum Key: NUTCH-192

[jira] Commented: (NUTCH-192) meta data support for CrawlDatum

2006-02-01 Thread Doug Cutting (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-192?page=comments#action_12364923 ] Doug Cutting commented on NUTCH-192: I'm worried that this will substantially slow things. I'd like to see some effort made to ensure that: 1. If no metadata is used,