Hello
Would you think it is a good idea to add a proxy exception list to
org.apache.nutch.protocol.http.api.HttpBase and
org.apache.nutch.protocol.http.HttpResponse? When scanning intra- and extranets
in one go, it otherwise creates problems. E.g.
property
namehttp.proxy.exception.list/name
[
http://issues.apache.org/jira/browse/NUTCH-192?page=comments#action_12364788 ]
Stefan Groschupf commented on NUTCH-192:
That's true. In any case I don't wan't to store the class id map. Since if we
do that, you are right we can use strings.
What
[
http://issues.apache.org/jira/browse/NUTCH-192?page=comments#action_12364795 ]
Stefan Groschupf commented on NUTCH-192:
A perfect plan, I will do that so and commit a new patch. :)
THANKS!
meta data support for CrawlDatum
[ http://issues.apache.org/jira/browse/NUTCH-194?page=all ]
Andrzej Bialecki closed NUTCH-194:
---
Fix Version: 0.8-dev
Resolution: Fixed
Applied. Thanks!
Nutch-169 introduced two tiny bugs
--
Key:
Hi there,
Has anybody looked into running Nutch with Alexa? I.e. using their
data store as the source for data that you'd typically be fetching?
The fact that their APIs are Perl C based would make this
non-trivial, I imagine.
I tried searching on their documentation site for Java - kind
[
http://issues.apache.org/jira/browse/NUTCH-196?page=comments#action_12364861 ]
Andrzej Bialecki commented on NUTCH-196:
-
I don't think it's necessary for the core to use anything else than the
standard XML APIs. I specifically meant the plugins
[ http://issues.apache.org/jira/browse/NUTCH-185?page=all ]
Rida Benjelloun updated NUTCH-185:
--
Summary: XMLParser is configurable xml parser plugin. (was: XMLParser
is configurable plugin. It use XPath and namespaces to do the mapping between
+1
That would help NUTCH-87 (whitelist urlfilter). The code is all in
one plugin, but there is a utility class that needs to be called from
the command line.
On Feb 1, 2006, at 4:35 PM, Andrzej Bialecki wrote:
Hi,
I just found out that it's not possible to invoke main() methods of
+1
Am 01.02.2006 um 22:35 schrieb Andrzej Bialecki:
Hi,
I just found out that it's not possible to invoke main() methods of
plugins through the bin/nutch script. Sometimes it's useful for
testing and debugging - I can do it from within Eclipse, because I
have all plugins on the
NullPointerException in TaskRunner if application jar does not have lib
directory
---
Key: NUTCH-197
URL: http://issues.apache.org/jira/browse/NUTCH-197
Project: Nutch
Type: Bug
[ http://issues.apache.org/jira/browse/NUTCH-197?page=all ]
Doug Cutting resolved NUTCH-197:
Fix Version: 0.8-dev
Resolution: Fixed
I just committed this. Thanks, Owen!
NullPointerException in TaskRunner if application jar does not have lib
[ http://issues.apache.org/jira/browse/NUTCH-192?page=all ]
Stefan Groschupf updated NUTCH-192:
---
Attachment: metadata010206.patch
As discussed...
meta data support for CrawlDatum
Key: NUTCH-192
[
http://issues.apache.org/jira/browse/NUTCH-192?page=comments#action_12364923 ]
Doug Cutting commented on NUTCH-192:
I'm worried that this will substantially slow things.
I'd like to see some effort made to ensure that:
1. If no metadata is used,
13 matches
Mail list logo