[
http://issues.apache.org/jira/browse/NUTCH-21?page=comments#action_12320763 ]
Stephan Strittmatter commented on NUTCH-21:
---
I will verify the Unit-Tests until next week!
parser plugin for MS PowerPoint slides
Hi all.
I'm a Java programmer who wants to help in the development of Nutch.
I've never been involved in a free software project over the Internet.
Where to begin? How to help?
TIA..
it works great (see the new function bellow). But we'll have to add
commons-lang (http://jakarta.apache.org/commons/lang/) to the libraries.
Are there any objections? How is the procedure to add it?
There's already commons-logging, in nutch libs, so I think there's no
problem to add
[ http://issues.apache.org/jira/browse/NUTCH-65?page=all ]
Michael Nebel updated NUTCH-65:
---
Attachment: MoreIndexingFilter.diff
commons-lang-2.1.jar
MoreIndexingFilter.java
As Jerome suggested, I changed the function
There's already commons-logging, in nutch libs, so I think there's no
problem to add commons-lang.
Moreover it is under Apache License, so there's no prolem.
I will add it while committing your patch.
No objections for adding commons-lang to the nutch lib.
As it is a generic lib, I plan
Dear Developers!
I tested nutch 0.7 with all the parser plugins, and found the followings:
-
The fetch broken by with e.g. followings:
-
050901 110915
Kelvin Tan wrote:
Each of these stages will be handled in its own thread (except for HTML parsing
and scoring, which may actually benefit from having multiple threads). With the
introduction of non-blocking IO, I think threads should be used only where
parallel computation offers performance
In some cases, though, focused crawling requirements may require
extra data to be stored, which is not useful for whole-web, for
example, storing a url's parent and seed url and its depth
(essential for crawl scopes).
Sounds like meta data for a page. :)
Some time ago I submit a patch to
[ http://issues.apache.org/jira/browse/NUTCH-65?page=all ]
Jerome Charron closed NUTCH-65:
---
Resolution: Fixed
Patch committed (http://svn.apache.org/viewcvs.cgi?rev=265794view=rev)
index-more plugin can't parse large set of modification-date