[jira] Kommentiert: (NUTCH-21) parser plugin for MS PowerPoint slides

2005-09-01 Thread Stephan Strittmatter (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-21?page=comments#action_12320763 ] Stephan Strittmatter commented on NUTCH-21: --- I will verify the Unit-Tests until next week! parser plugin for MS PowerPoint slides

How to help?

2005-09-01 Thread Dani
Hi all. I'm a Java programmer who wants to help in the development of Nutch. I've never been involved in a free software project over the Internet. Where to begin? How to help? TIA..

Re: [jira] Commented: (NUTCH-65) index-more plugin can't parse large set of modification-date

2005-09-01 Thread Jérôme Charron
it works great (see the new function bellow). But we'll have to add commons-lang (http://jakarta.apache.org/commons/lang/) to the libraries. Are there any objections? How is the procedure to add it? There's already commons-logging, in nutch libs, so I think there's no problem to add

[jira] Updated: (NUTCH-65) index-more plugin can't parse large set of modification-date

2005-09-01 Thread Michael Nebel (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-65?page=all ] Michael Nebel updated NUTCH-65: --- Attachment: MoreIndexingFilter.diff commons-lang-2.1.jar MoreIndexingFilter.java As Jerome suggested, I changed the function

Re: [jira] Commented: (NUTCH-65) index-more plugin can't parse large set of modification-date

2005-09-01 Thread Jérôme Charron
There's already commons-logging, in nutch libs, so I think there's no problem to add commons-lang. Moreover it is under Apache License, so there's no prolem. I will add it while committing your patch. No objections for adding commons-lang to the nutch lib. As it is a generic lib, I plan

nutch 0.7 bug?

2005-09-01 Thread [EMAIL PROTECTED]
Dear Developers! I tested nutch 0.7 with all the parser plugins, and found the followings: - The fetch broken by with e.g. followings: - 050901 110915

Re: Event queues vs threads

2005-09-01 Thread Doug Cutting
Kelvin Tan wrote: Each of these stages will be handled in its own thread (except for HTML parsing and scoring, which may actually benefit from having multiple threads). With the introduction of non-blocking IO, I think threads should be used only where parallel computation offers performance

Re: To mapred or not

2005-09-01 Thread Stefan Groschupf
In some cases, though, focused crawling requirements may require extra data to be stored, which is not useful for whole-web, for example, storing a url's parent and seed url and its depth (essential for crawl scopes). Sounds like meta data for a page. :) Some time ago I submit a patch to

[jira] Closed: (NUTCH-65) index-more plugin can't parse large set of modification-date

2005-09-01 Thread Jerome Charron (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-65?page=all ] Jerome Charron closed NUTCH-65: --- Resolution: Fixed Patch committed (http://svn.apache.org/viewcvs.cgi?rev=265794view=rev) index-more plugin can't parse large set of modification-date