Do you wish to test the open office content extractor plugin ? It add >
180 new file formats to nutch but wasn't tested in large yet.
Stefan
Am 18.05.2004 um 21:32 schrieb Byron Miller:
I would be able to test this next week. I do about 10
million pages day.
--- [EMAIL PROTECTED] wrote:
Forgot one thing:
Has anyone run crawler with the plugin on, fetching
substantial
amount of urls, say 500,000? How does it perform?
-------------------------------------------------------
This SF.Net email is sponsored by: SourceForge.net Broadband
Sign-up now for SourceForge Broadband and get the fastest
6.0/768 connection for only $19.95/mo for the first 3 months!
http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers
---------------------------------------------------------------
open technology: http://www.media-style.com
open source: http://www.weta-group.net
open discussion: http://www.text-mining.org
-------------------------------------------------------
This SF.Net email is sponsored by: SourceForge.net Broadband
Sign-up now for SourceForge Broadband and get the fastest
6.0/768 connection for only $19.95/mo for the first 3 months!
http://ads.osdn.com/?ad_id=2562&alloc_id=6184&op=click
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers