Setup a url filter for any *.doc and install and use the word parser,
that is all you need to do...
Am 28.03.2005 um 07:12 schrieb Eric Money:
Hi all,
If I wanna search a site but only interested in the
files with .doc suffix, how should I re-write nutch to
get all these files? Any comments and experiences
are appreciated, thanks all in advance.
-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real
users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general
---------------------------------------------------------------
company: http://www.media-style.com
forum: http://www.text-mining.org
blog: http://www.find23.net