Thanks, I looked in the nutch-defalut.xml and find the following property: <property> <name>plugin.folder</name> <value>plugins</value> <description>A Directory where nutch plugin are located</description> </property>
which is the only thing related with plugins, but I did not find the "parse-(text|html)" value. Also, should I include the following property: <property> <name>urlfilter.regex.file</name> <value>regex-urlfilter.txt</value> <description>Name of file on CLASSPATH containing default regular expressions used by RegexURLFilter.</description> </property> Thanks for your advice. On Mon, 28 Mar 2005 11:56:00 -0800 (PST), thomas delnoij <[EMAIL PROTECTED]> wrote: > Hi, > > I am new to Nutch as well, so please correct me if I > am wrong. > > > Thanks. Could you please be more specific, how to > > setup the url filter? > > The url filter should be set up in the > regex-urlfilter.txt file. As far as I can tell, urls > ending with the .doc extension are included. > > The word parser is installed by updating the > nutch-site.xml file. You need to copy the entries from > nutch-default.xml that you like to change. > > In your case, I think you need to copy the > plugin.includes property, and change parse-(text|html) > to parse-(text|html|msword). > > Hope this helps. > > Rgrds, > > Thomas > > > > something like http://mysite.doc? But how can I get > > all doc files at mysite > > if the doc is at http://mysite/1/2/~user/a.doc. > > > > Is there any reference for word parser? I don't know > > how to use it, thank you. > > > > > > On Mon, 28 Mar 2005 14:59:57 +0200, Stefan Groschupf > > <[EMAIL PROTECTED]> wrote: > > > Setup a url filter for any *.doc and install and > > use the word parser, > > > that is all you need to do... > > > > > > Am 28.03.2005 um 07:12 schrieb Eric Money: > > > > > > > Hi all, > > > > > > > > If I wanna search a site but only interested in > > the > > > > files with .doc suffix, how should I re-write > > nutch to > > > > get all these files? Any comments and > > experiences > > > > are appreciated, thanks all in advance. > > > > > > > > > > > > > > > ------------------------------------------------------- > > > > SF email is sponsored by - The IT Product Guide > > > > Read honest & candid reviews on hundreds of IT > > Products from real > > > > users. > > > > Discover which products truly live up to the > > hype. Start reading now. > > > > > > > http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click > > > > _______________________________________________ > > > > Nutch-general mailing list > > > > [email protected] > > > > > > > https://lists.sourceforge.net/lists/listinfo/nutch-general > > > > > > > > > > > > > > --------------------------------------------------------------- > > > company: http://www.media-style.com > > > forum: http://www.text-mining.org > > > blog: http://www.find23.net > > > > > > > > >
