Thanks, I looked in the nutch-defalut.xml and find the following property:

<property>
  <name>plugin.folder</name>
  <value>plugins</value>
  <description>A Directory where nutch plugin are located</description>
</property>

which is the only thing related with plugins, but I did not find the 
"parse-(text|html)" value.

Also, should I include the following property:

<property>
  <name>urlfilter.regex.file</name>
  <value>regex-urlfilter.txt</value>
  <description>Name of file on CLASSPATH containing default regular
  expressions used by RegexURLFilter.</description>
</property>


Thanks for your advice.


On Mon, 28 Mar 2005 11:56:00 -0800 (PST), thomas delnoij
<[EMAIL PROTECTED]> wrote:
> Hi,
> 
> I am new to Nutch as well, so please correct me if I
> am wrong.
> 
> > Thanks. Could you please be more specific, how to
> > setup the url filter?
> 
> The url filter should be set up in the
> regex-urlfilter.txt file. As far as I can tell, urls
> ending with the .doc extension are included.
> 
> The word parser is installed by updating the
> nutch-site.xml file. You need to copy the entries from
>  nutch-default.xml that you like to change.
> 
> In your case, I think you need to copy the
> plugin.includes property, and change parse-(text|html)
> to parse-(text|html|msword).
> 
> Hope this helps.
> 
> Rgrds,
> 
> Thomas
> 
> 
> > something like http://mysite.doc? But how can I get
> > all doc files at mysite
> > if the doc is at http://mysite/1/2/~user/a.doc.
> >
> > Is there any reference for word parser? I don't know
> > how to use it, thank you.
> >
> >
> > On Mon, 28 Mar 2005 14:59:57 +0200, Stefan Groschupf
> > <[EMAIL PROTECTED]> wrote:
> > > Setup a url filter for any *.doc and install and
> > use the word parser,
> > > that is all you need to do...
> > >
> > > Am 28.03.2005 um 07:12 schrieb Eric Money:
> > >
> > > > Hi all,
> > > >
> > > > If I wanna search a site but only interested in
> > the
> > > > files with .doc suffix, how should I re-write
> > nutch to
> > > > get all these files? Any comments and
> > experiences
> > > > are appreciated, thanks all in advance.
> > > >
> > > >
> > > >
> >
> -------------------------------------------------------
> > > > SF email is sponsored by - The IT Product Guide
> > > > Read honest & candid reviews on hundreds of IT
> > Products from real
> > > > users.
> > > > Discover which products truly live up to the
> > hype. Start reading now.
> > > >
> >
> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
> > > > _______________________________________________
> > > > Nutch-general mailing list
> > > > [email protected]
> > > >
> >
> https://lists.sourceforge.net/lists/listinfo/nutch-general
> > > >
> > > >
> > >
> >
> ---------------------------------------------------------------
> > > company:                http://www.media-style.com
> > > forum:          http://www.text-mining.org
> > > blog:                   http://www.find23.net
> > >
> > >
> >
>

Reply via email to