where are you from Sergio?
Sergio Morales wrote: > > Hi Payo, > > You need to add the right plugin to your nutch configuration file. Here is > an extraction from my installation: > > NUTCH_HOME\conf\nutch-site.xml: > > <?xml version="1.0"?> > <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> > <configuration> > <property> > <name>plugin.includes</name> > > <value>nutch-extensionpoints|ontology|protocol-ftp|protocol-httpclient|urlfilter-regex|parse-(text|html|pdf|rtf|msword|js|mspowerpoint|msexcel|oo|rss)|index-(basic|more)|query-(basic|site|url|more)|summary-lucene|scoring-opic</value> > </property> > ... > > Using the above configuration, I am able to index text, html, pbd, excel, > etc. > > Not sure about XML, I think there is already an enhacement request for > this in JIRA. > > I hope this helps, > > Sergio > > ----- Original Message ---- > From: payo <[EMAIL PROTECTED]> > To: [email protected] > Sent: Friday, 19 October, 2007 4:16:20 PM > Subject: Re: Indexing documents > > > > > Goethe wrote: >> >> >> >> payo wrote: >>> >>> Hi >>> >>> my questions are >>> >>> 1.- Nutch can index documents PDF, HTML and XML? >>> >>> 2.- Nutxh can index remote documents? >>> >>> thanks >>> >> >> Yes to both questions, and for the first question Nutch already comes >> with >> the plugins necessary to index those files types. >> >> > > where i can obtain information on this? > > -- > View this message in context: > http://www.nabble.com/Indexing-documents-tf4653264.html#a13295436 > Sent from the Nutch - User mailing list archive at Nabble.com. > > > ___________________________________________________________ > Want ideas for reducing your carbon footprint? Visit Yahoo! For Good > http://uk.promotions.yahoo.com/forgood/environment.html > -- View this message in context: http://www.nabble.com/Indexing-documents-tf4653264.html#a13302250 Sent from the Nutch - User mailing list archive at Nabble.com.
