Hi Payo,

You need to add the right plugin to your nutch configuration file. Here is an 
extraction from my installation:

NUTCH_HOME\conf\nutch-site.xml:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
 <property>
   <name>plugin.includes</name>
   
<value>nutch-extensionpoints|ontology|protocol-ftp|protocol-httpclient|urlfilter-regex|parse-(text|html|pdf|rtf|msword|js|mspowerpoint|msexcel|oo|rss)|index-(basic|more)|query-(basic|site|url|more)|summary-lucene|scoring-opic</value>
 </property>
...

Using the above configuration, I am able to index text, html, pbd, excel, etc.

Not sure about XML, I think there is already an enhacement request for this in 
JIRA. 

I hope this helps,

Sergio

----- Original Message ----
From: payo <[EMAIL PROTECTED]>
To: [email protected]
Sent: Friday, 19 October, 2007 4:16:20 PM
Subject: Re: Indexing documents




Goethe wrote:
> 
> 
> 
> payo wrote:
>> 
>> Hi
>> 
>> my questions are
>> 
>> 1.- Nutch can index documents PDF, HTML and XML?
>> 
>> 2.- Nutxh can index remote documents?
>> 
>> thanks
>> 
> 
> Yes to both questions, and for the first question Nutch already comes with
> the plugins necessary to index those files types.
> 
> 

where i can obtain information on this?

-- 
View this message in context: 
http://www.nabble.com/Indexing-documents-tf4653264.html#a13295436
Sent from the Nutch - User mailing list archive at Nabble.com.


      ___________________________________________________________ 
Want ideas for reducing your carbon footprint? Visit Yahoo! For Good  
http://uk.promotions.yahoo.com/forgood/environment.html

Reply via email to