Christian Herta wrote:
I tried to Index my local file system according to the FAQ: http://wiki.apache.org/nutch/FAQ#head-c721b23b43b15885f5ea7d8da62c1c40a37878e6

But if I add the plugin into the nutch-site.xml file like this:

      <property>
        <name>plugin.includes</name>
<value>protocol-file|protocol-http|parse-(text|html)|index-basic|query-(basic|site|url)</value>
      </property>


try with:

<value>protocol-(file|http)|urlfilter-regex|parse-(text|html|js)|index-basic|query-(basic|site|url)|summary-basic|scoring-opic</value>

if it does not work consult your log file logs/hadoop.log for more specific info about your problem.



Additionally I have another question:
 * Is there a possibility to use a directory of the HDFS Filesystem as a
spool directory to index from?

Not directly, but if you can expose[1] hdfs via some available protocol then it is possible to index contents of hdfs also.

One could also write a protocol-hdfs plugin to do the job.

--
 Sami Siren


[1]http://issues.apache.org/jira/browse/HADOOP-4

Reply via email to