Christian Herta wrote:
I tried to Index my local file system according to the FAQ:
http://wiki.apache.org/nutch/FAQ#head-c721b23b43b15885f5ea7d8da62c1c40a37878e6
But if I add the plugin into the nutch-site.xml file like this:
<property>
<name>plugin.includes</name>
<value>protocol-file|protocol-http|parse-(text|html)|index-basic|query-(basic|site|url)</value>
</property>
try with:
<value>protocol-(file|http)|urlfilter-regex|parse-(text|html|js)|index-basic|query-(basic|site|url)|summary-basic|scoring-opic</value>
if it does not work consult your log file logs/hadoop.log for more
specific info about your problem.
Additionally I have another question:
* Is there a possibility to use a directory of the HDFS Filesystem as a
spool directory to index from?
Not directly, but if you can expose[1] hdfs via some available protocol
then it is possible to index contents of hdfs also.
One could also write a protocol-hdfs plugin to do the job.
--
Sami Siren
[1]http://issues.apache.org/jira/browse/HADOOP-4