Hi,
I am using Nutch 1.x source, need to index/add new field at run time and write
to so Solr, I am using index-more filter to add new filed , made changes in
schema.xml in Nutch, Solr, Index-More plugin, nutch-site.xml and build it again
using ant command .
I am writing index to solr , I don’t see “byname” field in returned doc Json.
How can I make sure Index-more filter got executed or not , any other
configuration change in notch-site.xml.
Below are my changes.
MoreIndexingFilter.java
public NutchDocument filter(NutchDocument doc, Parse parse, Text url,
CrawlDatum datum, Inlinks inlinks) throws IndexingException {
doc.add("myname", "manish");
return doc;
}
Schema.xml (solr and nutch)
<field name="myname" type="string" stored="true" indexed="true"/>
nutch-site.xml (added index-more)
<property>
<name>plugin.includes</name>
<value>protocol-http|urlfilter-regex|parse-(html|tika)|index-(basic|anchor)|indexer-solr|scoring-opic|urlnormalizer-(pass|regex|basic)|index-more</value>
<description>Regular expression naming plugin directory names to
include. Any plugin not matching this expression is excluded.
In any case you need at least include the nutch-extensionpoints plugin. By
default Nutch includes crawling just HTML and plain text via HTTP,
and basic indexing and search plugins. In order to use HTTPS please enable
protocol-httpclient, but be aware of possible intermittent problems with the
underlying commons-httpclient library. Set parsefilter-naivebayes for
classification based focused crawler.
</description>
</property>
Please suggest.
Thanks
Manish Verma
AML Search
+1 669 224 9924