Hi,

I am using Nutch 1.x source,  need to index/add new field at run time and write 
to so Solr, I am using index-more filter to add new filed , made changes in 
schema.xml in Nutch, Solr, Index-More plugin, nutch-site.xml and build it again 
using ant command .
I am writing index to solr , I don’t see “byname” field in returned doc Json. 
How can I make sure Index-more filter got executed or not , any other 
configuration change in notch-site.xml.
Below are my changes.

MoreIndexingFilter.java

public NutchDocument filter(NutchDocument doc, Parse parse, Text url,
      CrawlDatum datum, Inlinks inlinks) throws IndexingException {
    doc.add("myname", "manish");
    return doc;
  }


Schema.xml (solr and nutch)

    <field name="myname" type="string" stored="true" indexed="true"/>

nutch-site.xml (added index-more)

<property>
  <name>plugin.includes</name>
  
<value>protocol-http|urlfilter-regex|parse-(html|tika)|index-(basic|anchor)|indexer-solr|scoring-opic|urlnormalizer-(pass|regex|basic)|index-more</value>
  <description>Regular expression naming plugin directory names to
  include.  Any plugin not matching this expression is excluded.
  In any case you need at least include the nutch-extensionpoints plugin. By
  default Nutch includes crawling just HTML and plain text via HTTP,
  and basic indexing and search plugins. In order to use HTTPS please enable 
  protocol-httpclient, but be aware of possible intermittent problems with the 
  underlying commons-httpclient library. Set parsefilter-naivebayes for 
classification based focused crawler.
  </description>
</property>




Please suggest.


Thanks
Manish Verma
AML Search
+1 669 224 9924

Reply via email to