Hi,
You can quickly check which fields will be added to your NutchDocument
before being passed to Solr using the IndexFiltersChecker tool. This tool
is available in both trunk and 2.x codebases and can be invoked from the
command line interface.

On Monday, June 3, 2013, stone2dbone <[email protected]> wrote:
> I would like to know how to add a field to an index using Nutch 1.6 and
Solr
> 4.0.  I have tried using the index-static, index-extra and index-metadata
> plugins, all to no avail. I have modified
>
> nutch-default.xml:
>
> <property>
>   <name>index.static</name>
>   <value>display_type:page</value>
>   <description>
>   A simple plugin called at indexing that adds fields with static data.
>   You can specify a list of fieldname:fieldcontent per nutch job.
>   It can be useful when collections can't be created by urlpatterns,
>   like in subcollection, but on a job-basis.
>   </description>
> </property>
>
> nutch-site.xml:
>
> <property>
>   <name>plugin.includes</name>
>
>
<value>protocol-httpclient|urlfilter-regex|parse-(html|tika|metatags)|index-(anchor|basic|metadata|static)|scoring-opic|urlnormalizer-(pass|regex|basic)|urlfilter-suffix</value>
>   <description>Regular expression naming plugin directory names to
>   include.  Any plugin not matching this expression is excluded.
>   In any case you need at least include the nutch-extensionpoints plugin.
By
>   default Nutch includes crawling just HTML and plain text via HTTP,
>   and basic indexing and search plugins. In order to use HTTPS please
enable
>   protocol-httpclient, but be aware of possible intermittent problems with
> the
>   underlying commons-httpclient library.
>   </description>
> </property>
>
> I expected the result in Solr to look similar to the following:
>
> <doc>
>   <arr name="content">
>     <str>Untitled Document text goes here and more text and more</str>
>   </arr>
>   <str name="title">Untitled Document</str>
>   <str name="segment">20130603095157</str>
>   <float name="boost">0.65465367</float>
>   <str name="digest">30fd854c798cf159085934c50561dccb</str>
>   <date name="tstamp">2013-06-03T13:52:12.593Z</date>
>   <str name="id">https://...</str>
>   <str name="url">https://...</str>
>   <long name="_version_">1436829905573642240</long>
>   <str name="display_type">page</str>
> </doc>
>
> But I do not see my added field.
>
> I believe index-extra is deprecated, but I thought index-static and
> index-metadata should still work.
>
> Must I write a custom plugin? If so, I ultimately would like to change the
> value of the added field dependent upon the mime type parsed (e.g.
> if (application/msword or application/pdf) {doc.add("display_type",
"doc")}
> if (text/html) {doc.add("display_type", "pages")}
> if (video/mpeg) {{doc.add("display_type", "video")}
>
> Any assistance would be greatly appreciated.
>
>
>
> --
> View this message in context:
http://lucene.472066.n3.nabble.com/How-to-add-field-to-index-tp4067894.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>

-- 
*Lewis*

Reply via email to