OPENNLP problems

Patrick Mi Tue, 28 May 2013 22:10:19 -0700

Hi there,

Checked out branch_4x and applied the latest patch
LUCENE-2899-current.patch however I ran into 2 problems


Followed the wiki page instruction and set up a field with this type aiming
to keep nouns and verbs and do a facet on the field
==
<fieldType name="text_opennlp_nvf" class="solr.TextField"
positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.OpenNLPTokenizerFactory"
tokenizerModel="opennlp/en-token.bin"/>
        <filter class="solr.OpenNLPFilterFactory"
posTaggerModel="opennlp/en-pos-maxent.bin"/>
        <filter class="solr.FilterPayloadsFilterFactory"
payloadList="NN,NNS,NNP,NNPS,VB,VBD,VBG,VBN,VBP,VBZ,FW"/>
        <filter class="solr.StripPayloadsFilterFactory"/>
      </analyzer>
    </fieldType>
==

Struggled to get that going until I put the extra parameter
keepPayloads="true" in as below. 
     <filter class="solr.FilterPayloadsFilterFactory" keepPayloads="true"
payloadList="NN,NNS,NNP,NNPS,VB,VBD,VBG,VBN,VBP,VBZ,FW"/>

Question: am I doing the right thing? Is this a mistake on wiki 

Second problem:

Posted the document xml one by one to the solr and the result was what I
expected.

<add>
<doc>
  <field name="id">1</field>
  <field name="text_opennlp_nvf">check in the hotel</field></doc>
</add>

However if I put multiple documents into the same xml file and post it in
one go only the first document gets processed( only 'check' and 'hotel' were
showing in the facet result.) 
 
<add>
<doc>
  <field name="id">1</field>
  <field name="text_opennlp_nvf">check in the hotel</field>
</doc>
<doc>
  <field name="id">2</field>
  <field name="text_opennlp_nvf">removes the payloads</field>
</doc>
<doc>
  <field name="id">3</field>
  <field name="text_opennlp_nvf">retains only nouns and verbs </field>
</doc>
</add>

Same problem when updated the data using csv upload.

Is that a bug or something I did wrong?

Thanks in advance!

Regards,
Patrick

OPENNLP problems

Reply via email to