Hi, I have Solr 5.5.0 configured with UIMA and Tika. I am facing issues when I am doing atomic updates for the documents already indexed.
<updateRequestProcessorChain name="uima" default="true"> <processor class="org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory"> <lst name="uimaConfig"> <lst name="runtimeParameters"> <int name="ngramsize">3</int> </lst> <!-- analysisEngine must contain an AE descriptor inside the specified path in the classpath. --> <str name="analysisEngine"><Path to my Analysis Engine></str> <!-- Set to true if you want to continue indexing even if text processing fails. Default is false. That is, Solr throws RuntimeException and never indexed documents entirely in your session. --> <bool name="ignoreErrors">true</bool> <!-- This is optional. It is used for logging when text processing fails. If logField is not specified, uniqueKey will be used as logField. <str name="logField">id</str> --> <!-- analyzeFields must contain the input fields that need to be analyzed by UIMA. --> <lst name="analyzeFields"> <bool name="merge">false</bool> <!-- wanted to use field 'text' but solr-uima has known bug for 'multiValued' types field parsing; hence using multiple fields --> <arr name="fields"> <str>content</str> <str>title</str> </arr> </lst> <!-- Field mapping describes which features of which types should go in a field. --> <lst name="fieldMappings"> <lst name="type"> <str name="name">org.apache.uima.TokenAnnotation</str> <lst name="mapping"> <str name="feature">coveredText</str> <str name="field">posVals</str> </lst> <lst name="mapping"> <str name="feature">posTag</str> <str name="field">posTags</str> </lst> </lst> </lst> </lst> </processor> <processor class="solr.LogUpdateProcessorFactory" /> <processor class="solr.RunUpdateProcessorFactory" /> </updateRequestProcessorChain> <requestHandler name="/update" class="solr.UpdateRequestHandler"> <lst name="defaults"> <str name="update.chain">uima</str> </lst> </requestHandler> // Tika configuration <requestHandler name="/update/extract" class="org.apache.solr.handler.extraction.ExtractingRequestHandler"> <lst name="defaults"> <str name="fmap.Last-Modified">last_modified</str> <!-- ignore undeclared fields --> <str name="uprefix">ignored_</str> <str name="captureAttr">true</str> </lst> <!-- Optional: Specify a path to a tika configuration file. See the Tika docs for details. --> <!-- <str name="tika.config">/my/path/to/tika.config</str> --> <!-- Optional: Specify one or more date formats to parse. See DateUtil.DEFAULT_DATE_FORMATS for default date formats --> <lst name="date.formats"> <str>yyyy-MM-dd</str> </lst> </requestHandler> My schema has the fields 'title', 'content' which are used by UIMA and copied to 'text' using copyField. <field name="title" type="text_general" indexed="true" stored="true" multiValued="true" termVectors="true" termPositions="true" termOffsets="true" /> <field name="content" type="text_general" stored="true" multiValued="true" termVectors="true" termPositions="true" termOffsets="true" /> <field name="text" type="text_general" indexed="true" stored="true" multiValued="true" termVectors="true" termPositions="true" termOffsets="true" /> <copyField source="title" dest="text"/> <copyField source="content" dest="text"/> I tried removing the stored="true" for 'text' field. But no luck. This link https://issues.apache.org/jira/browse/SOLR-8528 says it's fixed, but I am still facing the issue. Can someone please help me with this? Thanks, Srini -- http://cheyuta-helpinghands.blogspot.com