Hi -

New Solr user here.  I am using Solr Cell to index files (PDF, doc, docx,
txt, htm, etc.) and there is a good chance that a new file will have
duplicate content but not necessarily the same file name.  To avoid this I
am using the deduplication feature of Solr.

  <updateRequestProcessorChain name="dedupe">
    <processor
class="org.apache.solr.update.processor.SignatureUpdateProcessorFactory">
      <bool name="enabled">true</bool>
      <str name="signatureField">id</str>
      <bool name="overwriteDupes">true</bool>
      <str name="fields">attr_content</str>
      <str name="signatureClass">org.apache.solr.update.processor.</str>
    </processor>
    <processor class="solr.LogUpdateProcessorFactory" />
    <processor class="solr.RunUpdateProcessorFactory" />
  </updateRequestProcessorChain>

How do I get the "id" value post Solr processing.  Is there someway to
modify the curl response so that id is returned.  I need this id because I
would like to rename the file to the id value.  I could probably do a Solr
search after the fact to get the id field based on the attr_stream_name but
I would like to do only one request.

curl '
http://localhost:8080/solr/update/extract?uprefix=attr_&fmap.content=attr_content&commit=true'
-F "myfi...@myfile.pdf"

Thanks,
Bill

Reply via email to