Hello solr team,

I want to index multiple fields into one solr index entity, with the same id. 
We are using solr 4.1


I try it with following source fragment:

    public void addContentSet(ContentSet contentSet) throws 
SearchProviderException {

                                ...

            ContentStreamUpdateRequest csur = 
generateCSURequest(contentSet.getIndexId(), contentSet);
            String indexId = contentSet.getIndexId();

            ConcurrentUpdateSolrServer server = 
serverPool.getUpdateServer(indexId);
            server.request(csur);

                                ...
    }

    private ContentStreamUpdateRequest generateCSURequest(String indexId, 
ContentSet contentSet)
            throws IOException {
        ContentStreamUpdateRequest csur = new 
ContentStreamUpdateRequest(confStore.getExtractUrl());

        ModifiableSolrParams parameters = csur.getParams();
        if (parameters == null) {
            parameters = new ModifiableSolrParams();
        }

        parameters.set("literalsOverride", "false");

        // maps the tika default content attribute to the Attribute with name 
'fulltext'
        parameters.set("fmap.content", 
SearchSystemAttributeDef.FULLTEXT.getName());
        // create an empty content stream, this seams necessary for 
ContentStreamUpdateRequest
        csur.addContentStream(new ImaContentStream());

        for (Content content : contentSet.getContentList()) {
            csur.addContentStream(new ImaContentStream(content));
            // for each content stream add additional attributes
            parameters.add("literal." + 
SearchSystemAttributeDef.CONTENT_ID.getName(), 
content.getBinaryObjectId().toString());
            parameters.add("literal." + 
SearchSystemAttributeDef.CONTENT_KEY.getName(), content.getContentKey());
            parameters.add("literal." + 
SearchSystemAttributeDef.FILE_NAME.getName(), content.getContentName());
            parameters.add("literal." + 
SearchSystemAttributeDef.MIME_TYPE.getName(), content.getMimeType());
        }

        parameters.set("literal.id ", indexId);

        // adding some other attributes
        ...

        csur.setParams(parameters);

        return csur;
    }

During debugging I can see that the method 'server.request(csur)' read for each 
ImaContentStream the buffer.
When I'm looking on solr catalina log I see that the attached files reach the 
solr servlet.

INFO: Releasing directory:/data/V-4-1/master0/data/index
Apr 25, 2013 5:48:07 AM org.apache.solr.update.processor.LogUpdateProcessor 
finish
INFO: [master0] webapp=/solr-4-1 path=/update/extract 
params={literal.searchconnectortest15_c8150e41_cc49_4a ...... 
&literal.id=26afa5dc-40ad-442a-ac79-0e7880c06aa1& .....
{add=[26afa5dc-40ad-442a-ac79-0e7880c06aa1 (1433265910940958720), 
26afa5dc-40ad-442a-ac79-0e7880c06aa1 (1433265910971367424), 
26afa5dc-40ad-442a-ac79-0e7880c06aa1 (1433265910976610304), 
26afa5dc-40ad-442a-ac79-0e7880c06aa1 (1433265910983950336), 
26afa5dc-40ad-442a-ac79-0e7880c06aa1 (1433265910989193216), 
26afa5dc-40ad-442a-ac79-0e7880c06aa1 (1433265910995484672)]} 0 58


But only the latest in the content list will be indexed.


My schema.xml has the following field definitions:

    <field name="id" type="string" indexed="true" stored="true" required="true" 
/>
    <field name="content" type="text_general" indexed="false" stored="true" 
multiValued="true"/>

    <field name="contentkey" type="string" indexed="true" stored="true" 
multiValued="true"/>
    <field name="contentid" type="string" indexed="true" stored="true" 
multiValued="true"/>
    <field name="contentfilename " type="string" indexed="true" stored="true" 
multiValued="true"/>
    <field name="contentmimetype" type="string" indexed="true" stored="true" 
multiValued="true"/>

    <field name="fulltext" type="text_general" indexed="true" stored="true" 
multiValued="true"/>


I'm using the tika ExtractingRequestHandler which can extract binary files.



  <requestHandler name="/update/extract"
                  startup="lazy"
                  class="solr.extraction.ExtractingRequestHandler" >
    <lst name="defaults">
      <str name="lowernames">true</str>
      <str name="uprefix">ignored_</str>

      <!-- capture link hrefs but ignore div attributes -->
      <str name="captureAttr">true</str>
      <str name="fmap.a">links</str>
      <str name="fmap.div">ignored_</str>

    </lst>
  </requestHandler>

Is it possible to index multiple files with the same id?
It is necessary to implement my own RequestHandler?

With best regards Mark



Reply via email to