Hello solr team, I want to index multiple fields into one solr index entity, with the same id. We are using solr 4.1
I try it with following source fragment: public void addContentSet(ContentSet contentSet) throws SearchProviderException { ... ContentStreamUpdateRequest csur = generateCSURequest(contentSet.getIndexId(), contentSet); String indexId = contentSet.getIndexId(); ConcurrentUpdateSolrServer server = serverPool.getUpdateServer(indexId); server.request(csur); ... } private ContentStreamUpdateRequest generateCSURequest(String indexId, ContentSet contentSet) throws IOException { ContentStreamUpdateRequest csur = new ContentStreamUpdateRequest(confStore.getExtractUrl()); ModifiableSolrParams parameters = csur.getParams(); if (parameters == null) { parameters = new ModifiableSolrParams(); } parameters.set("literalsOverride", "false"); // maps the tika default content attribute to the Attribute with name 'fulltext' parameters.set("fmap.content", SearchSystemAttributeDef.FULLTEXT.getName()); // create an empty content stream, this seams necessary for ContentStreamUpdateRequest csur.addContentStream(new ImaContentStream()); for (Content content : contentSet.getContentList()) { csur.addContentStream(new ImaContentStream(content)); // for each content stream add additional attributes parameters.add("literal." + SearchSystemAttributeDef.CONTENT_ID.getName(), content.getBinaryObjectId().toString()); parameters.add("literal." + SearchSystemAttributeDef.CONTENT_KEY.getName(), content.getContentKey()); parameters.add("literal." + SearchSystemAttributeDef.FILE_NAME.getName(), content.getContentName()); parameters.add("literal." + SearchSystemAttributeDef.MIME_TYPE.getName(), content.getMimeType()); } parameters.set("literal.id ", indexId); // adding some other attributes ... csur.setParams(parameters); return csur; } During debugging I can see that the method 'server.request(csur)' read for each ImaContentStream the buffer. When I'm looking on solr catalina log I see that the attached files reach the solr servlet. INFO: Releasing directory:/data/V-4-1/master0/data/index Apr 25, 2013 5:48:07 AM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: [master0] webapp=/solr-4-1 path=/update/extract params={literal.searchconnectortest15_c8150e41_cc49_4a ...... &literal.id=26afa5dc-40ad-442a-ac79-0e7880c06aa1& ..... {add=[26afa5dc-40ad-442a-ac79-0e7880c06aa1 (1433265910940958720), 26afa5dc-40ad-442a-ac79-0e7880c06aa1 (1433265910971367424), 26afa5dc-40ad-442a-ac79-0e7880c06aa1 (1433265910976610304), 26afa5dc-40ad-442a-ac79-0e7880c06aa1 (1433265910983950336), 26afa5dc-40ad-442a-ac79-0e7880c06aa1 (1433265910989193216), 26afa5dc-40ad-442a-ac79-0e7880c06aa1 (1433265910995484672)]} 0 58 But only the latest in the content list will be indexed. My schema.xml has the following field definitions: <field name="id" type="string" indexed="true" stored="true" required="true" /> <field name="content" type="text_general" indexed="false" stored="true" multiValued="true"/> <field name="contentkey" type="string" indexed="true" stored="true" multiValued="true"/> <field name="contentid" type="string" indexed="true" stored="true" multiValued="true"/> <field name="contentfilename " type="string" indexed="true" stored="true" multiValued="true"/> <field name="contentmimetype" type="string" indexed="true" stored="true" multiValued="true"/> <field name="fulltext" type="text_general" indexed="true" stored="true" multiValued="true"/> I'm using the tika ExtractingRequestHandler which can extract binary files. <requestHandler name="/update/extract" startup="lazy" class="solr.extraction.ExtractingRequestHandler" > <lst name="defaults"> <str name="lowernames">true</str> <str name="uprefix">ignored_</str> <!-- capture link hrefs but ignore div attributes --> <str name="captureAttr">true</str> <str name="fmap.a">links</str> <str name="fmap.div">ignored_</str> </lst> </requestHandler> Is it possible to index multiple files with the same id? It is necessary to implement my own RequestHandler? With best regards Mark