I just skimmed your post, but I'm responding to the last bit. If you have <uniqueKey> defined as "id" in schema.xml then no, you cannot have multiple documents with the same ID. Whenever a new doc comes in it replaces the old doc with that ID.
You can remove the <uniqueKey> definition and do what you want, but there are very few Solr installations with no <uniqueKey> and it's probably a better idea to make your id's truly unique. Best Erick On Thu, May 23, 2013 at 6:14 AM, <mark.ka...@t-systems.com> wrote: > Hello solr team, > > I want to index multiple fields into one solr index entity, with the same id. > We are using solr 4.1 > > > I try it with following source fragment: > > public void addContentSet(ContentSet contentSet) throws > SearchProviderException { > > ... > > ContentStreamUpdateRequest csur = > generateCSURequest(contentSet.getIndexId(), contentSet); > String indexId = contentSet.getIndexId(); > > ConcurrentUpdateSolrServer server = > serverPool.getUpdateServer(indexId); > server.request(csur); > > ... > } > > private ContentStreamUpdateRequest generateCSURequest(String indexId, > ContentSet contentSet) > throws IOException { > ContentStreamUpdateRequest csur = new > ContentStreamUpdateRequest(confStore.getExtractUrl()); > > ModifiableSolrParams parameters = csur.getParams(); > if (parameters == null) { > parameters = new ModifiableSolrParams(); > } > > parameters.set("literalsOverride", "false"); > > // maps the tika default content attribute to the Attribute with name > 'fulltext' > parameters.set("fmap.content", > SearchSystemAttributeDef.FULLTEXT.getName()); > // create an empty content stream, this seams necessary for > ContentStreamUpdateRequest > csur.addContentStream(new ImaContentStream()); > > for (Content content : contentSet.getContentList()) { > csur.addContentStream(new ImaContentStream(content)); > // for each content stream add additional attributes > parameters.add("literal." + > SearchSystemAttributeDef.CONTENT_ID.getName(), > content.getBinaryObjectId().toString()); > parameters.add("literal." + > SearchSystemAttributeDef.CONTENT_KEY.getName(), content.getContentKey()); > parameters.add("literal." + > SearchSystemAttributeDef.FILE_NAME.getName(), content.getContentName()); > parameters.add("literal." + > SearchSystemAttributeDef.MIME_TYPE.getName(), content.getMimeType()); > } > > parameters.set("literal.id ", indexId); > > // adding some other attributes > ... > > csur.setParams(parameters); > > return csur; > } > > During debugging I can see that the method 'server.request(csur)' read for > each ImaContentStream the buffer. > When I'm looking on solr catalina log I see that the attached files reach the > solr servlet. > > INFO: Releasing directory:/data/V-4-1/master0/data/index > Apr 25, 2013 5:48:07 AM org.apache.solr.update.processor.LogUpdateProcessor > finish > INFO: [master0] webapp=/solr-4-1 path=/update/extract > params={literal.searchconnectortest15_c8150e41_cc49_4a ...... > &literal.id=26afa5dc-40ad-442a-ac79-0e7880c06aa1& ..... > {add=[26afa5dc-40ad-442a-ac79-0e7880c06aa1 (1433265910940958720), > 26afa5dc-40ad-442a-ac79-0e7880c06aa1 (1433265910971367424), > 26afa5dc-40ad-442a-ac79-0e7880c06aa1 (1433265910976610304), > 26afa5dc-40ad-442a-ac79-0e7880c06aa1 (1433265910983950336), > 26afa5dc-40ad-442a-ac79-0e7880c06aa1 (1433265910989193216), > 26afa5dc-40ad-442a-ac79-0e7880c06aa1 (1433265910995484672)]} 0 58 > > > But only the latest in the content list will be indexed. > > > My schema.xml has the following field definitions: > > <field name="id" type="string" indexed="true" stored="true" > required="true" /> > <field name="content" type="text_general" indexed="false" stored="true" > multiValued="true"/> > > <field name="contentkey" type="string" indexed="true" stored="true" > multiValued="true"/> > <field name="contentid" type="string" indexed="true" stored="true" > multiValued="true"/> > <field name="contentfilename " type="string" indexed="true" stored="true" > multiValued="true"/> > <field name="contentmimetype" type="string" indexed="true" stored="true" > multiValued="true"/> > > <field name="fulltext" type="text_general" indexed="true" stored="true" > multiValued="true"/> > > > I'm using the tika ExtractingRequestHandler which can extract binary files. > > > > <requestHandler name="/update/extract" > startup="lazy" > class="solr.extraction.ExtractingRequestHandler" > > <lst name="defaults"> > <str name="lowernames">true</str> > <str name="uprefix">ignored_</str> > > <!-- capture link hrefs but ignore div attributes --> > <str name="captureAttr">true</str> > <str name="fmap.a">links</str> > <str name="fmap.div">ignored_</str> > > </lst> > </requestHandler> > > Is it possible to index multiple files with the same id? > It is necessary to implement my own RequestHandler? > > With best regards Mark > > >