Hello, I am also a newbie and was wanting to do almost the exact same thing. I was planning on doing the equivalent of:-
<dataConfig> <dataSource type="FileDataSource" encoding="UTF-8" /> <document> <entity name ="f" processor="FileListEntityProcessor" baseDir="***" fileName=".*xml" rootEntity="false" dataSource="null" > <entity name="record" processor="XPathEntityProcessor" stream="false" rootEntity="false" ***changed*** forEach="/record" url="${f.fileAbsolutePath}"> <field column="ID" xpath="/record/@id" commonField="true"/> ***change** <!-- Address --> <entity name="record_adr" processor="XPathEntityProcessor" stream="false" forEach="/record/address" url="${f.fileAbsolutePath}"> <field column="address_street" xpath="/ record/address/@street" /> <field column="address_state" xpath="/record/address//@state" /> <field column="address_type" xpath="/ record/address//@type" /> </entity> </entity> </entity> </document> </dataConfig> ID is no longer unique within Solr, There would be multiple "documents" with a given ID; one for each address. You can then search on ID and get the three addresses, you can also search on an address more sensibly. I have not been able to try this yet as other issues are still to be dealt with. Comments????? >Hi >I may be completely off on this being new to SOLR but I am not sure >how to index related groups of fields in a document and preserver >their 'grouping'. I would appreciate any help on this. Detailed >description of the problem below. > >I am trying to index an entity that can have multiple occurrences in >the same document - e.g. Address. The address could be Shipping, >Home, Office etc. Each address element has multiple values in it >like street, state etc. Thus each address element is a group with >the state and street in one address element being related to each other. > >It looks like this in my source xml > ><record> > <coreInfo id="123" , .../> > <address street="XYZ1" State="CA" ...type="home" /> > <address street="XYZ2" state="CA" ... type="Office"/> > <address street="XYZ3" state="CA" ....type="Other"/> ></record> > >I have setup my DIH to treat these as entities as below > ><dataConfig> > <dataSource type="FileDataSource" encoding="UTF-8" /> > <document> > <entity name ="f" processor="FileListEntityProcessor" > baseDir="***" > fileName=".*xml" > rootEntity="false" > dataSource="null" > > <entity > name="record" > processor="XPathEntityProcessor" > stream="false" > forEach="/record" > url="${f.fileAbsolutePath}"> > <field column="ID" xpath="/record/@id" /> > > <!-- Address --> > <entity > name="record_adr" > processor="XPathEntityProcessor" > stream="false" > forEach="/record/address" > url="${f.fileAbsolutePath}"> > <field column="address_street" xpath="/ >record/address/@street" /> > <field column="address_state" > xpath="/record/address//@state" /> > <field column="address_type" xpath="/ >record/address//@type" /> > </entity> > </entity> > </entity> > </document> ></dataConfig> > > >The problem is as follows. DIH seems to treat these as entities but >solr seems to flatten them out on indexing to fields in a document >(losing the entity part). > >So when I search for the an ID - in the response all the street fields >are bunched to-gather, followed by all the state fields type etc. >Thus I can't associate which street address corresponds to which >address type in the response. > >What seems harder is this - say I need to query on 'Street' = XYZ1 and >type="Office". This should NOT return a document since the street for >the office address is "XY2" and not "XYZ1". However when I query for >address_state:"XYZ1" and address_type:"Office" I get back this document. > >The problem seems to be that while DIH allows 'entities' within a >document the SOLR schema does not preserve them - it 'flattens' all >of them out as indices for the document. > >I could work around the problem by creating SOLR fields like >"home_address_street" and "office_address_street" and do some xpath >mapping. However I don't want to do it as we can have multiple >'other' addresses. Also I have other fields whose type is not easily >distinguished like address. > >As I mentioned being new to SOLR I might have completely goofed on a >way to set it up - much appreciate any direction on it. I am using >SOLR 1.3 > >Regards, >Guna -- =============================================================== Fergus McMenemie Email:fer...@twig.me.uk Techmore Ltd Phone:(UK) 07721 376021 Unix/Mac/Intranets Analyst Programmer ===============================================================