Hello,

I am also a newbie and was wanting to do almost the exact same thing.
I was planning on doing the equivalent of:-

<dataConfig>
    <dataSource type="FileDataSource" encoding="UTF-8" />
    <document>
      <entity name ="f" processor="FileListEntityProcessor"
              baseDir="***"
              fileName=".*xml"
              rootEntity="false"
              dataSource="null" >
         <entity
           name="record"
           processor="XPathEntityProcessor"
           stream="false"
           rootEntity="false"            ***changed***
           forEach="/record"
           url="${f.fileAbsolutePath}">
                 <field column="ID" xpath="/record/@id" commonField="true"/> 
***change**
                 <!-- Address  -->
                  <entity
                     name="record_adr"
                     processor="XPathEntityProcessor"
                     stream="false"
                     forEach="/record/address"
                     url="${f.fileAbsolutePath}">
                          <field column="address_street"  xpath="/ 
record/address/@street" />
                          <field column="address_state"   
xpath="/record/address//@state" />
                          <field column="address_type"    xpath="/ 
record/address//@type" />
                </entity>
            </entity>
      </entity>
    </document>
</dataConfig>

ID is no longer unique within Solr, There would be multiple "documents"
with a given ID; one for each address. You can then search on ID and get 
the three addresses, you can also search on an address more sensibly.

I have not been able to try this yet as other issues are still to be
dealt with.

Comments?????

>Hi
>I may be completely off on this being new to SOLR but I am not sure  
>how to index related groups of fields in a document and preserver  
>their 'grouping'.   I  would appreciate any help on this.    Detailed  
>description of the problem below.
>
>I am trying to index an entity that can have multiple occurrences in  
>the same document - e.g. Address.  The address could be Shipping,  
>Home, Office etc.   Each address element has multiple values in it  
>like street, state etc.    Thus each address element is a group with  
>the state and street in one address element being related to each other.
>
>It looks like this in my source xml
>
><record>
>    <coreInfo id="123" , .../>
>    <address street="XYZ1" State="CA" ...type="home" />
>    <address street="XYZ2" state="CA" ... type="Office"/>
>    <address street="XYZ3" state="CA" ....type="Other"/>
></record>
>
>I have setup my DIH to treat these as entities as below
>
><dataConfig>
>    <dataSource type="FileDataSource" encoding="UTF-8" />
>    <document>
>      <entity name ="f" processor="FileListEntityProcessor"
>              baseDir="***"
>              fileName=".*xml"
>              rootEntity="false"
>              dataSource="null" >
>         <entity
>            name="record"
>          processor="XPathEntityProcessor"
>          stream="false"
>          forEach="/record"
>            url="${f.fileAbsolutePath}">
>                 <field column="ID" xpath="/record/@id" />
>
>                 <!-- Address  -->
>                  <entity
>                      name="record_adr"
>                    processor="XPathEntityProcessor"
>                    stream="false"
>                    forEach="/record/address"
>                            url="${f.fileAbsolutePath}">
>                          <field column="address_street"  xpath="/ 
>record/address/@street" />
>                        <field column="address_state"   
> xpath="/record/address//@state" />
>                          <field column="address_type"    xpath="/ 
>record/address//@type" />
>               </entity>
>            </entity>
>      </entity>
>    </document>
></dataConfig>
>
>
>The problem is as follows.  DIH seems to treat these as entities but  
>solr seems to flatten them out on indexing to fields in a document  
>(losing the entity part).
>
>So when I search for the an ID - in the response all the street fields  
>are bunched to-gather, followed by all the state fields type etc.   
>Thus I can't associate which street address corresponds to which  
>address type in the response.
>
>What seems harder is this - say I need to query on 'Street' = XYZ1 and  
>type="Office".  This should NOT return a document since the street for  
>the office address is "XY2" and not "XYZ1".  However when I query for  
>address_state:"XYZ1" and address_type:"Office" I get back this document.
>
>The problem seems to be that while DIH allows 'entities' within a  
>document  the SOLR schema does not preserve them - it 'flattens' all  
>of them out as indices for the document.
>
>I could work around the problem by creating SOLR fields like  
>"home_address_street" and "office_address_street" and do some xpath  
>mapping.  However I don't want to do it as we can have multiple  
>'other' addresses.  Also I have other fields whose type is not easily  
>distinguished like address.
>
>As I mentioned being new to SOLR I might have completely goofed on a  
>way to set it up - much appreciate any direction on it. I am using  
>SOLR 1.3
>
>Regards,
>Guna

-- 

===============================================================
Fergus McMenemie               Email:fer...@twig.me.uk
Techmore Ltd                   Phone:(UK) 07721 376021

Unix/Mac/Intranets             Analyst Programmer
===============================================================

Reply via email to