Thanks Shalin,
I'll try this asap. Yes, you did understand the sample schema I've been
playing with.
Just a couple of questions to clarify for my own understanding your
proposal.
1) the column "comboId" doesn't exist on the dB (yet it is specified as a
separate "column" for both "owners" and "vets" in your specification). I
didn't realize you could add "virtual" columns.
2) the "pets" entity did have a field/column id. The fact that you omitted
it, was an oversight, or not necessary, I suppose.
3) Your statement about uniqueKey needs some clarification:
   - I do have the following in my schema.xml: <uniqueKey>id</uniqueKey>, I
also added <uniqueKey>comboId</uniqueKey>. Are both necessary?

Thanks

** julio

-----Original Message-----
From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, June 04, 2008 11:01 AM
To: solr-user@lucene.apache.org
Subject: Re: How to describe 2 entities in dataConfig for the DataImporter?

Hi Julio,

The following are my assumptions after studying your given data-config
examples

1. The column id is present in all three tables -- vets, owners and pets.
2. Vets and owners are independent of each other, there is no join required
between them 3. There is a parent-child relationship between owners and pets
(joined on owner_id column)

The whole problem relates to the fact that both Vets and Owners have a
primary key with the same name -- "id". If "id" is the uniqueKey for the
Solr schema, then some records from owners overwrite vets when the value for
the "id" is the same.

To solve this problem, we must have a uniqueKey in schema.xml which has
different values when coming from Vets and Owners even when value of "id"
column is the same in both the tables. At the same time, for the join to
work, the original value of "id" should be kept as is.

I believe the following data-config.xml should solve your use-case:

<document name="doc-1">
    <entity name="vets" pk="id"
        query="select id,first_name,last_name FROM vets"
        transformer="TemplateTransformer">
        <field column="id" />
        <field column="comboId" template="vets-${vets.id}" />
        <field column="first_name" name="userName" />
        <field column="last_name" name="userName" />
    </entity>
    <entity name="owners" pk="id"
        query="select id,first_name,last_name FROM owners"
        transformer="TemplateTransformer">
        <field column="id" />
        <field column="comboId" template="owners-${owners.id}" />
        <field column="first_name" name="userName" />
        <field column="last_name" name="userName" />

        <entity name="pets" pk="id"
            query="SELECT id,name,birth_date,type_id FROM pets WHERE
owner_id='${owners.id}'">
            <field column="name" name="name" />
            <field column="birth_date" name="birthDate" />
        </entity>
    </entity>
</document>

Here, comboId is the uniqueKey for your Solr documents. The "id" field need
not exist in your schema.xml, it will only be used for joining the tables
and discarded if it does not exist in the schema.xml.

Hope this helps.

On Wed, Jun 4, 2008 at 10:54 PM, Julio Castillo <[EMAIL PROTECTED]>
wrote:

> Noble,
> Thanks for continuing to assist me on trying to come up a config that 
> works.
> A couple of questions/clarifications:
> 1) I had to introduce the "artificial" comboID and the transformer 
> because of a conflict with a parallel entity on the "id" ("vets" and
"owners").
> 2) I don't think there is a conflict with the petID because prior to 
> the introduction of "vets" I had "owners" with no "id" issues regarding
"pets".
> 3) The conflict was introduced the moment I tried to add "vets".
> Unfortunately by introducing the transformer, for "owners", the "pets"
> relationships stopped working.
>
> Below are 3 specifications. The first 2 work in isolation, when 
> combined(last one) it doesn't work.
>
> * CASE 1 (Works -nested entities -no conflicts on ids -no transformer)
>
> <document name="doc-1">
>  <entity name="owners" pk="id"
>            query="select id,first_name,last_name FROM owners">
>     <field column="id"           name="id"/>
>     <field column="first_name" name="userName"/>
>    <field column="last_name"    name="userName"/>
>
>    <entity name="pets" pk="id"
>                query="SELECT id,name,birth_date,type_id FROM pets 
> WHERE owner_id='${owners.id}'"
>                parentDeltaQuery="SELECT id FROM owners WHERE 
> id=${pets.owner_id}">
>        <field column="id"         name="id"/>
>        <field column="name"       name="name"/>
>        <field column="birth_date" name="birthDate"/>
>    </entity>
>  </entity>
> </document>
>
>
> * CASE 2 (Works -parallel independent entities -introduced transformer 
> to avoid id conflicts)
>
> <document name="doc-1">
>  <entity name="vets" pk="id"
>             query="select id,first_name,last_name FROM vets"
>            transformer="TemplateTransformer">
>    <field column="id"           name="comboId" template="vets-${vets.id
> }"/>
>     <field column="first_name" name="userName"/>
>    <field column="last_name"    name="userName"/>
>  </entity>
>   <entity name="owners" pk="id"
>            query="select id,first_name,last_name FROM owners"
>            transformer="TemplateTransformer">
>    <field column="id"           name="comboId"
> template="owners-${owners.id}"/>
>    <field column="first_name" name="userName"/>
>    <field column="last_name"    name="userName"/>
>  </entity>
> </document>
>
>
> * CASE 3 (Commented out "vets" to simplify case. Nested entities don't
> work:
> "Document [null] missing required field: id")
>
> <document name="doc-1">
>  <!--entity name="vets" pk="id"
>             query="select id,first_name,last_name FROM vets"
>            transformer="TemplateTransformer">
>    <field column="id"           name="comboId" template="vets-${vets.id
> }"/>
>     <field column="first_name" name="userName"/>
>    <field column="last_name"    name="userName"/>
>   </entity-->
>   <entity name="owners" pk="id"
>            query="select id,first_name,last_name FROM owners"
>            transformer="TemplateTransformer">
>    <field column="id"           name="comboId"
> template="owners-${owners.id}"/>
>    <field column="first_name" name="userName"/>
>    <field column="last_name"    name="userName"/>
>
>    <entity name="pets" pk="id"
>                query="SELECT id,name,birth_date,type_id FROM pets 
> WHERE owner_id='${owners.id}'"
>                parentDeltaQuery="SELECT id FROM owners WHERE 
> id=${pets.owner_id}">
>        <field column="id"         name="id"/>
>        <field column="name"       name="name"/>
>        <field column="birth_date" name="birthDate"/>
>    </entity>
>  </entity>
> </document>
>
>
> The debug output for one row from the dataImporter while iterating 
> over pets where the first row owner_id=1 (which gets transformed to 
> 'owners-1' -where owner_id is a fk to owners id column) shows as 
> follows:
> "SELECT id,name,birth_date,type_id FROM pets WHERE owner_id='owners-1'
>
> I believe the issue on somehow having to "untransform" the owners-id 
> prior to comparison with pets foreign key owner_id
>
> Thanks again
>
> ** julio
>
> -----Original Message-----
> From: Noble Paul ??????? ?????? [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, June 03, 2008 10:30 PM
> To: solr-user@lucene.apache.org
> Subject: Re: How to describe 2 entities in dataConfig for the
DataImporter?
>
> The id in pet should be  aliased to 'petid' , because id is coming 
> from both entities there is a conflict <entity name="owners" pk="id"
>          query="select id,first_name,last_name FROM owners"
>          transformer="TemplateTransformer">
>      <field column="comboId"  template="owners-${owners.id}"/>
>      <field column="id" />
>      <field column="first_name" name="userName"/>
>      <field column="last_name"  name="userName"/>
>
>      <entity name="pets"
>              query="SELECT id,name,birth_date,type_id FROM pets WHERE 
> owner_id='${owners.id}'"
>              parentDeltaQuery="SELECT id FROM owners WHERE 
> id=${pets.owner_id}">
>          <field column="id"          name="petid"/>
>          <field column="name"        name="name"/>
>          <field column="birth_date" name="birthDate"/>
>      </entity>
>  </entity>
>
>
> On Wed, Jun 4, 2008 at 10:37 AM, Noble Paul ??????? ??????
> <[EMAIL PROTECTED]> wrote:
> > hi julio,
> > You must create an extra field for 'comboid' because you really need 
> > the 'id' for your sub-entities. Your data-config must look as follows.
> > The pet also has a field called 'id' . It is not a good idea. call 
> > it 'petid' or something (both in dataconfig and schema.xml). Please 
> > make sure that the field names are unique .
> >
> >
> > <entity name="owners" pk="id"
> >           query="select id,first_name,last_name FROM owners"
> >           transformer="TemplateTransformer">
> >       <field column="comboId"  template="owners-${owners.id}"/>
> >       <field column="id" />
> >       <field column="first_name" name="userName"/>
> >       <field column="last_name"  name="userName"/>
> >
> >       <entity name="pets" pk="id"
> >               query="SELECT id,name,birth_date,type_id FROM pets 
> > WHERE owner_id='${owners.id}'"
> >               parentDeltaQuery="SELECT id FROM owners WHERE 
> > id=${pets.owner_id}">
> >           <field column="id"          name="id"/>
> >           <field column="name"        name="name"/>
> >           <field column="birth_date" name="birthDate"/>
> >       </entity>
> >   </entity>
> >
> >
> > On Wed, Jun 4, 2008 at 5:50 AM, Julio Castillo 
> > <[EMAIL PROTECTED]>
> wrote:
> >> Hi Noble,
> >> I had forgotten to also list comboId as a uniqueKey in the 
> >> schema.xml
> file.
> >> But that didn't make a difference.
> >> It still complained about the "Document [null] missing required field:
> id"
> >> for each row it ran into of the outer entity.
> >>
> >> If you look at the debug output of the entity:pets (see below on 
> >> original message).
> >> The query looks like this:
> >> "SELECT id,name,birth_date,type_id FROM pets WHERE owner_id='owners-1'
> >>
> >> This is the problem lies, because, the owner_id in the pets table 
> >> is currently a number and thus will not match the modified combo id 
> >> generated for the owners' id column.
> >>
> >> So, somehow, I need to be able to either remove the 'owners-' 
> >> suffix before comparing, or append the same suffix to the 
> >> pets.owner_id value prior to comparing.
> >>
> >> Thanks
> >>
> >> ** julio
> >>
> >> -----Original Message-----
> >> From: Noble Paul ??????? ?????? [mailto:[EMAIL PROTECTED]
> >> Sent: Monday, June 02, 2008 9:20 PM
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: How to describe 2 entities in dataConfig for the
> DataImporter?
> >>
> >> hi Julio,
> >> delete my previous response. In your schema , 'id' is the uniqueKey.
> >> make  'comboid' the unique key. Because that is the target field 
> >> name coming out of the entity 'owners'
> >>
> >> --Noble
> >>
> >> On Tue, Jun 3, 2008 at 9:46 AM, Noble Paul ??????? ??????
> >> <[EMAIL PROTECTED]> wrote:
> >>> The field 'id' is repeated for pet also rename it to something 
> >>> else say  <entity name="pets" pk="id"
> >>>               query="SELECT id,name,birth_date,type_id FROM pets 
> >>> WHERE owner_id='${owners.id}'"
> >>>               parentDeltaQuery="SELECT id FROM owners WHERE 
> >>> id=${pets.owner_id}">
> >>>           <field column="id"          name="petid"/>
> >>> </entity>
> >>>
> >>> --Noble
> >>>
> >>> On Tue, Jun 3, 2008 at 3:28 AM, Julio Castillo 
> >>> <[EMAIL PROTECTED]>
> >> wrote:
> >>>> Shalin,
> >>>> I experimented with it, and the null pointer exception has been 
> >>>> taken care of. Thank you.
> >>>>
> >>>> I have a different problem now. I believe it is a 
> >>>> syntax/specification problem.
> >>>>
> >>>> When importing data, I got the following exceptions:
> >>>> SEVERE: Exception while adding:
> >>>> SolrInputDocumnt[{comboId=comboId(1.0)={owners-9},
> >>>> userName=userName(1.0)={[David, Schroeder]}}]
> >>>>
> >>>> org.apache.solr.common.SolrException: Document [null] missing 
> >>>> required
> >>>> field: id
> >>>>        at
> >>>>
> >> org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.j
> >> av
> >> a:289)
> >>>>        at
> >>>> org.apache.solr.handler.dataimport.DataImportHandler$1.upload(Dat
> >>>> aI
> >>>> mp
> >>>> ortHand
> >>>> ler.java:263)
> >>>>        ...
> >>>>
> >>>> The problem arises the moment I try to include nested entities (e.g.
> >>>> pets -the problem does not occur if I don't use the transformer, 
> >>>> but I have to use the transformer because other unrelated 
> >>>> entities also have
> >> id's).
> >>>> My data config file looks as follows.
> >>>>
> >>>> <dataConfig>
> >>>>  <document name="doc-1">
> >>>>    <entity name="owners" pk="id"
> >>>>            query="select id,first_name,last_name FROM owners"
> >>>>            transformer="TemplateTransformer">
> >>>>        <field column="id"  name="comboId"
> >> template="owners-${owners.id}"/>
> >>>>        <field column="first_name" name="userName"/>
> >>>>        <field column="last_name"  name="userName"/>
> >>>>
> >>>>        <entity name="pets" pk="id"
> >>>>                query="SELECT id,name,birth_date,type_id FROM pets 
> >>>> WHERE owner_id='${owners.id}'"
> >>>>                parentDeltaQuery="SELECT id FROM owners WHERE 
> >>>> id=${pets.owner_id}">
> >>>>            <field column="id"          name="id"/>
> >>>>            <field column="name"        name="name"/>
> >>>>            <field column="birth_date" name="birthDate"/>
> >>>>        </entity>
> >>>>    </entity>
> >>>>  </document>
> >>>> </dataConfig>
> >>>>
> >>>> The debug output of the data import looks as follows:
> >>>>
> >>>> ....
> >>>> - <lst name="verbose-output">
> >>>>  - <lst name="entity:owners">
> >>>>    - <lst name="document#1">
> >>>>      <str name="query">select id,first_name,last_name FROM
> owners</str>
> >>>>      <str name="time-taken">0:0:0.15</str>
> >>>>      <str>----------- row #1-------------</str>
> >>>>      <int name="id">1</int>
> >>>>      <str name="first_name">George</str>
> >>>>      <str name="last_name">Franklin</str>
> >>>>      <str>---------------------------------------------</str>
> >>>>      - <lst name="transformer:TemplateTransformer">
> >>>>        <str>---------------------------------------------</str>
> >>>>        <str name="id">owners-1</str>
> >>>>        <str name="first_name">George</str>
> >>>>        <str name="last_name">Franklin</str>
> >>>>        <str>---------------------------------------------</str>
> >>>>        - <lst name="entity:pets">
> >>>>          <str name="query">SELECT id,name,birth_date,type_id FROM 
> >>>> pets WHERE owner_id='owners-1'</str>
> >>>>          <str name="time-taken">0:0:0.0</str>
> >>>>          </lst>
> >>>>      </lst>
> >>>>  </lst>
> >>>> + <lst name="document#1">
> >>>> ....
> >>>>
> >>>> Thanks again
> >>>>
> >>>> ** julio
> >>>>
> >>>>
> >>>> -----Original Message-----
> >>>> From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED]
> >>>> Sent: Saturday, May 31, 2008 10:26 AM
> >>>> To: solr-user@lucene.apache.org
> >>>> Subject: Re: How to describe 2 entities in dataConfig for the
> >> DataImporter?
> >>>>
> >>>> Hi Julio,
> >>>>
> >>>> I've fixed the bug, can you please replace the exiting 
> >>>> TemplateTransformer.java in the SOLR-469.patch and use the 
> >>>> attached TemplateTransformer.java file. We'll add the changes to 
> >>>> our next
> patch.
> >>>> Sorry for all the trouble.
> >>>>
> >>>> On Sat, May 31, 2008 at 10:31 PM, Noble Paul ??????? ??????
> >>>> <[EMAIL PROTECTED]> wrote:
> >>>>> julio,
> >>>>> Looks like it is a bug.
> >>>>> We can give u a new TemplateTransformer.java which we will 
> >>>>> incorporate in the next patch --Noble
> >>>>>
> >>>>> On Sat, May 31, 2008 at 12:24 AM, Julio Castillo 
> >>>>> <[EMAIL PROTECTED]> wrote:
> >>>>>> I'm sorry Shalin, but I still get the same Null Pointer exception.
> >>>>>> This is my complete dataconfig.xml (I remove the parallel 
> >>>>>> entity to narrow down the scope of the problem).
> >>>>>> <dataConfig>
> >>>>>>  <document name="doc-1">
> >>>>>>    <entity name="vets" pk="id"
> >>>>>>            query="select id as idAlias,first_name,last_name 
> >>>>>> FROM
> vets"
> >>>>>>            deltaQuery="SELECT id as idAlias FROM vets WHERE 
> >>>>>> last_modified > '${dataimporter.last_index_time}'"
> >>>>>>            transformer="TemplateTransformer">
> >>>>>>        <field column="id"              name="id"
> >>>>>> template="vets-${vets.idAlias}"/>
> >>>>>>        <field column="first_name"      name="userName"/>
> >>>>>>        <field column="last_name"       name="userName"/>
> >>>>>>    </entity>
> >>>>>>  </document>
> >>>>>> </dataConfig>
> >>>>>>
> >>>>>> Thanks again.
> >>>>>>
> >>>>>> ** julio
> >>>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED]
> >>>>>> Sent: Friday, May 30, 2008 11:38 AM
> >>>>>> To: solr-user@lucene.apache.org
> >>>>>> Subject: Re: How to describe 2 entities in dataConfig for the
> >>>> DataImporter?
> >>>>>>
> >>>>>> The surname is used just as an example of a field.
> >>>>>>
> >>>>>> The NullPointerException is because the same field "id" tries 
> >>>>>> to use it's own value in a template. The template cannot 
> >>>>>> contain the same field on which it is being applied. I'd 
> >>>>>> suggest that you get the id aliased to another name, for 
> >>>>>> example using a query "select id as idAlias from vets" and then
> >>>>>> use:
> >>>>>> <field column="id" template="vets-${vets.idAlias}" />
> >>>>>>
> >>>>>> That should work, let me know if you face a problem.
> >>>>>>
> >>>>>> On Fri, May 30, 2008 at 10:40 PM, Julio Castillo 
> >>>>>> <[EMAIL PROTECTED]>
> >>>>>> wrote:
> >>>>>>> Thanks for all the leads.
> >>>>>>> I did get however a null pointer exception while implementing it:
> >>>>>>>
> >>>>>>> May 30, 2008 9:57:50 AM
> >>>>>>> org.apache.solr.handler.dataimport.EntityProcessorBase
> >> applyTransformer
> >>>>>>> WARNING: transformer threw error java.lang.NullPointerException
> >>>>>>>   at
> >>>>>>>
> >> org.apache.solr.handler.dataimport.TemplateTransformer.transformRow
> >> (T
> >> emplate
> >> Transformer.java:55)
> >>>>>>>   at
> >>>>>>>
> >> org.apache.solr.handler.dataimport.EntityProcessorBase.applyTransfo
> >> rm
> >> er(Enti
> >> tyProcessorBase.java:186)
> >>>>>>>
> >>>>>>> Looking at the source code, it appears that the resolverMap is
> null.
> >>>>>>> The resolver returned null given the entityName.
> >>>>>>>
> >>>>>>>
> >>>>>>> ** julio
> >>>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: Noble Paul ??????? ?????? [mailto:[EMAIL PROTECTED]
> >>>>>>> Sent: Thursday, May 29, 2008 11:10 PM
> >>>>>>> To: solr-user@lucene.apache.org
> >>>>>>> Subject: Re: How to describe 2 entities in dataConfig for the
> >>>>>> DataImporter?
> >>>>>>>
> >>>>>>> Sorry I forgot to mention that.
> >>>>>>> http://wiki.apache.org/solr/DataImportHandler#head-a6916b30b5d
> >>>>>>> 76
> >>>>>>> 05
> >>>>>>> a9
> >>>>>>> 90
> >>>>>>> fb03c4
> >>>>>>> ff461b3736496a9
> >>>>>>> --Noble
> >>>>>>>
> >>>>>>> On Fri, May 30, 2008 at 11:37 AM, Shalin Shekhar Mangar 
> >>>>>>> <[EMAIL PROTECTED]> wrote:
> >>>>>>>> You need to enable TemplateTransformer for your entity. For
> example:
> >>>>>>>> <entity name="owners" pk="id" query="...."
> >>>>>>>> transformer="TemplateTransformer">
> >>>>>>>>
> >>>>>>>> On Fri, May 30, 2008 at 11:31 AM, Julio Castillo 
> >>>>>>>> <[EMAIL PROTECTED]> wrote:
> >>>>>>>>> Noble,
> >>>>>>>>> I tried the template setting for the "id" field, but I 
> >>>>>>>>> didn't notice any different behavior. I also didn't see 
> >>>>>>>>> where this would be
> >>>> reflected.
> >>>>>>>>> I looked at the fields and the debug output for the 
> >>>>>>>>> dataImporter and couldn't see any reference to a modified id 
> >>>>>>>>> name (per the template instructions).
> >>>>>>>>>
> >>>>>>>>> The behavior in the end seemed to be the same. Did I miss
> anything?
> >>>>>>>>> I assume that the <uniqueKey>id</uniqueKey> setting in the 
> >>>>>>>>> schema.xml remains the same?
> >>>>>>>>>
> >>>>>>>>> Thanks again
> >>>>>>>>>
> >>>>>>>>> ** julio
> >>>>>>>>>
> >>>>>>>>> -----Original Message-----
> >>>>>>>>> From: Noble Paul ??????? ?????? 
> >>>>>>>>> [mailto:[EMAIL PROTECTED]
> >>>>>>>>> Sent: Thursday, May 29, 2008 9:46 PM
> >>>>>>>>> To: solr-user@lucene.apache.org
> >>>>>>>>> Subject: Re: How to describe 2 entities in dataConfig for 
> >>>>>>>>> the
> >>>>>>> DataImporter?
> >>>>>>>>>
> >>>>>>>>> Consider constructing the id concatenating an extra string 
> >>>>>>>>> for each document . You can construct that field using the
> >>>> TeplateTransformer.
> >>>>>>>>> in the entity owners keep the id as
> >>>>>>>>>
> >>>>>>>>> <field column="id" name="id" 
> >>>>>>>>> template="owners-${owners.id}"/> and in vets <field column="id"
name="id"
> >>>>>>>>> template="vets-${vets.id}"/>
> >>>>>>>>>
> >>>>>>>>> or anything else which can make it unique
> >>>>>>>>>
> >>>>>>>>> --Noble
> >>>>>>>>>
> >>>>>>>>> On Fri, May 30, 2008 at 10:05 AM, Shalin Shekhar Mangar 
> >>>>>>>>> <[EMAIL PROTECTED]> wrote:
> >>>>>>>>>> That will happen only if id is the uniqueKey in Solr and 
> >>>>>>>>>> the id coming from both your tables have same values. In 
> >>>>>>>>>> that case, they will overwrite each other. You will need a 
> >>>>>>>>>> separate uniqueKey (on other than id field).
> >>
> >>
> >
> >
> >
> > --
> > --Noble Paul
> >
>
>
>
> --
> --Noble Paul
>
>


--
Regards,
Shalin Shekhar Mangar.

Reply via email to