One thing I notice in your configuration...the child entity has this:

cacheLookup="ent1.uid"

but your parent entity doesn't have a "uid" field.  

Also, you have these 3 transformers:  
RegexTransformer,DateFormatTransformer,TemplateTransformer

but none of your columns seem to make use of these.  Are you sure you need them?

In any case I am suspicious there may still be bugs in 3.6.1 related to 
CachedSqlEntityProcessor, so if you are able to create a failing unit test and 
post it to JIRA that would be helpful.  If you need to, you can use the 3.5 DIH 
jar with Solr 3.6.1.  Also, I do not think the SOLR-3360 should affect you 
unless you're using the "threads" parameter.  Both SOLR-3360 & SOLR-3430 fixed 
bugs related to CachedSqlEntityProcessor that were introduced in 3.6.0 (from 
SOLR-3411 and SOLR-2482 respectively).

Finally, if you are at all able to test this on 4.0-beta, I would greatly 
appreciate it!  SOLR-3411/SOLR-3360 were never applied to version 4.0 because 
"threadS" support was removed entirely.  However, SOLR-2482/SOLR-3430 were 
applied to 4.0 also.  If we have any more SOLR-2482 bugs lingering in 4.0 these 
really need to be fixed so any testing help would be much appreciated.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: mechravi25 [mailto:mechrav...@yahoo.co.in] 
Sent: Tuesday, August 14, 2012 8:04 AM
To: solr-user@lucene.apache.org
Subject: Dataimport Handler in solr 3.6.1

I am indexing some data using dataimport handler files in solr 3.6.1. I using
a nested entity in my handler file. 
I noticed a scenario where-in instead of the records which is to be fetched
for a document, 
all the records present in the table are indexed.

Following is the ideal scenario how the data has to be indexed.
For a document A, I am trying to index the 2 values B,C as a multivalued
field

<id>A</id>
<related_id>
<str>B</str>
<str>C</str>
</related_id>

This is how the output should be. I have used the same DIH file for solr
1.4,3.5 versions 
and the data was indexed fine like the one mentioned above in both the
versions.

But in solr 3.6.1 version, data was indexed differently. In my table, there
are 4 values(B,C,D,E) in related_id field.
This is how the data is indexed in 3.6.1

<id>A</id>
<related_id>
<str>B</str>
<str>C</str>
<str>D</str>
<str>E</str>
</related_id>

Ideally, the values D and E should not get indexed under id "A". This is the
same for the other id records.


Following is the content of the DIH file



         <entity name="ent1"  query="select sid as id Table1 a "
transformer="RegexTransformer,DateFormatTransformer,TemplateTransformer">
                        
                        <field column="id" name="id" boost="0.5"/>
          

                <entity name="ent2" query="select id1,rid from Table2 "
processor="CachedSqlEntityProcessor" cacheKey="id1" cacheLookup="ent1.uid"
transformer="RegexTransformer,DateFormatTransformer,TemplateTransformer">

                    
                        <field column="rid" name="related_id"/>
                       

                </entity>        
                

        </entity>
        
        
        
 I tried changing the CachedSqlEntityProcessor to SqlEntityProcessor and
then indexed the same but still I faced the same issue.
 
 When I googled a bit, I found this url
https://issues.apache.org/jira/browse/SOLR-3360


I am not sure if the issue 3360 is the same as the scenario as I have
mentioned above.

Please guid me.

Thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Dataimport-Handler-in-solr-3-6-1-tp4001149.html
Sent from the Solr - User mailing list archive at Nabble.com.


Reply via email to