Hi,

i may have been a little to fast with my response. 

After reading a bit more I imagine you meant running the full-import with the 
entity param for the root entity for full import. And running the delta import 
with the entity param for the delta entity. Is that correct?

Regards

Constantin


-----Ursprüngliche Nachricht-----
Von: Constantin Wolber [mailto:constantin.wol...@medicalcolumbus.de] 
Gesendet: Donnerstag, 20. Juni 2013 16:42
An: solr-user@lucene.apache.org
Betreff: AW: DataImportHandler: Problems with delta-import and 
CachedSqlEntityProcessor

Hi,

and thanks for the answer. But I'm a little bit confused about what you are 
suggesting. 
I did not really use the rootEntity attribute before. But from what I read in 
the documentation as far as I can tell that would result in two documents 
(maybe with the same id which would probably result in only one document being 
stored) because one for each root entity.

It would be great if you could just sketch the setup with the entities I 
provided. Because currently I have no idea on how to do it. 

Regards

Constantin


-----Ursprüngliche Nachricht-----
Von: Noble Paul നോബിള്‍ नोब्ळ् [mailto:noble.p...@gmail.com]
Gesendet: Donnerstag, 20. Juni 2013 15:42
An: solr-user@lucene.apache.org
Betreff: Re: DataImportHandler: Problems with delta-import and 
CachedSqlEntityProcessor

it is possible to create two separate root entities . one for full-import and 
another for delta. for the delta-import you can skip Cache that way



On Thu, Jun 20, 2013 at 1:50 PM, Constantin Wolber < 
constantin.wol...@medicalcolumbus.de> wrote:

> Hi,
>
> i searched for a solution for quite some time but did not manage to 
> find some real hints on how to fix it.
>
>
> I'm using solr 4.3.0 1477023 - simonw - 2013-04-29 15:10:12 running in 
> a tomcat 6 container.
>
> My data import setup is basically the following:
>
> Data-config.xml:
>
> <entity
>         name="article"
>         dataSource="ds1"
>         query="SELECT * FROM article"
>         deltaQuery="SELECT myownid FROM articleHistory WHERE 
> modified_date &gt; '${dih.last_index_time}
>         deltaImportQuery="SELECT * FROM article WHERE 
> myownid=${dih.delta.myownid}"
>         pk="myownid">
>         <field column="myownid" name="id"/>
>
>         <entity
>                 name="supplier"
>                 dataSource="ds2"
>                 query="SELECT * FROM supplier WHERE status=1"
>                 processor="CachedSqlEntityProcessor"
>                 cacheKey="SUPPLIER_ID"
>                 cacheLookup="article.ARTICLE_SUPPLIER_ID">
>         </entity>
>
>         <entity
>                 name="attributes"
>                 dataSource="ds1"
>                 query="SELECT ARTICLE_ID,'Key:'+ATTRIBUTE_KEY+'
> Value:'+ATTRIBUTE_VALUE FROM attributes"
>                 cacheKey="ARTICLE_ID"
>                 cacheLookup="article.myownid"
>                 processor="CachedSqlEntityProcessor">
>         </entity>
> </entity>
>
>
> Ok now for the problem:
>
> At first I tried everything without the Cache. But the full-import 
> took a very long time. Because the attributes query is pretty slow 
> compared to the rest. As a result I got a processing speed of around 150 
> Documents/s.
> When switching everything to the CachedSqlEntityProcessor the full 
> import processed at the speed of 4000 Documents/s
>
> So full import is running quite fine. Now I wanted to use the delta 
> import. When running the delta import I was expecting the ramp up time 
> to be about the same as in full import since I need to load the whole 
> table supplier and attributes to the cache in the first step. But when 
> looking into the log file the weird thing is solr seems to refresh the 
> Cache for every single document that is processed. So currently my 
> delta-import is a lot slower than the full-import. I even tried to add 
> the deltaImportQuery parameter to the entity but it doesn't change the 
> behavior at all (of course I know it is not supposed to change anything in 
> the setup I run).
>
> The following solutions would be possible in my opinion:
>
> 1. Is there any way to tell the config to ignore the Cache when 
> running a delta import? That would help already because we are talking 
> about the maximum of 500 documents changed in 15 minutes compared to 
> over 5 million documents in total.
> 2. Get solr to not refresh the cash for every document.
>
> Best Regards
>
> Constantin Wolber
>
>


--
-----------------------------------------------------
Noble Paul

Reply via email to