it is possible to create two separate root entities . one for full-import and another for delta. for the delta-import you can skip Cache that way
On Thu, Jun 20, 2013 at 1:50 PM, Constantin Wolber < constantin.wol...@medicalcolumbus.de> wrote: > Hi, > > i searched for a solution for quite some time but did not manage to find > some real hints on how to fix it. > > > I'm using solr 4.3.0 1477023 - simonw - 2013-04-29 15:10:12 running in a > tomcat 6 container. > > My data import setup is basically the following: > > Data-config.xml: > > <entity > name="article" > dataSource="ds1" > query="SELECT * FROM article" > deltaQuery="SELECT myownid FROM articleHistory WHERE modified_date > > '${dih.last_index_time} > deltaImportQuery="SELECT * FROM article WHERE > myownid=${dih.delta.myownid}" > pk="myownid"> > <field column="myownid" name="id"/> > > <entity > name="supplier" > dataSource="ds2" > query="SELECT * FROM supplier WHERE status=1" > processor="CachedSqlEntityProcessor" > cacheKey="SUPPLIER_ID" > cacheLookup="article.ARTICLE_SUPPLIER_ID"> > </entity> > > <entity > name="attributes" > dataSource="ds1" > query="SELECT ARTICLE_ID,'Key:'+ATTRIBUTE_KEY+' > Value:'+ATTRIBUTE_VALUE FROM attributes" > cacheKey="ARTICLE_ID" > cacheLookup="article.myownid" > processor="CachedSqlEntityProcessor"> > </entity> > </entity> > > > Ok now for the problem: > > At first I tried everything without the Cache. But the full-import took a > very long time. Because the attributes query is pretty slow compared to the > rest. As a result I got a processing speed of around 150 Documents/s. > When switching everything to the CachedSqlEntityProcessor the full import > processed at the speed of 4000 Documents/s > > So full import is running quite fine. Now I wanted to use the delta > import. When running the delta import I was expecting the ramp up time to > be about the same as in full import since I need to load the whole table > supplier and attributes to the cache in the first step. But when looking > into the log file the weird thing is solr seems to refresh the Cache for > every single document that is processed. So currently my delta-import is a > lot slower than the full-import. I even tried to add the deltaImportQuery > parameter to the entity but it doesn't change the behavior at all (of > course I know it is not supposed to change anything in the setup I run). > > The following solutions would be possible in my opinion: > > 1. Is there any way to tell the config to ignore the Cache when running a > delta import? That would help already because we are talking about the > maximum of 500 documents changed in 15 minutes compared to over 5 million > documents in total. > 2. Get solr to not refresh the cash for every document. > > Best Regards > > Constantin Wolber > > -- ----------------------------------------------------- Noble Paul