CachedSQLentity processor is using unbounded hashmap -----------------------------------------------------
Key: SOLR-1867 URL: https://issues.apache.org/jira/browse/SOLR-1867 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 1.4 Reporter: barani I am using cachedSqlEntityprocessor in DIH to index the data. Please find a sample dataconfig structure, <entity x query="select * from x"> ---> object <entity y query="select * from y" processor="cachedSqlEntityprocessor" cachekey=y.id cachevalue=x.id> --> object properties For each and every object I would be retrieveing corresponding object properties (in my subqueries). I get in to OOM very often and I think thats a trade off if I use cachedSqlEntityprocessor. My assumption is that when I use cachedSqlEntityprocessor the indexing happens as follows, First entity x will get executed and the entire table gets stored in cache next entity y gets executed and entire table gets stored in cache Finally the compasion heppens through hash map . So always I need to have the memory allocated to SOLR JVM more than or equal to the data present in tables. One more issue is that even after SOLR completes indexing, the memory used previously is not getting released. I could still see the JVM consuming 1.5 GB after the indexing completes. I tried to use Java hotspot options but didnt see any differences.. GC is not getting invoked even after a long time when using CachedSQLentity processor Main issue seem to be the fact that the CachedSQLentity processor cache is an unbounded HashMap, with no option to bound it. Reference: http://n3.nabble.com/Need-info-on-CachedSQLentity-processor-tt698418.html#a698418 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.