Thanks for all the great work on Solr. We have used it for over a year and have been very satisfied with it.
However ,we have noticed that some of the recent changes have affected import caching in a not so good way. We are using Solr 4.2.0. We use full and delta imports. We only use a delta import query on the root entity (our object model does not safely support updates to the nested entities). Here is a snippet of the xml. <entity name="product" pk="ID" query="..." deltaImportQuery="..." deltaQuery="..." deletedPkQuery="..." > <field column="ID" name="id" /> <field column="NAME" name="name" /> ... <entity name="productSize" query="..." processor="CachedSqlEntityProcessor" cacheKey="PRODUCT_ID" cacheLookup="product.ID"> <entity name="productSizeAttributes" query="..." processor="CachedSqlEntityProcessor" cacheKey="SIZE_ID" cacheLookup="productSize.SIZE_ID" transformer="LogTransformer" logLevel="info" logTemplate="The size for product ${product.ID} is ${productSizeAttributes}"> <field column="SIZE_ID" name="size" /> <field column="SIZE_NAME" name="sizeName" /> <field column="SIZE_CODE" name="sizeCode"/> </entity> </entity> </entity> We have noticed that delta imports that used to take 30 seconds now run indefinitely and eventually cause an OutOfMemory condition on a huge multi GB Heap. Here is the Stack Trace. java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2882) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390) at java.lang.StringBuilder.append(StringBuilder.java:119) at java.lang.StringBuilder.append(StringBuilder.java:115) at java.util.AbstractCollection.toString(AbstractCollection.java:422) at java.lang.String.valueOf(String.java:2826) at java.lang.StringBuilder.append(StringBuilder.java:115) at org.apache.solr.common.SolrInputField.toString(SolrInputField.java:215) at java.lang.String.valueOf(String.java:2826) at java.lang.StringBuilder.append(StringBuilder.java:115) at java.util.AbstractCollection.toString(AbstractCollection.java:422) at java.lang.String.valueOf(String.java:2826) at java.lang.StringBuilder.append(StringBuilder.java:115) at org.apache.solr.common.SolrInputDocument.toString(SolrInputDocument.java:192) at java.lang.String.valueOf(String.java:2826) at java.lang.StringBuilder.append(StringBuilder.java:115) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:524) at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:404) at org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:353) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:219) at org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:451) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:489) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:468) DocBuilder.buildDocument line 354 in Solr 4.2.0: SolrException.log(LOG, "Exception while processing: " + epw.getEntity().getName() + " document : " + doc, e); The doc.toString is appending all SolrInputFields to the string. Why are the SolrInputFields so big? It is hard to say because the original exception is not logged. After debugging for a few days it appears that during a delta-import the cache is destroyed prematurely. DocBuilder.buildDocument is called for each row returned by the deltaQuery. In the finally block of buildDocument it calls destroy on all EntityProcessorWrapper's. This eventually calls destroy on EntityProcessorBase which after destroying the cacheSupport, sets cacheSupport to null. For all other buildDocument calls, EntityProcessorBase.init() is eventually executed. This looks at the isFirstInit flag (which is false) and skips re-initializing the cache (which likely should never have been destroyed except on the last row returned by the deltaQuery). Finally when the rows for the nested entities are fetched, it skips the cache behavior, re-executes the SQL and loads every single row form the nested entities as new fields in each document. Thus if a query returned 100000 productSize records every product after the first would end up with all 100000 productSizes attached to it. This behavior makes delta-imports unusable when caching is utilized in any release after this functionality was changed. We have also noticed that caching does not seem to be honored when the SQL statement contains resolvable tokens ${}. However, we can workaround those 2 queries by disabling caching. I cannot disable caching on the other 20 queries. Imports would take hours. Has anyone else seen this? Keith Naas