[ 
https://issues.apache.org/jira/browse/SOLR-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854729#action_12854729
 ] 

Lance Norskog edited comment on SOLR-1867 at 4/8/10 1:30 AM:
-------------------------------------------------------------

Possibly it is from using ThreadLocal. All classes are in 
contrib/..../dataimport:

Context.java:

{noformat}
static final ThreadLocal<Context> CURRENT_CONTEXT = new ThreadLocal<Context>();
{noformat}

DocBuilder.buildDocument():

{noformat}
ContextImpl ctx = new ContextImpl(entity, vr, null,
    pk == null ? Context.FULL_DUMP : Context.DELTA_DUMP,
    session, parentCtx, this);
entityProcessor.init(ctx);
Context.CURRENT_CONTEXT.set(ctx);
{noformat}

If the CachedSqlEntityProcessor is saving rows in the Context, this may be the 
problem.

ThreadLocal is notorious for causing memory leaks because the thread gets 
reused in some way but the code forgets to null out the local object.

The DIH needs to do Context.CURRENT_CONTEXT.set(null) before the request 
returns, if the DIH index operation is synchronous. It probably should do this 
anyway for safety.

      was (Author: lancenorskog):
    Possibly it is from using ThreadLocal. All classes are in 
contrib/..../dataimport:

Context.java:
{{
static final ThreadLocal<Context> CURRENT_CONTEXT = new ThreadLocal<Context>();
}}
DocBuilder.buildDocument():
{{
    ContextImpl ctx = new ContextImpl(entity, vr, null,
            pk == null ? Context.FULL_DUMP : Context.DELTA_DUMP,
            session, parentCtx, this);
    entityProcessor.init(ctx);
    Context.CURRENT_CONTEXT.set(ctx);
}}

If the CachedSqlEntityProcessor is saving rows in the Context, this may be the 
problem.

ThreadLocal is notorious for causing memory leaks because the thread gets 
reused in some way but the local variable is not set to null.

The DIH needs to do Context.CURRENT_CONTEXT.set(null) before the request 
returns, if the DIH index operation is synchronous. It probably should do it 
anyway for safety.
  
> CachedSQLentity processor is using unbounded hashmap 
> -----------------------------------------------------
>
>                 Key: SOLR-1867
>                 URL: https://issues.apache.org/jira/browse/SOLR-1867
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.4
>            Reporter: barani
>
> I am using cachedSqlEntityprocessor in DIH to index the data. Please find a 
> sample dataconfig structure, 
> <entity x query="select * from x"> ---> object 
> <entity y query="select * from y" processor="cachedSqlEntityprocessor" 
> cachekey=y.id cachevalue=x.id> --> object properties 
> For each and every object I would be retrieveing corresponding object 
> properties (in my subqueries). 
> I get in to OOM very often and I think thats a trade off if I use 
> cachedSqlEntityprocessor. 
> My assumption is that when I use cachedSqlEntityprocessor the indexing 
> happens as follows, 
> First entity x will get executed and the entire table gets stored in cache 
> next entity y gets executed and entire table gets stored in cache 
> Finally the compasion heppens through hash map . 
> So always I need to have the memory allocated to SOLR JVM more than or equal 
> to the data present in tables.
> One more issue is that even after SOLR completes indexing, the memory used 
> previously is not getting released. I could still see the JVM consuming 1.5 
> GB after the indexing completes. I tried to use Java hotspot options but 
> didnt see any differences.. GC is not getting invoked even after a long time 
> when using CachedSQLentity processor
> Main issue seem to be the fact that  the CachedSQLentity processor cache is 
> an unbounded HashMap, with no option to bound it. 
> Reference: 
> http://n3.nabble.com/Need-info-on-CachedSQLentity-processor-tt698418.html#a698418

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to