Re: Solr 8.7.0 memory leak?

2021-01-29 Thread Chris Hostetter


: there are not many OOM stack details printed in the solr log file, it's
: just saying No enough memory, and it's killed by oom.sh(solr's script).

not many isn't the same as none ... can you tell us *ANYTHING* about what 
the logs look like? ... as i said: it's not just the details of the OOM 
that would be helpful: any details about what the solr logs say solr is 
doing while the memory is growing (before the OOM) would also be helpful.

: My question(issue) is not it's OOM or not, the issue is why JVM memory
: usage keeps growing up but never going down, it's not how java programs
: work. the normal java process can use a lot of memory, but it will throw
: away after using it instead of keep it in the memory with reference.

you're absolutely right -- that's how a java program should be have, and 
that's what i'm seeing when I try to repoduce what you're describing with 
solr 8.7.0 by running a few nodes, creating a collection and waiting.

In other words: i can't reproduce what you are seing based on the 
information you've provided -- so the only thing i can do is to ask you 
for more information: what you see in the logs, what your configs are, the 
exact steps you take to trigger this situation, etc...

Please help us help you so we can figure out what is causing the 
behavior you are seeing and try to fix it

: > Knowing exactly what your config looks like would help, knowing exactly
: > what you do before you see the OOM would help (are you realy just creating
: > the collections, or is it actauly neccessary to index some docs into those
: > collections before you see this problem start to happen? what do the logs
: > say during the time when the heap usage is just growing w/o explanation?
: > what is the stack trace of the OOM? what does a heap abalysis show in
: > terms of large/leaked objects? etc.
: >
: > You have to help us understand the minimally viable steps we need
: > to execute to see the behavior you see
: >
: > https://cwiki.apache.org/confluence/display/SOLR/UsingMailingLists


-Hoss
http://www.lucidworks.com/


Re: Solr 8.7.0 memory leak?

2021-01-28 Thread Luke
Thanks Hoss and Shawn for helping.

there are not many OOM stack details printed in the solr log file, it's
just saying No enough memory, and it's killed by oom.sh(solr's script).

My question(issue) is not it's OOM or not, the issue is why JVM memory
usage keeps growing up but never going down, it's not how java programs
work. the normal java process can use a lot of memory, but it will throw
away after using it instead of keep it in the memory with reference.

After trying solr 8.7.0 in one week, I go back to solr 8.6.2(config, plugin
and 3th party libraries are all the same, Xmx=6G), now I can see JVM memory
usage up and down. I can see it goes up when I am creating collections, but
it will go down once the collection is created completely.

I think I should stick with 8.6.2 until I can find a proper config or
stable version.

thanks again

Derrick

On Thu, Jan 28, 2021 at 11:43 PM Chris Hostetter 
wrote:

>
> : Is the matter to use the config file ? I am using custom config instead
> : of _default, my config is from solr 8.6.2 with custom solrconfig.xml
>
> Well, it depends on what's *IN* the custom config ... maybe you are using
> some built in functionality that has a bug but didn't get triggered by my
> simple test case -- or maybe you have custom components that have memory
> leaks.
>
> The point of the question was to try and understand where/how you are
> running into an OOM i can't reproduce.
>
> Knowing exactly what your config looks like would help, knowing exactly
> what you do before you see the OOM would help (are you realy just creating
> the collections, or is it actauly neccessary to index some docs into those
> collections before you see this problem start to happen? what do the logs
> say during the time when the heap usage is just growing w/o explanation?
> what is the stack trace of the OOM? what does a heap abalysis show in
> terms of large/leaked objects? etc.
>
> You have to help us understand the minimally viable steps we need
> to execute to see the behavior you see
>
> https://cwiki.apache.org/confluence/display/SOLR/UsingMailingLists
>
> -Hoss
> http://www.lucidworks.com/
>
>


Re: Solr 8.7.0 memory leak?

2021-01-28 Thread Chris Hostetter


: Is the matter to use the config file ? I am using custom config instead 
: of _default, my config is from solr 8.6.2 with custom solrconfig.xml

Well, it depends on what's *IN* the custom config ... maybe you are using 
some built in functionality that has a bug but didn't get triggered by my 
simple test case -- or maybe you have custom components that have memory 
leaks.

The point of the question was to try and understand where/how you are 
running into an OOM i can't reproduce.

Knowing exactly what your config looks like would help, knowing exactly 
what you do before you see the OOM would help (are you realy just creating 
the collections, or is it actauly neccessary to index some docs into those 
collections before you see this problem start to happen? what do the logs 
say during the time when the heap usage is just growing w/o explanation? 
what is the stack trace of the OOM? what does a heap abalysis show in 
terms of large/leaked objects? etc.

You have to help us understand the minimally viable steps we need 
to execute to see the behavior you see

https://cwiki.apache.org/confluence/display/SOLR/UsingMailingLists

-Hoss
http://www.lucidworks.com/



Re: Solr 8.7.0 memory leak?

2021-01-28 Thread Shawn Heisey

On 1/27/2021 9:00 PM, Luke wrote:

it's killed by OOME exception. The problem is that I just created empty
collections and the Solr JVM keeps growing and never goes down. there is no
data at all. at the beginning, I set Xxm=6G, then 10G, now 15G, Solr 8.7
always use all of them and it will be killed by oom.sh once jvm usage
reachs 100%.


We are stuck until we know what resource is running out and causing the 
OOME.  To know that we will need to see the actual exception.


Thanks,
Shawn


Re: Solr 8.7.0 memory leak?

2021-01-28 Thread Luke Oak
Thanks Chris,  

Is the matter to use the config file ? I am using custom config instead of 
_default, my config is from solr 8.6.2 with custom solrconfig.xml

Derrick

Sent from my iPhone

> On Jan 28, 2021, at 2:48 PM, Chris Hostetter  wrote:
> 
> 
> FWIW, I just tried using 8.7.0 to run:
>bin/solr -m 200m -e cloud -noprompt
> 
> And then setup the following bash one liner to poll the heap metrics...
> 
> while : ; do date; echo "node 8989" && (curl -sS 
> http://localhost:8983/solr/admin/metrics | grep memory.heap); echo "node 
> 7574" && (curl -sS http://localhost:8983/solr/admin/metrics | grep 
> memory.heap) ; sleep 30; done
> 
> ...what i saw was about what i expected ... heap usage slowly grew on both 
> nodes as bits of garbage were generated (as expected cosidering the 
> metrics requests, let alone typical backgroup threads) until eventually it 
> garbage collected back down to low usage w/o ever encountering an OOM or 
> crash...
> 
> 
> Thu Jan 28 12:38:47 MST 2021
> node 8989
>  "memory.heap.committed":209715200,
>  "memory.heap.init":209715200,
>  "memory.heap.max":209715200,
>  "memory.heap.usage":0.7613688659667969,
>  "memory.heap.used":159670624,
> node 7574
>  "memory.heap.committed":209715200,
>  "memory.heap.init":209715200,
>  "memory.heap.max":209715200,
>  "memory.heap.usage":0.7713688659667969,
>  "memory.heap.used":161767776,
> Thu Jan 28 12:39:17 MST 2021
> node 8989
>  "memory.heap.committed":209715200,
>  "memory.heap.init":209715200,
>  "memory.heap.max":209715200,
>  "memory.heap.usage":0.7813688659667969,
>  "memory.heap.used":163864928,
> node 7574
>  "memory.heap.committed":209715200,
>  "memory.heap.init":209715200,
>  "memory.heap.max":209715200,
>  "memory.heap.usage":0.7913688659667969,
>  "memory.heap.used":165962080,
> Thu Jan 28 12:39:47 MST 2021
> node 8989
>  "memory.heap.committed":209715200,
>  "memory.heap.init":209715200,
>  "memory.heap.max":209715200,
>  "memory.heap.usage":0.8063688659667969,
>  "memory.heap.used":169107808,
> node 7574
>  "memory.heap.committed":209715200,
>  "memory.heap.init":209715200,
>  "memory.heap.max":209715200,
>  "memory.heap.usage":0.8113688659667969,
>  "memory.heap.used":170156384,
> Thu Jan 28 12:40:17 MST 2021
> node 8989
>  "memory.heap.committed":209715200,
>  "memory.heap.init":209715200,
>  "memory.heap.max":209715200,
>  "memory.heap.usage":0.3428504943847656,
>  "memory.heap.used":71900960,
> node 7574
>  "memory.heap.committed":209715200,
>  "memory.heap.init":209715200,
>  "memory.heap.max":209715200,
>  "memory.heap.usage":0.3528504943847656,
>  "memory.heap.used":73998112,
> 
> 
> 
> 
> 
> 
> -Hoss
> http://www.lucidworks.com/


Re: Solr 8.7.0 memory leak?

2021-01-28 Thread Chris Hostetter


FWIW, I just tried using 8.7.0 to run:
bin/solr -m 200m -e cloud -noprompt

And then setup the following bash one liner to poll the heap metrics...

while : ; do date; echo "node 8989" && (curl -sS 
http://localhost:8983/solr/admin/metrics | grep memory.heap); echo "node 7574" 
&& (curl -sS http://localhost:8983/solr/admin/metrics | grep memory.heap) ; 
sleep 30; done

...what i saw was about what i expected ... heap usage slowly grew on both 
nodes as bits of garbage were generated (as expected cosidering the 
metrics requests, let alone typical backgroup threads) until eventually it 
garbage collected back down to low usage w/o ever encountering an OOM or 
crash...


Thu Jan 28 12:38:47 MST 2021
node 8989
  "memory.heap.committed":209715200,
  "memory.heap.init":209715200,
  "memory.heap.max":209715200,
  "memory.heap.usage":0.7613688659667969,
  "memory.heap.used":159670624,
node 7574
  "memory.heap.committed":209715200,
  "memory.heap.init":209715200,
  "memory.heap.max":209715200,
  "memory.heap.usage":0.7713688659667969,
  "memory.heap.used":161767776,
Thu Jan 28 12:39:17 MST 2021
node 8989
  "memory.heap.committed":209715200,
  "memory.heap.init":209715200,
  "memory.heap.max":209715200,
  "memory.heap.usage":0.7813688659667969,
  "memory.heap.used":163864928,
node 7574
  "memory.heap.committed":209715200,
  "memory.heap.init":209715200,
  "memory.heap.max":209715200,
  "memory.heap.usage":0.7913688659667969,
  "memory.heap.used":165962080,
Thu Jan 28 12:39:47 MST 2021
node 8989
  "memory.heap.committed":209715200,
  "memory.heap.init":209715200,
  "memory.heap.max":209715200,
  "memory.heap.usage":0.8063688659667969,
  "memory.heap.used":169107808,
node 7574
  "memory.heap.committed":209715200,
  "memory.heap.init":209715200,
  "memory.heap.max":209715200,
  "memory.heap.usage":0.8113688659667969,
  "memory.heap.used":170156384,
Thu Jan 28 12:40:17 MST 2021
node 8989
  "memory.heap.committed":209715200,
  "memory.heap.init":209715200,
  "memory.heap.max":209715200,
  "memory.heap.usage":0.3428504943847656,
  "memory.heap.used":71900960,
node 7574
  "memory.heap.committed":209715200,
  "memory.heap.init":209715200,
  "memory.heap.max":209715200,
  "memory.heap.usage":0.3528504943847656,
  "memory.heap.used":73998112,






-Hoss
http://www.lucidworks.com/


Re: Solr 8.7.0 memory leak?

2021-01-28 Thread Chris Hostetter


: Hi, I am using solr 8.7.0, centos 7, java 8.
: 
: I just created a few collections and no data, memory keeps growing but 
: never go down, until I got OOM and solr is killed

Are you usinga custom config set, or just the _default configs?

if you start up this single node with something like -Xmx5g and create 
5 collections and do nothing else, how long does it take you to see the 
OOM?



-Hoss
http://www.lucidworks.com/


Re: Solr 8.7.0 memory leak?

2021-01-28 Thread Luke
and here is GC log when I create collection(just create collection, nothing
else)

{Heap before GC invocations=1530 (full 412):
 garbage-first heap   total 10485760K, used 10483431K [0x00054000,
0x000540405000, 0x0007c000)
  region size 4096K, 0 young (0K), 0 survivors (0K)
 Metaspace   used 70694K, capacity 75070K, committed 75260K, reserved
1116160K
  class spaceused 7674K, capacity 8836K, committed 8956K, reserved
1048576K
2021-01-28T21:24:18.396+0800: 34029.526: [GC pause (G1 Evacuation Pause)
(young)
Desired survivor size 33554432 bytes, new threshold 15 (max 15)
, 0.0034128 secs]
   [Parallel Time: 2.2 ms, GC Workers: 4]
  [GC Worker Start (ms): Min: 34029525.7, Avg: 34029526.1, Max:
34029527.3, Diff: 1.6]
  [Ext Root Scanning (ms): Min: 0.0, Avg: 1.0, Max: 1.4, Diff: 1.4,
Sum: 4.1]
  [Update RS (ms): Min: 0.3, Avg: 0.6, Max: 0.7, Diff: 0.4, Sum: 2.2]
 [Processed Buffers: Min: 2, Avg: 2.8, Max: 4, Diff: 2, Sum: 11]
  [Scan RS (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
  [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0,
Sum: 0.0]
  [Object Copy (ms): Min: 0.0, Avg: 0.1, Max: 0.2, Diff: 0.2, Sum: 0.2]
  [Termination (ms): Min: 0.0, Avg: 0.1, Max: 0.3, Diff: 0.3, Sum: 0.6]
 [Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 4]
  [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum:
0.0]
  [GC Worker Total (ms): Min: 0.6, Avg: 1.8, Max: 2.2, Diff: 1.6, Sum:
7.2]
  [GC Worker End (ms): Min: 34029527.9, Avg: 34029527.9, Max:
34029527.9, Diff: 0.0]
   [Code Root Fixup: 0.0 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 0.0 ms]
   [Other: 1.2 ms]
  [Choose CSet: 0.0 ms]
  [Ref Proc: 0.9 ms]
  [Ref Enq: 0.0 ms]
  [Redirty Cards: 0.0 ms]
  [Humongous Register: 0.1 ms]
  [Humongous Reclaim: 0.0 ms]
  [Free CSet: 0.0 ms]
   [Eden: 0.0B(512.0M)->0.0B(512.0M) Survivors: 0.0B->0.0B Heap:
10237.7M(10240.0M)->10237.7M(10240.0M)]
Heap after GC invocations=1531 (full 412):
 garbage-first heap   total 10485760K, used 10483431K [0x00054000,
0x000540405000, 0x0007c000)
  region size 4096K, 0 young (0K), 0 survivors (0K)
 Metaspace   used 70694K, capacity 75070K, committed 75260K, reserved
1116160K
  class spaceused 7674K, capacity 8836K, committed 8956K, reserved
1048576K
}
 [Times: user=0.01 sys=0.00, real=0.01 secs]
2021-01-28T21:24:18.400+0800: 34029.529: Total time for which application
threads were stopped: 0.0044183 seconds, Stopping threads took: 0.500
seconds
{Heap before GC invocations=1531 (full 412):

On Thu, Jan 28, 2021 at 1:23 PM Luke  wrote:

> Mike,
>
> No, it's not docker. it is just one solr node(service) which connects to
> external zookeeper, the below is a JVM setting and memory usage.
>
> There are 25  collections which have a few 2000 documents totally. I am
> wondering why solr uses so much memory.
>
> -XX:+AlwaysPreTouch-XX:+ExplicitGCInvokesConcurrent
> -XX:+ParallelRefProcEnabled-XX:+PerfDisableSharedMem
> -XX:+PrintGCApplicationStoppedTime-XX:+PrintGCDateStamps
> -XX:+PrintGCDetails-XX:+PrintGCTimeStamps-XX:+PrintHeapAtGC
> -XX:+PrintTenuringDistribution-XX:+UseG1GC-XX:+UseGCLogFileRotation
> -XX:+UseLargePages-XX:-OmitStackTraceInFastThrow-XX:GCLogFileSize=20M
> -XX:MaxGCPauseMillis=250-XX:NumberOfGCLogFiles=9-XX:OnOutOfMemoryError=/mnt/ume/software/solr-8.7.0-3/bin/oom_solr.sh
> 8983 /mnt/ume/logs/solr2-Xloggc:/mnt/ume/logs/solr2/solr_gc.log-Xms6g
> -Xmx10g-Xss256k-verbose:gc
> [image: image.png]
>
> On Thu, Jan 28, 2021 at 4:40 AM Mike Drob  wrote:
>
>> Are you running these in docker containers?
>>
>> Also, I’m assuming this is a typo but just in case the setting is Xmx :)
>>
>> Can you share the OOM stack trace? It’s not always running out of memory,
>> sometimes Java throws OOM for file handles or threads.
>>
>> Mike
>>
>> On Wed, Jan 27, 2021 at 10:00 PM Luke  wrote:
>>
>> > Shawn,
>> >
>> > it's killed by OOME exception. The problem is that I just created empty
>> > collections and the Solr JVM keeps growing and never goes down. there
>> is no
>> > data at all. at the beginning, I set Xxm=6G, then 10G, now 15G, Solr 8.7
>> > always use all of them and it will be killed by oom.sh once jvm usage
>> > reachs 100%.
>> >
>> > I have another solr 8.6.2 cloud(3 nodes) in separated environment ,
>> which
>> > have over 100 collections, the Xxm = 6G , jvm is always 4-5G.
>> >
>> >
>> >
>> > On Thu, Jan 28, 2021 at 2:56 AM Shawn Heisey 
>> wrote:
>> >
>> > > On 1/27/2021 5:08 PM, Luke Oak wrote:
>> > > > I just created a few collections and no data, memory keeps growing
>> but
>> > > never go down, until I got OOM and solr is killed
>> > > >
>> > > > Any reason?
>> > >
>> > > Was Solr killed by the operating system's oom killer or did the death
>> > > start with a Java OutOfMemoryError exception?
>> > >
>> > > If it was the OS, then the entire system doesn't have enough memory
>> for
>> > > the 

Re: Solr 8.7.0 memory leak?

2021-01-28 Thread Luke
Mike,

No, it's not docker. it is just one solr node(service) which connects to
external zookeeper, the below is a JVM setting and memory usage.

There are 25  collections which have a few 2000 documents totally. I am
wondering why solr uses so much memory.

-XX:+AlwaysPreTouch-XX:+ExplicitGCInvokesConcurrent
-XX:+ParallelRefProcEnabled-XX:+PerfDisableSharedMem
-XX:+PrintGCApplicationStoppedTime-XX:+PrintGCDateStamps-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps-XX:+PrintHeapAtGC-XX:+PrintTenuringDistribution
-XX:+UseG1GC-XX:+UseGCLogFileRotation-XX:+UseLargePages
-XX:-OmitStackTraceInFastThrow-XX:GCLogFileSize=20M-XX:MaxGCPauseMillis=250
-XX:NumberOfGCLogFiles=9-XX:OnOutOfMemoryError=/mnt/ume/software/solr-8.7.0-3/bin/oom_solr.sh
8983 /mnt/ume/logs/solr2-Xloggc:/mnt/ume/logs/solr2/solr_gc.log-Xms6g-Xmx10g
-Xss256k-verbose:gc
[image: image.png]

On Thu, Jan 28, 2021 at 4:40 AM Mike Drob  wrote:

> Are you running these in docker containers?
>
> Also, I’m assuming this is a typo but just in case the setting is Xmx :)
>
> Can you share the OOM stack trace? It’s not always running out of memory,
> sometimes Java throws OOM for file handles or threads.
>
> Mike
>
> On Wed, Jan 27, 2021 at 10:00 PM Luke  wrote:
>
> > Shawn,
> >
> > it's killed by OOME exception. The problem is that I just created empty
> > collections and the Solr JVM keeps growing and never goes down. there is
> no
> > data at all. at the beginning, I set Xxm=6G, then 10G, now 15G, Solr 8.7
> > always use all of them and it will be killed by oom.sh once jvm usage
> > reachs 100%.
> >
> > I have another solr 8.6.2 cloud(3 nodes) in separated environment , which
> > have over 100 collections, the Xxm = 6G , jvm is always 4-5G.
> >
> >
> >
> > On Thu, Jan 28, 2021 at 2:56 AM Shawn Heisey 
> wrote:
> >
> > > On 1/27/2021 5:08 PM, Luke Oak wrote:
> > > > I just created a few collections and no data, memory keeps growing
> but
> > > never go down, until I got OOM and solr is killed
> > > >
> > > > Any reason?
> > >
> > > Was Solr killed by the operating system's oom killer or did the death
> > > start with a Java OutOfMemoryError exception?
> > >
> > > If it was the OS, then the entire system doesn't have enough memory for
> > > the demands that are made on it.  The problem might be Solr, or it
> might
> > > be something else.  You will need to either reduce the amount of memory
> > > used or increase the memory in the system.
> > >
> > > If it was a Java OOME exception that led to Solr being killed, then
> some
> > > resource (could be heap memory, but isn't always) will be too small and
> > > will need to be increased.  To figure out what resource, you need to
> see
> > > the exception text.  Such exceptions are not always recorded -- it may
> > > occur in a section of code that has no logging.
> > >
> > > Thanks,
> > > Shawn
> > >
> >
>


Re: Solr 8.7.0 memory leak?

2021-01-27 Thread Mike Drob
Are you running these in docker containers?

Also, I’m assuming this is a typo but just in case the setting is Xmx :)

Can you share the OOM stack trace? It’s not always running out of memory,
sometimes Java throws OOM for file handles or threads.

Mike

On Wed, Jan 27, 2021 at 10:00 PM Luke  wrote:

> Shawn,
>
> it's killed by OOME exception. The problem is that I just created empty
> collections and the Solr JVM keeps growing and never goes down. there is no
> data at all. at the beginning, I set Xxm=6G, then 10G, now 15G, Solr 8.7
> always use all of them and it will be killed by oom.sh once jvm usage
> reachs 100%.
>
> I have another solr 8.6.2 cloud(3 nodes) in separated environment , which
> have over 100 collections, the Xxm = 6G , jvm is always 4-5G.
>
>
>
> On Thu, Jan 28, 2021 at 2:56 AM Shawn Heisey  wrote:
>
> > On 1/27/2021 5:08 PM, Luke Oak wrote:
> > > I just created a few collections and no data, memory keeps growing but
> > never go down, until I got OOM and solr is killed
> > >
> > > Any reason?
> >
> > Was Solr killed by the operating system's oom killer or did the death
> > start with a Java OutOfMemoryError exception?
> >
> > If it was the OS, then the entire system doesn't have enough memory for
> > the demands that are made on it.  The problem might be Solr, or it might
> > be something else.  You will need to either reduce the amount of memory
> > used or increase the memory in the system.
> >
> > If it was a Java OOME exception that led to Solr being killed, then some
> > resource (could be heap memory, but isn't always) will be too small and
> > will need to be increased.  To figure out what resource, you need to see
> > the exception text.  Such exceptions are not always recorded -- it may
> > occur in a section of code that has no logging.
> >
> > Thanks,
> > Shawn
> >
>


Re: Solr 8.7.0 memory leak?

2021-01-27 Thread Luke
Shawn,

it's killed by OOME exception. The problem is that I just created empty
collections and the Solr JVM keeps growing and never goes down. there is no
data at all. at the beginning, I set Xxm=6G, then 10G, now 15G, Solr 8.7
always use all of them and it will be killed by oom.sh once jvm usage
reachs 100%.

I have another solr 8.6.2 cloud(3 nodes) in separated environment , which
have over 100 collections, the Xxm = 6G , jvm is always 4-5G.



On Thu, Jan 28, 2021 at 2:56 AM Shawn Heisey  wrote:

> On 1/27/2021 5:08 PM, Luke Oak wrote:
> > I just created a few collections and no data, memory keeps growing but
> never go down, until I got OOM and solr is killed
> >
> > Any reason?
>
> Was Solr killed by the operating system's oom killer or did the death
> start with a Java OutOfMemoryError exception?
>
> If it was the OS, then the entire system doesn't have enough memory for
> the demands that are made on it.  The problem might be Solr, or it might
> be something else.  You will need to either reduce the amount of memory
> used or increase the memory in the system.
>
> If it was a Java OOME exception that led to Solr being killed, then some
> resource (could be heap memory, but isn't always) will be too small and
> will need to be increased.  To figure out what resource, you need to see
> the exception text.  Such exceptions are not always recorded -- it may
> occur in a section of code that has no logging.
>
> Thanks,
> Shawn
>


Re: Solr 8.7.0 memory leak?

2021-01-27 Thread Shawn Heisey

On 1/27/2021 5:08 PM, Luke Oak wrote:

I just created a few collections and no data, memory keeps growing but never go 
down, until I got OOM and solr is killed

Any reason?


Was Solr killed by the operating system's oom killer or did the death 
start with a Java OutOfMemoryError exception?


If it was the OS, then the entire system doesn't have enough memory for 
the demands that are made on it.  The problem might be Solr, or it might 
be something else.  You will need to either reduce the amount of memory 
used or increase the memory in the system.


If it was a Java OOME exception that led to Solr being killed, then some 
resource (could be heap memory, but isn't always) will be too small and 
will need to be increased.  To figure out what resource, you need to see 
the exception text.  Such exceptions are not always recorded -- it may 
occur in a section of code that has no logging.


Thanks,
Shawn


Solr 8.7.0 memory leak?

2021-01-27 Thread Luke Oak
Hi, I am using solr 8.7.0, centos 7, java 8.

I just created a few collections and no data, memory keeps growing but never go 
down, until I got OOM and solr is killed 

Any reason?

Thanks

Sent from my iPhone

memory leak?

2019-07-17 Thread Oakley, Craig (NIH/NLM/NCBI) [C]
I am having problems with a SolrCloud where both nodes become unresponsive. I 
am wondering whether there is some sort of memory leak. Attached is a portion 
of the solr_gc.log from around the time that the problem starts. Have you any 
suggestions how to diagnose and address this issue?

Thanks
2019-07-17T04:24:35.810-0400: 65828.261: Total time for which application 
threads were stopped: 0.8963997 seconds, Stopping threads took: 0.8958437 
seconds
2019-07-17T04:24:42.059-0400: 65834.510: [GC (CMS Initial Mark) [1 
CMS-initial-mark: 20963525K(35389440K)] 24965285K(45219840K), 0.2193216 secs] 
[Times: user=0.61 sys=0.27, real=0.22 secs] 
2019-07-17T04:24:42.279-0400: 65834.730: Total time for which application 
threads were stopped: 3.4630816 seconds, Stopping threads took: 3.2432089 
seconds
2019-07-17T04:24:42.279-0400: 65834.730: [CMS-concurrent-mark-start]
2019-07-17T04:24:43.281-0400: 65835.732: Total time for which application 
threads were stopped: 0.0005248 seconds, Stopping threads took: 0.0001216 
seconds
2019-07-17T04:24:43.282-0400: 65835.733: [CMS-concurrent-mark: 1.002/1.003 
secs] [Times: user=5.18 sys=0.19, real=1.01 secs] 
2019-07-17T04:24:43.282-0400: 65835.733: [CMS-concurrent-preclean-start]
2019-07-17T04:24:43.377-0400: 65835.829: [CMS-concurrent-preclean: 0.096/0.096 
secs] [Times: user=0.14 sys=0.05, real=0.09 secs] 
2019-07-17T04:24:43.378-0400: 65835.829: 
[CMS-concurrent-abortable-preclean-start]
2019-07-17T04:24:44.282-0400: 65836.733: Total time for which application 
threads were stopped: 0.0005190 seconds, Stopping threads took: 0.0001509 
seconds
2019-07-17T04:24:46.282-0400: 65838.733: Total time for which application 
threads were stopped: 0.0005607 seconds, Stopping threads took: 0.0001641 
seconds
 CMS: abort preclean due to time 2019-07-17T04:24:49.495-0400: 65841.946: 
[CMS-concurrent-abortable-preclean: 5.343/6.117 secs] [Times: user=11.07 
sys=0.79, real=6.12 secs] 
2019-07-17T04:24:49.496-0400: 65841.947: [GC (CMS Final Remark) [YG occupancy: 
4287360 K (9830400 K)]{Heap before GC invocations=4 (full 3):
 par new generation   total 9830400K, used 4287360K [0x7fd06000, 
0x7fd33000, 0x7fd33000)
  eden space 7864320K,  51% used [0x7fd06000, 0x7fd1564c5bf8, 
0x7fd24000)
  from space 1966080K,  12% used [0x7fd24000, 0x7fd24f61a718, 
0x7fd2b800)
  to   space 1966080K,   0% used [0x7fd2b800, 0x7fd2b800, 
0x7fd33000)
 concurrent mark-sweep generation total 35389440K, used 20963525K 
[0x7fd33000, 0x7fdba000, 0x7fdba000)
 Metaspace   used 47785K, capacity 49718K, committed 49972K, reserved 51200K
2019-07-17T04:24:49.496-0400: 65841.947: [GC (CMS Final Remark) 
2019-07-17T04:24:49.496-0400: 65841.947: [ParNew
Desired survivor size 1811939328 bytes, new threshold 8 (max 8)
- age   1:  305474592 bytes,  305474592 total
- age   2:  101513064 bytes,  406987656 total
- age   3:  118054640 bytes,  525042296 total
- age   4:792 bytes,  525043088 total
- age   5:   12209536 bytes,  537252624 total
: 4287360K->543113K(9830400K), 0.7519541 secs] 25250886K->21506639K(45219840K), 
0.7521144 secs] [Times: user=2.62 sys=0.39, real=0.75 secs] 
Heap after GC invocations=5 (full 3):
 par new generation   total 9830400K, used 543113K [0x7fd06000, 
0x7fd33000, 0x7fd33000)
  eden space 7864320K,   0% used [0x7fd06000, 0x7fd06000, 
0x7fd24000)
  from space 1966080K,  27% used [0x7fd2b800, 0x7fd2d9262690, 
0x7fd33000)
  to   space 1966080K,   0% used [0x7fd24000, 0x7fd24000, 
0x7fd2b800)
 concurrent mark-sweep generation total 35389440K, used 20963525K 
[0x7fd33000, 0x7fdba000, 0x7fdba000)
 Metaspace   used 47785K, capacity 49718K, committed 49972K, reserved 51200K
}
2019-07-17T04:24:50.248-0400: 65842.699: [Rescan (parallel) , 0.1138935 
secs]2019-07-17T04:24:50.362-0400: 65842.813: [weak refs processing, 0.0007099 
secs]2019-07-17T04:24:50.363-0400: 65842.814: [class unloading, 0.0343465 
secs]2019-07-17T04:24:50.397-0400: 65842.848: [scrub symbol table, 0.0137971 
secs]2019-07-17T04:24:50.411-0400: 65842.862: [scrub string table, 0.0009367 
secs][1 CMS-remark: 20963525K(35389440K)] 21506639K(45219840K), 0.9220238 secs] 
[Times: user=3.09 sys=0.42, real=0.92 secs] 
2019-07-17T04:24:50.418-0400: 65842.869: Total time for which application 
threads were stopped: 0.9225790 seconds, Stopping threads took: 0.0001272 
seconds
2019-07-17T04:24:50.418-0400: 65842.869: [CMS-concurrent-sweep-start]
2019-07-17T04:24:50.418-0400: 65842.869: [CMS-concurrent-sweep: 0.000/0.000 
secs] [Times: user=0.01 sys=0.00, real=0.00 secs] 
2019-07-17T04:24:50.418-0400: 65842.869: [CMS-concurrent-reset-start]
2019-07-17T04:24:50.663-0400: 65843.114: [CMS-concurrent-reset: 0.245/0.245 
secs] [Times: user=0.33 sys=0.16, real=0.25 secs] 
2019-07-17T04:24:52.663-0400: 65845.11

Re: Memory Leak in 7.3 to 7.4

2018-08-06 Thread Tim Allison
+1 to Shawn's and Erick's points about isolating Tika in a separate jvm.

Y, please do let us know:  u...@tika.apache.org  We might be able to
help out, and you, in turn, can help the community figure out what's
going on; see e.g.: https://issues.apache.org/jira/browse/TIKA-2703
On Sun, Aug 5, 2018 at 1:22 PM Shawn Heisey  wrote:
>
> On 8/2/2018 5:30 AM, Thomas Scheffler wrote:
> > my final verdict is the upgrade to Tika 1.17. If I downgrade the libraries 
> > just for tika back to 1.16 and keep the rest of SOLR 7.4.0 the heap usage 
> > after about 85 % of the index process and manual trigger of the garbage 
> > collector is about 60-70 MB (That low!!!)
> >
> > My problem now is that we have several setups that triggers this reliably 
> > but there is no simple test case that „fails“ if Tika 1.17 or 1.18 is used. 
> > I also do not know if the error is inside Tika or inside the glue code that 
> > makes Tika usable in SOLR.
>
> If downgrading Tika fixes the issue, then it doesn't seem (to me) very
> likely that Solr's glue code for ERH has a problem. If it's not Solr's
> code that has the problem, there will be nothing we can do about it
> other than change the Tika library included with Solr.
>
> Before filing an issue, you should discuss this with the Tika project on
> their mailing list.  They'll want to make sure that they can fix the
> problem in a future version.  It might not be an actual memory leak ...
> it could just be that one of the documents you're trying to index is one
> that Tika requires a huge amount of memory to handle.  But it could be a
> memory leak.
>
> If you know which document is being worked on when it runs out of
> memory, can you try not including that document in your indexing, to see
> if it still has a problem?
>
> Please note that it is strongly recommended that you do not use the
> Extracting Request Handler in production.  Tika is prone to many
> problems, and those problems will generally affect Solr if Tika is being
> run inside Solr.  Because of this, it is recommended that you write a
> separate program using Tika that handles extracting information from
> documents and sending that data to Solr.  If that program crashes, Solr
> remains operational.
>
> There is already an issue to upgrade Tika to the latest version in Solr,
> but you've said that you tried 1.18 already with no change to the
> problem.  So whatever the problem is, it will need to be solved in 1.19
> or later.
>
> Thanks,
> Shawn
>


Re: Memory Leak in 7.3 to 7.4

2018-08-05 Thread Shawn Heisey

On 8/2/2018 5:30 AM, Thomas Scheffler wrote:

my final verdict is the upgrade to Tika 1.17. If I downgrade the libraries just 
for tika back to 1.16 and keep the rest of SOLR 7.4.0 the heap usage after 
about 85 % of the index process and manual trigger of the garbage collector is 
about 60-70 MB (That low!!!)

My problem now is that we have several setups that triggers this reliably but 
there is no simple test case that „fails“ if Tika 1.17 or 1.18 is used. I also 
do not know if the error is inside Tika or inside the glue code that makes Tika 
usable in SOLR.


If downgrading Tika fixes the issue, then it doesn't seem (to me) very 
likely that Solr's glue code for ERH has a problem. If it's not Solr's 
code that has the problem, there will be nothing we can do about it 
other than change the Tika library included with Solr.


Before filing an issue, you should discuss this with the Tika project on 
their mailing list.  They'll want to make sure that they can fix the 
problem in a future version.  It might not be an actual memory leak ... 
it could just be that one of the documents you're trying to index is one 
that Tika requires a huge amount of memory to handle.  But it could be a 
memory leak.


If you know which document is being worked on when it runs out of 
memory, can you try not including that document in your indexing, to see 
if it still has a problem?


Please note that it is strongly recommended that you do not use the 
Extracting Request Handler in production.  Tika is prone to many 
problems, and those problems will generally affect Solr if Tika is being 
run inside Solr.  Because of this, it is recommended that you write a 
separate program using Tika that handles extracting information from 
documents and sending that data to Solr.  If that program crashes, Solr 
remains operational.


There is already an issue to upgrade Tika to the latest version in Solr, 
but you've said that you tried 1.18 already with no change to the 
problem.  So whatever the problem is, it will need to be solved in 1.19 
or later.


Thanks,
Shawn



Re: Memory Leak in 7.3 to 7.4

2018-08-02 Thread Vincenzo D'Amore
Does this script also saves a memory dump of jvm?

Ciao,
Vincenzo

--
mobile: 3498513251
skype: free.dev

> On 2 Aug 2018, at 17:53, Erick Erickson  wrote:
> 
> Thomas:
> 
> You've obviously done a lot of work to track this, but maybe you can
> do even more ;).
> 
> Here's a link to a program that uses Tika to parse docs _on the client_:
> https://lucidworks.com/2012/02/14/indexing-with-solrj/
> 
> If you take out all the DB and Solr parts, you're left with something
> that just parses docs with Tika. My idea here is to feed it your docs
> and see if there are these noticeable memory differences between the
> versions of Tika.  A heap dump if there are would help the Tika folks
> enormously in tracking this down.
> 
> And if there's no memory creep, that points toward the glue code in Solr.
> 
> I also have to add that this kind of thing is one of the reasons we
> generally recommend that production systems do not use
> ExtractingRequestHandler. There are other reasons outlined in the link
> above
> 
> Best,
> Erick
> 
> On Thu, Aug 2, 2018 at 4:30 AM, Thomas Scheffler
>  wrote:
>> Hi,
>> 
>> my final verdict is the upgrade to Tika 1.17. If I downgrade the libraries 
>> just for tika back to 1.16 and keep the rest of SOLR 7.4.0 the heap usage 
>> after about 85 % of the index process and manual trigger of the garbage 
>> collector is about 60-70 MB (That low!!!)
>> 
>> My problem now is that we have several setups that triggers this reliably 
>> but there is no simple test case that „fails“ if Tika 1.17 or 1.18 is used. 
>> I also do not know if the error is inside Tika or inside the glue code that 
>> makes Tika usable in SOLR.
>> 
>> Should I file an issue for this?
>> 
>> kind regards,
>> 
>> Thomas
>> 
>> 
>>> Am 02.08.2018 um 12:06 schrieb Thomas Scheffler 
>>> :
>>> 
>>> Hi,
>>> 
>>> we noticed a memory leak in a rather small setup. 40.000 metadata documents 
>>> with nearly as much files that have „literal.*“ fields with it. While 7.2.1 
>>> has brought some tika issues (due to a beta version) the real problems 
>>> started to appear with version 7.3.0 which are currently unresolved in 
>>> 7.4.0. Memory consumption is out-of-roof. Where previously 512MB heap was 
>>> enough, now 6G aren’t enough to index all files.
>>> I am now to a point where I can track this down to the libraries in 
>>> solr-7.4.0/contrib/extraction/lib/. If I replace them all by the libraries 
>>> shipped with 7.2.1 the problem disappears. As most files are PDF documents 
>>> I tried updating pdfbox to 2.0.11 and tika to 1.18 with no solution to the 
>>> problem. I will next try to downgrade these single libraries back to 2.0.6 
>>> and 1.16 to see if these are the source of the memory leak.
>>> 
>>> In the mean time I would like to know if anybody else experienced the same 
>>> problems?
>>> 
>>> kind regards,
>>> 
>>> Thomas
>> 
>> 


Re: Memory Leak in 7.3 to 7.4

2018-08-02 Thread Erick Erickson
Thomas:

You've obviously done a lot of work to track this, but maybe you can
do even more ;).

Here's a link to a program that uses Tika to parse docs _on the client_:
https://lucidworks.com/2012/02/14/indexing-with-solrj/

If you take out all the DB and Solr parts, you're left with something
that just parses docs with Tika. My idea here is to feed it your docs
and see if there are these noticeable memory differences between the
versions of Tika.  A heap dump if there are would help the Tika folks
enormously in tracking this down.

And if there's no memory creep, that points toward the glue code in Solr.

I also have to add that this kind of thing is one of the reasons we
generally recommend that production systems do not use
ExtractingRequestHandler. There are other reasons outlined in the link
above

Best,
Erick

On Thu, Aug 2, 2018 at 4:30 AM, Thomas Scheffler
 wrote:
> Hi,
>
> my final verdict is the upgrade to Tika 1.17. If I downgrade the libraries 
> just for tika back to 1.16 and keep the rest of SOLR 7.4.0 the heap usage 
> after about 85 % of the index process and manual trigger of the garbage 
> collector is about 60-70 MB (That low!!!)
>
> My problem now is that we have several setups that triggers this reliably but 
> there is no simple test case that „fails“ if Tika 1.17 or 1.18 is used. I 
> also do not know if the error is inside Tika or inside the glue code that 
> makes Tika usable in SOLR.
>
> Should I file an issue for this?
>
> kind regards,
>
> Thomas
>
>
>> Am 02.08.2018 um 12:06 schrieb Thomas Scheffler 
>> :
>>
>> Hi,
>>
>> we noticed a memory leak in a rather small setup. 40.000 metadata documents 
>> with nearly as much files that have „literal.*“ fields with it. While 7.2.1 
>> has brought some tika issues (due to a beta version) the real problems 
>> started to appear with version 7.3.0 which are currently unresolved in 
>> 7.4.0. Memory consumption is out-of-roof. Where previously 512MB heap was 
>> enough, now 6G aren’t enough to index all files.
>> I am now to a point where I can track this down to the libraries in 
>> solr-7.4.0/contrib/extraction/lib/. If I replace them all by the libraries 
>> shipped with 7.2.1 the problem disappears. As most files are PDF documents I 
>> tried updating pdfbox to 2.0.11 and tika to 1.18 with no solution to the 
>> problem. I will next try to downgrade these single libraries back to 2.0.6 
>> and 1.16 to see if these are the source of the memory leak.
>>
>> In the mean time I would like to know if anybody else experienced the same 
>> problems?
>>
>> kind regards,
>>
>> Thomas
>
>


Re: Memory Leak in 7.3 to 7.4

2018-08-02 Thread Thomas Scheffler
Hi,

my final verdict is the upgrade to Tika 1.17. If I downgrade the libraries just 
for tika back to 1.16 and keep the rest of SOLR 7.4.0 the heap usage after 
about 85 % of the index process and manual trigger of the garbage collector is 
about 60-70 MB (That low!!!)

My problem now is that we have several setups that triggers this reliably but 
there is no simple test case that „fails“ if Tika 1.17 or 1.18 is used. I also 
do not know if the error is inside Tika or inside the glue code that makes Tika 
usable in SOLR.

Should I file an issue for this?

kind regards,

Thomas


> Am 02.08.2018 um 12:06 schrieb Thomas Scheffler 
> :
> 
> Hi,
> 
> we noticed a memory leak in a rather small setup. 40.000 metadata documents 
> with nearly as much files that have „literal.*“ fields with it. While 7.2.1 
> has brought some tika issues (due to a beta version) the real problems 
> started to appear with version 7.3.0 which are currently unresolved in 7.4.0. 
> Memory consumption is out-of-roof. Where previously 512MB heap was enough, 
> now 6G aren’t enough to index all files.
> I am now to a point where I can track this down to the libraries in 
> solr-7.4.0/contrib/extraction/lib/. If I replace them all by the libraries 
> shipped with 7.2.1 the problem disappears. As most files are PDF documents I 
> tried updating pdfbox to 2.0.11 and tika to 1.18 with no solution to the 
> problem. I will next try to downgrade these single libraries back to 2.0.6 
> and 1.16 to see if these are the source of the memory leak.
> 
> In the mean time I would like to know if anybody else experienced the same 
> problems?
> 
> kind regards,
> 
> Thomas




signature.asc
Description: Message signed with OpenPGP


Re: Memory Leak in 7.3 to 7.4

2018-08-02 Thread Thomas Scheffler
Hi,

SOLR is shipping with a script that handles OOM errors. And produces log files 
for every case with content like this:

Running OOM killer script for process 9015 for Solr on port 28080
Killed process 9015

This script works ;-)

kind regards

Thomas



> Am 02.08.2018 um 12:28 schrieb Vincenzo D'Amore :
> 
> Not clear if you had experienced an OOM error.
> 
> On Thu, Aug 2, 2018 at 12:06 PM Thomas Scheffler <
> thomas.scheff...@uni-jena.de> wrote:
> 
>> Hi,
>> 
>> we noticed a memory leak in a rather small setup. 40.000 metadata
>> documents with nearly as much files that have „literal.*“ fields with it.
>> While 7.2.1 has brought some tika issues (due to a beta version) the real
>> problems started to appear with version 7.3.0 which are currently
>> unresolved in 7.4.0. Memory consumption is out-of-roof. Where previously
>> 512MB heap was enough, now 6G aren’t enough to index all files.
>> I am now to a point where I can track this down to the libraries in
>> solr-7.4.0/contrib/extraction/lib/. If I replace them all by the libraries
>> shipped with 7.2.1 the problem disappears. As most files are PDF documents
>> I tried updating pdfbox to 2.0.11 and tika to 1.18 with no solution to the
>> problem. I will next try to downgrade these single libraries back to 2.0.6
>> and 1.16 to see if these are the source of the memory leak.
>> 
>> In the mean time I would like to know if anybody else experienced the same
>> problems?
>> 
>> kind regards,
>> 
>> Thomas
>> 
> 
> 
> --
> Vincenzo D'Amore




signature.asc
Description: Message signed with OpenPGP


Re: Memory Leak in 7.3 to 7.4

2018-08-02 Thread Vincenzo D'Amore
Not clear if you had experienced an OOM error.

In the meanwhile, if you haven't already added, this can be useful:

-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/store/solr-logs/dump.hprof

This is my GC_TUNE config - a 32GB server and 16GB reserved for JVM
(-Xms16G -Xmx16G)

export GC_TUNE=" \
-XX:+UseG1GC \
-XX:+ParallelRefProcEnabled \
-XX:G1HeapRegionSize=8m \
-XX:MaxGCPauseMillis=200 \
-XX:+UseLargePages \
-XX:+AggressiveOpts \
-XX:+HeapDumpOnOutOfMemoryError \
-XX:HeapDumpPath=/store/solr-logs/dump.hprof \
"


On Thu, Aug 2, 2018 at 12:06 PM Thomas Scheffler <
thomas.scheff...@uni-jena.de> wrote:

> Hi,
>
> we noticed a memory leak in a rather small setup. 40.000 metadata
> documents with nearly as much files that have „literal.*“ fields with it.
> While 7.2.1 has brought some tika issues (due to a beta version) the real
> problems started to appear with version 7.3.0 which are currently
> unresolved in 7.4.0. Memory consumption is out-of-roof. Where previously
> 512MB heap was enough, now 6G aren’t enough to index all files.
> I am now to a point where I can track this down to the libraries in
> solr-7.4.0/contrib/extraction/lib/. If I replace them all by the libraries
> shipped with 7.2.1 the problem disappears. As most files are PDF documents
> I tried updating pdfbox to 2.0.11 and tika to 1.18 with no solution to the
> problem. I will next try to downgrade these single libraries back to 2.0.6
> and 1.16 to see if these are the source of the memory leak.
>
> In the mean time I would like to know if anybody else experienced the same
> problems?
>
> kind regards,
>
> Thomas
>


-- 
Vincenzo D'Amore


Memory Leak in 7.3 to 7.4

2018-08-02 Thread Thomas Scheffler
Hi,

we noticed a memory leak in a rather small setup. 40.000 metadata documents 
with nearly as much files that have „literal.*“ fields with it. While 7.2.1 has 
brought some tika issues (due to a beta version) the real problems started to 
appear with version 7.3.0 which are currently unresolved in 7.4.0. Memory 
consumption is out-of-roof. Where previously 512MB heap was enough, now 6G 
aren’t enough to index all files.
I am now to a point where I can track this down to the libraries in 
solr-7.4.0/contrib/extraction/lib/. If I replace them all by the libraries 
shipped with 7.2.1 the problem disappears. As most files are PDF documents I 
tried updating pdfbox to 2.0.11 and tika to 1.18 with no solution to the 
problem. I will next try to downgrade these single libraries back to 2.0.6 and 
1.16 to see if these are the source of the memory leak.

In the mean time I would like to know if anybody else experienced the same 
problems?

kind regards,

Thomas


signature.asc
Description: Message signed with OpenPGP


Re: Possible memory leak with VersionBucket objects

2017-09-25 Thread Sundeep T
Sorry, I meant we are "not" running Solr in cloud mode

On Mon, Sep 25, 2017 at 1:29 PM, Sundeep T  wrote:

> Yes, but that issue seems specific to SolrCloud like I mentioned. We are
> running Solr in cloud mode and don't have Zookeeper configured
>
> Thanks
> Sundeep
>
> On Mon, Sep 25, 2017 at 12:52 PM, Steve Rowe  wrote:
>
>> Hi Sundeep,
>>
>> This looks to me like 
>> / , which was fixed in
>> Solr 7.0.
>>
>> --
>> Steve
>> www.lucidworks.com
>>
>> > On Sep 25, 2017, at 2:42 PM, Sundeep T  wrote:
>> >
>> > Hello,
>> >
>> > We are running our solr 6.4.2 instance on a single node without
>> zookeeper. So, we are not using solr cloud. We have been ingesting about
>> 50k messages per second into this instance spread over 4 cores.
>> >
>> > When we looked at the heapdump we see that it has there are around 385
>> million instances of VersionBucket objects taking about 8gb memory. This
>> number seems to grow based on the number of cores into which we are
>> ingesting data into.PFA a screen cap of heap recording.
>> >
>> > Browsing through the jira list we saw a similar issue -
>> https://issues.apache.org/jira/browse/SOLR-9803
>> >
>> > This issue is recently resolved by Erick. But this issue seems be
>> specifically tied to SolrCloud mode and Zookeeper. We are not using any of
>> these.
>> >
>> > So, we are thinking this could be another issue. Any one has ideas on
>> what this could be and if there is a fix for it?
>> >
>> > Thanks
>> > Sundeep
>>
>>
>


Re: Possible memory leak with VersionBucket objects

2017-09-25 Thread Sundeep T
Yes, but that issue seems specific to SolrCloud like I mentioned. We are
running Solr in cloud mode and don't have Zookeeper configured

Thanks
Sundeep

On Mon, Sep 25, 2017 at 12:52 PM, Steve Rowe  wrote:

> Hi Sundeep,
>
> This looks to me like  /
> , which was fixed in
> Solr 7.0.
>
> --
> Steve
> www.lucidworks.com
>
> > On Sep 25, 2017, at 2:42 PM, Sundeep T  wrote:
> >
> > Hello,
> >
> > We are running our solr 6.4.2 instance on a single node without
> zookeeper. So, we are not using solr cloud. We have been ingesting about
> 50k messages per second into this instance spread over 4 cores.
> >
> > When we looked at the heapdump we see that it has there are around 385
> million instances of VersionBucket objects taking about 8gb memory. This
> number seems to grow based on the number of cores into which we are
> ingesting data into.PFA a screen cap of heap recording.
> >
> > Browsing through the jira list we saw a similar issue -
> https://issues.apache.org/jira/browse/SOLR-9803
> >
> > This issue is recently resolved by Erick. But this issue seems be
> specifically tied to SolrCloud mode and Zookeeper. We are not using any of
> these.
> >
> > So, we are thinking this could be another issue. Any one has ideas on
> what this could be and if there is a fix for it?
> >
> > Thanks
> > Sundeep
>
>


Re: Possible memory leak with VersionBucket objects

2017-09-25 Thread Steve Rowe
Hi Sundeep,

This looks to me like  / 
, which was fixed in Solr 7.0.

--
Steve
www.lucidworks.com

> On Sep 25, 2017, at 2:42 PM, Sundeep T  wrote:
> 
> Hello,
> 
> We are running our solr 6.4.2 instance on a single node without zookeeper. 
> So, we are not using solr cloud. We have been ingesting about 50k messages 
> per second into this instance spread over 4 cores. 
> 
> When we looked at the heapdump we see that it has there are around 385 
> million instances of VersionBucket objects taking about 8gb memory. This 
> number seems to grow based on the number of cores into which we are ingesting 
> data into.PFA a screen cap of heap recording.
> 
> Browsing through the jira list we saw a similar issue 
> -https://issues.apache.org/jira/browse/SOLR-9803
> 
> This issue is recently resolved by Erick. But this issue seems be 
> specifically tied to SolrCloud mode and Zookeeper. We are not using any of 
> these.
> 
> So, we are thinking this could be another issue. Any one has ideas on what 
> this could be and if there is a fix for it?
> 
> Thanks
> Sundeep



Possible memory leak with VersionBucket objects

2017-09-25 Thread Sundeep T
Hello,

We are running our solr 6.4.2 instance on a single node without zookeeper.
So, we are not using solr cloud. We have been ingesting about 50k messages
per second into this instance spread over 4 cores.

When we looked at the heapdump we see that it has there are around 385
million instances of VersionBucket objects taking about 8gb memory. This
number seems to grow based on the number of cores into which we are
ingesting data into.PFA a screen cap of heap recording.

Browsing through the jira list we saw a similar issue -
https://issues.apache.org/jira/browse/SOLR-9803

This issue is recently resolved by Erick. But this issue seems be
specifically tied to SolrCloud mode and Zookeeper. We are not using any of
these.

So, we are thinking this could be another issue. Any one has ideas on what
this could be and if there is a fix for it?

Thanks
Sundeep


Re: Solr memory leak

2017-09-10 Thread Hendrik Haddorp
I didn't meant to say that the fix is not in 7.0. I just stated that I 
do not see it listed in the release notes 
(https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310230=12335718).


Thanks for explaining the release process.

regards,
Hendrik

On 10.09.2017 17:32, Erick Erickson wrote:

There will be no 6.7. Once the X+1 version is released, all past fixes
are applied to as minor releases to the last released version of the
previous major release. So now that 7.0 has been cut, there might be a
6.6.2 (6.6.1 was just released) but no 6.7. Current un-released JIRAs
are parked on the 6.x (as opposed to branch_6_6) for convenience. If
anyone steps up to release 6.6.2, they can include ths.

Why do you say this isn't in 7.0? The "Fix Versions" clearly states
so, as does CHANGES.txt for 7.0. The new file is is in the 7.0 branch.


If you need it in 6x you have a couple of options:

1> agitate fo ra 6.6.2 with this included
2> apply the patch yourself and compile it locally

Best,
Erick

On Sun, Sep 10, 2017 at 6:04 AM, Hendrik Haddorp
<hendrik.hadd...@gmx.net> wrote:

Hi,

looks like SOLR-10506 didn't make it into 6.6.1. I do however also not see
it listen in the current release notes for 6.7 nor 7.0:
 https://issues.apache.org/jira/projects/SOLR/versions/12340568
 https://issues.apache.org/jira/projects/SOLR/versions/12335718

Is there any any rough idea already when 6.7 or 7.0 will be released?

thanks,
Hendrik


On 28.08.2017 18:31, Erick Erickson wrote:

Varun Thacker is the RM for Solr 6.6.1, I've pinged him about including
it.

On Mon, Aug 28, 2017 at 8:52 AM, Walter Underwood <wun...@wunderwood.org>
wrote:

That would be a really good reason for a 6.7.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)



On Aug 28, 2017, at 8:48 AM, Markus Jelsma <markus.jel...@openindex.io>
wrote:

It is, unfortunately, not committed for 6.7.





-Original message-

From:Markus Jelsma <markus.jel...@openindex.io>
Sent: Monday 28th August 2017 17:46
To: solr-user@lucene.apache.org
Subject: RE: Solr memory leak

See https://issues.apache.org/jira/browse/SOLR-10506
Fixed for 7.0

Markus



-Original message-

From:Hendrik Haddorp <hendrik.hadd...@gmx.net>
Sent: Monday 28th August 2017 17:42
To: solr-user@lucene.apache.org
Subject: Solr memory leak

Hi,

we noticed that triggering collection reloads on many collections has
a
good chance to result in an OOM-Error. To investigate that further I
did
a simple test:
  - Start solr with a 2GB heap and 1GB Metaspace
  - create a trivial collection with a few documents (I used only 2
fields and 100 documents)
  - trigger a collection reload in a loop (I used SolrJ for this)

Using Solr 6.3 the test started to fail after about 250 loops. Solr
6.6
worked better but also failed after 1100 loops.

When looking at the memory usage on the Solr dashboard it looks like
the
space left after GC cycles gets less and less. Then Solr gets very
slow,
as the JVM is busy with the GC. A bit later Solr gets an OOM-Error. In
my last run this was actually for the Metaspace. So it looks like more
and more heap and metaspace is being used by just constantly reloading
a
trivial collection.

regards,
Hendrik





Re: Solr memory leak

2017-09-10 Thread Erick Erickson
There will be no 6.7. Once the X+1 version is released, all past fixes
are applied to as minor releases to the last released version of the
previous major release. So now that 7.0 has been cut, there might be a
6.6.2 (6.6.1 was just released) but no 6.7. Current un-released JIRAs
are parked on the 6.x (as opposed to branch_6_6) for convenience. If
anyone steps up to release 6.6.2, they can include ths.

Why do you say this isn't in 7.0? The "Fix Versions" clearly states
so, as does CHANGES.txt for 7.0. The new file is is in the 7.0 branch.


If you need it in 6x you have a couple of options:

1> agitate fo ra 6.6.2 with this included
2> apply the patch yourself and compile it locally

Best,
Erick

On Sun, Sep 10, 2017 at 6:04 AM, Hendrik Haddorp
<hendrik.hadd...@gmx.net> wrote:
> Hi,
>
> looks like SOLR-10506 didn't make it into 6.6.1. I do however also not see
> it listen in the current release notes for 6.7 nor 7.0:
> https://issues.apache.org/jira/projects/SOLR/versions/12340568
> https://issues.apache.org/jira/projects/SOLR/versions/12335718
>
> Is there any any rough idea already when 6.7 or 7.0 will be released?
>
> thanks,
> Hendrik
>
>
> On 28.08.2017 18:31, Erick Erickson wrote:
>>
>> Varun Thacker is the RM for Solr 6.6.1, I've pinged him about including
>> it.
>>
>> On Mon, Aug 28, 2017 at 8:52 AM, Walter Underwood <wun...@wunderwood.org>
>> wrote:
>>>
>>> That would be a really good reason for a 6.7.
>>>
>>> wunder
>>> Walter Underwood
>>> wun...@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>>
>>>
>>>> On Aug 28, 2017, at 8:48 AM, Markus Jelsma <markus.jel...@openindex.io>
>>>> wrote:
>>>>
>>>> It is, unfortunately, not committed for 6.7.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> -Original message-
>>>>>
>>>>> From:Markus Jelsma <markus.jel...@openindex.io>
>>>>> Sent: Monday 28th August 2017 17:46
>>>>> To: solr-user@lucene.apache.org
>>>>> Subject: RE: Solr memory leak
>>>>>
>>>>> See https://issues.apache.org/jira/browse/SOLR-10506
>>>>> Fixed for 7.0
>>>>>
>>>>> Markus
>>>>>
>>>>>
>>>>>
>>>>> -Original message-
>>>>>>
>>>>>> From:Hendrik Haddorp <hendrik.hadd...@gmx.net>
>>>>>> Sent: Monday 28th August 2017 17:42
>>>>>> To: solr-user@lucene.apache.org
>>>>>> Subject: Solr memory leak
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> we noticed that triggering collection reloads on many collections has
>>>>>> a
>>>>>> good chance to result in an OOM-Error. To investigate that further I
>>>>>> did
>>>>>> a simple test:
>>>>>>  - Start solr with a 2GB heap and 1GB Metaspace
>>>>>>  - create a trivial collection with a few documents (I used only 2
>>>>>> fields and 100 documents)
>>>>>>  - trigger a collection reload in a loop (I used SolrJ for this)
>>>>>>
>>>>>> Using Solr 6.3 the test started to fail after about 250 loops. Solr
>>>>>> 6.6
>>>>>> worked better but also failed after 1100 loops.
>>>>>>
>>>>>> When looking at the memory usage on the Solr dashboard it looks like
>>>>>> the
>>>>>> space left after GC cycles gets less and less. Then Solr gets very
>>>>>> slow,
>>>>>> as the JVM is busy with the GC. A bit later Solr gets an OOM-Error. In
>>>>>> my last run this was actually for the Metaspace. So it looks like more
>>>>>> and more heap and metaspace is being used by just constantly reloading
>>>>>> a
>>>>>> trivial collection.
>>>>>>
>>>>>> regards,
>>>>>> Hendrik
>>>>>>
>


Re: Solr memory leak

2017-09-10 Thread Hendrik Haddorp

Hi,

looks like SOLR-10506 didn't make it into 6.6.1. I do however also not 
see it listen in the current release notes for 6.7 nor 7.0:

https://issues.apache.org/jira/projects/SOLR/versions/12340568
https://issues.apache.org/jira/projects/SOLR/versions/12335718

Is there any any rough idea already when 6.7 or 7.0 will be released?

thanks,
Hendrik

On 28.08.2017 18:31, Erick Erickson wrote:

Varun Thacker is the RM for Solr 6.6.1, I've pinged him about including it.

On Mon, Aug 28, 2017 at 8:52 AM, Walter Underwood <wun...@wunderwood.org> wrote:

That would be a really good reason for a 6.7.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)



On Aug 28, 2017, at 8:48 AM, Markus Jelsma <markus.jel...@openindex.io> wrote:

It is, unfortunately, not committed for 6.7.





-Original message-

From:Markus Jelsma <markus.jel...@openindex.io>
Sent: Monday 28th August 2017 17:46
To: solr-user@lucene.apache.org
Subject: RE: Solr memory leak

See https://issues.apache.org/jira/browse/SOLR-10506
Fixed for 7.0

Markus



-Original message-

From:Hendrik Haddorp <hendrik.hadd...@gmx.net>
Sent: Monday 28th August 2017 17:42
To: solr-user@lucene.apache.org
Subject: Solr memory leak

Hi,

we noticed that triggering collection reloads on many collections has a
good chance to result in an OOM-Error. To investigate that further I did
a simple test:
 - Start solr with a 2GB heap and 1GB Metaspace
 - create a trivial collection with a few documents (I used only 2
fields and 100 documents)
 - trigger a collection reload in a loop (I used SolrJ for this)

Using Solr 6.3 the test started to fail after about 250 loops. Solr 6.6
worked better but also failed after 1100 loops.

When looking at the memory usage on the Solr dashboard it looks like the
space left after GC cycles gets less and less. Then Solr gets very slow,
as the JVM is busy with the GC. A bit later Solr gets an OOM-Error. In
my last run this was actually for the Metaspace. So it looks like more
and more heap and metaspace is being used by just constantly reloading a
trivial collection.

regards,
Hendrik





Re: Solr memory leak

2017-08-30 Thread Hendrik Haddorp
Did you get an answer? Would really be nice to have that in the next 
release.


On 28.08.2017 18:31, Erick Erickson wrote:

Varun Thacker is the RM for Solr 6.6.1, I've pinged him about including it.

On Mon, Aug 28, 2017 at 8:52 AM, Walter Underwood <wun...@wunderwood.org> wrote:

That would be a really good reason for a 6.7.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)



On Aug 28, 2017, at 8:48 AM, Markus Jelsma <markus.jel...@openindex.io> wrote:

It is, unfortunately, not committed for 6.7.





-Original message-

From:Markus Jelsma <markus.jel...@openindex.io>
Sent: Monday 28th August 2017 17:46
To: solr-user@lucene.apache.org
Subject: RE: Solr memory leak

See https://issues.apache.org/jira/browse/SOLR-10506
Fixed for 7.0

Markus



-Original message-

From:Hendrik Haddorp <hendrik.hadd...@gmx.net>
Sent: Monday 28th August 2017 17:42
To: solr-user@lucene.apache.org
Subject: Solr memory leak

Hi,

we noticed that triggering collection reloads on many collections has a
good chance to result in an OOM-Error. To investigate that further I did
a simple test:
 - Start solr with a 2GB heap and 1GB Metaspace
 - create a trivial collection with a few documents (I used only 2
fields and 100 documents)
 - trigger a collection reload in a loop (I used SolrJ for this)

Using Solr 6.3 the test started to fail after about 250 loops. Solr 6.6
worked better but also failed after 1100 loops.

When looking at the memory usage on the Solr dashboard it looks like the
space left after GC cycles gets less and less. Then Solr gets very slow,
as the JVM is busy with the GC. A bit later Solr gets an OOM-Error. In
my last run this was actually for the Metaspace. So it looks like more
and more heap and metaspace is being used by just constantly reloading a
trivial collection.

regards,
Hendrik





Re: Solr memory leak

2017-08-28 Thread Erick Erickson
Varun Thacker is the RM for Solr 6.6.1, I've pinged him about including it.

On Mon, Aug 28, 2017 at 8:52 AM, Walter Underwood <wun...@wunderwood.org> wrote:
> That would be a really good reason for a 6.7.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
>> On Aug 28, 2017, at 8:48 AM, Markus Jelsma <markus.jel...@openindex.io> 
>> wrote:
>>
>> It is, unfortunately, not committed for 6.7.
>>
>>
>>
>>
>>
>> -Original message-
>>> From:Markus Jelsma <markus.jel...@openindex.io>
>>> Sent: Monday 28th August 2017 17:46
>>> To: solr-user@lucene.apache.org
>>> Subject: RE: Solr memory leak
>>>
>>> See https://issues.apache.org/jira/browse/SOLR-10506
>>> Fixed for 7.0
>>>
>>> Markus
>>>
>>>
>>>
>>> -Original message-
>>>> From:Hendrik Haddorp <hendrik.hadd...@gmx.net>
>>>> Sent: Monday 28th August 2017 17:42
>>>> To: solr-user@lucene.apache.org
>>>> Subject: Solr memory leak
>>>>
>>>> Hi,
>>>>
>>>> we noticed that triggering collection reloads on many collections has a
>>>> good chance to result in an OOM-Error. To investigate that further I did
>>>> a simple test:
>>>> - Start solr with a 2GB heap and 1GB Metaspace
>>>> - create a trivial collection with a few documents (I used only 2
>>>> fields and 100 documents)
>>>> - trigger a collection reload in a loop (I used SolrJ for this)
>>>>
>>>> Using Solr 6.3 the test started to fail after about 250 loops. Solr 6.6
>>>> worked better but also failed after 1100 loops.
>>>>
>>>> When looking at the memory usage on the Solr dashboard it looks like the
>>>> space left after GC cycles gets less and less. Then Solr gets very slow,
>>>> as the JVM is busy with the GC. A bit later Solr gets an OOM-Error. In
>>>> my last run this was actually for the Metaspace. So it looks like more
>>>> and more heap and metaspace is being used by just constantly reloading a
>>>> trivial collection.
>>>>
>>>> regards,
>>>> Hendrik
>>>>
>>>
>


Re: Solr memory leak

2017-08-28 Thread Walter Underwood
That would be a really good reason for a 6.7.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Aug 28, 2017, at 8:48 AM, Markus Jelsma <markus.jel...@openindex.io> wrote:
> 
> It is, unfortunately, not committed for 6.7.
> 
> 
> 
> 
> 
> -Original message-
>> From:Markus Jelsma <markus.jel...@openindex.io>
>> Sent: Monday 28th August 2017 17:46
>> To: solr-user@lucene.apache.org
>> Subject: RE: Solr memory leak
>> 
>> See https://issues.apache.org/jira/browse/SOLR-10506
>> Fixed for 7.0
>> 
>> Markus
>> 
>> 
>> 
>> -Original message-
>>> From:Hendrik Haddorp <hendrik.hadd...@gmx.net>
>>> Sent: Monday 28th August 2017 17:42
>>> To: solr-user@lucene.apache.org
>>> Subject: Solr memory leak
>>> 
>>> Hi,
>>> 
>>> we noticed that triggering collection reloads on many collections has a 
>>> good chance to result in an OOM-Error. To investigate that further I did 
>>> a simple test:
>>> - Start solr with a 2GB heap and 1GB Metaspace
>>> - create a trivial collection with a few documents (I used only 2 
>>> fields and 100 documents)
>>> - trigger a collection reload in a loop (I used SolrJ for this)
>>> 
>>> Using Solr 6.3 the test started to fail after about 250 loops. Solr 6.6 
>>> worked better but also failed after 1100 loops.
>>> 
>>> When looking at the memory usage on the Solr dashboard it looks like the 
>>> space left after GC cycles gets less and less. Then Solr gets very slow, 
>>> as the JVM is busy with the GC. A bit later Solr gets an OOM-Error. In 
>>> my last run this was actually for the Metaspace. So it looks like more 
>>> and more heap and metaspace is being used by just constantly reloading a 
>>> trivial collection.
>>> 
>>> regards,
>>> Hendrik
>>> 
>> 



RE: Solr memory leak

2017-08-28 Thread Markus Jelsma
It is, unfortunately, not committed for 6.7.



 
 
-Original message-
> From:Markus Jelsma <markus.jel...@openindex.io>
> Sent: Monday 28th August 2017 17:46
> To: solr-user@lucene.apache.org
> Subject: RE: Solr memory leak
> 
> See https://issues.apache.org/jira/browse/SOLR-10506
> Fixed for 7.0
> 
> Markus
> 
>  
>  
> -Original message-
> > From:Hendrik Haddorp <hendrik.hadd...@gmx.net>
> > Sent: Monday 28th August 2017 17:42
> > To: solr-user@lucene.apache.org
> > Subject: Solr memory leak
> > 
> > Hi,
> > 
> > we noticed that triggering collection reloads on many collections has a 
> > good chance to result in an OOM-Error. To investigate that further I did 
> > a simple test:
> >  - Start solr with a 2GB heap and 1GB Metaspace
> >  - create a trivial collection with a few documents (I used only 2 
> > fields and 100 documents)
> >  - trigger a collection reload in a loop (I used SolrJ for this)
> > 
> > Using Solr 6.3 the test started to fail after about 250 loops. Solr 6.6 
> > worked better but also failed after 1100 loops.
> > 
> > When looking at the memory usage on the Solr dashboard it looks like the 
> > space left after GC cycles gets less and less. Then Solr gets very slow, 
> > as the JVM is busy with the GC. A bit later Solr gets an OOM-Error. In 
> > my last run this was actually for the Metaspace. So it looks like more 
> > and more heap and metaspace is being used by just constantly reloading a 
> > trivial collection.
> > 
> > regards,
> > Hendrik
> > 
> 


RE: Solr memory leak

2017-08-28 Thread Markus Jelsma
See https://issues.apache.org/jira/browse/SOLR-10506
Fixed for 7.0

Markus

 
 
-Original message-
> From:Hendrik Haddorp <hendrik.hadd...@gmx.net>
> Sent: Monday 28th August 2017 17:42
> To: solr-user@lucene.apache.org
> Subject: Solr memory leak
> 
> Hi,
> 
> we noticed that triggering collection reloads on many collections has a 
> good chance to result in an OOM-Error. To investigate that further I did 
> a simple test:
>  - Start solr with a 2GB heap and 1GB Metaspace
>  - create a trivial collection with a few documents (I used only 2 
> fields and 100 documents)
>  - trigger a collection reload in a loop (I used SolrJ for this)
> 
> Using Solr 6.3 the test started to fail after about 250 loops. Solr 6.6 
> worked better but also failed after 1100 loops.
> 
> When looking at the memory usage on the Solr dashboard it looks like the 
> space left after GC cycles gets less and less. Then Solr gets very slow, 
> as the JVM is busy with the GC. A bit later Solr gets an OOM-Error. In 
> my last run this was actually for the Metaspace. So it looks like more 
> and more heap and metaspace is being used by just constantly reloading a 
> trivial collection.
> 
> regards,
> Hendrik
> 


Solr memory leak

2017-08-28 Thread Hendrik Haddorp

Hi,

we noticed that triggering collection reloads on many collections has a 
good chance to result in an OOM-Error. To investigate that further I did 
a simple test:

- Start solr with a 2GB heap and 1GB Metaspace
- create a trivial collection with a few documents (I used only 2 
fields and 100 documents)

- trigger a collection reload in a loop (I used SolrJ for this)

Using Solr 6.3 the test started to fail after about 250 loops. Solr 6.6 
worked better but also failed after 1100 loops.


When looking at the memory usage on the Solr dashboard it looks like the 
space left after GC cycles gets less and less. Then Solr gets very slow, 
as the JVM is busy with the GC. A bit later Solr gets an OOM-Error. In 
my last run this was actually for the Metaspace. So it looks like more 
and more heap and metaspace is being used by just constantly reloading a 
trivial collection.


regards,
Hendrik


Re: maxwarmingSearchers and memory leak

2017-03-05 Thread SOLR4189
1) We've actually got 60 to 80 GB of index on the machine (in the image below
you can see that size of index on the machine 82GB, because all index is in
path /opt/solr):
<http://lucene.472066.n3.nabble.com/file/n4323509/size.jpg> 

2) Our commits runs: autoSoftCommit - each 15 minutes and autoHardCommit -
each 30 minutes
and our commits take 10 seconds only

3) ConcurrentLFUCaches (that you saw in the image in the previous message)
aren't filterCaches, they are fieldValueCaches

4) Solr top:
<http://lucene.472066.n3.nabble.com/file/n4323509/top.jpg> 

5) We don't know if this related to problem, but all our SOLR servers are
virtual servers.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/maxwarmingSearchers-and-memory-leak-tp4321937p4323509.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: maxwarmingSearchers and memory leak

2017-03-01 Thread Shawn Heisey
On 2/26/2017 6:40 AM, SOLR4189 wrote:
> Shawn, you are right.
> * OS vendor and version 
> CentosOS 6.5
>
> * Java vendor and version
> OpenJDK version 1.8.0_20
> OpenJDK 64-bit Server VM (build 25.20-b23)
>
> * Servlet container used to start Solr. 
> Catalina(tomcat7)
>
> * Total amount of memory in the server. 
> 30 GB
>
> * Max heap size for Solr. 
> 8GB(JVM)
>
> * An idea of exactly what is running on the server. 
> On our servers runs solr service only and splunk forwarder
>
> * Total index size and document count being handled by Solr (add up all 
> indexes). 
> 60GB and 2.6 milion on one shard

You say that you've got 60GB of index, but the screenshot seems to
indicate that you've actually got 160 to 180GB of index on the machine. 
With approximately 22GB of memory available for caching (30GB total
minus 8GB for Solr's heap), you don't have enough memory for good
performance.  Your commits are probabl taking so long to finish that
additional commits are coming in and trying to open new searchers before
the previous commits are done.  Increasing the memory or splitting the
index onto more machines might help performance.

With 2.6 million documents in an index shard, whenever the system
creates a filterCache entry for that shard, it will be 325000 bytes.  If
enough of these entries are created, a huge amount of heap memory will
be required.  It will not be a memory leak, though.

You've got an early Java 8 release.  There have been some memory leaks
in Java itself fixed in later releases.  Consider upgrading to the
latest Java 8.

The only thing I can say about the container (tomcat) is that it is an
untested environment.  The only container that actually gets tested is
Jetty.  It's not very likely that running in Tomcat is the problm, though.

> * A screen shot of a process list sorted by memory usage. 
> <http://lucene.472066.n3.nabble.com/file/n4322362/20170226_102812.jpg> 

The display for htop is NOT the same as top.  If I had wanted htop, that
would have been what I mentioned.  The standard top utility shows
everything I was wanting to see.  The display for htop can be useful,
and has answered one question, but doesn't contain everything that I was
after.

Can you share a screenshot of the Solr dashboard, and one of the
standard top utility sorted by memory usage?

> * A screenshot showing total system memory allocations
> <http://lucene.472066.n3.nabble.com/file/n4322362/20170226_102007.jpg> 

This file is not available.  Nabble says "file not found."

Thanks,
Shawn



Re: maxwarmingSearchers and memory leak

2017-02-26 Thread SOLR4189
Shawn, you are right.
* OS vendor and version 
CentosOS 6.5

* Java vendor and version
OpenJDK version 1.8.0_20
OpenJDK 64-bit Server VM (build 25.20-b23)

* Servlet container used to start Solr. 
Catalina(tomcat7)

* Total amount of memory in the server. 
30 GB

* Max heap size for Solr. 
8GB(JVM)

* An idea of exactly what is running on the server. 
On our servers runs solr service only and splunk forwarder

* Total index size and document count being handled by Solr (add up all 
indexes). 
60GB and 2.6 milion on one shard

* A screen shot of a process list sorted by memory usage. 
<http://lucene.472066.n3.nabble.com/file/n4322362/20170226_102812.jpg> 

* A screenshot showing total system memory allocations
<http://lucene.472066.n3.nabble.com/file/n4322362/20170226_102007.jpg> 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/maxwarmingSearchers-and-memory-leak-tp4321937p4322362.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: maxwarmingSearchers and memory leak

2017-02-24 Thread Shawn Heisey
On 2/23/2017 1:51 AM, SOLR4189 wrote:
> We have maxwarmingSearchers set to 2 and field value cache set to
> initial size of 64. We saw that by taking a heap dump that our caches
> consume 70% of the heap size, by looking into the dump we saw that
> fieldValueCache has 6 occurences of
> org.apache.solr.util.concurrentCache. When we have
> maxWarmingSearches=2 we would expect to have only 3 (maybe 4 before GC
> has been launched). What can it be? We use solr4.10.1 

There are no specific details here about your setup except for
maxWarmingSearchers (which is at the typical default value) and
fieldValueCache, which is not explicitly configured in any example, and
probably should not be explicitly configured.  The version number is not
provided.  Heap size is not provided.  Installation method is not
provided.  Other details about your OS, Solr install, configuration, and
index are not available.

Version 4.10.x can be installed in a variety of ways, most of which are
outside the project's control.  5.0 and later are typically only
installed using the built-in container (jetty) and scripts.

Here's a starting list for additional info that we may need:

* OS vendor and version
* Java vendor and version
* Servlet container used to start Solr.
* Total amount of memory in the server.
* Max heap size for Solr.
* An idea of exactly what is running on the server.
* Total index size and document count being handled by Solr (add up all
indexes).
* A screen shot of a process list sorted by memory usage.
* A screenshot showing total system memory allocations.

Those last two are typically available with one screen on most operating
systems that have the "top" utility, if it allows sorting by memory by
pressing shift-M.

Although a memory leak is certainly possible, I am not aware of any
known problems with memory leaks in 4.10 or any of the current code
branches.  Very likely you simply need a larger heap for Solr to do the
work that it has been asked to do.  This link may be helpful:

https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap

There is no "ConcurrentCache" class in Solr.  There is
ConcurrentLRUCache, though.  I have no idea how large cache entries are
in the fieldValueCache, though I would expect them to be fairly small. 
One cache that typically has very large entry sizes is filterCache. 
Each entry in that cache has a byte size equal to the maxDoc value of
the index divided by eight.  So an index with one million documents in
it has filterCache entries which are each 125000 bytes.  An index with
100 million documents has filterCache entries which are each 12.5
million bytes.

Thanks,
Shawn



maxwarmingSearchers and memory leak

2017-02-23 Thread SOLR4189
We have maxwarmingSearchers set to 2 and field value cache set to initial
size of 64. We saw that by taking a heap dump that our caches consume 70% of
the heap size, by looking into the dump we saw that fieldValueCache has 6
occurences of org.apache.solr.util.concurrentCache.
When we have maxWarmingSearches=2 we would expect to have only 3 (maybe 4
before GC has been launched).
What can it be? We use solr4.10.1



--
View this message in context: 
http://lucene.472066.n3.nabble.com/maxwarmingSearchers-and-memory-leak-tp4321937.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Memory leak in Solr

2016-12-07 Thread William Bell
What do you mean by JVM level? Run Solr on different ports on the same
machine? If you have a 32 core box would you run 2,3,4 JVMs?

On Sun, Dec 4, 2016 at 8:46 PM, Jeff Wartes <jwar...@whitepages.com> wrote:

>
> Here’s an earlier post where I mentioned some GC investigation tools:
> https://mail-archives.apache.org/mod_mbox/lucene-solr-user/
> 201604.mbox/%3c8f8fa32d-ec0e-4352-86f7-4b2d8a906...@whitepages.com%3E
>
> In my experience, there are many aspects of the Solr/Lucene memory
> allocation model that scale with things other than documents returned.
> (such as cardinality, or simply index size) A single query on a large index
> might consume dozens of megabytes of heap to complete. But that heap should
> also be released quickly after the query finishes.
> The key characteristic of a memory leak is that the software is allocating
> memory that it cannot reclaim. If it’s a leak, you ought to be able to
> reproduce it at any query rate - have you tried this? A run with, say, half
> the rate, over twice the duration?
>
> I’m inclined to agree with others here, that although you’ve correctly
> attributed the cause to GC, it’s probably less an indication of a leak, and
> more an indication of simply allocating memory faster than it can be
> reclaimed, combined with the long pauses that are increasingly unavoidable
> as heap size goes up.
> Note that in the case of a CMS allocation failure, the fallback full-GC is
> *single threaded*, which means it’ll usually take considerably longer than
> a normal GC - even for a comparable amount of garbage.
>
> In addition to GC tuning, you can address these by sharding more, both at
> the core and jvm level.
>
>
> On 12/4/16, 3:46 PM, "Shawn Heisey" <apa...@elyograg.org> wrote:
>
> On 12/3/2016 9:46 PM, S G wrote:
> > The symptom we see is that the java clients querying Solr see
> response
> > times in 10s of seconds (not milliseconds).
> 
> > Some numbers for the Solr Cloud:
> >
> > *Overall infrastructure:*
> > - Only one collection
> > - 16 VMs used
> > - 8 shards (1 leader and 1 replica per shard - each core on separate
> VM)
> >
> > *Overview from one core:*
> > - Num Docs:193,623,388
> > - Max Doc:230,577,696
> > - Heap Memory Usage:231,217,880
> > - Deleted Docs:36,954,308
> > - Version:2,357,757
> > - Segment Count:37
>
> The heap memory usage number isn't useful.  It doesn't cover all the
> memory used.
>
> > *Stats from QueryHandler/select*
> > - requests:78,557
> > - errors:358
> > - timeouts:0
> > - totalTime:1,639,975.27
> > - avgRequestsPerSecond:2.62
> > - 5minRateReqsPerSecond:1.39
> > - 15minRateReqsPerSecond:1.64
> > - avgTimePerRequest:20.87
> > - medianRequestTime:0.70
> > - 75thPcRequestTime:1.11
> > - 95thPcRequestTime:191.76
>
> These times are in *milliseconds*, not seconds .. and these are even
> better numbers than you showed before.  Where are you seeing 10 plus
> second query times?  Solr is not showing numbers like that.
>
> If your VM host has 16 VMs on it and each one has a total memory size
> of
> 92GB, then if that machine doesn't have 1.5 terabytes of memory, you're
> oversubscribed, and this is going to lead to terrible performance...
> but
> the numbers you've shown here do not show terrible performance.
>
> > Plus, on every server, we are seeing lots of exceptions.
> > For example:
> >
> > Between 8:06:55 PM and 8:21:36 PM, exceptions are:
> >
> > 1) Request says it is coming from leader, but we are the leader:
> > update.distrib=FROMLEADER=HOSTB_ca_1_
> 1456430020/=javabin=2
> >
> > 2) org.apache.solr.common.SolrException: Request says it is coming
> from
> > leader, but we are the leader
> >
> > 3) org.apache.solr.common.SolrException:
> > org.apache.solr.client.solrj.SolrServerException: Tried one server
> for read
> > operation and it timed out, so failing fast
> >
> > 4) null:org.apache.solr.common.SolrException:
> > org.apache.solr.client.solrj.SolrServerException: Tried one server
> for read
> > operation and it timed out, so failing fast
> >
> > 5) org.apache.solr.common.SolrException:
> > org.apache.solr.client.solrj.SolrServerException: Tried one server
> for read
> > operation and it timed out, so failing fast
> >
> > 6) null:org.apache.solr.common.SolrException:
> &g

Re: Memory leak in Solr

2016-12-04 Thread Jeff Wartes

Here’s an earlier post where I mentioned some GC investigation tools:
https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201604.mbox/%3c8f8fa32d-ec0e-4352-86f7-4b2d8a906...@whitepages.com%3E

In my experience, there are many aspects of the Solr/Lucene memory allocation 
model that scale with things other than documents returned. (such as 
cardinality, or simply index size) A single query on a large index might 
consume dozens of megabytes of heap to complete. But that heap should also be 
released quickly after the query finishes.
The key characteristic of a memory leak is that the software is allocating 
memory that it cannot reclaim. If it’s a leak, you ought to be able to 
reproduce it at any query rate - have you tried this? A run with, say, half the 
rate, over twice the duration?

I’m inclined to agree with others here, that although you’ve correctly 
attributed the cause to GC, it’s probably less an indication of a leak, and 
more an indication of simply allocating memory faster than it can be reclaimed, 
combined with the long pauses that are increasingly unavoidable as heap size 
goes up.
Note that in the case of a CMS allocation failure, the fallback full-GC is 
*single threaded*, which means it’ll usually take considerably longer than a 
normal GC - even for a comparable amount of garbage.

In addition to GC tuning, you can address these by sharding more, both at the 
core and jvm level.


On 12/4/16, 3:46 PM, "Shawn Heisey" <apa...@elyograg.org> wrote:

On 12/3/2016 9:46 PM, S G wrote:
> The symptom we see is that the java clients querying Solr see response
> times in 10s of seconds (not milliseconds).

> Some numbers for the Solr Cloud:
>
> *Overall infrastructure:*
> - Only one collection
> - 16 VMs used
> - 8 shards (1 leader and 1 replica per shard - each core on separate VM)
>
> *Overview from one core:*
> - Num Docs:193,623,388
> - Max Doc:230,577,696
> - Heap Memory Usage:231,217,880
> - Deleted Docs:36,954,308
> - Version:2,357,757
> - Segment Count:37

The heap memory usage number isn't useful.  It doesn't cover all the
memory used.

> *Stats from QueryHandler/select*
> - requests:78,557
> - errors:358
> - timeouts:0
> - totalTime:1,639,975.27
> - avgRequestsPerSecond:2.62
> - 5minRateReqsPerSecond:1.39
> - 15minRateReqsPerSecond:1.64
> - avgTimePerRequest:20.87
> - medianRequestTime:0.70
> - 75thPcRequestTime:1.11
> - 95thPcRequestTime:191.76

These times are in *milliseconds*, not seconds .. and these are even
better numbers than you showed before.  Where are you seeing 10 plus
second query times?  Solr is not showing numbers like that.

If your VM host has 16 VMs on it and each one has a total memory size of
92GB, then if that machine doesn't have 1.5 terabytes of memory, you're
oversubscribed, and this is going to lead to terrible performance... but
the numbers you've shown here do not show terrible performance.

> Plus, on every server, we are seeing lots of exceptions.
> For example:
>
> Between 8:06:55 PM and 8:21:36 PM, exceptions are:
>
> 1) Request says it is coming from leader, but we are the leader:
> 
update.distrib=FROMLEADER=HOSTB_ca_1_1456430020/=javabin=2
>
> 2) org.apache.solr.common.SolrException: Request says it is coming from
> leader, but we are the leader
>
> 3) org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for 
read
> operation and it timed out, so failing fast
>
> 4) null:org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for 
read
> operation and it timed out, so failing fast
>
> 5) org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for 
read
> operation and it timed out, so failing fast
>
> 6) null:org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for 
read
> operation and it timed out, so failing fast
>
> 7) org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: No live SolrServers
> available to handle this request. Zombie server list:
> [HOSTA_ca_1_1456429897]
>
> 8) null:org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: No live SolrServers
> available to handle this request. Zombie server list:
> [HOSTA_ca_1_1456429897]
>
> 9) org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerEx

Re: Memory leak in Solr

2016-12-04 Thread Shawn Heisey
On 12/3/2016 9:46 PM, S G wrote:
> The symptom we see is that the java clients querying Solr see response
> times in 10s of seconds (not milliseconds).

> Some numbers for the Solr Cloud:
>
> *Overall infrastructure:*
> - Only one collection
> - 16 VMs used
> - 8 shards (1 leader and 1 replica per shard - each core on separate VM)
>
> *Overview from one core:*
> - Num Docs:193,623,388
> - Max Doc:230,577,696
> - Heap Memory Usage:231,217,880
> - Deleted Docs:36,954,308
> - Version:2,357,757
> - Segment Count:37

The heap memory usage number isn't useful.  It doesn't cover all the
memory used.

> *Stats from QueryHandler/select*
> - requests:78,557
> - errors:358
> - timeouts:0
> - totalTime:1,639,975.27
> - avgRequestsPerSecond:2.62
> - 5minRateReqsPerSecond:1.39
> - 15minRateReqsPerSecond:1.64
> - avgTimePerRequest:20.87
> - medianRequestTime:0.70
> - 75thPcRequestTime:1.11
> - 95thPcRequestTime:191.76

These times are in *milliseconds*, not seconds .. and these are even
better numbers than you showed before.  Where are you seeing 10 plus
second query times?  Solr is not showing numbers like that.

If your VM host has 16 VMs on it and each one has a total memory size of
92GB, then if that machine doesn't have 1.5 terabytes of memory, you're
oversubscribed, and this is going to lead to terrible performance... but
the numbers you've shown here do not show terrible performance.

> Plus, on every server, we are seeing lots of exceptions.
> For example:
>
> Between 8:06:55 PM and 8:21:36 PM, exceptions are:
>
> 1) Request says it is coming from leader, but we are the leader:
> update.distrib=FROMLEADER=HOSTB_ca_1_1456430020/=javabin=2
>
> 2) org.apache.solr.common.SolrException: Request says it is coming from
> leader, but we are the leader
>
> 3) org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for read
> operation and it timed out, so failing fast
>
> 4) null:org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for read
> operation and it timed out, so failing fast
>
> 5) org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for read
> operation and it timed out, so failing fast
>
> 6) null:org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for read
> operation and it timed out, so failing fast
>
> 7) org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: No live SolrServers
> available to handle this request. Zombie server list:
> [HOSTA_ca_1_1456429897]
>
> 8) null:org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: No live SolrServers
> available to handle this request. Zombie server list:
> [HOSTA_ca_1_1456429897]
>
> 9) org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for read
> operation and it timed out, so failing fast
>
> 10) null:org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for read
> operation and it timed out, so failing fast
>
> 11) org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for read
> operation and it timed out, so failing fast
>
> 12) null:org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for read
> operation and it timed out, so failing fast

These errors sound like timeouts, possibly caused by long GC pauses ...
but as already mentioned, the query handler statistics do not indicate
long query times.  If a long GC were to happen during a query, then the
query time would be long as well.

The core information above doesn't include the size of the index on
disk.  That number would be useful for telling you whether there's
enough memory.

As I said at the beginning of the thread, I haven't seen anything here
to indicate a memory leak, and others are using version 4.10 without any
problems.  If there were a memory leak in a released version of Solr,
many people would have run into problems with it.

Thanks,
Shawn



Re: Memory leak in Solr

2016-12-04 Thread Walter Underwood
That is a huge heap.

Once you have enough heap memory to hold a Java program’s working set,
more memory doesn’t make it faster. I just makes the GC take longer.

If you have GC monitoring, look at how much memory is in use after a full GC.
Add the space for new generation (eden, whatever), then a bit more for 
burst memory usage. Set the heap to that.

I recommend fairly large new generation memory allocation. An HTTP service
has a fair amount of allocation that has a lifetime of one HTTP request. Those
allocations should never be promoted to tenured space.

We run with an 8G heap and a 2G new generation with 4.10.4.

Of course, make sure you are running some sort of parallel GC. You can use
G1 or use CMS with ParNew, your choice. We are running CMS/ParNew, but
will be experimenting with G1 soon.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Dec 4, 2016, at 11:07 AM, S G <sg.online.em...@gmail.com> wrote:
> 
> Thank you Eric.
> Our Solr version is 4.10 and we are not doing any sorting or faceting.
> 
> I am trying to find some ways of investigating this problem.
> Hence asking a few more questions to see what are the normal steps taken in
> such situations.
> (I did search a few of them on the Internet but could not find anything
> good).
> Any pointers provided here will help us resolve a little more quickly.
> 
> 
> 1) Is there a conclusive way to know about the memory leaks?
>  How does Solr ensure with each release that there are no memory leaks?
>  With a heap 24gb (-Xmx parameter), I sometimes see GC pauses of about 1
> second now.
>  Looks like we will need to scale it down.
>  Total VM memory is 92gb and Solr is the only process running on it.
> 
> 
> 2) How can I know that the zookeeper connectivity to Solr is not good?
>  What commands/steps are normally used to resolve this?
>  Does Solr has some metrics that share the zookeeper interaction
> statistics?
> 
> 
> 3) In a span of 9 hours, I see:
>  4 times: java.net.SocketException: Connection reset
>  32 times: java.net.SocketTimeoutException: Read timed out
> 
> And several other exceptions that ultimately bring a whole shard down
> (leader is recovery-failed and replica is down).
> 
> I understand that the above information might not be sufficient to get the
> full picture.
> But just in case, someone has resolved or debugged these issues before,
> please share your experience.
> It would be of great help to me.
> 
> Thanks,
> SG
> 
> 
> 
> 
> 
> On Sun, Dec 4, 2016 at 8:59 AM, Erick Erickson <erickerick...@gmail.com>
> wrote:
> 
>> All of this is consistent with not having a properly
>> tuned Solr instance wrt # documents, usage
>> pattern, memory allocated to the JVM, GC
>> settings and the like.
>> 
>> Your leader issues can be explained by long
>> GC pauses too. Zookeeper periodically pings
>> each replica it knows about and if the response
>> times out (due to GC in this case) then Zookeeper
>> thinks the node has gone away and marks
>> it as "down". Similarly when a leader forwards
>> an update to a follower and the request times
>> out, the leader will mark the follower as down.
>> Do this enough and the state of the cluster gets
>> "interesting".
>> 
>> You still haven't told us what version of Solr
>> you're using, the "Version" you took from
>> the core stats is the version of the _index_,
>> not Solr.
>> 
>> You have almost 200M documents on
>> a single core. That's definitely on the high side,
>> although I've seen that work. Assuming
>> you aren't doing things like faceting and
>> sorting and the like on non docValues fields.
>> 
>> As others have pointed out, the link you
>> provided doesn't provide much in the way of
>> any "smoking guns" as far as a memory
>> leak is concerned.
>> 
>> I've certainly seen situations where memory
>> required by Solr is close to the total memory
>> allocated to the JVM for instance. Then the GC
>> cycle kicks in and recovers just enough to
>> go on for a very brief time before going into another
>> GC cycle resulting in very poor performance.
>> 
>> So overall this looks like you need to do some
>> serious tuning of your Solr instances, take a
>> hard look at how you're using your physical
>> machines. You specify that these are VMs,
>> but how many VMs are you running per box?
>> How much JVM have you allocated for each?
>> How much total physical memory do you have
>> to work with per box?
>> 
>> Even if you provide the answers to the above
>>

Re: Memory leak in Solr

2016-12-04 Thread S G
Thank you Eric.
Our Solr version is 4.10 and we are not doing any sorting or faceting.

I am trying to find some ways of investigating this problem.
Hence asking a few more questions to see what are the normal steps taken in
such situations.
(I did search a few of them on the Internet but could not find anything
good).
Any pointers provided here will help us resolve a little more quickly.


1) Is there a conclusive way to know about the memory leaks?
  How does Solr ensure with each release that there are no memory leaks?
  With a heap 24gb (-Xmx parameter), I sometimes see GC pauses of about 1
second now.
  Looks like we will need to scale it down.
  Total VM memory is 92gb and Solr is the only process running on it.


2) How can I know that the zookeeper connectivity to Solr is not good?
  What commands/steps are normally used to resolve this?
  Does Solr has some metrics that share the zookeeper interaction
statistics?


3) In a span of 9 hours, I see:
  4 times: java.net.SocketException: Connection reset
  32 times: java.net.SocketTimeoutException: Read timed out

And several other exceptions that ultimately bring a whole shard down
(leader is recovery-failed and replica is down).

I understand that the above information might not be sufficient to get the
full picture.
But just in case, someone has resolved or debugged these issues before,
please share your experience.
It would be of great help to me.

Thanks,
SG





On Sun, Dec 4, 2016 at 8:59 AM, Erick Erickson <erickerick...@gmail.com>
wrote:

> All of this is consistent with not having a properly
> tuned Solr instance wrt # documents, usage
> pattern, memory allocated to the JVM, GC
> settings and the like.
>
> Your leader issues can be explained by long
> GC pauses too. Zookeeper periodically pings
> each replica it knows about and if the response
> times out (due to GC in this case) then Zookeeper
> thinks the node has gone away and marks
> it as "down". Similarly when a leader forwards
> an update to a follower and the request times
> out, the leader will mark the follower as down.
> Do this enough and the state of the cluster gets
> "interesting".
>
> You still haven't told us what version of Solr
> you're using, the "Version" you took from
> the core stats is the version of the _index_,
> not Solr.
>
> You have almost 200M documents on
> a single core. That's definitely on the high side,
> although I've seen that work. Assuming
> you aren't doing things like faceting and
> sorting and the like on non docValues fields.
>
> As others have pointed out, the link you
> provided doesn't provide much in the way of
> any "smoking guns" as far as a memory
> leak is concerned.
>
> I've certainly seen situations where memory
> required by Solr is close to the total memory
> allocated to the JVM for instance. Then the GC
> cycle kicks in and recovers just enough to
> go on for a very brief time before going into another
> GC cycle resulting in very poor performance.
>
> So overall this looks like you need to do some
> serious tuning of your Solr instances, take a
> hard look at how you're using your physical
> machines. You specify that these are VMs,
> but how many VMs are you running per box?
> How much JVM have you allocated for each?
> How much total physical memory do you have
> to work with per box?
>
> Even if you provide the answers to the above
> questions, there's not much we can do to
> help you resolve your issues assuming it's
> simply inappropriate sizing. I'd really recommend
> you create a stress environment so you can
> test different scenarios to become confident about
> your expected performance, here's a blog on the
> subject:
>
> https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-
> the-abstract-why-we-dont-have-a-definitive-answer/
>
> Best,
> Erick
>
> On Sat, Dec 3, 2016 at 8:46 PM, S G <sg.online.em...@gmail.com> wrote:
> > The symptom we see is that the java clients querying Solr see response
> > times in 10s of seconds (not milliseconds).
> > And on the tomcat's gc.log file (where Solr is running), we see very bad
> GC
> > pauses - threads being paused for 0.5 seconds per second approximately.
> >
> > Some numbers for the Solr Cloud:
> >
> > *Overall infrastructure:*
> > - Only one collection
> > - 16 VMs used
> > - 8 shards (1 leader and 1 replica per shard - each core on separate VM)
> >
> > *Overview from one core:*
> > - Num Docs:193,623,388
> > - Max Doc:230,577,696
> > - Heap Memory Usage:231,217,880
> > - Deleted Docs:36,954,308
> > - Version:2,357,757
> > - Segment Count:37
> >
> > *Stats from QueryHandler/select*
> > - requests:78,

Re: Memory leak in Solr

2016-12-04 Thread Erick Erickson
All of this is consistent with not having a properly
tuned Solr instance wrt # documents, usage
pattern, memory allocated to the JVM, GC
settings and the like.

Your leader issues can be explained by long
GC pauses too. Zookeeper periodically pings
each replica it knows about and if the response
times out (due to GC in this case) then Zookeeper
thinks the node has gone away and marks
it as "down". Similarly when a leader forwards
an update to a follower and the request times
out, the leader will mark the follower as down.
Do this enough and the state of the cluster gets
"interesting".

You still haven't told us what version of Solr
you're using, the "Version" you took from
the core stats is the version of the _index_,
not Solr.

You have almost 200M documents on
a single core. That's definitely on the high side,
although I've seen that work. Assuming
you aren't doing things like faceting and
sorting and the like on non docValues fields.

As others have pointed out, the link you
provided doesn't provide much in the way of
any "smoking guns" as far as a memory
leak is concerned.

I've certainly seen situations where memory
required by Solr is close to the total memory
allocated to the JVM for instance. Then the GC
cycle kicks in and recovers just enough to
go on for a very brief time before going into another
GC cycle resulting in very poor performance.

So overall this looks like you need to do some
serious tuning of your Solr instances, take a
hard look at how you're using your physical
machines. You specify that these are VMs,
but how many VMs are you running per box?
How much JVM have you allocated for each?
How much total physical memory do you have
to work with per box?

Even if you provide the answers to the above
questions, there's not much we can do to
help you resolve your issues assuming it's
simply inappropriate sizing. I'd really recommend
you create a stress environment so you can
test different scenarios to become confident about
your expected performance, here's a blog on the
subject:

https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

Best,
Erick

On Sat, Dec 3, 2016 at 8:46 PM, S G <sg.online.em...@gmail.com> wrote:
> The symptom we see is that the java clients querying Solr see response
> times in 10s of seconds (not milliseconds).
> And on the tomcat's gc.log file (where Solr is running), we see very bad GC
> pauses - threads being paused for 0.5 seconds per second approximately.
>
> Some numbers for the Solr Cloud:
>
> *Overall infrastructure:*
> - Only one collection
> - 16 VMs used
> - 8 shards (1 leader and 1 replica per shard - each core on separate VM)
>
> *Overview from one core:*
> - Num Docs:193,623,388
> - Max Doc:230,577,696
> - Heap Memory Usage:231,217,880
> - Deleted Docs:36,954,308
> - Version:2,357,757
> - Segment Count:37
>
> *Stats from QueryHandler/select*
> - requests:78,557
> - errors:358
> - timeouts:0
> - totalTime:1,639,975.27
> - avgRequestsPerSecond:2.62
> - 5minRateReqsPerSecond:1.39
> - 15minRateReqsPerSecond:1.64
> - avgTimePerRequest:20.87
> - medianRequestTime:0.70
> - 75thPcRequestTime:1.11
> - 95thPcRequestTime:191.76
>
> *Stats from QueryHandler/update*
> - requests:33,555
> - errors:0
> - timeouts:0
> - totalTime:227,870.58
> - avgRequestsPerSecond:1.12
> - 5minRateReqsPerSecond:1.16
> - 15minRateReqsPerSecond:1.23
> - avgTimePerRequest:6.79
> - medianRequestTime:3.16
> - 75thPcRequestTime:5.27
> - 95thPcRequestTime:9.33
>
> And yet the Solr clients are reporting timeouts and very long read times.
>
> Plus, on every server, we are seeing lots of exceptions.
> For example:
>
> Between 8:06:55 PM and 8:21:36 PM, exceptions are:
>
> 1) Request says it is coming from leader, but we are the leader:
> update.distrib=FROMLEADER=HOSTB_ca_1_1456430020/=javabin=2
>
> 2) org.apache.solr.common.SolrException: Request says it is coming from
> leader, but we are the leader
>
> 3) org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for read
> operation and it timed out, so failing fast
>
> 4) null:org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for read
> operation and it timed out, so failing fast
>
> 5) org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for read
> operation and it timed out, so failing fast
>
> 6) null:org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerException: Tried one server for read
> operation and it timed out, so failing fast
>
> 7) org.apache.solr.common.SolrException:
> org.apache.solr.client.solrj.SolrServerExcepti

Re: Memory leak in Solr

2016-12-03 Thread S G
The symptom we see is that the java clients querying Solr see response
times in 10s of seconds (not milliseconds).
And on the tomcat's gc.log file (where Solr is running), we see very bad GC
pauses - threads being paused for 0.5 seconds per second approximately.

Some numbers for the Solr Cloud:

*Overall infrastructure:*
- Only one collection
- 16 VMs used
- 8 shards (1 leader and 1 replica per shard - each core on separate VM)

*Overview from one core:*
- Num Docs:193,623,388
- Max Doc:230,577,696
- Heap Memory Usage:231,217,880
- Deleted Docs:36,954,308
- Version:2,357,757
- Segment Count:37

*Stats from QueryHandler/select*
- requests:78,557
- errors:358
- timeouts:0
- totalTime:1,639,975.27
- avgRequestsPerSecond:2.62
- 5minRateReqsPerSecond:1.39
- 15minRateReqsPerSecond:1.64
- avgTimePerRequest:20.87
- medianRequestTime:0.70
- 75thPcRequestTime:1.11
- 95thPcRequestTime:191.76

*Stats from QueryHandler/update*
- requests:33,555
- errors:0
- timeouts:0
- totalTime:227,870.58
- avgRequestsPerSecond:1.12
- 5minRateReqsPerSecond:1.16
- 15minRateReqsPerSecond:1.23
- avgTimePerRequest:6.79
- medianRequestTime:3.16
- 75thPcRequestTime:5.27
- 95thPcRequestTime:9.33

And yet the Solr clients are reporting timeouts and very long read times.

Plus, on every server, we are seeing lots of exceptions.
For example:

Between 8:06:55 PM and 8:21:36 PM, exceptions are:

1) Request says it is coming from leader, but we are the leader:
update.distrib=FROMLEADER=HOSTB_ca_1_1456430020/=javabin=2

2) org.apache.solr.common.SolrException: Request says it is coming from
leader, but we are the leader

3) org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException: Tried one server for read
operation and it timed out, so failing fast

4) null:org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException: Tried one server for read
operation and it timed out, so failing fast

5) org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException: Tried one server for read
operation and it timed out, so failing fast

6) null:org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException: Tried one server for read
operation and it timed out, so failing fast

7) org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException: No live SolrServers
available to handle this request. Zombie server list:
[HOSTA_ca_1_1456429897]

8) null:org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException: No live SolrServers
available to handle this request. Zombie server list:
[HOSTA_ca_1_1456429897]

9) org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException: Tried one server for read
operation and it timed out, so failing fast

10) null:org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException: Tried one server for read
operation and it timed out, so failing fast

11) org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException: Tried one server for read
operation and it timed out, so failing fast

12) null:org.apache.solr.common.SolrException:
org.apache.solr.client.solrj.SolrServerException: Tried one server for read
operation and it timed out, so failing fast

Why are we seeing so many timeouts then and why so huge response times on
the client?

Thanks
SG



On Sat, Dec 3, 2016 at 4:19 PM, <billnb...@gmail.com> wrote:

> What tool is that ? The stats I would like to run on my Solr instance
>
> Bill Bell
> Sent from mobile
>
>
> > On Dec 2, 2016, at 4:49 PM, Shawn Heisey <apa...@elyograg.org> wrote:
> >
> >> On 12/2/2016 12:01 PM, S G wrote:
> >> This post shows some stats on Solr which indicate that there might be a
> >> memory leak in there.
> >>
> >> http://stackoverflow.com/questions/40939166/is-this-a-
> memory-leak-in-solr
> >>
> >> Can someone please help to debug this?
> >> It might be a very good step in making Solr stable if we can fix this.
> >
> > +1 to what Walter said.
> >
> > I replied earlier on the stackoverflow question.
> >
> > FYI -- your 95th percentile request time of about 16 milliseconds is NOT
> > something that I would characterize as "very high."  I would *love* to
> > have statistics that good.
> >
> > Even your 99th percentile request time is not much more than a full
> > second.  If a search takes a couple of seconds, most users will not
> > really care, and some might not even notice.  It's when a large
> > percentage of queries start taking several seconds that complaints start
> > coming in.  On your system, 99 percent of your queries are completing in
> > 1.3 seconds or less, and 95 percent of them are less than 17
> > milliseconds.  That sounds quite good to me

Re: Memory leak in Solr

2016-12-03 Thread billnbell
What tool is that ? The stats I would like to run on my Solr instance 

Bill Bell
Sent from mobile


> On Dec 2, 2016, at 4:49 PM, Shawn Heisey <apa...@elyograg.org> wrote:
> 
>> On 12/2/2016 12:01 PM, S G wrote:
>> This post shows some stats on Solr which indicate that there might be a
>> memory leak in there.
>> 
>> http://stackoverflow.com/questions/40939166/is-this-a-memory-leak-in-solr
>> 
>> Can someone please help to debug this?
>> It might be a very good step in making Solr stable if we can fix this.
> 
> +1 to what Walter said.
> 
> I replied earlier on the stackoverflow question.
> 
> FYI -- your 95th percentile request time of about 16 milliseconds is NOT
> something that I would characterize as "very high."  I would *love* to
> have statistics that good.
> 
> Even your 99th percentile request time is not much more than a full
> second.  If a search takes a couple of seconds, most users will not
> really care, and some might not even notice.  It's when a large
> percentage of queries start taking several seconds that complaints start
> coming in.  On your system, 99 percent of your queries are completing in
> 1.3 seconds or less, and 95 percent of them are less than 17
> milliseconds.  That sounds quite good to me.
> 
> In my experience, the time it takes for the browser to receive the
> search result page and render it is a significant part of the total time
> to see results, and often dwarfs the time spent getting info from Solr.
> 
> Here's some numbers from Solr in my organization:
> 
> requests:   4102054
> errors: 364894
> timeouts:   49
> totalTime:  799446287.45041
> avgRequestsPerSecond:   1.2375565828793849
> 5minRateReqsPerSecond:  0.8444329508327961
> 15minRateReqsPerSecond: 0.8631197328073346
> avgTimePerRequest:  194.88926460997587
> medianRequestTime:  20.8566605
> 75thPcRequestTime:  85.5132884999
> 95thPcRequestTime:  2202.27746654
> 99thPcRequestTime:  5280.375381280002
> 999thPcRequestTime: 6866.020122961001
> 
> The numbers above come from a distributed index that contains 167
> million documents and takes up about 200GB of disk space across two
> machines.
> 
> requests:   192683
> errors: 124
> timeouts:   0
> totalTime:  199380421.985073
> avgRequestsPerSecond0.04722771354554
> 5minRateReqsPerSecon0.00800545427600684
> 15minRateReqsPerSecond: 0.017521222412364163
> avgTimePerRequest:  1034.7587591280653
> medianRequestTime:  541.591858
> 75thPcRequestTime:  1683.83246125
> 95thPcRequestTime:  5644.542019949997
> 99thPcRequestTime:  9445.592394760004
> 999thPcRequestTime: 14602.166640771007
> 
> These numbers are from an index with about 394 million documents, taking
> up nearly 500GB of disk space.  This index is also distributed on
> multiple machines.
> 
> Are you experiencing any problems other than what you perceive as slow
> queries?  I asked some other questions on stackoverflow.  In particular,
> I'd like to know the total memory on the server, the total number of
> documents (maxDoc and numDoc) you're handling with this server, as well
> as the total index size.  What do your queries look like?  What version
> and vendor of Java are you using?  Can you share your config/schema?
> 
> A memory leak is very unlikely, unless your Java or your operating
> system is broken.  I can't say for sure that it's not happening, but
> it's just not something we see around here.
> 
> Here's what I have collected on performance issues in Solr.  This page
> does mostly concern itself with memory, though it touches briefly on
> other topics:
> 
> https://wiki.apache.org/solr/SolrPerformanceProblems
> 
> Thanks,
> Shawn
> 


Re: Memory leak in Solr

2016-12-03 Thread Greg Harris
Hi,

All your stats show is large memory requirements to Solr. There is no
direct mapping of number of documents and queries to memory reqts as
requested in that article. Different Solr projects can yield extremely,
extremely different requirements. If you want to understand your memory
usage better, you need to do a heap dump and to analyze it with something
like Eclipse MemoryAnalyzer or YourKit. Its STW, so you will have a little
bit of downtime. In 4.10 I'd almost already guess that your culprit is not
using docValues for things being faceted, grouped, sorted on leaving you
with a large fieldCache and yielding large memory requirements which will
not be cleaned upon a gc as they are still "live objects". While I couldn't
say that's true for sure without more analysis, its IME, pretty common.

Greg


On Dec 2, 2016 11:01 AM, "S G" <sg.online.em...@gmail.com> wrote:

Hi,

This post shows some stats on Solr which indicate that there might be a
memory leak in there.

http://stackoverflow.com/questions/40939166/is-this-a-memory-leak-in-solr

Can someone please help to debug this?
It might be a very good step in making Solr stable if we can fix this.

Thanks
SG


Re: Memory leak in Solr

2016-12-02 Thread Shawn Heisey
On 12/2/2016 12:01 PM, S G wrote:
> This post shows some stats on Solr which indicate that there might be a
> memory leak in there.
>
> http://stackoverflow.com/questions/40939166/is-this-a-memory-leak-in-solr
>
> Can someone please help to debug this?
> It might be a very good step in making Solr stable if we can fix this.

+1 to what Walter said.

I replied earlier on the stackoverflow question.

FYI -- your 95th percentile request time of about 16 milliseconds is NOT
something that I would characterize as "very high."  I would *love* to
have statistics that good.

Even your 99th percentile request time is not much more than a full
second.  If a search takes a couple of seconds, most users will not
really care, and some might not even notice.  It's when a large
percentage of queries start taking several seconds that complaints start
coming in.  On your system, 99 percent of your queries are completing in
1.3 seconds or less, and 95 percent of them are less than 17
milliseconds.  That sounds quite good to me.

In my experience, the time it takes for the browser to receive the
search result page and render it is a significant part of the total time
to see results, and often dwarfs the time spent getting info from Solr.

Here's some numbers from Solr in my organization:

requests:   4102054
errors: 364894
timeouts:   49
totalTime:  799446287.45041
avgRequestsPerSecond:   1.2375565828793849
5minRateReqsPerSecond:  0.8444329508327961
15minRateReqsPerSecond: 0.8631197328073346
avgTimePerRequest:  194.88926460997587
medianRequestTime:  20.8566605
75thPcRequestTime:  85.5132884999
95thPcRequestTime:  2202.27746654
99thPcRequestTime:  5280.375381280002
999thPcRequestTime: 6866.020122961001

The numbers above come from a distributed index that contains 167
million documents and takes up about 200GB of disk space across two
machines.

requests:   192683
errors: 124
timeouts:   0
totalTime:  199380421.985073
avgRequestsPerSecond0.04722771354554
5minRateReqsPerSecon0.00800545427600684
15minRateReqsPerSecond: 0.017521222412364163
avgTimePerRequest:  1034.7587591280653
medianRequestTime:  541.591858
75thPcRequestTime:  1683.83246125
95thPcRequestTime:  5644.542019949997
99thPcRequestTime:  9445.592394760004
999thPcRequestTime: 14602.166640771007

These numbers are from an index with about 394 million documents, taking
up nearly 500GB of disk space.  This index is also distributed on
multiple machines.

Are you experiencing any problems other than what you perceive as slow
queries?  I asked some other questions on stackoverflow.  In particular,
I'd like to know the total memory on the server, the total number of
documents (maxDoc and numDoc) you're handling with this server, as well
as the total index size.  What do your queries look like?  What version
and vendor of Java are you using?  Can you share your config/schema?

A memory leak is very unlikely, unless your Java or your operating
system is broken.  I can't say for sure that it's not happening, but
it's just not something we see around here.

Here's what I have collected on performance issues in Solr.  This page
does mostly concern itself with memory, though it touches briefly on
other topics:

https://wiki.apache.org/solr/SolrPerformanceProblems

Thanks,
Shawn



Re: Memory leak in Solr

2016-12-02 Thread Scott Blum
Are you sure it's an actual leak, not just memory pinned by caches?

Related: https://issues.apache.org/jira/browse/SOLR-9810

On Fri, Dec 2, 2016 at 2:01 PM, S G <sg.online.em...@gmail.com> wrote:

> Hi,
>
> This post shows some stats on Solr which indicate that there might be a
> memory leak in there.
>
> http://stackoverflow.com/questions/40939166/is-this-a-memory-leak-in-solr
>
> Can someone please help to debug this?
> It might be a very good step in making Solr stable if we can fix this.
>
> Thanks
> SG
>


Re: Memory leak in Solr

2016-12-02 Thread Walter Underwood
We’ve been running Solr 4.10.4 in prod for a couple of years. There aren’t any 
obvious
memory leaks in it. It stays up for months.

Objects ejected from the cache will almost always be tenured, so that tends to 
cause 
full GCs.

If there are very few repeats in your query load, you’ll see a lot of cache 
ejections. 
This can also happen if you have an HTTP cache in front of the Solr hosts.
What are the hit rates on the Solr caches?

Also, are you using “NOW” in your queries? That will cause a very low hit rate
on the query result cache.

We can’t help without a lot more information, like your search architecture, 
the 
search collections, the query load, cache sizes, etc.

Finally, this is not a question for the dev list. This belongs on solr-user, so 
I’m
dropping the reply to the dev list.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Dec 2, 2016, at 11:01 AM, S G <sg.online.em...@gmail.com> wrote:
> 
> Hi,
> 
> This post shows some stats on Solr which indicate that there might be a 
> memory leak in there.
> 
> http://stackoverflow.com/questions/40939166/is-this-a-memory-leak-in-solr 
> <http://stackoverflow.com/questions/40939166/is-this-a-memory-leak-in-solr>
> 
> Can someone please help to debug this?
> It might be a very good step in making Solr stable if we can fix this.
> 
> Thanks
> SG



Memory leak in Solr

2016-12-02 Thread S G
Hi,

This post shows some stats on Solr which indicate that there might be a
memory leak in there.

http://stackoverflow.com/questions/40939166/is-this-a-memory-leak-in-solr

Can someone please help to debug this?
It might be a very good step in making Solr stable if we can fix this.

Thanks
SG


Re: File Descriptor/Memory Leak

2016-07-10 Thread Alexandre Rafalovitch
If this is reproducible, I would run the comparison under Wireshark
(used to be called Ehtereal) https://www.wireshark.org/ . It would
capture full network traffic and can even be run on a machine separate
from either client or server (in promiscuous mode).

Then, I would look at number of connections differences between HTTP
and HTTPS for the same test. Perhaps HTTP is doing request pipelining
and HTTPS does not. This would lead to more sockets (and more
CLOSE_WAITs) for the same content.

If the number of connection is the same, then I would pick a similar
transaction and see the delays between the closing sequence
FIN/SYN/whatever packets. If, after the server sends the closing
packet, the client does not reply as fast with its own closing packet
under HTTPS, then the problem is socket closing code. Obviously, SSL
establishment of the connection is more painful/expensive than
non-SSL, but the issue here is closing of one.

This was the way I troubleshooted these scenarios many years ago as
Weblogic senior tech support. I still think approaching this from
network up is the most viable approach.

Regards,
   Alex.


Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 10 July 2016 at 17:05, Shai Erera <ser...@gmail.com> wrote:
> There is no firewall and the CLOSE_WAITs are between Solr-to-Solr nodes
> (the origin and destination IP:PORT belong to Solr).
>
> Also, note that the same test runs fine on 5.4.1, even though there are
> still few hundreds of CLOSE_WAITs. I'm looking at what has changed in the
> code between 5.4.1 and 5.5.1. It's also only reproducible when Solr is run
> in SSL mode, so the problem might lie in HttpClient/Jetty too.
>
> Shai
>
> On Fri, Jul 8, 2016 at 11:59 AM Alexandre Rafalovitch <arafa...@gmail.com>
> wrote:
>
>> Is there a firewall between a client and a server by any chance?
>>
>> CLOSE_WAIT is not a leak, but standard TCP step at the end. So the question
>> is why sockets are reopened that often or why the other side does not
>> acknowledge TCP termination packet fast.
>>
>> I would run Ethereal to troubleshoot that. And truss/strace.
>>
>> Regards,
>> Alex
>> On 8 Jul 2016 4:56 PM, "Mads Tomasgård Bjørgan" <m...@dips.no> wrote:
>>
>> FYI - we're using Solr-6.1.0, and the leak seems to be consequent (occurs
>> every single time when running with SSL).
>>
>> -Original Message-
>> From: Anshum Gupta [mailto:ans...@anshumgupta.net]
>> Sent: torsdag 7. juli 2016 18.14
>> To: solr-user@lucene.apache.org
>> Subject: Re: File Descriptor/Memory Leak
>>
>> I've created a JIRA to track this:
>> https://issues.apache.org/jira/browse/SOLR-9290
>>
>> On Thu, Jul 7, 2016 at 8:00 AM, Shai Erera <ser...@gmail.com> wrote:
>>
>> > Shalin, we're seeing that issue too (and actually actively debugging
>> > it these days). So far I can confirm the following (on a 2-node cluster):
>> >
>> > 1) It consistently reproduces on 5.5.1, but *does not* reproduce on
>> > 5.4.1
>> > 2) It does not reproduce when SSL is disabled
>> > 3) Restarting the Solr process (sometimes both need to be restarted),
>> > the count drops to 0, but if indexing continues, they climb up again
>> >
>> > When it does happen, Solr seems stuck. The leader cannot talk to the
>> > replica, or vice versa, the replica is usually put in DOWN state and
>> > there's no way to fix it besides restarting the JVM.
>> >
>> > Reviewing the changes from 5.4.1 to 5.5.1 I tried reverting some that
>> > looked suspicious (SOLR-8451 and SOLR-8578), even though the changes
>> > look legit. That did not help, and honestly I've done that before we
>> > suspected it might be the SSL. Therefore I think those are "safe", but
>> just FYI.
>> >
>> > When it does happen, the number of CLOSE_WAITS climb very high, to the
>> > order of 30K+ entries in 'netstat'.
>> >
>> > When I say it does not reproduce on 5.4.1 I really mean the numbers
>> > don't go as high as they do in 5.5.1. Meaning, when running without
>> > SSL, the number of CLOSE_WAITs is smallish, usually less than a 10 (I
>> > would separately like to understand why we have any in that state at
>> > all). When running with SSL and 5.4.1, they stay low at the order of
>> > hundreds the most.
>> >
>> > Unfortunately running without SSL is not an option for us. We will
>> > likely roll back to 5.4.1, even if the problem exists there, but to a
>> > lesser degree.
>> >
>> > I will post back here when/if we ha

Re: File Descriptor/Memory Leak

2016-07-10 Thread Shai Erera
There is no firewall and the CLOSE_WAITs are between Solr-to-Solr nodes
(the origin and destination IP:PORT belong to Solr).

Also, note that the same test runs fine on 5.4.1, even though there are
still few hundreds of CLOSE_WAITs. I'm looking at what has changed in the
code between 5.4.1 and 5.5.1. It's also only reproducible when Solr is run
in SSL mode, so the problem might lie in HttpClient/Jetty too.

Shai

On Fri, Jul 8, 2016 at 11:59 AM Alexandre Rafalovitch <arafa...@gmail.com>
wrote:

> Is there a firewall between a client and a server by any chance?
>
> CLOSE_WAIT is not a leak, but standard TCP step at the end. So the question
> is why sockets are reopened that often or why the other side does not
> acknowledge TCP termination packet fast.
>
> I would run Ethereal to troubleshoot that. And truss/strace.
>
> Regards,
> Alex
> On 8 Jul 2016 4:56 PM, "Mads Tomasgård Bjørgan" <m...@dips.no> wrote:
>
> FYI - we're using Solr-6.1.0, and the leak seems to be consequent (occurs
> every single time when running with SSL).
>
> -Original Message-
> From: Anshum Gupta [mailto:ans...@anshumgupta.net]
> Sent: torsdag 7. juli 2016 18.14
> To: solr-user@lucene.apache.org
> Subject: Re: File Descriptor/Memory Leak
>
> I've created a JIRA to track this:
> https://issues.apache.org/jira/browse/SOLR-9290
>
> On Thu, Jul 7, 2016 at 8:00 AM, Shai Erera <ser...@gmail.com> wrote:
>
> > Shalin, we're seeing that issue too (and actually actively debugging
> > it these days). So far I can confirm the following (on a 2-node cluster):
> >
> > 1) It consistently reproduces on 5.5.1, but *does not* reproduce on
> > 5.4.1
> > 2) It does not reproduce when SSL is disabled
> > 3) Restarting the Solr process (sometimes both need to be restarted),
> > the count drops to 0, but if indexing continues, they climb up again
> >
> > When it does happen, Solr seems stuck. The leader cannot talk to the
> > replica, or vice versa, the replica is usually put in DOWN state and
> > there's no way to fix it besides restarting the JVM.
> >
> > Reviewing the changes from 5.4.1 to 5.5.1 I tried reverting some that
> > looked suspicious (SOLR-8451 and SOLR-8578), even though the changes
> > look legit. That did not help, and honestly I've done that before we
> > suspected it might be the SSL. Therefore I think those are "safe", but
> just FYI.
> >
> > When it does happen, the number of CLOSE_WAITS climb very high, to the
> > order of 30K+ entries in 'netstat'.
> >
> > When I say it does not reproduce on 5.4.1 I really mean the numbers
> > don't go as high as they do in 5.5.1. Meaning, when running without
> > SSL, the number of CLOSE_WAITs is smallish, usually less than a 10 (I
> > would separately like to understand why we have any in that state at
> > all). When running with SSL and 5.4.1, they stay low at the order of
> > hundreds the most.
> >
> > Unfortunately running without SSL is not an option for us. We will
> > likely roll back to 5.4.1, even if the problem exists there, but to a
> > lesser degree.
> >
> > I will post back here when/if we have more info about this.
> >
> > Shai
> >
> > On Thu, Jul 7, 2016 at 5:32 PM Shalin Shekhar Mangar <
> > shalinman...@gmail.com>
> > wrote:
> >
> > > I have myself seen this CLOSE_WAIT issue at a customer. I am running
> > > some tests with different versions trying to pinpoint the cause of this
> leak.
> > > Once I have some more information and a reproducible test, I'll open
> > > a
> > jira
> > > issue. I'll keep you posted.
> > >
> > > On Thu, Jul 7, 2016 at 5:13 PM, Mads Tomasgård Bjørgan <m...@dips.no>
> > > wrote:
> > >
> > > > Hello there,
> > > > Our SolrCloud is experiencing a FD leak while running with SSL.
> > > > This is occurring on the one machine that our program is sending
> > > > data too. We
> > > have
> > > > a total of three servers running as an ensemble.
> > > >
> > > > While running without SSL does the FD Count remain quite constant
> > > > at around 180 while indexing. Performing a garbage collection also
> > > > clears almost the entire JVM-memory.
> > > >
> > > > However - when indexing with SSL does the FDC grow polynomial. The
> > count
> > > > increases with a few hundred every five seconds or so, but reaches
> > easily
> > > > 50 000 within three to four minutes. Performing a GC swipes most
> > > > of the memory on the two machines our program isn't transmitting
> > > > the data
> > > directly
> > > > to. The last machine is unaffected by the GC, and both memory nor
> > > > FDC doesn't reset before Solr is restarted on that machine.
> > > >
> > > > Performing a netstat reveals that the FDC mostly consists of
> > > > TCP-connections in the state of "CLOSE_WAIT".
> > > >
> > > >
> > > >
> > >
> > >
> > > --
> > > Regards,
> > > Shalin Shekhar Mangar.
> > >
> >
>
>
>
> --
> Anshum Gupta
>


RE: File Descriptor/Memory Leak

2016-07-08 Thread Alexandre Rafalovitch
Is there a firewall between a client and a server by any chance?

CLOSE_WAIT is not a leak, but standard TCP step at the end. So the question
is why sockets are reopened that often or why the other side does not
acknowledge TCP termination packet fast.

I would run Ethereal to troubleshoot that. And truss/strace.

Regards,
Alex
On 8 Jul 2016 4:56 PM, "Mads Tomasgård Bjørgan" <m...@dips.no> wrote:

FYI - we're using Solr-6.1.0, and the leak seems to be consequent (occurs
every single time when running with SSL).

-Original Message-
From: Anshum Gupta [mailto:ans...@anshumgupta.net]
Sent: torsdag 7. juli 2016 18.14
To: solr-user@lucene.apache.org
Subject: Re: File Descriptor/Memory Leak

I've created a JIRA to track this:
https://issues.apache.org/jira/browse/SOLR-9290

On Thu, Jul 7, 2016 at 8:00 AM, Shai Erera <ser...@gmail.com> wrote:

> Shalin, we're seeing that issue too (and actually actively debugging
> it these days). So far I can confirm the following (on a 2-node cluster):
>
> 1) It consistently reproduces on 5.5.1, but *does not* reproduce on
> 5.4.1
> 2) It does not reproduce when SSL is disabled
> 3) Restarting the Solr process (sometimes both need to be restarted),
> the count drops to 0, but if indexing continues, they climb up again
>
> When it does happen, Solr seems stuck. The leader cannot talk to the
> replica, or vice versa, the replica is usually put in DOWN state and
> there's no way to fix it besides restarting the JVM.
>
> Reviewing the changes from 5.4.1 to 5.5.1 I tried reverting some that
> looked suspicious (SOLR-8451 and SOLR-8578), even though the changes
> look legit. That did not help, and honestly I've done that before we
> suspected it might be the SSL. Therefore I think those are "safe", but
just FYI.
>
> When it does happen, the number of CLOSE_WAITS climb very high, to the
> order of 30K+ entries in 'netstat'.
>
> When I say it does not reproduce on 5.4.1 I really mean the numbers
> don't go as high as they do in 5.5.1. Meaning, when running without
> SSL, the number of CLOSE_WAITs is smallish, usually less than a 10 (I
> would separately like to understand why we have any in that state at
> all). When running with SSL and 5.4.1, they stay low at the order of
> hundreds the most.
>
> Unfortunately running without SSL is not an option for us. We will
> likely roll back to 5.4.1, even if the problem exists there, but to a
> lesser degree.
>
> I will post back here when/if we have more info about this.
>
> Shai
>
> On Thu, Jul 7, 2016 at 5:32 PM Shalin Shekhar Mangar <
> shalinman...@gmail.com>
> wrote:
>
> > I have myself seen this CLOSE_WAIT issue at a customer. I am running
> > some tests with different versions trying to pinpoint the cause of this
leak.
> > Once I have some more information and a reproducible test, I'll open
> > a
> jira
> > issue. I'll keep you posted.
> >
> > On Thu, Jul 7, 2016 at 5:13 PM, Mads Tomasgård Bjørgan <m...@dips.no>
> > wrote:
> >
> > > Hello there,
> > > Our SolrCloud is experiencing a FD leak while running with SSL.
> > > This is occurring on the one machine that our program is sending
> > > data too. We
> > have
> > > a total of three servers running as an ensemble.
> > >
> > > While running without SSL does the FD Count remain quite constant
> > > at around 180 while indexing. Performing a garbage collection also
> > > clears almost the entire JVM-memory.
> > >
> > > However - when indexing with SSL does the FDC grow polynomial. The
> count
> > > increases with a few hundred every five seconds or so, but reaches
> easily
> > > 50 000 within three to four minutes. Performing a GC swipes most
> > > of the memory on the two machines our program isn't transmitting
> > > the data
> > directly
> > > to. The last machine is unaffected by the GC, and both memory nor
> > > FDC doesn't reset before Solr is restarted on that machine.
> > >
> > > Performing a netstat reveals that the FDC mostly consists of
> > > TCP-connections in the state of "CLOSE_WAIT".
> > >
> > >
> > >
> >
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
> >
>



--
Anshum Gupta


RE: File Descriptor/Memory Leak

2016-07-08 Thread Mads Tomasgård Bjørgan
FYI - we're using Solr-6.1.0, and the leak seems to be consequent (occurs every 
single time when running with SSL).

-Original Message-
From: Anshum Gupta [mailto:ans...@anshumgupta.net] 
Sent: torsdag 7. juli 2016 18.14
To: solr-user@lucene.apache.org
Subject: Re: File Descriptor/Memory Leak

I've created a JIRA to track this:
https://issues.apache.org/jira/browse/SOLR-9290

On Thu, Jul 7, 2016 at 8:00 AM, Shai Erera <ser...@gmail.com> wrote:

> Shalin, we're seeing that issue too (and actually actively debugging 
> it these days). So far I can confirm the following (on a 2-node cluster):
>
> 1) It consistently reproduces on 5.5.1, but *does not* reproduce on 
> 5.4.1
> 2) It does not reproduce when SSL is disabled
> 3) Restarting the Solr process (sometimes both need to be restarted), 
> the count drops to 0, but if indexing continues, they climb up again
>
> When it does happen, Solr seems stuck. The leader cannot talk to the 
> replica, or vice versa, the replica is usually put in DOWN state and 
> there's no way to fix it besides restarting the JVM.
>
> Reviewing the changes from 5.4.1 to 5.5.1 I tried reverting some that 
> looked suspicious (SOLR-8451 and SOLR-8578), even though the changes 
> look legit. That did not help, and honestly I've done that before we 
> suspected it might be the SSL. Therefore I think those are "safe", but just 
> FYI.
>
> When it does happen, the number of CLOSE_WAITS climb very high, to the 
> order of 30K+ entries in 'netstat'.
>
> When I say it does not reproduce on 5.4.1 I really mean the numbers 
> don't go as high as they do in 5.5.1. Meaning, when running without 
> SSL, the number of CLOSE_WAITs is smallish, usually less than a 10 (I 
> would separately like to understand why we have any in that state at 
> all). When running with SSL and 5.4.1, they stay low at the order of 
> hundreds the most.
>
> Unfortunately running without SSL is not an option for us. We will 
> likely roll back to 5.4.1, even if the problem exists there, but to a 
> lesser degree.
>
> I will post back here when/if we have more info about this.
>
> Shai
>
> On Thu, Jul 7, 2016 at 5:32 PM Shalin Shekhar Mangar < 
> shalinman...@gmail.com>
> wrote:
>
> > I have myself seen this CLOSE_WAIT issue at a customer. I am running 
> > some tests with different versions trying to pinpoint the cause of this 
> > leak.
> > Once I have some more information and a reproducible test, I'll open 
> > a
> jira
> > issue. I'll keep you posted.
> >
> > On Thu, Jul 7, 2016 at 5:13 PM, Mads Tomasgård Bjørgan <m...@dips.no>
> > wrote:
> >
> > > Hello there,
> > > Our SolrCloud is experiencing a FD leak while running with SSL. 
> > > This is occurring on the one machine that our program is sending 
> > > data too. We
> > have
> > > a total of three servers running as an ensemble.
> > >
> > > While running without SSL does the FD Count remain quite constant 
> > > at around 180 while indexing. Performing a garbage collection also 
> > > clears almost the entire JVM-memory.
> > >
> > > However - when indexing with SSL does the FDC grow polynomial. The
> count
> > > increases with a few hundred every five seconds or so, but reaches
> easily
> > > 50 000 within three to four minutes. Performing a GC swipes most 
> > > of the memory on the two machines our program isn't transmitting 
> > > the data
> > directly
> > > to. The last machine is unaffected by the GC, and both memory nor 
> > > FDC doesn't reset before Solr is restarted on that machine.
> > >
> > > Performing a netstat reveals that the FDC mostly consists of 
> > > TCP-connections in the state of "CLOSE_WAIT".
> > >
> > >
> > >
> >
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
> >
>



--
Anshum Gupta


Re: File Descriptor/Memory Leak

2016-07-07 Thread Anshum Gupta
I've created a JIRA to track this:
https://issues.apache.org/jira/browse/SOLR-9290

On Thu, Jul 7, 2016 at 8:00 AM, Shai Erera  wrote:

> Shalin, we're seeing that issue too (and actually actively debugging it
> these days). So far I can confirm the following (on a 2-node cluster):
>
> 1) It consistently reproduces on 5.5.1, but *does not* reproduce on 5.4.1
> 2) It does not reproduce when SSL is disabled
> 3) Restarting the Solr process (sometimes both need to be restarted), the
> count drops to 0, but if indexing continues, they climb up again
>
> When it does happen, Solr seems stuck. The leader cannot talk to the
> replica, or vice versa, the replica is usually put in DOWN state and
> there's no way to fix it besides restarting the JVM.
>
> Reviewing the changes from 5.4.1 to 5.5.1 I tried reverting some that
> looked suspicious (SOLR-8451 and SOLR-8578), even though the changes look
> legit. That did not help, and honestly I've done that before we suspected
> it might be the SSL. Therefore I think those are "safe", but just FYI.
>
> When it does happen, the number of CLOSE_WAITS climb very high, to the
> order of 30K+ entries in 'netstat'.
>
> When I say it does not reproduce on 5.4.1 I really mean the numbers don't
> go as high as they do in 5.5.1. Meaning, when running without SSL, the
> number of CLOSE_WAITs is smallish, usually less than a 10 (I would
> separately like to understand why we have any in that state at all). When
> running with SSL and 5.4.1, they stay low at the order of hundreds the
> most.
>
> Unfortunately running without SSL is not an option for us. We will likely
> roll back to 5.4.1, even if the problem exists there, but to a lesser
> degree.
>
> I will post back here when/if we have more info about this.
>
> Shai
>
> On Thu, Jul 7, 2016 at 5:32 PM Shalin Shekhar Mangar <
> shalinman...@gmail.com>
> wrote:
>
> > I have myself seen this CLOSE_WAIT issue at a customer. I am running some
> > tests with different versions trying to pinpoint the cause of this leak.
> > Once I have some more information and a reproducible test, I'll open a
> jira
> > issue. I'll keep you posted.
> >
> > On Thu, Jul 7, 2016 at 5:13 PM, Mads Tomasgård Bjørgan 
> > wrote:
> >
> > > Hello there,
> > > Our SolrCloud is experiencing a FD leak while running with SSL. This is
> > > occurring on the one machine that our program is sending data too. We
> > have
> > > a total of three servers running as an ensemble.
> > >
> > > While running without SSL does the FD Count remain quite constant at
> > > around 180 while indexing. Performing a garbage collection also clears
> > > almost the entire JVM-memory.
> > >
> > > However - when indexing with SSL does the FDC grow polynomial. The
> count
> > > increases with a few hundred every five seconds or so, but reaches
> easily
> > > 50 000 within three to four minutes. Performing a GC swipes most of the
> > > memory on the two machines our program isn't transmitting the data
> > directly
> > > to. The last machine is unaffected by the GC, and both memory nor FDC
> > > doesn't reset before Solr is restarted on that machine.
> > >
> > > Performing a netstat reveals that the FDC mostly consists of
> > > TCP-connections in the state of "CLOSE_WAIT".
> > >
> > >
> > >
> >
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
> >
>



-- 
Anshum Gupta


Re: File Descriptor/Memory Leak

2016-07-07 Thread Shai Erera
Shalin, we're seeing that issue too (and actually actively debugging it
these days). So far I can confirm the following (on a 2-node cluster):

1) It consistently reproduces on 5.5.1, but *does not* reproduce on 5.4.1
2) It does not reproduce when SSL is disabled
3) Restarting the Solr process (sometimes both need to be restarted), the
count drops to 0, but if indexing continues, they climb up again

When it does happen, Solr seems stuck. The leader cannot talk to the
replica, or vice versa, the replica is usually put in DOWN state and
there's no way to fix it besides restarting the JVM.

Reviewing the changes from 5.4.1 to 5.5.1 I tried reverting some that
looked suspicious (SOLR-8451 and SOLR-8578), even though the changes look
legit. That did not help, and honestly I've done that before we suspected
it might be the SSL. Therefore I think those are "safe", but just FYI.

When it does happen, the number of CLOSE_WAITS climb very high, to the
order of 30K+ entries in 'netstat'.

When I say it does not reproduce on 5.4.1 I really mean the numbers don't
go as high as they do in 5.5.1. Meaning, when running without SSL, the
number of CLOSE_WAITs is smallish, usually less than a 10 (I would
separately like to understand why we have any in that state at all). When
running with SSL and 5.4.1, they stay low at the order of hundreds the most.

Unfortunately running without SSL is not an option for us. We will likely
roll back to 5.4.1, even if the problem exists there, but to a lesser
degree.

I will post back here when/if we have more info about this.

Shai

On Thu, Jul 7, 2016 at 5:32 PM Shalin Shekhar Mangar 
wrote:

> I have myself seen this CLOSE_WAIT issue at a customer. I am running some
> tests with different versions trying to pinpoint the cause of this leak.
> Once I have some more information and a reproducible test, I'll open a jira
> issue. I'll keep you posted.
>
> On Thu, Jul 7, 2016 at 5:13 PM, Mads Tomasgård Bjørgan 
> wrote:
>
> > Hello there,
> > Our SolrCloud is experiencing a FD leak while running with SSL. This is
> > occurring on the one machine that our program is sending data too. We
> have
> > a total of three servers running as an ensemble.
> >
> > While running without SSL does the FD Count remain quite constant at
> > around 180 while indexing. Performing a garbage collection also clears
> > almost the entire JVM-memory.
> >
> > However - when indexing with SSL does the FDC grow polynomial. The count
> > increases with a few hundred every five seconds or so, but reaches easily
> > 50 000 within three to four minutes. Performing a GC swipes most of the
> > memory on the two machines our program isn't transmitting the data
> directly
> > to. The last machine is unaffected by the GC, and both memory nor FDC
> > doesn't reset before Solr is restarted on that machine.
> >
> > Performing a netstat reveals that the FDC mostly consists of
> > TCP-connections in the state of "CLOSE_WAIT".
> >
> >
> >
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Re: File Descriptor/Memory Leak

2016-07-07 Thread Shalin Shekhar Mangar
I have myself seen this CLOSE_WAIT issue at a customer. I am running some
tests with different versions trying to pinpoint the cause of this leak.
Once I have some more information and a reproducible test, I'll open a jira
issue. I'll keep you posted.

On Thu, Jul 7, 2016 at 5:13 PM, Mads Tomasgård Bjørgan  wrote:

> Hello there,
> Our SolrCloud is experiencing a FD leak while running with SSL. This is
> occurring on the one machine that our program is sending data too. We have
> a total of three servers running as an ensemble.
>
> While running without SSL does the FD Count remain quite constant at
> around 180 while indexing. Performing a garbage collection also clears
> almost the entire JVM-memory.
>
> However - when indexing with SSL does the FDC grow polynomial. The count
> increases with a few hundred every five seconds or so, but reaches easily
> 50 000 within three to four minutes. Performing a GC swipes most of the
> memory on the two machines our program isn't transmitting the data directly
> to. The last machine is unaffected by the GC, and both memory nor FDC
> doesn't reset before Solr is restarted on that machine.
>
> Performing a netstat reveals that the FDC mostly consists of
> TCP-connections in the state of "CLOSE_WAIT".
>
>
>


-- 
Regards,
Shalin Shekhar Mangar.


File Descriptor/Memory Leak

2016-07-07 Thread Mads Tomasgård Bjørgan
Hello there,
Our SolrCloud is experiencing a FD leak while running with SSL. This is 
occurring on the one machine that our program is sending data too. We have a 
total of three servers running as an ensemble.

While running without SSL does the FD Count remain quite constant at around 180 
while indexing. Performing a garbage collection also clears almost the entire 
JVM-memory.

However - when indexing with SSL does the FDC grow polynomial. The count 
increases with a few hundred every five seconds or so, but reaches easily 50 
000 within three to four minutes. Performing a GC swipes most of the memory on 
the two machines our program isn't transmitting the data directly to. The last 
machine is unaffected by the GC, and both memory nor FDC doesn't reset before 
Solr is restarted on that machine.

Performing a netstat reveals that the FDC mostly consists of TCP-connections in 
the state of "CLOSE_WAIT".




Re: Memory leak defect or misssuse of SolrJ API?

2016-02-01 Thread Shawn Heisey
On 1/30/2016 6:15 AM, Steven White wrote:
> I'm getting memory leak in my code.  I narrowed the code to the following
> minimal to cause the leak.
>
> while (true) {
> HttpSolrClient client = new HttpSolrClient(" 
> http://192.168.202.129:8983/solr/core1;);
> client.close();
> }
>
> Is this a defect or an issue in the way I'm using HttpSolrClient?

As mentioned by others, you are indeed using HttpSolrClient
incorrectly.  Even so, the fact that this code causes OOM does indicate
that *something* is leaking in your environment.

I could not reproduce the leak.  I tried the above code loop in some
test code (as a testcase in the branch_5x code) and could not get it to
OOM orshow any evidence of a leak.  I let it run for ten minutes on a
512MB heap, which produced this jconsole memory graph:

https://www.dropbox.com/s/em392mx1gr6af67/client-loop-memory-graph.png?dl=0

That memory graph does not look like a program with a memory leak. 
Here's the test code that I was running -- specifically, the
testFullClient() method:

https://www.dropbox.com/s/dooy5bayv4hu6jk/TestHttpSolrClientMemoryLeak.java?dl=0

What versions of the dependent jars do you have in your project?  There
might be something leaking in a dependency rather than within SolrJ.

I also set up a test program using SolrJ 5.2.1, with updated
dependencies beyond the versions included with SolrJ, and could not get
that to show a leak either.

Thanks,
Shawn



Re: Memory leak defect or misssuse of SolrJ API?

2016-01-31 Thread Walter Underwood
I already answered this.

Move the creation of the HttpSolrClient outside the loop. Your code will run 
much fast, because it will be able to reuse the connections.

Put another way, your program should have exactly as many HttpSolrClient 
objects as there are servers it talks to. If there is one Solr server, you have 
one object.

There is no leak in HttpSolrClient, you are misusing the class, massively.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Jan 31, 2016, at 2:10 PM, Steven White <swhite4...@gmail.com> wrote:
> 
> Thank you all for your feedback.
> 
> This is code that I inherited and the example i gave is intended to
> demonstrate the memory leak which based on YourKit is
> on java/util/LinkedHashMap$Entry.  In short, I'm getting core dumps with
> "Detail "java/lang/OutOfMemoryError" "Java heap space" received "
> 
> Here is a more detailed layout of the code.  This is a crawler that runs
> 24x7 without any recycle logic in place:
> 
>init_data()
> 
>while (true)
>{
>HttpSolrClient client = new HttpSolrClient("
> http://localhost:8983/solr/core1 <http://192.168.202.129:8983/solr/core1>/");
> <<<< this is real code
> 
>see_if_we_have_new_data();
> 
>send_new_data_to_solr();
> 
>client.close();<<<< this is real code
> 
>sleep_for_a_bit(N);<<<< 'N' can be any positive int
>}
> 
> By default, our Java program is given 4gb of ram "-Xmx4g" and N is set for
> 5 min.  We had a customer set N to 10 second and we started seeing core
> dumps with OOM.  As I started to debug, I narrowed the OOM to
> HttpSolrClient per my original email.
> 
> The follow up answers I got suggest that I move the construction of
> HttpSolrClient object outside the while loop which I did (but I also had to
> move "client.close()" outside the loop) and the leak is gone.
> 
> Give this, is this how HttpSolrClient is suppose to be used?  If so, what's
> the point of HttpSolrClient.close()?
> 
> Another side question.  I noticed HttpSolrClient has a setBaseUrl().  Now,
> if I call it and give it "http://localhost:8983/solr/core1
> <http://192.168.202.129:8983/solr/core1>/" (ntoice the "/" at the end) next
> time I use HttpSolrClient to send Solr data, I get back 404. The fix is to
> remove the ending "/".  This is not how the constructor of HttpSolrClient
> behaves; HttpSolrClient will take the URL with or without "/".
> 
> In summary, it would be good if someone can confirm f we have a memory leak
> in HttpSolrClient if used per my example; if so this is a defect.  Also,
> can someone confirm the fix I used for this issue: move the constructor of
> HttpSolrClient outside the loop and reuse the existing object "client".
> 
> Again, thank you all for the quick response it is much appreciated.
> 
> Steve
> 
> 
> 
> On Sat, Jan 30, 2016 at 1:24 PM, Erick Erickson <erickerick...@gmail.com>
> wrote:
> 
>> Assuming you're not really using code like above and it's a test case
>> 
>> What's your evidence that memory consumption goes up? Are you sure
>> you're not just seeing uncollected garbage?
>> 
>> When I attached Java Mission Control to this program it looked pretty
>> scary at first, but the heap allocated after old generation garbage
>> collections leveled out to a steady state.
>> 
>> 
>> On Sat, Jan 30, 2016 at 9:29 AM, Walter Underwood <wun...@wunderwood.org>
>> wrote:
>>> Create one HttpSolrClient object for each Solr server you are talking
>> to. Reuse it for all requests to that Solr server.
>>> 
>>> It will manage a pool of connections and keep them alive for faster
>> communication.
>>> 
>>> I took a look at the JavaDoc and the wiki doc, neither one explains this
>> well. I don’t think they even point out what is thread safe.
>>> 
>>> wunder
>>> Walter Underwood
>>> wun...@wunderwood.org
>>> http://observer.wunderwood.org/  (my blog)
>>> 
>>> 
>>>> On Jan 30, 2016, at 7:42 AM, Susheel Kumar <susheel2...@gmail.com>
>> wrote:
>>>> 
>>>> Hi Steve,
>>>> 
>>>> Can you please elaborate what error you are getting and i didn't
>> understand
>>>> your code above, that why initiating Solr client object  is in loop.  In
>>>> general  creating client instance should be outside the loop and a one
>> time
>>>> activity during the complete execution of program.
>>>> 
>>>> Thanks,
>>>> Susheel
>>>> 
>>>> On Sat, Jan 30, 2016 at 8:15 AM, Steven White <swhite4...@gmail.com>
>> wrote:
>>>> 
>>>>> Hi folks,
>>>>> 
>>>>> I'm getting memory leak in my code.  I narrowed the code to the
>> following
>>>>> minimal to cause the leak.
>>>>> 
>>>>>   while (true) {
>>>>>   HttpSolrClient client = new HttpSolrClient("
>>>>> http://192.168.202.129:8983/solr/core1;);
>>>>>   client.close();
>>>>>   }
>>>>> 
>>>>> Is this a defect or an issue in the way I'm using HttpSolrClient?
>>>>> 
>>>>> I'm on Solr 5.2.1
>>>>> 
>>>>> Thanks.
>>>>> 
>>>>> Steve
>>>>> 
>>> 
>> 



Re: Memory leak defect or misssuse of SolrJ API?

2016-01-31 Thread Steven White
Thanks Walter.  Yes, I saw your answer and fixed the issue per your
suggestion.

The JavaDoc need to make this clear.  The fact there is a close() on this
class and the JavaDoc does not say "your program should have exactly as
many HttpSolrClient objects as there are servers it talks to" is a prime
candidate for missuses of the class.

Steve


On Sun, Jan 31, 2016 at 5:20 PM, Walter Underwood <wun...@wunderwood.org>
wrote:

> I already answered this.
>
> Move the creation of the HttpSolrClient outside the loop. Your code will
> run much fast, because it will be able to reuse the connections.
>
> Put another way, your program should have exactly as many HttpSolrClient
> objects as there are servers it talks to. If there is one Solr server, you
> have one object.
>
> There is no leak in HttpSolrClient, you are misusing the class, massively.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> > On Jan 31, 2016, at 2:10 PM, Steven White <swhite4...@gmail.com> wrote:
> >
> > Thank you all for your feedback.
> >
> > This is code that I inherited and the example i gave is intended to
> > demonstrate the memory leak which based on YourKit is
> > on java/util/LinkedHashMap$Entry.  In short, I'm getting core dumps with
> > "Detail "java/lang/OutOfMemoryError" "Java heap space" received "
> >
> > Here is a more detailed layout of the code.  This is a crawler that runs
> > 24x7 without any recycle logic in place:
> >
> >init_data()
> >
> >while (true)
> >{
> >HttpSolrClient client = new HttpSolrClient("
> > http://localhost:8983/solr/core1 <http://192.168.202.129:8983/solr/core1
> >/");
> > <<<< this is real code
> >
> >see_if_we_have_new_data();
> >
> >send_new_data_to_solr();
> >
> >client.close();<<<< this is real code
> >
> >sleep_for_a_bit(N);<<<< 'N' can be any positive int
> >}
> >
> > By default, our Java program is given 4gb of ram "-Xmx4g" and N is set
> for
> > 5 min.  We had a customer set N to 10 second and we started seeing core
> > dumps with OOM.  As I started to debug, I narrowed the OOM to
> > HttpSolrClient per my original email.
> >
> > The follow up answers I got suggest that I move the construction of
> > HttpSolrClient object outside the while loop which I did (but I also had
> to
> > move "client.close()" outside the loop) and the leak is gone.
> >
> > Give this, is this how HttpSolrClient is suppose to be used?  If so,
> what's
> > the point of HttpSolrClient.close()?
> >
> > Another side question.  I noticed HttpSolrClient has a setBaseUrl().
> Now,
> > if I call it and give it "http://localhost:8983/solr/core1
> > <http://192.168.202.129:8983/solr/core1>/" (ntoice the "/" at the end)
> next
> > time I use HttpSolrClient to send Solr data, I get back 404. The fix is
> to
> > remove the ending "/".  This is not how the constructor of HttpSolrClient
> > behaves; HttpSolrClient will take the URL with or without "/".
> >
> > In summary, it would be good if someone can confirm f we have a memory
> leak
> > in HttpSolrClient if used per my example; if so this is a defect.  Also,
> > can someone confirm the fix I used for this issue: move the constructor
> of
> > HttpSolrClient outside the loop and reuse the existing object "client".
> >
> > Again, thank you all for the quick response it is much appreciated.
> >
> > Steve
> >
> >
> >
> > On Sat, Jan 30, 2016 at 1:24 PM, Erick Erickson <erickerick...@gmail.com
> >
> > wrote:
> >
> >> Assuming you're not really using code like above and it's a test
> case
> >>
> >> What's your evidence that memory consumption goes up? Are you sure
> >> you're not just seeing uncollected garbage?
> >>
> >> When I attached Java Mission Control to this program it looked pretty
> >> scary at first, but the heap allocated after old generation garbage
> >> collections leveled out to a steady state.
> >>
> >>
> >> On Sat, Jan 30, 2016 at 9:29 AM, Walter Underwood <
> wun...@wunderwood.org>
> >> wrote:
> >>> Create one HttpSolrClient object for each Solr server you are talking
> >> to. Reuse it for all requests to that Solr server.
> >>>
> >>> It will manage a poo

Re: Memory leak defect or misssuse of SolrJ API?

2016-01-31 Thread Walter Underwood
The JavaDoc needs a lot more information. As I remember it, SolrJ started as a 
thin layer over Apache HttpClient, so the authors may have assumed that 
programmers were familiar with that library. HttpClient makes a shared object 
that manages a pool of connections to the target server. HttpClient is 
seriously awesome—I first used it in the late 1990’s when I hit the limitations 
of the URL classes written by Sun.

I looked at the JavaDoc and various examples and none of them make this clear. 
Not your fault, we need a serious upgrade on those docs.

On the plus side, your program should be a lot faster after you reuse the 
client class.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Jan 31, 2016, at 3:46 PM, Steven White <swhite4...@gmail.com> wrote:
> 
> Thanks Walter.  Yes, I saw your answer and fixed the issue per your
> suggestion.
> 
> The JavaDoc need to make this clear.  The fact there is a close() on this
> class and the JavaDoc does not say "your program should have exactly as
> many HttpSolrClient objects as there are servers it talks to" is a prime
> candidate for missuses of the class.
> 
> Steve
> 
> 
> On Sun, Jan 31, 2016 at 5:20 PM, Walter Underwood <wun...@wunderwood.org>
> wrote:
> 
>> I already answered this.
>> 
>> Move the creation of the HttpSolrClient outside the loop. Your code will
>> run much fast, because it will be able to reuse the connections.
>> 
>> Put another way, your program should have exactly as many HttpSolrClient
>> objects as there are servers it talks to. If there is one Solr server, you
>> have one object.
>> 
>> There is no leak in HttpSolrClient, you are misusing the class, massively.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>> 
>>> On Jan 31, 2016, at 2:10 PM, Steven White <swhite4...@gmail.com> wrote:
>>> 
>>> Thank you all for your feedback.
>>> 
>>> This is code that I inherited and the example i gave is intended to
>>> demonstrate the memory leak which based on YourKit is
>>> on java/util/LinkedHashMap$Entry.  In short, I'm getting core dumps with
>>> "Detail "java/lang/OutOfMemoryError" "Java heap space" received "
>>> 
>>> Here is a more detailed layout of the code.  This is a crawler that runs
>>> 24x7 without any recycle logic in place:
>>> 
>>>   init_data()
>>> 
>>>   while (true)
>>>   {
>>>   HttpSolrClient client = new HttpSolrClient("
>>> http://localhost:8983/solr/core1 <http://192.168.202.129:8983/solr/core1
>>> /");
>>> <<<< this is real code
>>> 
>>>   see_if_we_have_new_data();
>>> 
>>>   send_new_data_to_solr();
>>> 
>>>   client.close();<<<< this is real code
>>> 
>>>   sleep_for_a_bit(N);<<<< 'N' can be any positive int
>>>   }
>>> 
>>> By default, our Java program is given 4gb of ram "-Xmx4g" and N is set
>> for
>>> 5 min.  We had a customer set N to 10 second and we started seeing core
>>> dumps with OOM.  As I started to debug, I narrowed the OOM to
>>> HttpSolrClient per my original email.
>>> 
>>> The follow up answers I got suggest that I move the construction of
>>> HttpSolrClient object outside the while loop which I did (but I also had
>> to
>>> move "client.close()" outside the loop) and the leak is gone.
>>> 
>>> Give this, is this how HttpSolrClient is suppose to be used?  If so,
>> what's
>>> the point of HttpSolrClient.close()?
>>> 
>>> Another side question.  I noticed HttpSolrClient has a setBaseUrl().
>> Now,
>>> if I call it and give it "http://localhost:8983/solr/core1
>>> <http://192.168.202.129:8983/solr/core1>/" (ntoice the "/" at the end)
>> next
>>> time I use HttpSolrClient to send Solr data, I get back 404. The fix is
>> to
>>> remove the ending "/".  This is not how the constructor of HttpSolrClient
>>> behaves; HttpSolrClient will take the URL with or without "/".
>>> 
>>> In summary, it would be good if someone can confirm f we have a memory
>> leak
>>> in HttpSolrClient if used per my example; if so this is a defect.  Also,
>>> can someone confirm the fix I used for this issue: move the constructor
>> of
>>> HttpSolrClient outside

Re: Memory leak defect or misssuse of SolrJ API?

2016-01-31 Thread Steven White
Thank you all for your feedback.

This is code that I inherited and the example i gave is intended to
demonstrate the memory leak which based on YourKit is
on java/util/LinkedHashMap$Entry.  In short, I'm getting core dumps with
"Detail "java/lang/OutOfMemoryError" "Java heap space" received "

Here is a more detailed layout of the code.  This is a crawler that runs
24x7 without any recycle logic in place:

init_data()

while (true)
{
HttpSolrClient client = new HttpSolrClient("
http://localhost:8983/solr/core1 <http://192.168.202.129:8983/solr/core1>/");
 <<<< this is real code

see_if_we_have_new_data();

send_new_data_to_solr();

client.close();<<<< this is real code

sleep_for_a_bit(N);<<<< 'N' can be any positive int
}

By default, our Java program is given 4gb of ram "-Xmx4g" and N is set for
5 min.  We had a customer set N to 10 second and we started seeing core
dumps with OOM.  As I started to debug, I narrowed the OOM to
HttpSolrClient per my original email.

The follow up answers I got suggest that I move the construction of
HttpSolrClient object outside the while loop which I did (but I also had to
move "client.close()" outside the loop) and the leak is gone.

Give this, is this how HttpSolrClient is suppose to be used?  If so, what's
the point of HttpSolrClient.close()?

Another side question.  I noticed HttpSolrClient has a setBaseUrl().  Now,
if I call it and give it "http://localhost:8983/solr/core1
<http://192.168.202.129:8983/solr/core1>/" (ntoice the "/" at the end) next
time I use HttpSolrClient to send Solr data, I get back 404. The fix is to
remove the ending "/".  This is not how the constructor of HttpSolrClient
behaves; HttpSolrClient will take the URL with or without "/".

In summary, it would be good if someone can confirm f we have a memory leak
in HttpSolrClient if used per my example; if so this is a defect.  Also,
can someone confirm the fix I used for this issue: move the constructor of
HttpSolrClient outside the loop and reuse the existing object "client".

Again, thank you all for the quick response it is much appreciated.

Steve



On Sat, Jan 30, 2016 at 1:24 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> Assuming you're not really using code like above and it's a test case
>
> What's your evidence that memory consumption goes up? Are you sure
> you're not just seeing uncollected garbage?
>
> When I attached Java Mission Control to this program it looked pretty
> scary at first, but the heap allocated after old generation garbage
> collections leveled out to a steady state.
>
>
> On Sat, Jan 30, 2016 at 9:29 AM, Walter Underwood <wun...@wunderwood.org>
> wrote:
> > Create one HttpSolrClient object for each Solr server you are talking
> to. Reuse it for all requests to that Solr server.
> >
> > It will manage a pool of connections and keep them alive for faster
> communication.
> >
> > I took a look at the JavaDoc and the wiki doc, neither one explains this
> well. I don’t think they even point out what is thread safe.
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> >
> >> On Jan 30, 2016, at 7:42 AM, Susheel Kumar <susheel2...@gmail.com>
> wrote:
> >>
> >> Hi Steve,
> >>
> >> Can you please elaborate what error you are getting and i didn't
> understand
> >> your code above, that why initiating Solr client object  is in loop.  In
> >> general  creating client instance should be outside the loop and a one
> time
> >> activity during the complete execution of program.
> >>
> >> Thanks,
> >> Susheel
> >>
> >> On Sat, Jan 30, 2016 at 8:15 AM, Steven White <swhite4...@gmail.com>
> wrote:
> >>
> >>> Hi folks,
> >>>
> >>> I'm getting memory leak in my code.  I narrowed the code to the
> following
> >>> minimal to cause the leak.
> >>>
> >>>while (true) {
> >>>HttpSolrClient client = new HttpSolrClient("
> >>> http://192.168.202.129:8983/solr/core1;);
> >>>client.close();
> >>>}
> >>>
> >>> Is this a defect or an issue in the way I'm using HttpSolrClient?
> >>>
> >>> I'm on Solr 5.2.1
> >>>
> >>> Thanks.
> >>>
> >>> Steve
> >>>
> >
>


Re: Memory leak defect or misssuse of SolrJ API?

2016-01-30 Thread Walter Underwood
Create one HttpSolrClient object for each Solr server you are talking to. Reuse 
it for all requests to that Solr server.

It will manage a pool of connections and keep them alive for faster 
communication.

I took a look at the JavaDoc and the wiki doc, neither one explains this well. 
I don’t think they even point out what is thread safe.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Jan 30, 2016, at 7:42 AM, Susheel Kumar <susheel2...@gmail.com> wrote:
> 
> Hi Steve,
> 
> Can you please elaborate what error you are getting and i didn't understand
> your code above, that why initiating Solr client object  is in loop.  In
> general  creating client instance should be outside the loop and a one time
> activity during the complete execution of program.
> 
> Thanks,
> Susheel
> 
> On Sat, Jan 30, 2016 at 8:15 AM, Steven White <swhite4...@gmail.com> wrote:
> 
>> Hi folks,
>> 
>> I'm getting memory leak in my code.  I narrowed the code to the following
>> minimal to cause the leak.
>> 
>>while (true) {
>>HttpSolrClient client = new HttpSolrClient("
>> http://192.168.202.129:8983/solr/core1;);
>>client.close();
>>}
>> 
>> Is this a defect or an issue in the way I'm using HttpSolrClient?
>> 
>> I'm on Solr 5.2.1
>> 
>> Thanks.
>> 
>> Steve
>> 



Re: Memory leak defect or misssuse of SolrJ API?

2016-01-30 Thread Erick Erickson
Assuming you're not really using code like above and it's a test case

What's your evidence that memory consumption goes up? Are you sure
you're not just seeing uncollected garbage?

When I attached Java Mission Control to this program it looked pretty
scary at first, but the heap allocated after old generation garbage
collections leveled out to a steady state.


On Sat, Jan 30, 2016 at 9:29 AM, Walter Underwood <wun...@wunderwood.org> wrote:
> Create one HttpSolrClient object for each Solr server you are talking to. 
> Reuse it for all requests to that Solr server.
>
> It will manage a pool of connections and keep them alive for faster 
> communication.
>
> I took a look at the JavaDoc and the wiki doc, neither one explains this 
> well. I don’t think they even point out what is thread safe.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
>> On Jan 30, 2016, at 7:42 AM, Susheel Kumar <susheel2...@gmail.com> wrote:
>>
>> Hi Steve,
>>
>> Can you please elaborate what error you are getting and i didn't understand
>> your code above, that why initiating Solr client object  is in loop.  In
>> general  creating client instance should be outside the loop and a one time
>> activity during the complete execution of program.
>>
>> Thanks,
>> Susheel
>>
>> On Sat, Jan 30, 2016 at 8:15 AM, Steven White <swhite4...@gmail.com> wrote:
>>
>>> Hi folks,
>>>
>>> I'm getting memory leak in my code.  I narrowed the code to the following
>>> minimal to cause the leak.
>>>
>>>while (true) {
>>>HttpSolrClient client = new HttpSolrClient("
>>> http://192.168.202.129:8983/solr/core1;);
>>>client.close();
>>>}
>>>
>>> Is this a defect or an issue in the way I'm using HttpSolrClient?
>>>
>>> I'm on Solr 5.2.1
>>>
>>> Thanks.
>>>
>>> Steve
>>>
>


Memory leak defect or misssuse of SolrJ API?

2016-01-30 Thread Steven White
Hi folks,

I'm getting memory leak in my code.  I narrowed the code to the following
minimal to cause the leak.

while (true) {
HttpSolrClient client = new HttpSolrClient("
http://192.168.202.129:8983/solr/core1;);
client.close();
}

Is this a defect or an issue in the way I'm using HttpSolrClient?

I'm on Solr 5.2.1

Thanks.

Steve


Re: Memory leak defect or misssuse of SolrJ API?

2016-01-30 Thread Susheel Kumar
Hi Steve,

Can you please elaborate what error you are getting and i didn't understand
your code above, that why initiating Solr client object  is in loop.  In
general  creating client instance should be outside the loop and a one time
activity during the complete execution of program.

Thanks,
Susheel

On Sat, Jan 30, 2016 at 8:15 AM, Steven White <swhite4...@gmail.com> wrote:

> Hi folks,
>
> I'm getting memory leak in my code.  I narrowed the code to the following
> minimal to cause the leak.
>
> while (true) {
> HttpSolrClient client = new HttpSolrClient("
> http://192.168.202.129:8983/solr/core1;);
> client.close();
> }
>
> Is this a defect or an issue in the way I'm using HttpSolrClient?
>
> I'm on Solr 5.2.1
>
> Thanks.
>
> Steve
>


Re: Memory leak in SolrCloud 4.6

2015-12-15 Thread Emir Arnautovic

Hi Mark,
Can you tell us bit more about your index and load. Why do you thing 
there is a leak? If you give that memory to JVM it will use it and you 
gave most of it to JVM. Only 4GB is left for OS and disk caches. Since 
swap is enabled, it might swap some JVM pages. It seems to me like 
completely valid scenario. Try running Solr wiith smaller heap and set 
swappines to 1. What OS do you use?


Thanks,
Emir

On 15.12.2015 06:37, Mark Houts wrote:

I am running a SolrCloud 4.6 cluster with three solr nodes and three
external zookeeper nodes. Each Solr node has 12GB RAM. 8GB RAM dedicated to
the JVM.

When solr is started it consumes barely 1GB but over the course of 36 to 48
hours physical memory will be consumed and swap will be used. The i/o
latency of using swap will soon make the machine so slow that it will
become unresponsive.

Has anyone had experience with memory leaks in this version?

Regards,

M Houts



--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



Memory leak in SolrCloud 4.6

2015-12-14 Thread Mark Houts
I am running a SolrCloud 4.6 cluster with three solr nodes and three
external zookeeper nodes. Each Solr node has 12GB RAM. 8GB RAM dedicated to
the JVM.

When solr is started it consumes barely 1GB but over the course of 36 to 48
hours physical memory will be consumed and swap will be used. The i/o
latency of using swap will soon make the machine so slow that it will
become unresponsive.

Has anyone had experience with memory leaks in this version?

Regards,

M Houts


Re: Possible memory leak? Help!

2015-07-15 Thread Timothy Potter
What are your cache sizes? Max doc?

Also, what GC settings are you using? 6GB isn't all that much for a
memory-intensive app like Solr, esp. given the number of facet fields
you have. Lastly, are you using docvalues for your facet fields? That
should help reduce the amount of heap needed to compute facets.

On Tue, Jul 14, 2015 at 2:33 PM, Yael Gurevich yae...@gmail.com wrote:
 Hi,

 We're running Solr 4.10.1 on Linux using Tomcat. Distributed environment,
 40 virtual servers with high resources. Concurrent queries that are quite
 complex (may be hundreds of terms), NRT indexing and a few hundreds of
 facet fields which might have many (hundreds of thousands) distinct values.

 We've configured a 6GB JVM heap, and after quite a bit of work, it seems to
 be pretty well configured GC parameter-wise (we're using CMS and ParNew).

 The following problem occurs -
 Once every couple of hours, suddenly start getting
 concurrent-mode-failure on one or more servers, the memory starts
 climbing up further and further and concurrent-mode-failure continues.
 Naturally, during this time, SOLR is unresponsive and the queries are
 timed-out. Eventually it might pass (GC will succeed), after 5-10 minutes.
 Sometimes this phenomenon can occur for a great deal of time, one server
 goes up and then another and so forth.

 Memory dumps point to ConcurrentLRUCache (used in filterCache and
 fieldValueCache). Mathematically speaking, the sizes I see in the dumps do
 not make sense. The configured sizes shouldn't take up more than a few
 hunderds of MBs.

 Any ideas? Anyone seen this kind of problem?


Possible memory leak? Help!

2015-07-14 Thread Yael Gurevich
Hi,

We're running Solr 4.10.1 on Linux using Tomcat. Distributed environment,
40 virtual servers with high resources. Concurrent queries that are quite
complex (may be hundreds of terms), NRT indexing and a few hundreds of
facet fields which might have many (hundreds of thousands) distinct values.

We've configured a 6GB JVM heap, and after quite a bit of work, it seems to
be pretty well configured GC parameter-wise (we're using CMS and ParNew).

The following problem occurs -
Once every couple of hours, suddenly start getting
concurrent-mode-failure on one or more servers, the memory starts
climbing up further and further and concurrent-mode-failure continues.
Naturally, during this time, SOLR is unresponsive and the queries are
timed-out. Eventually it might pass (GC will succeed), after 5-10 minutes.
Sometimes this phenomenon can occur for a great deal of time, one server
goes up and then another and so forth.

Memory dumps point to ConcurrentLRUCache (used in filterCache and
fieldValueCache). Mathematically speaking, the sizes I see in the dumps do
not make sense. The configured sizes shouldn't take up more than a few
hunderds of MBs.

Any ideas? Anyone seen this kind of problem?


Re: Memory Leak in solr 4.8.1

2015-04-09 Thread Toke Eskildsen
On Wed, 2015-04-08 at 14:00 -0700, pras.venkatesh wrote:
 1. 8 nodes, 4 shards(2 nodes per shard)
 2. each node having about 55 GB of Data, in total there is 450 million
 documents in the collection. so the document size is not huge, 

So ~120M docs/shard.

 3. The schema has 42 fields, it gets reloaded every 15 mins with about
 50,000 documents. Now we have primary Key for the index, so when there are
 any duplicates the document gets re-written.
 4. The GC policy is CMS, with heap size min and max = 8 gb and perm size =
 512 mb and RAM on the VM is 24 gb.

Do you have a large and active filter cache? Each entry is 30MB, so it
does not take many entries to fill a 8GB heap. That would match the
description of ever-running GC.

- Toke Eskildsen, State and University Library, Denmark




Re: Memory Leak in solr 4.8.1

2015-04-09 Thread pras.venkatesh
I don't have a filter cache, and have completely disabled filter cache. Since
I am not using filter queries.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Memory-Leak-in-solr-4-8-1-tp4198488p4198716.html
Sent from the Solr - User mailing list archive at Nabble.com.


Memory Leak in solr 4.8.1

2015-04-08 Thread pras.venkatesh
I have a solr cloud instance with 8 nodes, 4 shards and facing memory leak on
the JVMs

here are the details of the instance.


1. 8 nodes, 4 shards(2 nodes per shard)
2. each node having about 55 GB of Data, in total there is 450 million
documents in the collection. so the document size is not huge, 
3. The schema has 42 fields, it gets reloaded every 15 mins with about
50,000 documents. Now we have primary Key for the index, so when there are
any duplicates the document gets re-written.
4. The GC policy is CMS, with heap size min and max = 8 gb and perm size =
512 mb and RAM on the VM is 24 gb.


when users start searching in solr and not always but often the heap keeps
growing and the GC cycles are not clearing up the heap. I see GC running for
almost 100,000 ms with still not clearing up the heap.

Appreciate any advice on this.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Memory-Leak-in-solr-4-8-1-tp4198488.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Memory leak for debugQuery?

2014-07-17 Thread Umesh Prasad
Histogram by itself isn't sufficient to root cause the JVM heap issue.
We have found JVM heap memory  issues multiple times in our system and each
time it was due to a different reasons.  I would recommend taking  heap
dumps at regular interval (using jmap/visual vm) and analyze those heap
dumps. That will give a definite answer to memory issues.

 I have regularly analyzed heap dump of size  32 GB with eclipse memory
analyzer. The linux version comes with a command line script
ParseHeapDump.sh inside mat directory.

# Usage: ParseHeapDump.sh path/to/dump.hprof [report]*
#
# The leak report has the id org.eclipse.mat.api:suspects
# The top component report has the id org.eclipse.mat.api:top_components
Increase the memory by setting Xmx and Xms param in MemoryAnalyzer.ini (in
same directory).

The leak suspect report is quite good. For checking detailed allocation
pattern etc , you can copy the index files generated from parsing and open
it in GUI.




On 17 July 2014 05:36, Tomás Fernández Löbbe tomasflo...@gmail.com wrote:

 Also, is this trunk? Solr 4.x? Single shard, right?


 On Wed, Jul 16, 2014 at 2:24 PM, Erik Hatcher erik.hatc...@gmail.com
 wrote:

  Tom -
 
  You could maybe isolate it a little further by seeing using the “debug
  parameter with values of timing|query|results
 
  Erik
 
  On May 15, 2014, at 5:50 PM, Tom Burton-West tburt...@umich.edu wrote:
 
   Hello all,
  
   I'm trying to get relevance scoring information for each of 1,000 docs
  returned for each of 250 queries.If I run the query (appended below)
  without debugQuery=on, I have no problem with getting all the results
 with
  under 4GB of memory use.  If I add the parameter debugQuery=on, memory
 use
  goes up continuously and after about 20 queries (with 1,000 results
 each),
  memory use reaches about 29.1 GB and the garbage collector gives up:
  
org.apache.solr.common.SolrException;
 null:java.lang.RuntimeException:
  java.lang.OutOfMemoryError: GC overhead limit exceeded
  
   I've attached a jmap -histo, exgerpt below.
  
   Is this a known issue with debugQuery?
  
   Tom
   
   query:
  
  
 
 q=Abraham+Lincolnfl=id,scoreindent=onwt=jsonstart=0rows=1000version=2.2debugQuery=on
  
   without debugQuery=on:
  
  
 
 q=Abraham+Lincolnfl=id,scoreindent=onwt=jsonstart=0rows=1000version=2.2
  
   num   #instances#bytes  Class description
  
 
 --
   1:  585,559 10,292,067,456  byte[]
   2:  743,639 18,874,349,592  char[]
   3:  53,821  91,936,328  long[]
   4:  70,430  69,234,400  int[]
   5:  51,348  27,111,744
   org.apache.lucene.util.fst.FST$Arc[]
   6:  286,357 20,617,704
   org.apache.lucene.util.fst.FST$Arc
   7:  715,364 17,168,736  java.lang.String
   8:  79,561  12,547,792  * ConstMethodKlass
   9:  18,909  11,404,696  short[]
   10: 345,854 11,067,328  java.util.HashMap$Entry
   11: 8,823   10,351,024  * ConstantPoolKlass
   12: 79,561  10,193,328  * MethodKlass
   13: 228,587 9,143,480
  org.apache.lucene.document.FieldType
   14: 228,584 9,143,360
 org.apache.lucene.document.Field
   15: 368,423 8,842,152   org.apache.lucene.util.BytesRef
   16: 210,342 8,413,680   java.util.TreeMap$Entry
   17: 81,576  8,204,648   java.util.HashMap$Entry[]
   18: 107,921 7,770,312
  org.apache.lucene.util.fst.FST$Arc
   19: 13,020  6,874,560
  org.apache.lucene.util.fst.FST$Arc[]
  
   debugQuery_jmap.txt
 
 




-- 
---
Thanks  Regards
Umesh Prasad


solr-4.9.0 : [OverseerExitThread] but has failed to stop it. This is very likely to create a memory leak

2014-07-16 Thread Vijayakumar Ramdoss
Hi,

When I am starting the SolrCloud (4.9) version top of the Tomcat, its
throwing the below error message, I am using the JAVA runtime for memory
leak exception . 

 

Summary of error message,

[OverseerExitThread] but has failed to stop it. This is very likely to
create a memory leak

 

 

Detailed error message here,

 

16-Jul-2014 15:14:01.044 INFO [Thread-5]
com.springsource.tcserver.licensing.LicensingLifecycleListener.setComponentS
tate ComponentState to off

16-Jul-2014 15:14:01.049 INFO [Thread-5]
org.apache.coyote.AbstractProtocol.pause Pausing ProtocolHandler
[http-bio-8080]

16-Jul-2014 15:14:01.049 INFO [Thread-5]
org.apache.catalina.core.StandardService.stopInternal Stopping service
Catalina

16-Jul-2014 15:14:01.091 SEVERE [localhost-startStop-2]
org.apache.catalina.loader.WebappClassLoader.clearReferencesThreads The web
application [/solr-4.9.0] appears to have started a thread named
[localhost-startStop-1-SendThread(cpsslrsbx01:2181)] but has failed to stop
it. This is very likely to create a memory leak.

16-Jul-2014 15:14:01.091 SEVERE [localhost-startStop-2]
org.apache.catalina.loader.WebappClassLoader.clearReferencesThreads The web
application [/solr-4.9.0] appears to have started a thread named
[localhost-startStop-1-EventThread] but has failed to stop it. This is very
likely to create a memory leak.

16-Jul-2014 15:14:01.091 SEVERE [localhost-startStop-2]
org.apache.catalina.loader.WebappClassLoader.clearReferencesThreads The web
application [/solr-4.9.0] appears to have started a thread named
[OverseerExitThread] but has failed to stop it. This is very likely to
create a memory leak.

16-Jul-2014 15:14:01.093 INFO [Thread-5]
org.apache.coyote.AbstractProtocol.stop Stopping ProtocolHandler
[http-bio-8080]

16-Jul-2014 15:14:01.094 INFO [Thread-5]
org.apache.coyote.AbstractProtocol.destroy Destroying ProtocolHandler
[http-bio-8080]

16-Jul-2014 15:31:40.834 INFO [main]
com.springsource.tcserver.security.PropertyDecoder.init tc Runtime
property decoder using memory-based key

16-Jul-2014 15:31:41.131 INFO [main]
com.springsource.tcserver.security.PropertyDecoder.init tcServer Runtime
property decoder has been initialized in 301 ms

16-Jul-2014 15:31:43.978 INFO [main] org.apache.coyote.AbstractProtocol.init
Initializing ProtocolHandler [http-bio-8080]

16-Jul-2014 15:31:45.141 INFO [main]
com.springsource.tcserver.licensing.LicensingLifecycleListener.setComponentS
tate ComponentState to on

16-Jul-2014 15:31:45.345 INFO [main]
com.springsource.tcserver.serviceability.rmi.JmxSocketListener.init Started
up JMX registry on 127.0.0.1:6969 in 187 ms

16-Jul-2014 15:31:45.370 INFO [main]
org.apache.catalina.core.StandardService.startInternal Starting service
Catalina

16-Jul-2014 15:31:45.370 INFO [main]
org.apache.catalina.core.StandardEngine.startInternal Starting Servlet
Engine: VMware vFabric tc Runtime 2.9.2.RELEASE/7.0.39.B.RELEASE

16-Jul-2014 15:31:45.384 INFO [localhost-startStop-1]
org.apache.catalina.startup.HostConfig.deployWAR Deploying web application
archive
/apps/ecps/vfabric-tc-server-standard-2.9.2.RELEASE/cps_8080/webapps/solr-4.
9.0.war

16-Jul-2014 15:31:48.204 INFO [localhost-startStop-1]
org.apache.catalina.startup.HostConfig.deployDirectory Deploying web
application directory
/apps/ecps/vfabric-tc-server-standard-2.9.2.RELEASE/cps_8080/webapps/ROOT

16-Jul-2014 15:31:48.349 INFO [main]
org.apache.coyote.AbstractProtocol.start Starting ProtocolHandler
[http-bio-8080]



Re: Memory leak for debugQuery?

2014-07-16 Thread Erik Hatcher
Tom -

You could maybe isolate it a little further by seeing using the “debug 
parameter with values of timing|query|results

Erik

On May 15, 2014, at 5:50 PM, Tom Burton-West tburt...@umich.edu wrote:

 Hello all,
 
 I'm trying to get relevance scoring information for each of 1,000 docs 
 returned for each of 250 queries.If I run the query (appended below) 
 without debugQuery=on, I have no problem with getting all the results with 
 under 4GB of memory use.  If I add the parameter debugQuery=on, memory use 
 goes up continuously and after about 20 queries (with 1,000 results each), 
 memory use reaches about 29.1 GB and the garbage collector gives up:
 
  org.apache.solr.common.SolrException; null:java.lang.RuntimeException: 
 java.lang.OutOfMemoryError: GC overhead limit exceeded
 
 I've attached a jmap -histo, exgerpt below.
 
 Is this a known issue with debugQuery?
 
 Tom
 
 query: 
 
 q=Abraham+Lincolnfl=id,scoreindent=onwt=jsonstart=0rows=1000version=2.2debugQuery=on
 
 without debugQuery=on:
 
 q=Abraham+Lincolnfl=id,scoreindent=onwt=jsonstart=0rows=1000version=2.2
 
 num   #instances#bytes  Class description
 --
 1:  585,559 10,292,067,456  byte[]
 2:  743,639 18,874,349,592  char[]
 3:  53,821  91,936,328  long[]
 4:  70,430  69,234,400  int[]
 5:  51,348  27,111,744  org.apache.lucene.util.fst.FST$Arc[]
 6:  286,357 20,617,704  org.apache.lucene.util.fst.FST$Arc
 7:  715,364 17,168,736  java.lang.String
 8:  79,561  12,547,792  * ConstMethodKlass
 9:  18,909  11,404,696  short[]
 10: 345,854 11,067,328  java.util.HashMap$Entry
 11: 8,823   10,351,024  * ConstantPoolKlass
 12: 79,561  10,193,328  * MethodKlass
 13: 228,587 9,143,480   org.apache.lucene.document.FieldType
 14: 228,584 9,143,360   org.apache.lucene.document.Field
 15: 368,423 8,842,152   org.apache.lucene.util.BytesRef
 16: 210,342 8,413,680   java.util.TreeMap$Entry
 17: 81,576  8,204,648   java.util.HashMap$Entry[]
 18: 107,921 7,770,312   org.apache.lucene.util.fst.FST$Arc
 19: 13,020  6,874,560   org.apache.lucene.util.fst.FST$Arc[]
 
 debugQuery_jmap.txt



Re: Memory leak for debugQuery?

2014-07-16 Thread Tomás Fernández Löbbe
Also, is this trunk? Solr 4.x? Single shard, right?


On Wed, Jul 16, 2014 at 2:24 PM, Erik Hatcher erik.hatc...@gmail.com
wrote:

 Tom -

 You could maybe isolate it a little further by seeing using the “debug
 parameter with values of timing|query|results

 Erik

 On May 15, 2014, at 5:50 PM, Tom Burton-West tburt...@umich.edu wrote:

  Hello all,
 
  I'm trying to get relevance scoring information for each of 1,000 docs
 returned for each of 250 queries.If I run the query (appended below)
 without debugQuery=on, I have no problem with getting all the results with
 under 4GB of memory use.  If I add the parameter debugQuery=on, memory use
 goes up continuously and after about 20 queries (with 1,000 results each),
 memory use reaches about 29.1 GB and the garbage collector gives up:
 
   org.apache.solr.common.SolrException; null:java.lang.RuntimeException:
 java.lang.OutOfMemoryError: GC overhead limit exceeded
 
  I've attached a jmap -histo, exgerpt below.
 
  Is this a known issue with debugQuery?
 
  Tom
  
  query:
 
 
 q=Abraham+Lincolnfl=id,scoreindent=onwt=jsonstart=0rows=1000version=2.2debugQuery=on
 
  without debugQuery=on:
 
 
 q=Abraham+Lincolnfl=id,scoreindent=onwt=jsonstart=0rows=1000version=2.2
 
  num   #instances#bytes  Class description
 
 --
  1:  585,559 10,292,067,456  byte[]
  2:  743,639 18,874,349,592  char[]
  3:  53,821  91,936,328  long[]
  4:  70,430  69,234,400  int[]
  5:  51,348  27,111,744
  org.apache.lucene.util.fst.FST$Arc[]
  6:  286,357 20,617,704
  org.apache.lucene.util.fst.FST$Arc
  7:  715,364 17,168,736  java.lang.String
  8:  79,561  12,547,792  * ConstMethodKlass
  9:  18,909  11,404,696  short[]
  10: 345,854 11,067,328  java.util.HashMap$Entry
  11: 8,823   10,351,024  * ConstantPoolKlass
  12: 79,561  10,193,328  * MethodKlass
  13: 228,587 9,143,480
 org.apache.lucene.document.FieldType
  14: 228,584 9,143,360   org.apache.lucene.document.Field
  15: 368,423 8,842,152   org.apache.lucene.util.BytesRef
  16: 210,342 8,413,680   java.util.TreeMap$Entry
  17: 81,576  8,204,648   java.util.HashMap$Entry[]
  18: 107,921 7,770,312
 org.apache.lucene.util.fst.FST$Arc
  19: 13,020  6,874,560
 org.apache.lucene.util.fst.FST$Arc[]
 
  debugQuery_jmap.txt




Re: Possible memory leak after segment merge? (related to DocValues?)

2013-12-31 Thread Michael McCandless
On Mon, Dec 30, 2013 at 1:22 PM, Greg Preston
gpres...@marinsoftware.com wrote:
 That was it.  Setting omitNorms=true on all fields fixed my problem.
  I left it indexing all weekend, and heap usage still looks great.

Good!

 I'm still not clear why bouncing the solr instance freed up memory,
 unless the in-memory structure for this norms data is lazily loaded
 somehow.

In fact it is lazily loaded, the first time a search (well,
Similarity) needs to load the norms for scoring.

 Anyway, thank you very much for the suggestion.

You're welcome.

Mike McCandless

http://blog.mikemccandless.com


Re: Possible memory leak after segment merge? (related to DocValues?)

2013-12-30 Thread Greg Preston
That was it.  Setting omitNorms=true on all fields fixed my problem.
 I left it indexing all weekend, and heap usage still looks great.

I'm still not clear why bouncing the solr instance freed up memory,
unless the in-memory structure for this norms data is lazily loaded
somehow.

Anyway, thank you very much for the suggestion.

-Greg


On Fri, Dec 27, 2013 at 4:25 AM, Michael McCandless
luc...@mikemccandless.com wrote:
 Likely this is for field norms, which use doc values under the hood.

 Mike McCandless

 http://blog.mikemccandless.com


 On Thu, Dec 26, 2013 at 5:03 PM, Greg Preston
 gpres...@marinsoftware.com wrote:
 Does anybody with knowledge of solr internals know why I'm seeing
 instances of Lucene42DocValuesProducer when I don't have any fields
 that are using DocValues?  Or am I misunderstanding what this class is
 for?

 -Greg


 On Mon, Dec 23, 2013 at 12:07 PM, Greg Preston
 gpres...@marinsoftware.com wrote:
 Hello,

 I'm loading up our solr cloud with data (from a solrj client) and
 running into a weird memory issue.  I can reliably reproduce the
 problem.

 - Using Solr Cloud 4.4.0 (also replicated with 4.6.0)
 - 24 solr nodes (one shard each), spread across 3 physical hosts, each
 host has 256G of memory
 - index and tlogs on ssd
 - Xmx=7G, G1GC
 - Java 1.7.0_25
 - schema and solrconfig.xml attached

 I'm using composite routing to route documents with the same clientId
 to the same shard.  After several hours of indexing, I occasionally
 see an IndexWriter go OOM.  I think that's a symptom.  When that
 happens, indexing continues, and that node's tlog starts to grow.
 When I notice this, I stop indexing, and bounce the problem node.
 That's where it gets interesting.

 Upon bouncing, the tlog replays, and then segments merge.  Once the
 merging is complete, the heap is fairly full, and forced full GC only
 helps a little.  But if I then bounce the node again, the heap usage
 goes way down, and stays low until the next segment merge.  I believe
 segment merges are also what causes the original OOM.

 More details:

 Index on disk for this node is ~13G, tlog is ~2.5G.
 See attached mem1.png.  This is a jconsole view of the heap during the
 following:

 (Solr cloud node started at the left edge of this graph)

 A) One CPU core pegged at 100%.  Thread dump shows:
 Lucene Merge Thread #0 daemon prio=10 tid=0x7f5a3c064800
 nid=0x7a74 runnable [0x7f5a41c5f000]
java.lang.Thread.State: RUNNABLE
 at org.apache.lucene.util.fst.Builder.add(Builder.java:397)
 at 
 org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter.finishTerm(BlockTreeTermsWriter.java:1000)
 at 
 org.apache.lucene.codecs.TermsConsumer.merge(TermsConsumer.java:112)
 at 
 org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:72)
 at 
 org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:365)
 at 
 org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:98)
 at 
 org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772)
 at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376)
 at 
 org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
 at 
 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)

 B) One CPU core pegged at 100%.  Manually triggered GC.  Lots of
 memory freed.  Thread dump shows:
 Lucene Merge Thread #0 daemon prio=10 tid=0x7f5a3c064800
 nid=0x7a74 runnable [0x7f5a41c5f000]
java.lang.Thread.State: RUNNABLE
 at 
 org.apache.lucene.codecs.DocValuesConsumer$1$1.hasNext(DocValuesConsumer.java:127)
 at 
 org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:144)
 at 
 org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:92)
 at 
 org.apache.lucene.codecs.DocValuesConsumer.mergeNumericField(DocValuesConsumer.java:112)
 at 
 org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:221)
 at 
 org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:119)
 at 
 org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772)
 at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376)
 at 
 org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
 at 
 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)

 C) One CPU core pegged at 100%.  Manually triggered GC.  No memory
 freed.  Thread dump shows:
 Lucene Merge Thread #0 daemon prio=10 tid=0x7f5a3c064800
 nid=0x7a74 runnable [0x7f5a41c5f000]
java.lang.Thread.State: RUNNABLE
 at 
 org.apache.lucene.codecs.DocValuesConsumer$1$1.hasNext(DocValuesConsumer.java:127)
 at 
 

Re: Possible memory leak after segment merge? (related to DocValues?)

2013-12-27 Thread Michael McCandless
Likely this is for field norms, which use doc values under the hood.

Mike McCandless

http://blog.mikemccandless.com


On Thu, Dec 26, 2013 at 5:03 PM, Greg Preston
gpres...@marinsoftware.com wrote:
 Does anybody with knowledge of solr internals know why I'm seeing
 instances of Lucene42DocValuesProducer when I don't have any fields
 that are using DocValues?  Or am I misunderstanding what this class is
 for?

 -Greg


 On Mon, Dec 23, 2013 at 12:07 PM, Greg Preston
 gpres...@marinsoftware.com wrote:
 Hello,

 I'm loading up our solr cloud with data (from a solrj client) and
 running into a weird memory issue.  I can reliably reproduce the
 problem.

 - Using Solr Cloud 4.4.0 (also replicated with 4.6.0)
 - 24 solr nodes (one shard each), spread across 3 physical hosts, each
 host has 256G of memory
 - index and tlogs on ssd
 - Xmx=7G, G1GC
 - Java 1.7.0_25
 - schema and solrconfig.xml attached

 I'm using composite routing to route documents with the same clientId
 to the same shard.  After several hours of indexing, I occasionally
 see an IndexWriter go OOM.  I think that's a symptom.  When that
 happens, indexing continues, and that node's tlog starts to grow.
 When I notice this, I stop indexing, and bounce the problem node.
 That's where it gets interesting.

 Upon bouncing, the tlog replays, and then segments merge.  Once the
 merging is complete, the heap is fairly full, and forced full GC only
 helps a little.  But if I then bounce the node again, the heap usage
 goes way down, and stays low until the next segment merge.  I believe
 segment merges are also what causes the original OOM.

 More details:

 Index on disk for this node is ~13G, tlog is ~2.5G.
 See attached mem1.png.  This is a jconsole view of the heap during the
 following:

 (Solr cloud node started at the left edge of this graph)

 A) One CPU core pegged at 100%.  Thread dump shows:
 Lucene Merge Thread #0 daemon prio=10 tid=0x7f5a3c064800
 nid=0x7a74 runnable [0x7f5a41c5f000]
java.lang.Thread.State: RUNNABLE
 at org.apache.lucene.util.fst.Builder.add(Builder.java:397)
 at 
 org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter.finishTerm(BlockTreeTermsWriter.java:1000)
 at 
 org.apache.lucene.codecs.TermsConsumer.merge(TermsConsumer.java:112)
 at 
 org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:72)
 at 
 org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:365)
 at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:98)
 at 
 org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772)
 at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376)
 at 
 org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
 at 
 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)

 B) One CPU core pegged at 100%.  Manually triggered GC.  Lots of
 memory freed.  Thread dump shows:
 Lucene Merge Thread #0 daemon prio=10 tid=0x7f5a3c064800
 nid=0x7a74 runnable [0x7f5a41c5f000]
java.lang.Thread.State: RUNNABLE
 at 
 org.apache.lucene.codecs.DocValuesConsumer$1$1.hasNext(DocValuesConsumer.java:127)
 at 
 org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:144)
 at 
 org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:92)
 at 
 org.apache.lucene.codecs.DocValuesConsumer.mergeNumericField(DocValuesConsumer.java:112)
 at 
 org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:221)
 at 
 org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:119)
 at 
 org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772)
 at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376)
 at 
 org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
 at 
 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)

 C) One CPU core pegged at 100%.  Manually triggered GC.  No memory
 freed.  Thread dump shows:
 Lucene Merge Thread #0 daemon prio=10 tid=0x7f5a3c064800
 nid=0x7a74 runnable [0x7f5a41c5f000]
java.lang.Thread.State: RUNNABLE
 at 
 org.apache.lucene.codecs.DocValuesConsumer$1$1.hasNext(DocValuesConsumer.java:127)
 at 
 org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:108)
 at 
 org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:92)
 at 
 org.apache.lucene.codecs.DocValuesConsumer.mergeNumericField(DocValuesConsumer.java:112)
 at 
 org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:221)
 at 
 

Re: Possible memory leak after segment merge? (related to DocValues?)

2013-12-27 Thread Greg Preston
Interesting.  I'm not using score at all (all searches have an
explicit sort defined).  I'll try setting omit norms on all my fields
and see if I can reproduce.

Thanks.

-Greg


On Fri, Dec 27, 2013 at 4:25 AM, Michael McCandless
luc...@mikemccandless.com wrote:
 Likely this is for field norms, which use doc values under the hood.

 Mike McCandless

 http://blog.mikemccandless.com


 On Thu, Dec 26, 2013 at 5:03 PM, Greg Preston
 gpres...@marinsoftware.com wrote:
 Does anybody with knowledge of solr internals know why I'm seeing
 instances of Lucene42DocValuesProducer when I don't have any fields
 that are using DocValues?  Or am I misunderstanding what this class is
 for?

 -Greg


 On Mon, Dec 23, 2013 at 12:07 PM, Greg Preston
 gpres...@marinsoftware.com wrote:
 Hello,

 I'm loading up our solr cloud with data (from a solrj client) and
 running into a weird memory issue.  I can reliably reproduce the
 problem.

 - Using Solr Cloud 4.4.0 (also replicated with 4.6.0)
 - 24 solr nodes (one shard each), spread across 3 physical hosts, each
 host has 256G of memory
 - index and tlogs on ssd
 - Xmx=7G, G1GC
 - Java 1.7.0_25
 - schema and solrconfig.xml attached

 I'm using composite routing to route documents with the same clientId
 to the same shard.  After several hours of indexing, I occasionally
 see an IndexWriter go OOM.  I think that's a symptom.  When that
 happens, indexing continues, and that node's tlog starts to grow.
 When I notice this, I stop indexing, and bounce the problem node.
 That's where it gets interesting.

 Upon bouncing, the tlog replays, and then segments merge.  Once the
 merging is complete, the heap is fairly full, and forced full GC only
 helps a little.  But if I then bounce the node again, the heap usage
 goes way down, and stays low until the next segment merge.  I believe
 segment merges are also what causes the original OOM.

 More details:

 Index on disk for this node is ~13G, tlog is ~2.5G.
 See attached mem1.png.  This is a jconsole view of the heap during the
 following:

 (Solr cloud node started at the left edge of this graph)

 A) One CPU core pegged at 100%.  Thread dump shows:
 Lucene Merge Thread #0 daemon prio=10 tid=0x7f5a3c064800
 nid=0x7a74 runnable [0x7f5a41c5f000]
java.lang.Thread.State: RUNNABLE
 at org.apache.lucene.util.fst.Builder.add(Builder.java:397)
 at 
 org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter.finishTerm(BlockTreeTermsWriter.java:1000)
 at 
 org.apache.lucene.codecs.TermsConsumer.merge(TermsConsumer.java:112)
 at 
 org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:72)
 at 
 org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:365)
 at 
 org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:98)
 at 
 org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772)
 at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376)
 at 
 org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
 at 
 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)

 B) One CPU core pegged at 100%.  Manually triggered GC.  Lots of
 memory freed.  Thread dump shows:
 Lucene Merge Thread #0 daemon prio=10 tid=0x7f5a3c064800
 nid=0x7a74 runnable [0x7f5a41c5f000]
java.lang.Thread.State: RUNNABLE
 at 
 org.apache.lucene.codecs.DocValuesConsumer$1$1.hasNext(DocValuesConsumer.java:127)
 at 
 org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:144)
 at 
 org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:92)
 at 
 org.apache.lucene.codecs.DocValuesConsumer.mergeNumericField(DocValuesConsumer.java:112)
 at 
 org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:221)
 at 
 org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:119)
 at 
 org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772)
 at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376)
 at 
 org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
 at 
 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)

 C) One CPU core pegged at 100%.  Manually triggered GC.  No memory
 freed.  Thread dump shows:
 Lucene Merge Thread #0 daemon prio=10 tid=0x7f5a3c064800
 nid=0x7a74 runnable [0x7f5a41c5f000]
java.lang.Thread.State: RUNNABLE
 at 
 org.apache.lucene.codecs.DocValuesConsumer$1$1.hasNext(DocValuesConsumer.java:127)
 at 
 org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:108)
 at 
 

Re: Possible memory leak after segment merge? (related to DocValues?)

2013-12-26 Thread Greg Preston
Does anybody with knowledge of solr internals know why I'm seeing
instances of Lucene42DocValuesProducer when I don't have any fields
that are using DocValues?  Or am I misunderstanding what this class is
for?

-Greg


On Mon, Dec 23, 2013 at 12:07 PM, Greg Preston
gpres...@marinsoftware.com wrote:
 Hello,

 I'm loading up our solr cloud with data (from a solrj client) and
 running into a weird memory issue.  I can reliably reproduce the
 problem.

 - Using Solr Cloud 4.4.0 (also replicated with 4.6.0)
 - 24 solr nodes (one shard each), spread across 3 physical hosts, each
 host has 256G of memory
 - index and tlogs on ssd
 - Xmx=7G, G1GC
 - Java 1.7.0_25
 - schema and solrconfig.xml attached

 I'm using composite routing to route documents with the same clientId
 to the same shard.  After several hours of indexing, I occasionally
 see an IndexWriter go OOM.  I think that's a symptom.  When that
 happens, indexing continues, and that node's tlog starts to grow.
 When I notice this, I stop indexing, and bounce the problem node.
 That's where it gets interesting.

 Upon bouncing, the tlog replays, and then segments merge.  Once the
 merging is complete, the heap is fairly full, and forced full GC only
 helps a little.  But if I then bounce the node again, the heap usage
 goes way down, and stays low until the next segment merge.  I believe
 segment merges are also what causes the original OOM.

 More details:

 Index on disk for this node is ~13G, tlog is ~2.5G.
 See attached mem1.png.  This is a jconsole view of the heap during the
 following:

 (Solr cloud node started at the left edge of this graph)

 A) One CPU core pegged at 100%.  Thread dump shows:
 Lucene Merge Thread #0 daemon prio=10 tid=0x7f5a3c064800
 nid=0x7a74 runnable [0x7f5a41c5f000]
java.lang.Thread.State: RUNNABLE
 at org.apache.lucene.util.fst.Builder.add(Builder.java:397)
 at 
 org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter.finishTerm(BlockTreeTermsWriter.java:1000)
 at 
 org.apache.lucene.codecs.TermsConsumer.merge(TermsConsumer.java:112)
 at 
 org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:72)
 at 
 org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:365)
 at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:98)
 at 
 org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772)
 at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376)
 at 
 org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
 at 
 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)

 B) One CPU core pegged at 100%.  Manually triggered GC.  Lots of
 memory freed.  Thread dump shows:
 Lucene Merge Thread #0 daemon prio=10 tid=0x7f5a3c064800
 nid=0x7a74 runnable [0x7f5a41c5f000]
java.lang.Thread.State: RUNNABLE
 at 
 org.apache.lucene.codecs.DocValuesConsumer$1$1.hasNext(DocValuesConsumer.java:127)
 at 
 org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:144)
 at 
 org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:92)
 at 
 org.apache.lucene.codecs.DocValuesConsumer.mergeNumericField(DocValuesConsumer.java:112)
 at 
 org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:221)
 at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:119)
 at 
 org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772)
 at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376)
 at 
 org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
 at 
 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)

 C) One CPU core pegged at 100%.  Manually triggered GC.  No memory
 freed.  Thread dump shows:
 Lucene Merge Thread #0 daemon prio=10 tid=0x7f5a3c064800
 nid=0x7a74 runnable [0x7f5a41c5f000]
java.lang.Thread.State: RUNNABLE
 at 
 org.apache.lucene.codecs.DocValuesConsumer$1$1.hasNext(DocValuesConsumer.java:127)
 at 
 org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:108)
 at 
 org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:92)
 at 
 org.apache.lucene.codecs.DocValuesConsumer.mergeNumericField(DocValuesConsumer.java:112)
 at 
 org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:221)
 at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:119)
 at 
 org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772)
 at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376)
 at 
 

Possible memory leak after segment merge? (related to DocValues?)

2013-12-23 Thread Greg Preston
Hello,

I'm loading up our solr cloud with data (from a solrj client) and
running into a weird memory issue.  I can reliably reproduce the
problem.

- Using Solr Cloud 4.4.0 (also replicated with 4.6.0)
- 24 solr nodes (one shard each), spread across 3 physical hosts, each
host has 256G of memory
- index and tlogs on ssd
- Xmx=7G, G1GC
- Java 1.7.0_25
- schema and solrconfig.xml attached

I'm using composite routing to route documents with the same clientId
to the same shard.  After several hours of indexing, I occasionally
see an IndexWriter go OOM.  I think that's a symptom.  When that
happens, indexing continues, and that node's tlog starts to grow.
When I notice this, I stop indexing, and bounce the problem node.
That's where it gets interesting.

Upon bouncing, the tlog replays, and then segments merge.  Once the
merging is complete, the heap is fairly full, and forced full GC only
helps a little.  But if I then bounce the node again, the heap usage
goes way down, and stays low until the next segment merge.  I believe
segment merges are also what causes the original OOM.

More details:

Index on disk for this node is ~13G, tlog is ~2.5G.
See attached mem1.png.  This is a jconsole view of the heap during the
following:

(Solr cloud node started at the left edge of this graph)

A) One CPU core pegged at 100%.  Thread dump shows:
Lucene Merge Thread #0 daemon prio=10 tid=0x7f5a3c064800
nid=0x7a74 runnable [0x7f5a41c5f000]
   java.lang.Thread.State: RUNNABLE
at org.apache.lucene.util.fst.Builder.add(Builder.java:397)
at 
org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter.finishTerm(BlockTreeTermsWriter.java:1000)
at org.apache.lucene.codecs.TermsConsumer.merge(TermsConsumer.java:112)
at org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:72)
at 
org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:365)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:98)
at 
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376)
at 
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)

B) One CPU core pegged at 100%.  Manually triggered GC.  Lots of
memory freed.  Thread dump shows:
Lucene Merge Thread #0 daemon prio=10 tid=0x7f5a3c064800
nid=0x7a74 runnable [0x7f5a41c5f000]
   java.lang.Thread.State: RUNNABLE
at 
org.apache.lucene.codecs.DocValuesConsumer$1$1.hasNext(DocValuesConsumer.java:127)
at 
org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:144)
at 
org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:92)
at 
org.apache.lucene.codecs.DocValuesConsumer.mergeNumericField(DocValuesConsumer.java:112)
at 
org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:221)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:119)
at 
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376)
at 
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)

C) One CPU core pegged at 100%.  Manually triggered GC.  No memory
freed.  Thread dump shows:
Lucene Merge Thread #0 daemon prio=10 tid=0x7f5a3c064800
nid=0x7a74 runnable [0x7f5a41c5f000]
   java.lang.Thread.State: RUNNABLE
at 
org.apache.lucene.codecs.DocValuesConsumer$1$1.hasNext(DocValuesConsumer.java:127)
at 
org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:108)
at 
org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:92)
at 
org.apache.lucene.codecs.DocValuesConsumer.mergeNumericField(DocValuesConsumer.java:112)
at 
org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:221)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:119)
at 
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376)
at 
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)

D) One CPU core pegged at 100%.  Thread dump shows:
Lucene Merge Thread #0 daemon prio=10 tid=0x7f5a3c064800
nid=0x7a74 runnable [0x7f5a41c5f000]
   java.lang.Thread.State: RUNNABLE
 

Re: Possible memory leak after segment merge? (related to DocValues?)

2013-12-23 Thread Joel Bernstein
Hi Greg,

I have a suspicion that the problem might be related or exacerbated be
overly large tlogs. Can you adjust your autoCommits to 15 seconds. Leave
openSearcher = false. I would remove the maxDoc as well. If you try
rerunning under those commit setting it's possible the OOM errors will stop
occurring.

Joel

Joel Bernstein
Search Engineer at Heliosearch


On Mon, Dec 23, 2013 at 3:07 PM, Greg Preston gpres...@marinsoftware.comwrote:

 Hello,

 I'm loading up our solr cloud with data (from a solrj client) and
 running into a weird memory issue.  I can reliably reproduce the
 problem.

 - Using Solr Cloud 4.4.0 (also replicated with 4.6.0)
 - 24 solr nodes (one shard each), spread across 3 physical hosts, each
 host has 256G of memory
 - index and tlogs on ssd
 - Xmx=7G, G1GC
 - Java 1.7.0_25
 - schema and solrconfig.xml attached

 I'm using composite routing to route documents with the same clientId
 to the same shard.  After several hours of indexing, I occasionally
 see an IndexWriter go OOM.  I think that's a symptom.  When that
 happens, indexing continues, and that node's tlog starts to grow.
 When I notice this, I stop indexing, and bounce the problem node.
 That's where it gets interesting.

 Upon bouncing, the tlog replays, and then segments merge.  Once the
 merging is complete, the heap is fairly full, and forced full GC only
 helps a little.  But if I then bounce the node again, the heap usage
 goes way down, and stays low until the next segment merge.  I believe
 segment merges are also what causes the original OOM.

 More details:

 Index on disk for this node is ~13G, tlog is ~2.5G.
 See attached mem1.png.  This is a jconsole view of the heap during the
 following:

 (Solr cloud node started at the left edge of this graph)

 A) One CPU core pegged at 100%.  Thread dump shows:
 Lucene Merge Thread #0 daemon prio=10 tid=0x7f5a3c064800
 nid=0x7a74 runnable [0x7f5a41c5f000]
java.lang.Thread.State: RUNNABLE
 at org.apache.lucene.util.fst.Builder.add(Builder.java:397)
 at
 org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter.finishTerm(BlockTreeTermsWriter.java:1000)
 at
 org.apache.lucene.codecs.TermsConsumer.merge(TermsConsumer.java:112)
 at
 org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:72)
 at
 org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:365)
 at
 org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:98)
 at
 org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772)
 at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376)
 at
 org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
 at
 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)

 B) One CPU core pegged at 100%.  Manually triggered GC.  Lots of
 memory freed.  Thread dump shows:
 Lucene Merge Thread #0 daemon prio=10 tid=0x7f5a3c064800
 nid=0x7a74 runnable [0x7f5a41c5f000]
java.lang.Thread.State: RUNNABLE
 at
 org.apache.lucene.codecs.DocValuesConsumer$1$1.hasNext(DocValuesConsumer.java:127)
 at
 org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:144)
 at
 org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:92)
 at
 org.apache.lucene.codecs.DocValuesConsumer.mergeNumericField(DocValuesConsumer.java:112)
 at
 org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:221)
 at
 org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:119)
 at
 org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772)
 at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376)
 at
 org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
 at
 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)

 C) One CPU core pegged at 100%.  Manually triggered GC.  No memory
 freed.  Thread dump shows:
 Lucene Merge Thread #0 daemon prio=10 tid=0x7f5a3c064800
 nid=0x7a74 runnable [0x7f5a41c5f000]
java.lang.Thread.State: RUNNABLE
 at
 org.apache.lucene.codecs.DocValuesConsumer$1$1.hasNext(DocValuesConsumer.java:127)
 at
 org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:108)
 at
 org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:92)
 at
 org.apache.lucene.codecs.DocValuesConsumer.mergeNumericField(DocValuesConsumer.java:112)
 at
 org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:221)
 at
 org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:119)
 at
 

Re: Possible memory leak after segment merge? (related to DocValues?)

2013-12-23 Thread Greg Preston
Hi Joel,

Thanks for the suggestion.  I could see how decreasing autoCommit time
would reduce tlog size, and how that could possibly be related to the
original OOM error.  I'm not seeing how that would make any difference
once a tlog exists, though?

I have a saved off copy of my data dir that has the 13G index and 2.5G
tlog.  So I can reproduce the replay - merge - memory usage issue
very quickly.  Changing the autoCommit to possibly avoid the initial
OOM will take a good bit longer to try to reproduce.  I may try that
later in the week.

-Greg


On Mon, Dec 23, 2013 at 12:20 PM, Joel Bernstein joels...@gmail.com wrote:
 Hi Greg,

 I have a suspicion that the problem might be related or exacerbated be
 overly large tlogs. Can you adjust your autoCommits to 15 seconds. Leave
 openSearcher = false. I would remove the maxDoc as well. If you try
 rerunning under those commit setting it's possible the OOM errors will stop
 occurring.

 Joel

 Joel Bernstein
 Search Engineer at Heliosearch


 On Mon, Dec 23, 2013 at 3:07 PM, Greg Preston 
 gpres...@marinsoftware.comwrote:

 Hello,

 I'm loading up our solr cloud with data (from a solrj client) and
 running into a weird memory issue.  I can reliably reproduce the
 problem.

 - Using Solr Cloud 4.4.0 (also replicated with 4.6.0)
 - 24 solr nodes (one shard each), spread across 3 physical hosts, each
 host has 256G of memory
 - index and tlogs on ssd
 - Xmx=7G, G1GC
 - Java 1.7.0_25
 - schema and solrconfig.xml attached

 I'm using composite routing to route documents with the same clientId
 to the same shard.  After several hours of indexing, I occasionally
 see an IndexWriter go OOM.  I think that's a symptom.  When that
 happens, indexing continues, and that node's tlog starts to grow.
 When I notice this, I stop indexing, and bounce the problem node.
 That's where it gets interesting.

 Upon bouncing, the tlog replays, and then segments merge.  Once the
 merging is complete, the heap is fairly full, and forced full GC only
 helps a little.  But if I then bounce the node again, the heap usage
 goes way down, and stays low until the next segment merge.  I believe
 segment merges are also what causes the original OOM.

 More details:

 Index on disk for this node is ~13G, tlog is ~2.5G.
 See attached mem1.png.  This is a jconsole view of the heap during the
 following:

 (Solr cloud node started at the left edge of this graph)

 A) One CPU core pegged at 100%.  Thread dump shows:
 Lucene Merge Thread #0 daemon prio=10 tid=0x7f5a3c064800
 nid=0x7a74 runnable [0x7f5a41c5f000]
java.lang.Thread.State: RUNNABLE
 at org.apache.lucene.util.fst.Builder.add(Builder.java:397)
 at
 org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter.finishTerm(BlockTreeTermsWriter.java:1000)
 at
 org.apache.lucene.codecs.TermsConsumer.merge(TermsConsumer.java:112)
 at
 org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:72)
 at
 org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:365)
 at
 org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:98)
 at
 org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772)
 at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376)
 at
 org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
 at
 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)

 B) One CPU core pegged at 100%.  Manually triggered GC.  Lots of
 memory freed.  Thread dump shows:
 Lucene Merge Thread #0 daemon prio=10 tid=0x7f5a3c064800
 nid=0x7a74 runnable [0x7f5a41c5f000]
java.lang.Thread.State: RUNNABLE
 at
 org.apache.lucene.codecs.DocValuesConsumer$1$1.hasNext(DocValuesConsumer.java:127)
 at
 org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:144)
 at
 org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:92)
 at
 org.apache.lucene.codecs.DocValuesConsumer.mergeNumericField(DocValuesConsumer.java:112)
 at
 org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:221)
 at
 org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:119)
 at
 org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772)
 at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376)
 at
 org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
 at
 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)

 C) One CPU core pegged at 100%.  Manually triggered GC.  No memory
 freed.  Thread dump shows:
 Lucene Merge Thread #0 daemon prio=10 tid=0x7f5a3c064800
 nid=0x7a74 runnable [0x7f5a41c5f000]
java.lang.Thread.State: RUNNABLE
 at
 

Re: Possible memory leak after segment merge? (related to DocValues?)

2013-12-23 Thread Joel Bernstein
Greg,

There is a memory component to the tlog, which supports realtime gets. This
memory component grows until there is a commit, so it will appear like a
leak. I suspect that replaying a tlog that was big enough to possibly cause
OOM is also problematic.

One thing you might want to try is going to 15 second commits, and then
kill the Solr instance between the commits. Then watch the memory as the
replaying occurs with the smaller tlog.

Joel




Joel Bernstein
Search Engineer at Heliosearch


On Mon, Dec 23, 2013 at 4:17 PM, Greg Preston gpres...@marinsoftware.comwrote:

 Hi Joel,

 Thanks for the suggestion.  I could see how decreasing autoCommit time
 would reduce tlog size, and how that could possibly be related to the
 original OOM error.  I'm not seeing how that would make any difference
 once a tlog exists, though?

 I have a saved off copy of my data dir that has the 13G index and 2.5G
 tlog.  So I can reproduce the replay - merge - memory usage issue
 very quickly.  Changing the autoCommit to possibly avoid the initial
 OOM will take a good bit longer to try to reproduce.  I may try that
 later in the week.

 -Greg


 On Mon, Dec 23, 2013 at 12:20 PM, Joel Bernstein joels...@gmail.com
 wrote:
  Hi Greg,
 
  I have a suspicion that the problem might be related or exacerbated be
  overly large tlogs. Can you adjust your autoCommits to 15 seconds. Leave
  openSearcher = false. I would remove the maxDoc as well. If you try
  rerunning under those commit setting it's possible the OOM errors will
 stop
  occurring.
 
  Joel
 
  Joel Bernstein
  Search Engineer at Heliosearch
 
 
  On Mon, Dec 23, 2013 at 3:07 PM, Greg Preston 
 gpres...@marinsoftware.comwrote:
 
  Hello,
 
  I'm loading up our solr cloud with data (from a solrj client) and
  running into a weird memory issue.  I can reliably reproduce the
  problem.
 
  - Using Solr Cloud 4.4.0 (also replicated with 4.6.0)
  - 24 solr nodes (one shard each), spread across 3 physical hosts, each
  host has 256G of memory
  - index and tlogs on ssd
  - Xmx=7G, G1GC
  - Java 1.7.0_25
  - schema and solrconfig.xml attached
 
  I'm using composite routing to route documents with the same clientId
  to the same shard.  After several hours of indexing, I occasionally
  see an IndexWriter go OOM.  I think that's a symptom.  When that
  happens, indexing continues, and that node's tlog starts to grow.
  When I notice this, I stop indexing, and bounce the problem node.
  That's where it gets interesting.
 
  Upon bouncing, the tlog replays, and then segments merge.  Once the
  merging is complete, the heap is fairly full, and forced full GC only
  helps a little.  But if I then bounce the node again, the heap usage
  goes way down, and stays low until the next segment merge.  I believe
  segment merges are also what causes the original OOM.
 
  More details:
 
  Index on disk for this node is ~13G, tlog is ~2.5G.
  See attached mem1.png.  This is a jconsole view of the heap during the
  following:
 
  (Solr cloud node started at the left edge of this graph)
 
  A) One CPU core pegged at 100%.  Thread dump shows:
  Lucene Merge Thread #0 daemon prio=10 tid=0x7f5a3c064800
  nid=0x7a74 runnable [0x7f5a41c5f000]
 java.lang.Thread.State: RUNNABLE
  at org.apache.lucene.util.fst.Builder.add(Builder.java:397)
  at
 
 org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter.finishTerm(BlockTreeTermsWriter.java:1000)
  at
  org.apache.lucene.codecs.TermsConsumer.merge(TermsConsumer.java:112)
  at
  org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:72)
  at
  org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:365)
  at
  org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:98)
  at
  org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772)
  at
 org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376)
  at
 
 org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
  at
 
 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
 
  B) One CPU core pegged at 100%.  Manually triggered GC.  Lots of
  memory freed.  Thread dump shows:
  Lucene Merge Thread #0 daemon prio=10 tid=0x7f5a3c064800
  nid=0x7a74 runnable [0x7f5a41c5f000]
 java.lang.Thread.State: RUNNABLE
  at
 
 org.apache.lucene.codecs.DocValuesConsumer$1$1.hasNext(DocValuesConsumer.java:127)
  at
 
 org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:144)
  at
 
 org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:92)
  at
 
 org.apache.lucene.codecs.DocValuesConsumer.mergeNumericField(DocValuesConsumer.java:112)
  at
  org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:221)
  at
  

Re: Possible memory leak after segment merge? (related to DocValues?)

2013-12-23 Thread Greg Preston
Interesting.  In my original post, the memory growth (during restart)
occurs after the tlog is done replaying, but during the merge.

-Greg


On Mon, Dec 23, 2013 at 2:06 PM, Joel Bernstein joels...@gmail.com wrote:
 Greg,

 There is a memory component to the tlog, which supports realtime gets. This
 memory component grows until there is a commit, so it will appear like a
 leak. I suspect that replaying a tlog that was big enough to possibly cause
 OOM is also problematic.

 One thing you might want to try is going to 15 second commits, and then
 kill the Solr instance between the commits. Then watch the memory as the
 replaying occurs with the smaller tlog.

 Joel




 Joel Bernstein
 Search Engineer at Heliosearch


 On Mon, Dec 23, 2013 at 4:17 PM, Greg Preston 
 gpres...@marinsoftware.comwrote:

 Hi Joel,

 Thanks for the suggestion.  I could see how decreasing autoCommit time
 would reduce tlog size, and how that could possibly be related to the
 original OOM error.  I'm not seeing how that would make any difference
 once a tlog exists, though?

 I have a saved off copy of my data dir that has the 13G index and 2.5G
 tlog.  So I can reproduce the replay - merge - memory usage issue
 very quickly.  Changing the autoCommit to possibly avoid the initial
 OOM will take a good bit longer to try to reproduce.  I may try that
 later in the week.

 -Greg


 On Mon, Dec 23, 2013 at 12:20 PM, Joel Bernstein joels...@gmail.com
 wrote:
  Hi Greg,
 
  I have a suspicion that the problem might be related or exacerbated be
  overly large tlogs. Can you adjust your autoCommits to 15 seconds. Leave
  openSearcher = false. I would remove the maxDoc as well. If you try
  rerunning under those commit setting it's possible the OOM errors will
 stop
  occurring.
 
  Joel
 
  Joel Bernstein
  Search Engineer at Heliosearch
 
 
  On Mon, Dec 23, 2013 at 3:07 PM, Greg Preston 
 gpres...@marinsoftware.comwrote:
 
  Hello,
 
  I'm loading up our solr cloud with data (from a solrj client) and
  running into a weird memory issue.  I can reliably reproduce the
  problem.
 
  - Using Solr Cloud 4.4.0 (also replicated with 4.6.0)
  - 24 solr nodes (one shard each), spread across 3 physical hosts, each
  host has 256G of memory
  - index and tlogs on ssd
  - Xmx=7G, G1GC
  - Java 1.7.0_25
  - schema and solrconfig.xml attached
 
  I'm using composite routing to route documents with the same clientId
  to the same shard.  After several hours of indexing, I occasionally
  see an IndexWriter go OOM.  I think that's a symptom.  When that
  happens, indexing continues, and that node's tlog starts to grow.
  When I notice this, I stop indexing, and bounce the problem node.
  That's where it gets interesting.
 
  Upon bouncing, the tlog replays, and then segments merge.  Once the
  merging is complete, the heap is fairly full, and forced full GC only
  helps a little.  But if I then bounce the node again, the heap usage
  goes way down, and stays low until the next segment merge.  I believe
  segment merges are also what causes the original OOM.
 
  More details:
 
  Index on disk for this node is ~13G, tlog is ~2.5G.
  See attached mem1.png.  This is a jconsole view of the heap during the
  following:
 
  (Solr cloud node started at the left edge of this graph)
 
  A) One CPU core pegged at 100%.  Thread dump shows:
  Lucene Merge Thread #0 daemon prio=10 tid=0x7f5a3c064800
  nid=0x7a74 runnable [0x7f5a41c5f000]
 java.lang.Thread.State: RUNNABLE
  at org.apache.lucene.util.fst.Builder.add(Builder.java:397)
  at
 
 org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter.finishTerm(BlockTreeTermsWriter.java:1000)
  at
  org.apache.lucene.codecs.TermsConsumer.merge(TermsConsumer.java:112)
  at
  org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:72)
  at
  org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:365)
  at
  org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:98)
  at
  org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772)
  at
 org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376)
  at
 
 org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
  at
 
 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
 
  B) One CPU core pegged at 100%.  Manually triggered GC.  Lots of
  memory freed.  Thread dump shows:
  Lucene Merge Thread #0 daemon prio=10 tid=0x7f5a3c064800
  nid=0x7a74 runnable [0x7f5a41c5f000]
 java.lang.Thread.State: RUNNABLE
  at
 
 org.apache.lucene.codecs.DocValuesConsumer$1$1.hasNext(DocValuesConsumer.java:127)
  at
 
 org.apache.lucene.codecs.lucene42.Lucene42DocValuesConsumer.addNumericField(Lucene42DocValuesConsumer.java:144)
  at
 
 

Re: Possible memory leak after segment merge? (related to DocValues?)

2013-12-23 Thread Joel Bernstein
Yeah, sounds like a leak might be there. Having the huge tlog might have
just magnified it's importance.

Joel Bernstein
Search Engineer at Heliosearch


On Mon, Dec 23, 2013 at 5:17 PM, Greg Preston gpres...@marinsoftware.comwrote:

 Interesting.  In my original post, the memory growth (during restart)
 occurs after the tlog is done replaying, but during the merge.

 -Greg


 On Mon, Dec 23, 2013 at 2:06 PM, Joel Bernstein joels...@gmail.com
 wrote:
  Greg,
 
  There is a memory component to the tlog, which supports realtime gets.
 This
  memory component grows until there is a commit, so it will appear like a
  leak. I suspect that replaying a tlog that was big enough to possibly
 cause
  OOM is also problematic.
 
  One thing you might want to try is going to 15 second commits, and then
  kill the Solr instance between the commits. Then watch the memory as the
  replaying occurs with the smaller tlog.
 
  Joel
 
 
 
 
  Joel Bernstein
  Search Engineer at Heliosearch
 
 
  On Mon, Dec 23, 2013 at 4:17 PM, Greg Preston 
 gpres...@marinsoftware.comwrote:
 
  Hi Joel,
 
  Thanks for the suggestion.  I could see how decreasing autoCommit time
  would reduce tlog size, and how that could possibly be related to the
  original OOM error.  I'm not seeing how that would make any difference
  once a tlog exists, though?
 
  I have a saved off copy of my data dir that has the 13G index and 2.5G
  tlog.  So I can reproduce the replay - merge - memory usage issue
  very quickly.  Changing the autoCommit to possibly avoid the initial
  OOM will take a good bit longer to try to reproduce.  I may try that
  later in the week.
 
  -Greg
 
 
  On Mon, Dec 23, 2013 at 12:20 PM, Joel Bernstein joels...@gmail.com
  wrote:
   Hi Greg,
  
   I have a suspicion that the problem might be related or exacerbated be
   overly large tlogs. Can you adjust your autoCommits to 15 seconds.
 Leave
   openSearcher = false. I would remove the maxDoc as well. If you try
   rerunning under those commit setting it's possible the OOM errors will
  stop
   occurring.
  
   Joel
  
   Joel Bernstein
   Search Engineer at Heliosearch
  
  
   On Mon, Dec 23, 2013 at 3:07 PM, Greg Preston 
  gpres...@marinsoftware.comwrote:
  
   Hello,
  
   I'm loading up our solr cloud with data (from a solrj client) and
   running into a weird memory issue.  I can reliably reproduce the
   problem.
  
   - Using Solr Cloud 4.4.0 (also replicated with 4.6.0)
   - 24 solr nodes (one shard each), spread across 3 physical hosts,
 each
   host has 256G of memory
   - index and tlogs on ssd
   - Xmx=7G, G1GC
   - Java 1.7.0_25
   - schema and solrconfig.xml attached
  
   I'm using composite routing to route documents with the same clientId
   to the same shard.  After several hours of indexing, I occasionally
   see an IndexWriter go OOM.  I think that's a symptom.  When that
   happens, indexing continues, and that node's tlog starts to grow.
   When I notice this, I stop indexing, and bounce the problem node.
   That's where it gets interesting.
  
   Upon bouncing, the tlog replays, and then segments merge.  Once the
   merging is complete, the heap is fairly full, and forced full GC only
   helps a little.  But if I then bounce the node again, the heap usage
   goes way down, and stays low until the next segment merge.  I believe
   segment merges are also what causes the original OOM.
  
   More details:
  
   Index on disk for this node is ~13G, tlog is ~2.5G.
   See attached mem1.png.  This is a jconsole view of the heap during
 the
   following:
  
   (Solr cloud node started at the left edge of this graph)
  
   A) One CPU core pegged at 100%.  Thread dump shows:
   Lucene Merge Thread #0 daemon prio=10 tid=0x7f5a3c064800
   nid=0x7a74 runnable [0x7f5a41c5f000]
  java.lang.Thread.State: RUNNABLE
   at org.apache.lucene.util.fst.Builder.add(Builder.java:397)
   at
  
 
 org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter.finishTerm(BlockTreeTermsWriter.java:1000)
   at
   org.apache.lucene.codecs.TermsConsumer.merge(TermsConsumer.java:112)
   at
   org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:72)
   at
  
 org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:365)
   at
   org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:98)
   at
  
 org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772)
   at
  org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376)
   at
  
 
 org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405)
   at
  
 
 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482)
  
   B) One CPU core pegged at 100%.  Manually triggered GC.  Lots of
   memory freed.  Thread dump shows:
   Lucene Merge Thread #0 daemon prio=10 tid=0x7f5a3c064800
   nid=0x7a74 runnable [0x7f5a41c5f000]
  

Re: Solr Core Reload causing JVM Memory Leak through FieldCache/LRUCache/LFUCache

2013-11-15 Thread Umesh Prasad
Mailing list by default removes attachments. So uploaded it to google drive
..

https://drive.google.com/file/d/0B-RnB4e-vaJhX280NVllMUdHYWs/edit?usp=sharing



On Fri, Nov 15, 2013 at 2:28 PM, Umesh Prasad umesh.i...@gmail.com wrote:

 Hi All,
 We are seeing memory leaks in our Search application whenever core
 reload happens after replication.
We are using Solr 3.6.2 and I have observed this consistently on all
 servers.

 The leak suspect analysis from MAT is attached with the mail.

  #1425afb4a706064b_  Problem Suspect 1

 One instance of *org.apache.lucene.search.FieldCacheImpl*loaded by 
 *org.apache.catalina.loader.WebappClassLoader
 @ 0x7f7b0a5b8b30* occupies *8,726,099,312 (35.49%)* bytes. The memory is
 accumulated in one instance of*java.util.HashMap$Entry[]* loaded by 
 *system
 class loader*.

 *Keywords*
 org.apache.catalina.loader.WebappClassLoader @ 0x7f7b0a5b8b30
 java.util.HashMap$Entry[]
 org.apache.lucene.search.FieldCacheImpl

 Problem Suspect 2

 69 instances of *org.apache.solr.util.ConcurrentLRUCache*, loaded by 
 *org.apache.catalina.loader.WebappClassLoader
 @ 0x7f7b0a5b8b30* occupy *6,309,187,392 (25.66%)* bytes.

 Biggest instances:

- org.apache.solr.util.ConcurrentLRUCache @
0x7f7fe74ef120 - 755,575,672 (3.07%) bytes.
- org.apache.solr.util.ConcurrentLRUCache @
0x7f7e74b7a068 - 728,731,344 (2.96%) bytes.
- org.apache.solr.util.ConcurrentLRUCache @
0x7f7d0a6bd1b8 - 711,828,392 (2.90%) bytes.
- org.apache.solr.util.ConcurrentLRUCache @
0x7f7c6c12e800 - 708,657,624 (2.88%) bytes.
- org.apache.solr.util.ConcurrentLRUCache @
0x7f7fcb092058 - 568,473,352 (2.31%) bytes.
- org.apache.solr.util.ConcurrentLRUCache @
0x7f7f268cb2f0 - 568,400,040 (2.31%) bytes.
- org.apache.solr.util.ConcurrentLRUCache @
0x7f7e31b60c58 - 544,078,600 (2.21%) bytes.
- org.apache.solr.util.ConcurrentLRUCache @
0x7f7e65c2b2d8 - 489,578,480 (1.99%) bytes.
- org.apache.solr.util.ConcurrentLRUCache @
0x7f7d81ea8538 - 467,833,720 (1.90%) bytes.
- org.apache.solr.util.ConcurrentLRUCache @
0x7f7f31996508 - 444,383,992 (1.81%) bytes.



 *Keywords*
 org.apache.catalina.loader.WebappClassLoader @ 0x7f7b0a5b8b30
 org.apache.solr.util.ConcurrentLRUCache
 Details » http://pages/24.html

 194 instances of *org.apache.solr.util.ConcurrentLFUCache*, loaded by 
 *org.apache.catalina.loader.WebappClassLoader
 @ 0x7f7b0a5b8b30* occupy *4,583,727,104 (18.64%)* bytes.

 Biggest instances:

- org.apache.solr.util.ConcurrentLFUCache @
0x7f7cdd4735a0 - 410,628,176 (1.67%) bytes.
- org.apache.solr.util.ConcurrentLFUCache @
0x7f7c7d48e180 - 390,690,864 (1.59%) bytes.
- org.apache.solr.util.ConcurrentLFUCache @
0x7f7f1edfd008 - 348,193,312 (1.42%) bytes.
- org.apache.solr.util.ConcurrentLFUCache @
0x7f7f37b01990 - 340,595,920 (1.39%) bytes.
- org.apache.solr.util.ConcurrentLFUCache @
0x7f7fe02d8dd8 - 274,611,632 (1.12%) bytes.
- org.apache.solr.util.ConcurrentLFUCache @
0x7f7fa9dcfb20 - 253,848,232 (1.03%) bytes.



 *Keywords*
 org.apache.catalina.loader.WebappClassLoader @ 0x7f7b0a5b8b30
 org.apache.solr.util.ConcurrentLFUCache


 ---
 Thanks  Regards
 Umesh Prasad

 SDE @ Flipkart  : The Online Megastore at your doorstep ..




-- 
---
Thanks  Regards
Umesh Prasad


Re: solr 3.4: memory leak?

2013-04-13 Thread Dmitry Kan
Hi André,

Thanks a lot for your response and the relevant information.

Indeed, we have noticed the similar behavior when hot reloading a web-app
with solr after changing some of the classes. The only bad consequence of
this that luckily does not happen too often, is that the web app becomes
stale. So we prefer actually (re)deploying via tomcat restart.

Thanks,

Dmitry

On Thu, Apr 11, 2013 at 6:01 PM, Andre Bois-Crettez
andre.b...@kelkoo.comwrote:

 On 04/11/2013 08:49 AM, Dmitry Kan wrote:

 SEVERE: The web application [/solr] appears to have started a thread named
 [**MultiThreadedHttpConnectionMan**ager cleanup] but has failed to stop
 it.
 This is very likely to create a memory leak.
 Apr 11, 2013 6:38:14 AM org.apache.catalina.loader.**WebappClassLoader
 clearThreadLocalMap


 To my understanding this kind of leak only is a problem if the Java code
 is *reloaded* while the tomcat JVM is not stopped.
 For example when reloadable=true in the Context of the web application
 and you change files in WEB-INF or .war : what would happen is that each
 existing threadlocals would continue to live (potentially holding
 references to other stuff and preventing GC) while new threadlocals are
 created.

 http://wiki.apache.org/tomcat/**MemoryLeakProtectionhttp://wiki.apache.org/tomcat/MemoryLeakProtection

 If you stop tomcat entirely each time, you should be safe.


 --
 André Bois-Crettez

 Search technology, Kelkoo
 http://www.kelkoo.com/


 Kelkoo SAS
 Société par Actions Simplifiée
 Au capital de € 4.168.964,30
 Siège social : 8, rue du Sentier 75002 Paris
 425 093 069 RCS Paris

 Ce message et les pièces jointes sont confidentiels et établis à
 l'attention exclusive de leurs destinataires. Si vous n'êtes pas le
 destinataire de ce message, merci de le détruire et d'en avertir
 l'expéditeur.



solr 3.4: memory leak?

2013-04-11 Thread Dmitry Kan
Hello list,

We are on solr 3.4. After running a shard for some time (1w or so), we
sometimes need to shut it down for changing schema or move it around.

We have noticed the following memory leak related messages in the tomcat
logs. Could this be a sign of us doing something wrong or solr. Or can they
be safely ignored?

If not, does the memory leak happen during running and manifest itself in
the logs on shutdown only? It is clear, that if JVM with tomcat has been
shutdown, it shouldn't leak anything as such.

INFO: Closing Searcher@1e58744d main
fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Apr 11, 2013 6:38:14 AM org.apache.catalina.loader.WebappClassLoader
clearReferencesThreads
SEVERE: The web application [/solr] appears to have started a thread named
[MultiThreadedHttpConnectionManager cleanup] but has failed to stop it.
This is very likely to create a memory leak.
Apr 11, 2013 6:38:14 AM org.apache.catalina.loader.WebappClassLoader
clearThreadLocalMap
SEVERE: The web application [/solr] created a ThreadLocal with key of type
[org.apache.solr.schema.DateField.ThreadLocalDateFormat] (value
[org.apache.solr.schema.DateField$ThreadLocalDateFormat@deae877]) and a
value of type [org.apache.solr.schema.DateField.ISO8601CanonicalDateFormat]
(value [org.apache.solr.schema.DateField$ISO8601CanonicalDateFormat@6b2ed43a])
but failed to remove it when the web application was stopped. This is very
likely to create a memory leak.
Apr 11, 2013 6:38:14 AM org.apache.catalina.loader.WebappClassLoader
clearThreadLocalMap
SEVERE: The web application [/solr] created a ThreadLocal with key of type
[org.apache.solr.schema.DateField.ThreadLocalDateFormat] (value
[org.apache.solr.schema.DateField$ThreadLocalDateFormat@deae877]) and a
value of type [org.apache.solr.schema.DateField.ISO8601CanonicalDateFormat]
(value [org.apache.solr.schema.DateField$ISO8601CanonicalDateFormat@6b2ed43a])
but failed to remove it when the web application was stopped. This is very
likely to create a memory leak.
Apr 11, 2013 6:38:14 AM org.apache.catalina.loader.WebappClassLoader
clearThreadLocalMap
SEVERE: The web application [/solr] created a ThreadLocal with key of type
[org.apache.solr.schema.DateField.ThreadLocalDateFormat] (value
[org.apache.solr.schema.DateField$ThreadLocalDateFormat@deae877]) and a
value of type [org.apache.solr.schema.DateField.ISO8601CanonicalDateFormat]
(value [org.apache.solr.schema.DateField$ISO8601CanonicalDateFormat@6b2ed43a])
but failed to remove it when the web application was stopped. This is very
likely to create a memory leak.
Apr 11, 2013 6:38:14 AM org.apache.catalina.loader.WebappClassLoader
clearThreadLocalMap
SEVERE: The web application [/solr] created a ThreadLocal with key of type
[org.apache.solr.schema.DateField.ThreadLocalDateFormat] (value
[org.apache.solr.schema.DateField$ThreadLocalDateFormat@deae877]) and a
value of type [org.apache.solr.schema.DateField.ISO8601CanonicalDateFormat]
(value [org.apache.solr.schema.DateField$ISO8601CanonicalDateFormat@6b2ed43a])
but failed to remove it when the web application was stopped. This is very
likely to create a memory leak.
Apr 11, 2013 6:38:14 AM org.apache.catalina.loader.WebappClassLoader
clearThreadLocalMap
SEVERE: The web application [/solr] created a ThreadLocal with key of type
[org.apache.solr.schema.DateField.ThreadLocalDateFormat] (value
[org.apache.solr.schema.DateField$ThreadLocalDateFormat@deae877]) and a
value of type [org.apache.solr.schema.DateField.ISO8601CanonicalDateFormat]
(value [org.apache.solr.schema.DateField$ISO8601CanonicalDateFormat@6b2ed43a])
but failed to remove it when the web application was stopped. This is very
likely to create a memory leak.
Apr 11, 2013 6:38:14 AM org.apache.coyote.http11.Http11Protocol destroy
INFO: Stopping Coyote HTTP/1.1 on http-port_number


Re: solr 3.4: memory leak?

2013-04-11 Thread Andre Bois-Crettez

On 04/11/2013 08:49 AM, Dmitry Kan wrote:

SEVERE: The web application [/solr] appears to have started a thread named
[MultiThreadedHttpConnectionManager cleanup] but has failed to stop it.
This is very likely to create a memory leak.
Apr 11, 2013 6:38:14 AM org.apache.catalina.loader.WebappClassLoader
clearThreadLocalMap


To my understanding this kind of leak only is a problem if the Java code
is *reloaded* while the tomcat JVM is not stopped.
For example when reloadable=true in the Context of the web application
and you change files in WEB-INF or .war : what would happen is that each
existing threadlocals would continue to live (potentially holding
references to other stuff and preventing GC) while new threadlocals are
created.

http://wiki.apache.org/tomcat/MemoryLeakProtection

If you stop tomcat entirely each time, you should be safe.


--
André Bois-Crettez

Search technology, Kelkoo
http://www.kelkoo.com/


Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.


Re: memory leak - multiple cores

2013-02-12 Thread Michael Della Bitta
Marcos,

You could consider using the CoreAdminHandler instead:

http://wiki.apache.org/solr/CoreAdmin#CoreAdminHandler

It works extremely well.

Otherwise, you should periodically restart Tomcat. I'm not sure how
much memory would be leaked, but it's likely not going to have much of
an impact for a few iterations.


Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Mon, Feb 11, 2013 at 8:45 PM, Marcos Mendez mar...@jitisoft.com wrote:
 Hi Michael,

 Yes, we do intend to reload Solr when deploying new cores. So we deploy it, 
 update solr.xml and then restart Solr only. So this will happen sometimes in 
 production, but mostly testing. Which means it will be a real pain. Any way 
 to fix this?

 Also, I'm running geronimo with -Xmx1024m -XX:MaxPermSize=256m.

 Regards,
 Marcos

 On Feb 6, 2013, at 10:54 AM, Michael Della Bitta wrote:

 Marcos,

 The later 3 errors are common and won't pose a problem unless you
 intend to reload the Solr application without restarting Geronimo
 often.

 The first error, however, shouldn't happen. Have you changed the size
 of PermGen at all? I noticed this error while testing Solr 4.0 in
 Tomcat, but haven't seen it with Solr 4.1 (yet), so if you're on 4.0,
 you might want to try upgrading.


 Michael Della Bitta

 
 Appinions
 18 East 41st Street, 2nd Floor
 New York, NY 10017-6271

 www.appinions.com

 Where Influence Isn’t a Game


 On Wed, Feb 6, 2013 at 6:09 AM, Marcos Mendez mar...@jitisoft.com wrote:
 Hi,

 I'm deploying the SOLR war in Geronimo, with multiple cores. I'm seeing the
 following issue and it eats up a lot of memory when shutting down. Has
 anyone seen this and have an idea how to solve it?

 Exception in thread DefaultThreadPool 196 java.lang.OutOfMemoryError:
 PermGen space
 2013-02-05 20:13:34,747 ERROR [ConcurrentLRUCache] ConcurrentLRUCache was
 not destroyed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE
 LEAK!!!
 2013-02-05 20:13:34,747 ERROR [ConcurrentLRUCache] ConcurrentLRUCache was
 not destroyed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE
 LEAK!!!
 2013-02-05 20:13:34,747 ERROR [CoreContainer] CoreContainer was not
 shutdown prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!!
 instance=2080324477

 Regards,
 Marcos



Re: memory leak - multiple cores

2013-02-12 Thread Michael Della Bitta
I should also say that there can easily be memory leaked from permgen
space when reloading webapps in Tomcat regardless of what resources
the app creates because class references from the context classloader
to the parent classloader can't be collected appropriately, so
restarting Tomcat periodically when you reload webapps is a good
practice either way.

Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Tue, Feb 12, 2013 at 9:03 AM, Michael Della Bitta
michael.della.bi...@appinions.com wrote:
 Marcos,

 You could consider using the CoreAdminHandler instead:

 http://wiki.apache.org/solr/CoreAdmin#CoreAdminHandler

 It works extremely well.

 Otherwise, you should periodically restart Tomcat. I'm not sure how
 much memory would be leaked, but it's likely not going to have much of
 an impact for a few iterations.


 Michael Della Bitta

 
 Appinions
 18 East 41st Street, 2nd Floor
 New York, NY 10017-6271

 www.appinions.com

 Where Influence Isn’t a Game


 On Mon, Feb 11, 2013 at 8:45 PM, Marcos Mendez mar...@jitisoft.com wrote:
 Hi Michael,

 Yes, we do intend to reload Solr when deploying new cores. So we deploy it, 
 update solr.xml and then restart Solr only. So this will happen sometimes in 
 production, but mostly testing. Which means it will be a real pain. Any way 
 to fix this?

 Also, I'm running geronimo with -Xmx1024m -XX:MaxPermSize=256m.

 Regards,
 Marcos

 On Feb 6, 2013, at 10:54 AM, Michael Della Bitta wrote:

 Marcos,

 The later 3 errors are common and won't pose a problem unless you
 intend to reload the Solr application without restarting Geronimo
 often.

 The first error, however, shouldn't happen. Have you changed the size
 of PermGen at all? I noticed this error while testing Solr 4.0 in
 Tomcat, but haven't seen it with Solr 4.1 (yet), so if you're on 4.0,
 you might want to try upgrading.


 Michael Della Bitta

 
 Appinions
 18 East 41st Street, 2nd Floor
 New York, NY 10017-6271

 www.appinions.com

 Where Influence Isn’t a Game


 On Wed, Feb 6, 2013 at 6:09 AM, Marcos Mendez mar...@jitisoft.com wrote:
 Hi,

 I'm deploying the SOLR war in Geronimo, with multiple cores. I'm seeing the
 following issue and it eats up a lot of memory when shutting down. Has
 anyone seen this and have an idea how to solve it?

 Exception in thread DefaultThreadPool 196 java.lang.OutOfMemoryError:
 PermGen space
 2013-02-05 20:13:34,747 ERROR [ConcurrentLRUCache] ConcurrentLRUCache was
 not destroyed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE
 LEAK!!!
 2013-02-05 20:13:34,747 ERROR [ConcurrentLRUCache] ConcurrentLRUCache was
 not destroyed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE
 LEAK!!!
 2013-02-05 20:13:34,747 ERROR [CoreContainer] CoreContainer was not
 shutdown prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!!
 instance=2080324477

 Regards,
 Marcos



Re: memory leak - multiple cores

2013-02-12 Thread Marcos Mendez
Many thanks! I will try to use the CoreAdminHandler and see if that solves the 
issue!

On Feb 12, 2013, at 9:05 AM, Michael Della Bitta wrote:

 I should also say that there can easily be memory leaked from permgen
 space when reloading webapps in Tomcat regardless of what resources
 the app creates because class references from the context classloader
 to the parent classloader can't be collected appropriately, so
 restarting Tomcat periodically when you reload webapps is a good
 practice either way.
 
 Michael Della Bitta
 
 
 Appinions
 18 East 41st Street, 2nd Floor
 New York, NY 10017-6271
 
 www.appinions.com
 
 Where Influence Isn’t a Game
 
 
 On Tue, Feb 12, 2013 at 9:03 AM, Michael Della Bitta
 michael.della.bi...@appinions.com wrote:
 Marcos,
 
 You could consider using the CoreAdminHandler instead:
 
 http://wiki.apache.org/solr/CoreAdmin#CoreAdminHandler
 
 It works extremely well.
 
 Otherwise, you should periodically restart Tomcat. I'm not sure how
 much memory would be leaked, but it's likely not going to have much of
 an impact for a few iterations.
 
 
 Michael Della Bitta
 
 
 Appinions
 18 East 41st Street, 2nd Floor
 New York, NY 10017-6271
 
 www.appinions.com
 
 Where Influence Isn’t a Game
 
 
 On Mon, Feb 11, 2013 at 8:45 PM, Marcos Mendez mar...@jitisoft.com wrote:
 Hi Michael,
 
 Yes, we do intend to reload Solr when deploying new cores. So we deploy it, 
 update solr.xml and then restart Solr only. So this will happen sometimes 
 in production, but mostly testing. Which means it will be a real pain. Any 
 way to fix this?
 
 Also, I'm running geronimo with -Xmx1024m -XX:MaxPermSize=256m.
 
 Regards,
 Marcos
 
 On Feb 6, 2013, at 10:54 AM, Michael Della Bitta wrote:
 
 Marcos,
 
 The later 3 errors are common and won't pose a problem unless you
 intend to reload the Solr application without restarting Geronimo
 often.
 
 The first error, however, shouldn't happen. Have you changed the size
 of PermGen at all? I noticed this error while testing Solr 4.0 in
 Tomcat, but haven't seen it with Solr 4.1 (yet), so if you're on 4.0,
 you might want to try upgrading.
 
 
 Michael Della Bitta
 
 
 Appinions
 18 East 41st Street, 2nd Floor
 New York, NY 10017-6271
 
 www.appinions.com
 
 Where Influence Isn’t a Game
 
 
 On Wed, Feb 6, 2013 at 6:09 AM, Marcos Mendez mar...@jitisoft.com wrote:
 Hi,
 
 I'm deploying the SOLR war in Geronimo, with multiple cores. I'm seeing 
 the
 following issue and it eats up a lot of memory when shutting down. Has
 anyone seen this and have an idea how to solve it?
 
 Exception in thread DefaultThreadPool 196 java.lang.OutOfMemoryError:
 PermGen space
 2013-02-05 20:13:34,747 ERROR [ConcurrentLRUCache] ConcurrentLRUCache was
 not destroyed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE
 LEAK!!!
 2013-02-05 20:13:34,747 ERROR [ConcurrentLRUCache] ConcurrentLRUCache was
 not destroyed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE
 LEAK!!!
 2013-02-05 20:13:34,747 ERROR [CoreContainer] CoreContainer was not
 shutdown prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!!
 instance=2080324477
 
 Regards,
 Marcos
 



Re: memory leak - multiple cores

2013-02-11 Thread Marcos Mendez
Hi Michael,

Yes, we do intend to reload Solr when deploying new cores. So we deploy it, 
update solr.xml and then restart Solr only. So this will happen sometimes in 
production, but mostly testing. Which means it will be a real pain. Any way to 
fix this?

Also, I'm running geronimo with -Xmx1024m -XX:MaxPermSize=256m. 

Regards,
Marcos

On Feb 6, 2013, at 10:54 AM, Michael Della Bitta wrote:

 Marcos,
 
 The later 3 errors are common and won't pose a problem unless you
 intend to reload the Solr application without restarting Geronimo
 often.
 
 The first error, however, shouldn't happen. Have you changed the size
 of PermGen at all? I noticed this error while testing Solr 4.0 in
 Tomcat, but haven't seen it with Solr 4.1 (yet), so if you're on 4.0,
 you might want to try upgrading.
 
 
 Michael Della Bitta
 
 
 Appinions
 18 East 41st Street, 2nd Floor
 New York, NY 10017-6271
 
 www.appinions.com
 
 Where Influence Isn’t a Game
 
 
 On Wed, Feb 6, 2013 at 6:09 AM, Marcos Mendez mar...@jitisoft.com wrote:
 Hi,
 
 I'm deploying the SOLR war in Geronimo, with multiple cores. I'm seeing the
 following issue and it eats up a lot of memory when shutting down. Has
 anyone seen this and have an idea how to solve it?
 
 Exception in thread DefaultThreadPool 196 java.lang.OutOfMemoryError:
 PermGen space
 2013-02-05 20:13:34,747 ERROR [ConcurrentLRUCache] ConcurrentLRUCache was
 not destroyed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE
 LEAK!!!
 2013-02-05 20:13:34,747 ERROR [ConcurrentLRUCache] ConcurrentLRUCache was
 not destroyed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE
 LEAK!!!
 2013-02-05 20:13:34,747 ERROR [CoreContainer] CoreContainer was not
 shutdown prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!!
 instance=2080324477
 
 Regards,
 Marcos



  1   2   >