Re: load balancer for solr

2016-11-06 Thread Rallavagu

Hi Shawn,

Curious, I suppose you have haproxy for Solr in Master/Slave 
configuration? Thanks.


On 11/6/16 9:33 AM, Shawn Heisey wrote:

On 11/6/2016 4:08 AM, Mugeesh Husain wrote:

Please sugguest load balancer name ?


I use haproxy.  It is a software load balancer with pretty impressive
performance characteristics.

My haproxy setup for Solr has been running without problems for years
now.  I'm using pacemaker to provide redundancy for haproxy with two
servers.

http://www.haproxy.org/

Thanks,
Shawn



Re: indexing - offline

2016-10-20 Thread Rallavagu

Thanks Evan for quick response.

On 10/20/16 10:19 AM, Tom Evans wrote:

On Thu, Oct 20, 2016 at 5:38 PM, Rallavagu <rallav...@gmail.com> wrote:

Solr 5.4.1 cloud with embedded jetty

Looking for some ideas around offline indexing where an independent node
will be indexed offline (not in the cloud) and added to the cloud to become
leader so other cloud nodes will get replicated. Wonder if this is possible
without interrupting the live service. Thanks.


How we do this, to reindex collection "foo":

1) First, collection "foo" should be an alias to the real collection,
eg "foo_1" aliased to "foo"
2) Have a node "node_i" in the cluster that is used for indexing. It
doesn't hold any shards of any collections
So, a node is part of the cluster but no collections? How can we add a 
node to cloud without active participation?



3) Use collections API to create collection "foo_2", with however many
shards required, but all placed on "node_i"
4) Index "foo_2" with new data with DIH or direct indexing to "node_1".
5) Use collections API to expand "foo_2" to all the nodes/replicas
that it should be on
Could you please point me to documentation on how to do this? I am 
referring to this doc 
https://cwiki.apache.org/confluence/display/solr/Collections+API. But, 
it has many options and honestly not sure which one would be useful in 
this case.


Thanks


6) Remove "foo_2" from "node_i"
7) Verify contents of "foo_2" are correct
8) Use collections API to change alias for "foo" to "foo_2"
9) Remove "foo_1" collection once happy

This avoids indexing overwhelming the performance of the cluster (or
any nodes in the cluster that receive queries), and can be performed
with zero downtime or config changes on the clients.

Cheers

Tom



indexing - offline

2016-10-20 Thread Rallavagu

Solr 5.4.1 cloud with embedded jetty

Looking for some ideas around offline indexing where an independent node 
will be indexed offline (not in the cloud) and added to the cloud to 
become leader so other cloud nodes will get replicated. Wonder if this 
is possible without interrupting the live service. Thanks.


Queries to help warm up (mmap)

2016-10-06 Thread Rallavagu
Looking for clues/recommendations to help warm up during startup. Not 
necessarily Solr caches but mmap as well. I have used some like 
"q=:[* TO *]" for various fields and it seems to help with 
mmap population around 40-50%. Is there anything else that could help 
achieve 90% or more? Thanks.


Re: QuerySenderListener

2016-10-05 Thread Rallavagu

Not sure if this is related.

https://issues.apache.org/jira/browse/SOLR-7035

firstSearcher has few queries that run longer (~3 min)

On 10/5/16 6:58 PM, Erick Erickson wrote:

How many cores? Is it possible you're seeing these from two different cores?

Erick

On Wed, Oct 5, 2016 at 11:44 AM, Rallavagu <rallav...@gmail.com> wrote:

Solr Cloud 5.4.1 with embedded jetty, jdk8

At the time of startup it appears that "QuerySenderListener" is run twice
and this is causing "firstSearcher" and "newSearcher" to run twice as well.
Any clues as to why QuerySenderListener is triggered twice? Thanks.


Re: QuerySenderListener

2016-10-05 Thread Rallavagu

It is a single core.

On 10/5/16 6:58 PM, Erick Erickson wrote:

How many cores? Is it possible you're seeing these from two different cores?

Erick

On Wed, Oct 5, 2016 at 11:44 AM, Rallavagu <rallav...@gmail.com> wrote:

Solr Cloud 5.4.1 with embedded jetty, jdk8

At the time of startup it appears that "QuerySenderListener" is run twice
and this is causing "firstSearcher" and "newSearcher" to run twice as well.
Any clues as to why QuerySenderListener is triggered twice? Thanks.


QuerySenderListener

2016-10-05 Thread Rallavagu

Solr Cloud 5.4.1 with embedded jetty, jdk8

At the time of startup it appears that "QuerySenderListener" is run 
twice and this is causing "firstSearcher" and "newSearcher" to run twice 
as well. Any clues as to why QuerySenderListener is triggered twice? Thanks.


disable updates during startup

2016-10-04 Thread Rallavagu

Solr Cloud 5.4.1 with embedded Jetty - jdk 8

Is there a way to disable incoming updates (from leader) during startup 
until "firstSearcher" queries finished? I am noticing that firstSearcher 
queries keep on running at the time of startup and node shows up as 
"Recovering".


Thanks


Re: slow updates/searches

2016-09-30 Thread Rallavagu

Hi Erick,

Yes. Apparently, there is work to do with phrase queries. As I continue 
to debug, noticed that a multi word phrase query is CPU bound as it 
certainly works "hard". Are there any optimizations to consider?


On 9/29/16 8:14 AM, Erick Erickson wrote:

bq: The QTimes increase as the number of words in a phrase increase

Well, there's more work to do as the # of words increases, and if you
have large slops there's more work yet.

Best,
Erick

On Wed, Sep 28, 2016 at 5:54 PM, Rallavagu <rallav...@gmail.com> wrote:

Thanks Erick.

I have added queries for "firstSearcher" and "newSearcher". After startup,
pmap shows well populated mmap entries and have better QTimes than before.

However, phrase queries (edismax with pf2) are still sluggish. The QTimes
increase as the number of words in a phrase increase. None of the mmap
"warming" seem to have any impact on this. Am I missing anything? Thanks.

On 9/24/16 5:20 PM, Erick Erickson wrote:


Hmm..

About <1>: Yep, GC is one of the "more art than science" bits of
Java/Solr. Siiih.

About <2>: that's what autowarming is about. Particularly the
filterCache and queryResultCache. My guess is that you have the
autowarm count on those two caches set to zero. Try setting it to some
modest number like 16 or 32. The whole _point_ of those parameters is
to smooth out these kinds of spikes. Additionally, the newSearcher
event (also in solrconfig.xml) is explicitly intended to allow you to
hard-code queries that fill the internal caches as well as the mmap OS
memory from disk, people include facets, sorts and the like in that
event. It's fired every time a new searcher is opened (i.e. whenever
you commit and open a new searcher)...

FirstSearcher is for restarts. The difference is that newSearcher
presumes Solr has been running for a while and the autowarm counts
have something to work from. OTOH, when you start Solr there's no
history to autowarm so firstSeracher can be quite a bit more complex
than newSearcher. Practically, most people just copy newSearcher into
firstSearcher on the assumption that restarting Solr is pretty
rare.

about <3> MMap stuff will be controlled by the OS I think. I actually
worked with a much more primitive system at one point that would be
dog-slow during off-hours. Someone wrote an equivalent of a cron job
to tickle the app upon occasion to prevent periodic slowness.

for a nauseating set of details about hard and soft commits, see:

https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Best,
Erick


On Sat, Sep 24, 2016 at 11:35 AM, Rallavagu <rallav...@gmail.com> wrote:




On 9/22/16 5:59 AM, Shawn Heisey wrote:



On 9/22/2016 5:46 AM, Muhammad Zahid Iqbal wrote:



Did you find any solution to slow searches? As far as I know jetty
container default configuration is bit slow for large production
environment.




This might be true for the default configuration that comes with a
completely stock jetty downloaded from eclipse.org, but the jetty
configuration that *Solr* ships with is adequate for just about any Solr
installation.  The Solr configuration may require adjustment as the
query load increases, but the jetty configuration usually doesn't.

Thanks,
Shawn



It turned out to be a "sequence of performance testing sessions" in order
to
locate slowness. Though I am not completely done with it, here are my
finding so far. We are using NRT configuration (warm up count to 0 for
caches and NRTCachingDirectoryFactory for index directory)

1. Essentially, solr searches (particularly with edismax and relevance)
generate lot of "garbage" that makes GC activity to kick in more often.
This
becomes even more when facets are included. This has huge impact on
QTimes
(I have 12g heap and configured 6g to NewSize).

2. After a fresh restart (or core reload) when searches are performed,
Solr
would initially "populate" mmap entries and this is adding to total
QTimes
(I have made sure that index files are cached at filesystem layer using
vmtouch - https://hoytech.com/vmtouch). When run the same test again with
mmap entries populated from previous tests, it shows improved QTimes
relative to previous test.

3. Seems the populated mmap entries are flushed away after certain idle
time
(not sure if it is controlled by Solr or underlying OS). This will make
subsequent searches to fetch from "disk" (even though the disk items are
cached by OS).

So, what I am gonna try next is to tune the field(s) for facets to reduce
the index size if possible. Though I am not sure, if it will have impact
but
would attempt to change the "caches" even though they will be invalidated
after a softCommit (every 10 minutes in my case).

Any other tips/clues/suggestions are welcome. Thanks.





Re: slow updates/searches

2016-09-28 Thread Rallavagu

Thanks Erick.

I have added queries for "firstSearcher" and "newSearcher". After 
startup, pmap shows well populated mmap entries and have better QTimes 
than before.


However, phrase queries (edismax with pf2) are still sluggish. The 
QTimes increase as the number of words in a phrase increase. None of the 
mmap "warming" seem to have any impact on this. Am I missing anything? 
Thanks.


On 9/24/16 5:20 PM, Erick Erickson wrote:

Hmm..

About <1>: Yep, GC is one of the "more art than science" bits of
Java/Solr. Siiih.

About <2>: that's what autowarming is about. Particularly the
filterCache and queryResultCache. My guess is that you have the
autowarm count on those two caches set to zero. Try setting it to some
modest number like 16 or 32. The whole _point_ of those parameters is
to smooth out these kinds of spikes. Additionally, the newSearcher
event (also in solrconfig.xml) is explicitly intended to allow you to
hard-code queries that fill the internal caches as well as the mmap OS
memory from disk, people include facets, sorts and the like in that
event. It's fired every time a new searcher is opened (i.e. whenever
you commit and open a new searcher)...

FirstSearcher is for restarts. The difference is that newSearcher
presumes Solr has been running for a while and the autowarm counts
have something to work from. OTOH, when you start Solr there's no
history to autowarm so firstSeracher can be quite a bit more complex
than newSearcher. Practically, most people just copy newSearcher into
firstSearcher on the assumption that restarting Solr is pretty
rare.

about <3> MMap stuff will be controlled by the OS I think. I actually
worked with a much more primitive system at one point that would be
dog-slow during off-hours. Someone wrote an equivalent of a cron job
to tickle the app upon occasion to prevent periodic slowness.

for a nauseating set of details about hard and soft commits, see:
https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Best,
Erick


On Sat, Sep 24, 2016 at 11:35 AM, Rallavagu <rallav...@gmail.com> wrote:



On 9/22/16 5:59 AM, Shawn Heisey wrote:


On 9/22/2016 5:46 AM, Muhammad Zahid Iqbal wrote:


Did you find any solution to slow searches? As far as I know jetty
container default configuration is bit slow for large production
environment.



This might be true for the default configuration that comes with a
completely stock jetty downloaded from eclipse.org, but the jetty
configuration that *Solr* ships with is adequate for just about any Solr
installation.  The Solr configuration may require adjustment as the
query load increases, but the jetty configuration usually doesn't.

Thanks,
Shawn



It turned out to be a "sequence of performance testing sessions" in order to
locate slowness. Though I am not completely done with it, here are my
finding so far. We are using NRT configuration (warm up count to 0 for
caches and NRTCachingDirectoryFactory for index directory)

1. Essentially, solr searches (particularly with edismax and relevance)
generate lot of "garbage" that makes GC activity to kick in more often. This
becomes even more when facets are included. This has huge impact on QTimes
(I have 12g heap and configured 6g to NewSize).

2. After a fresh restart (or core reload) when searches are performed, Solr
would initially "populate" mmap entries and this is adding to total QTimes
(I have made sure that index files are cached at filesystem layer using
vmtouch - https://hoytech.com/vmtouch). When run the same test again with
mmap entries populated from previous tests, it shows improved QTimes
relative to previous test.

3. Seems the populated mmap entries are flushed away after certain idle time
(not sure if it is controlled by Solr or underlying OS). This will make
subsequent searches to fetch from "disk" (even though the disk items are
cached by OS).

So, what I am gonna try next is to tune the field(s) for facets to reduce
the index size if possible. Though I am not sure, if it will have impact but
would attempt to change the "caches" even though they will be invalidated
after a softCommit (every 10 minutes in my case).

Any other tips/clues/suggestions are welcome. Thanks.



Re: slow updates/searches

2016-09-24 Thread Rallavagu



On 9/22/16 5:59 AM, Shawn Heisey wrote:

On 9/22/2016 5:46 AM, Muhammad Zahid Iqbal wrote:

Did you find any solution to slow searches? As far as I know jetty
container default configuration is bit slow for large production
environment.


This might be true for the default configuration that comes with a
completely stock jetty downloaded from eclipse.org, but the jetty
configuration that *Solr* ships with is adequate for just about any Solr
installation.  The Solr configuration may require adjustment as the
query load increases, but the jetty configuration usually doesn't.

Thanks,
Shawn



It turned out to be a "sequence of performance testing sessions" in 
order to locate slowness. Though I am not completely done with it, here 
are my finding so far. We are using NRT configuration (warm up count to 
0 for caches and NRTCachingDirectoryFactory for index directory)


1. Essentially, solr searches (particularly with edismax and relevance) 
generate lot of "garbage" that makes GC activity to kick in more often. 
This becomes even more when facets are included. This has huge impact on 
QTimes (I have 12g heap and configured 6g to NewSize).


2. After a fresh restart (or core reload) when searches are performed, 
Solr would initially "populate" mmap entries and this is adding to total 
QTimes (I have made sure that index files are cached at filesystem layer 
using vmtouch - https://hoytech.com/vmtouch). When run the same test 
again with mmap entries populated from previous tests, it shows improved 
QTimes relative to previous test.


3. Seems the populated mmap entries are flushed away after certain idle 
time (not sure if it is controlled by Solr or underlying OS). This will 
make subsequent searches to fetch from "disk" (even though the disk 
items are cached by OS).


So, what I am gonna try next is to tune the field(s) for facets to 
reduce the index size if possible. Though I am not sure, if it will have 
impact but would attempt to change the "caches" even though they will be 
invalidated after a softCommit (every 10 minutes in my case).


Any other tips/clues/suggestions are welcome. Thanks.



Re: slow updates/searches

2016-09-19 Thread Rallavagu

Hi Erick,

Would increasing (or adjusting) update threads help as per this JIRA 
((Allow the number of threads ConcurrentUpdateSolrClient 
StreamingSolrClients configurable by a system property) here?


https://issues.apache.org/jira/browse/SOLR-8500

Thanks


On 9/19/16 8:30 AM, Erick Erickson wrote:

Hmmm, not sure, and also not sure what to suggest next. QTimes
measure only the search time, not, say, time waiting for the request to get
serviced.

I'm afraid the next suggestion is to throw a profiler at it 'cause nothing jumps
out at me..'

Best,
Erick

On Fri, Sep 16, 2016 at 10:23 AM, Rallavagu <rallav...@gmail.com> wrote:

Comments in line...

On 9/16/16 10:15 AM, Erick Erickson wrote:


Well, the next thing I'd look at is CPU activity. If you're flooding the
system
with updates there'll be CPU contention.



Monitoring does not suggest any high CPU but as you can see from vmstat
output "user" cpu is a bit high during updates that are taking time (34
user, 65 idle).



And there are a number of things you can do that make updates in
particular
much less efficient, from committing very frequently (sometimes combined
with excessive autowarm parameters) and the like.



softCommit is set to 10 minutes, autowarm count is set to 0 and commit is
set to 15 sec for NRT.



There are a series of ideas that might trigger an "aha" moment:
https://wiki.apache.org/solr/SolrPerformanceFactors



Reviewed this document and made few changes accordingly a while ago.



But the crude measure is just to look at CPU usage when updates happen, or
just before. Are you running hot with queries alone then add an update
burden?



Essentially, it is high QTimes for queries got me looking into logs, system
etc and I could correlate updates slowness and searching slowness. Some
other time QTimes go high is right after softCommit which is expected.

Wondering what causes update threads wait and if it has any impact on search
at all. I had couple of more CPUs added but I still see similar behavior.

Thanks.




Best,
Erick

On Fri, Sep 16, 2016 at 9:19 AM, Rallavagu <rallav...@gmail.com> wrote:


Erick,

Was monitoring GC activity and couldn't align GC pauses to this behavior.
Also, the vmstat shows no swapping or cpu I/O wait. However, whenever I
see
high update response times (corresponding high QTimes for searches)
vmstat
shows as series of number of "waiting to runnable" processes in "r"
column
of "procs" section.


https://dl.dropboxusercontent.com/u/39813705/Screen%20Shot%202016-09-16%20at%209.05.51%20AM.png

procs ---memory-- ---swap--
-io -system-- cpu -timestamp-
 r  b swpd freeinact   active   si   so bi
bo
in   cs  us  sy  id  wa  st CDT
 2  071068 18688496  2526604 2420444000 0
0
1433  462  27   1  73   0   0 2016-09-16 11:02:32
 1  071068 18688180  2526600 2420456800 0
0
1388  404  26   1  74   0   0 2016-09-16 11:02:33
 1  071068 18687928  2526600 2420456800 0
0
1354  401  25   0  75   0   0 2016-09-16 11:02:34
 1  071068 18687800  2526600 2420457200 0
0
1311  397  25   0  74   0   0 2016-09-16 11:02:35
 1  071068 18687164  2527116 2420484400 0
0
1770  702  31   1  69   0   0 2016-09-16 11:02:36
 1  071068 18686944  2527108 2420490800 0
52
1266  421  26   0  74   0   0 2016-09-16 11:02:37
12  171068 18682676  2528560 2420711600 0
280
2388  934  34   1  65   0   0 2016-09-16 11:02:38
 2  171068 18651340  2530820 2423336800 0
1052
10258 5696  82   5  13   0   0 2016-09-16 11:02:39
 5  071068 18648600  2530112 2423506000 0
1988
7261 3644  84   2  13   1   0 2016-09-16 11:02:40
 9  171068 18647804  2530580 2423607600 0
1688
7031 3575  84   2  13   1   0 2016-09-16 11:02:41
 1  071068 18647628  2530364 2423625600 0
680
7065 4463  61   3  35   1   0 2016-09-16 11:02:42
 1  071068 18646344  2531204 2423653600 0
44
6422 4922  35   3  63   0   0 2016-09-16 11:02:43
 2  071068 18644460  2532196 2423744000 0
0
6561 5056  25   3  72   0   0 2016-09-16 11:02:44
 0  071068 18661900  2531724 2421876400 0
0
7312 10050  11   3  86   0   0 2016-09-16 11:02:45
 2  071068 18649400  2532228 2422980000 0
0
7211 6222  34   3  63   0   0 2016-09-16 11:02:46
 0  071068 18648280  2533440 2423030000 0
108
3936 3381  20   1  79   0   0 2016-09-16 11:02:47
 0  071068 18648156  2533212 2423068400 0
12
1279 1681   2   0  97   0   0 2016-09-16 11:02:48


Captu

Re: slow updates/searches

2016-09-16 Thread Rallavagu

Comments in line...

On 9/16/16 10:15 AM, Erick Erickson wrote:

Well, the next thing I'd look at is CPU activity. If you're flooding the system
with updates there'll be CPU contention.


Monitoring does not suggest any high CPU but as you can see from vmstat 
output "user" cpu is a bit high during updates that are taking time (34 
user, 65 idle).




And there are a number of things you can do that make updates in particular
much less efficient, from committing very frequently (sometimes combined
with excessive autowarm parameters) and the like.


softCommit is set to 10 minutes, autowarm count is set to 0 and commit 
is set to 15 sec for NRT.




There are a series of ideas that might trigger an "aha" moment:
https://wiki.apache.org/solr/SolrPerformanceFactors


Reviewed this document and made few changes accordingly a while ago.


But the crude measure is just to look at CPU usage when updates happen, or
just before. Are you running hot with queries alone then add an update burden?


Essentially, it is high QTimes for queries got me looking into logs, 
system etc and I could correlate updates slowness and searching 
slowness. Some other time QTimes go high is right after softCommit which 
is expected.


Wondering what causes update threads wait and if it has any impact on 
search at all. I had couple of more CPUs added but I still see similar 
behavior.


Thanks.



Best,
Erick

On Fri, Sep 16, 2016 at 9:19 AM, Rallavagu <rallav...@gmail.com> wrote:

Erick,

Was monitoring GC activity and couldn't align GC pauses to this behavior.
Also, the vmstat shows no swapping or cpu I/O wait. However, whenever I see
high update response times (corresponding high QTimes for searches) vmstat
shows as series of number of "waiting to runnable" processes in "r" column
of "procs" section.

https://dl.dropboxusercontent.com/u/39813705/Screen%20Shot%202016-09-16%20at%209.05.51%20AM.png

procs ---memory-- ---swap--
-io -system-- cpu -timestamp-
 r  b swpd freeinact   active   si   so bibo
in   cs  us  sy  id  wa  st CDT
 2  071068 18688496  2526604 2420444000 0 0
1433  462  27   1  73   0   0 2016-09-16 11:02:32
 1  071068 18688180  2526600 2420456800 0 0
1388  404  26   1  74   0   0 2016-09-16 11:02:33
 1  071068 18687928  2526600 2420456800 0 0
1354  401  25   0  75   0   0 2016-09-16 11:02:34
 1  071068 18687800  2526600 2420457200 0 0
1311  397  25   0  74   0   0 2016-09-16 11:02:35
 1  071068 18687164  2527116 2420484400 0 0
1770  702  31   1  69   0   0 2016-09-16 11:02:36
 1  071068 18686944  2527108 2420490800 052
1266  421  26   0  74   0   0 2016-09-16 11:02:37
12  171068 18682676  2528560 2420711600 0   280
2388  934  34   1  65   0   0 2016-09-16 11:02:38
 2  171068 18651340  2530820 2423336800 0  1052
10258 5696  82   5  13   0   0 2016-09-16 11:02:39
 5  071068 18648600  2530112 2423506000 0  1988
7261 3644  84   2  13   1   0 2016-09-16 11:02:40
 9  171068 18647804  2530580 2423607600 0  1688
7031 3575  84   2  13   1   0 2016-09-16 11:02:41
 1  071068 18647628  2530364 2423625600 0   680
7065 4463  61   3  35   1   0 2016-09-16 11:02:42
 1  071068 18646344  2531204 2423653600 044
6422 4922  35   3  63   0   0 2016-09-16 11:02:43
 2  071068 18644460  2532196 2423744000 0 0
6561 5056  25   3  72   0   0 2016-09-16 11:02:44
 0  071068 18661900  2531724 2421876400 0 0
7312 10050  11   3  86   0   0 2016-09-16 11:02:45
 2  071068 18649400  2532228 2422980000 0 0
7211 6222  34   3  63   0   0 2016-09-16 11:02:46
 0  071068 18648280  2533440 2423030000 0   108
3936 3381  20   1  79   0   0 2016-09-16 11:02:47
 0  071068 18648156  2533212 2423068400 012
1279 1681   2   0  97   0   0 2016-09-16 11:02:48


Captured stack trace including timing for one of the update threads.


org.eclipse.jetty.server.handler.ContextHandler:doHandle (method time = 15
ms, total time = 30782 ms)
 Filter - SolrDispatchFilter:doFilter:181 (method time = 0 ms, total time =
30767 ms)
  Filter - SolrDispatchFilter:doFilter:223 (method time = 0 ms, total time =
30767 ms)
   org.apache.solr.servlet.HttpSolrCall:call:457 (method time = 0 ms, total
time = 30767 ms)
org.apache.solr.servlet.HttpSolrCall:execute:658 (method time = 0 ms,
total time = 30767 ms)
 org.apache.solr.core.SolrCore:execute:2073 (method time = 0 ms, total
time = 30767 ms)
  

Re: slow updates/searches

2016-09-16 Thread Rallavagu
rocessAdd:69 
(method time = 0 ms, total time = 23426 ms)
 org.apache.solr.update.DirectUpdateHandler2:addDoc:169 
(method time = 0 ms, total time = 23426 ms)


org.apache.solr.update.DirectUpdateHandler2:addDoc0:207 (method time = 0 
ms, total time = 23426 ms)


org.apache.solr.update.DirectUpdateHandler2:doNormalUpdate:275 (method 
time = 0 ms, total time = 23426 ms)


org.apache.lucene.index.IndexWriter:updateDocument:1477 (method time = 0 
ms, total time = 8551 ms)


org.apache.lucene.index.DocumentsWriter:updateDocument:450 (method time 
= 0 ms, total time = 8551 ms)


org.apache.lucene.index.DocumentsWriterPerThread:updateDocument:234 
(method time = 0 ms, total time = 8551 ms)


org.apache.lucene.index.DefaultIndexingChain:processDocument:300 (method 
time = 0 ms, total time = 8551 ms)


org.apache.lucene.index.DefaultIndexingChain:processField:344 (method 
time = 0 ms, total time = 8551 ms)


org.apache.lucene.index.DefaultIndexingChain$PerField:invert:613 (method 
time = 0 ms, total time = 4098 ms)


org.apache.lucene.analysis.util.FilteringTokenFilter:incrementToken:51 
(method time = 0 ms, total time = 4098 ms)


org.apache.lucene.analysis.synonym.SynonymFilter:incrementToken:627 
(method time = 0 ms, total time = 4098 ms)


org.apache.lucene.analysis.synonym.SynonymFilter:parse:396 (method time 
= 0 ms, total time = 4098 ms)


org.apache.lucene.util.fst.FST:findTargetArc:1186 (method time = 0 ms, 
total time = 4098 ms)


org.apache.lucene.util.fst.FST:findTargetArc:1270 (method time = 0 ms, 
total time = 4098 ms)


org.apache.lucene.util.fst.FST:readFirstRealTargetArc:992 (method time = 
0 ms, total time = 4098 ms)


org.apache.lucene.util.fst.FST:readNextRealArc:1085 (method time = 0 ms, 
total time = 4098 ms)


org.apache.lucene.util.fst.FST:readLabel:636 (method time = 0 ms, total 
time = 4098 ms)


org.apache.lucene.store.DataInput:readVInt:125 (method time = 4098 ms, 
total time = 4098 ms)


org.apache.lucene.index.DefaultIndexingChain:getOrAddField:484 (method 
time = 0 ms, total time = 4453 ms)


org.apache.lucene.index.FieldInfos$Builder:getOrAdd:317 (method time = 0 
ms, total time = 4453 ms)


org.apache.lucene.index.FieldInfos$FieldNumbers:addOrGet:218 (method 
time = 4453 ms, total time = 4453 ms)
org.apache.solr.update.UpdateLog:add:412 (method 
time = 0 ms, total time = 14875 ms)
 org.apache.solr.update.UpdateLog:add:421 (method 
time = 14875 ms, total time = 14875 ms)
 org.apache.solr.update.SolrCmdDistributor:distribAdd:207 
(method time = 0 ms, total time = 260 ms)
  org.apache.solr.update.SolrCmdDistributor:submit:289 
(method time = 0 ms, total time = 260 ms)
   org.apache.solr.update.SolrCmdDistributor:doRequest:296 
(method time = 0 ms, total time = 260 ms)
org.apache.solr.client.solrj.SolrClient:request:1220 
(method time = 0 ms, total time = 260 ms)


org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient:request:382 
(method time = 0 ms, total time = 260 ms)


org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient:addRunner:324 
(method time = 0 ms, total time = 260 ms)


org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor:execute:215 
(method time = 0 ms, total time = 260 ms)


org.apache.solr.common.util.SolrjNamedThreadFactory:newThread:40 (method 
time = 260 ms, total time = 260 ms)


org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor:finish:183 
(method time = 0 ms, total time = 7030 ms)


org.apache.solr.update.processor.DistributedUpdateProcessor:finish:1626 
(method time = 0 ms, total time = 7030 ms)


org.apache.solr.update.processor.DistributedUpdateProcessor:doFinish:778 
(method time = 0 ms, total time = 7030 ms)
   org.apache.solr.update.SolrCmdDistributor:finish:90 (method 
time = 0 ms, total time = 7030 ms)


org.apache.solr.update.SolrCmdDistributor:blockAndDoRetries:232 (method 
time = 0 ms, total time = 7030 ms)


org.apache.solr.update.StreamingSolrClients:blockUntilFinished:107 
(method time = 0 ms, total time = 7030 ms)


org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient:blockUntilFinished:429 
(method time = 0 ms, total time = 7030 ms)
   java.lang.Object:wait (method time = 7030 ms, total time 
= 7030 ms)


https://dl.dropboxusercontent.com/u/39813705/Screen%20Shot%202016-09-16%20at%209.18.52%20AM.png

Appears there could be thread waiting but I am not sure how would this 
impact searching.


Thanks

On 9/16/16 8:42 AM, Erick Erickson wrote:

First thing I'd look is whether you're _also_ seeing stop-the-world GC pauses.
In that case there are a number of JVM options that can be tuned

Best,
Erick

On Fri, Sep 16, 2016 at 8:40 AM, Rallavagu <rallav...@gmail.com> wrote:

Solr 5.4.1 with embedded jetty single shard - NRT

Looking in logs, noticed that there are high QTimes for Queries and round
same time high response times for updates. These ar

slow updates/searches

2016-09-16 Thread Rallavagu

Solr 5.4.1 with embedded jetty single shard - NRT

Looking in logs, noticed that there are high QTimes for Queries and 
round same time high response times for updates. These are not during 
"commit" or "softCommit" but when client application is sending updates. 
Wondering how updates could impact query performance. What are the 
options for tuning? Thanks.


Re: How to enable JMX to monitor Jetty

2016-09-12 Thread Rallavagu
I have modified modules/http.mod as following (for solr 5.4.1, Jetty 9). 
As you can see I have referred jetty-jmx.xml.


#
# Jetty HTTP Connector
#

[depend]
server

[xml]
etc/jetty-http.xml
etc/jetty-jmx.xml



On 5/21/16 3:59 AM, Georg Sorst wrote:

Hi list,

how do I correctly enable JMX in Solr 6 so that I can monitor Jetty's
thread pool?

The first step is to set ENABLE_REMOTE_JMX_OPTS="true" in bin/solr.in.sh.
This will give me JMX access to JVM properties (garbage collection, class
loading etc.) and works fine. However, this will not give me any Jetty
specific properties.

I've tried manually adding jetty-jmx.xml from the jetty 9 distribution to
server/etc/ and then starting Solr with 'java ... start.jar
etc/jetty-jmx.xml'. This works fine and gives me access to the right
properties, but seems wrong. I could similarly copy the contents of
jetty-jmx.xml into jetty.xml but this is not much better either.

Is there a correct way for this?

Thanks!
Georg



Re: ConcurrentUpdateSolrClient threads

2016-09-12 Thread Rallavagu

Any takers?

On 9/9/16 9:03 AM, Rallavagu wrote:

All,

Running Solr 5.4.1 with embedded Jetty with frequent updates coming in
and softCommit is set to 10 min. What I am noticing is occasional "slow"
updates (takes 8 sec to 15 sec sometimes) and about the same time slow
QTimes. Upon investigating, it appears that
"ConcurrentUpdateSolrClient:blockUntilFinished:429" is waiting on thread
to be free. Looking at https://issues.apache.org/jira/browse/SOLR-8500
it appears that it presents with an option to increase the number of
threads that might help with managing more updates without having to
wait (though need to update Solr to 5.5). I could not figure out the
default number of threads for ConcurrentUpdateSolrClient class. Before I
can try increasing number of threads, wondering if there are any
"gotchas" increasing the number of threads and what is the reasonable
number of the threads if so?


org.apache.solr.update.SolrCmdDistributor:finish:90 (method time = 0 ms,
total time = 7489 ms)
 org.apache.solr.update.SolrCmdDistributor:blockAndDoRetries:232 (method
time = 0 ms, total time = 7489 ms)
  org.apache.solr.update.StreamingSolrClients:blockUntilFinished:107
(method time = 0 ms, total time = 7489 ms)

org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient:blockUntilFinished:429
(method time = 0 ms, total time = 7489 ms)
java.lang.Object:wait (method time = 7489 ms, total time = 7489 ms)


Thanks in advance


ConcurrentUpdateSolrClient threads

2016-09-09 Thread Rallavagu

All,

Running Solr 5.4.1 with embedded Jetty with frequent updates coming in 
and softCommit is set to 10 min. What I am noticing is occasional "slow" 
updates (takes 8 sec to 15 sec sometimes) and about the same time slow 
QTimes. Upon investigating, it appears that 
"ConcurrentUpdateSolrClient:blockUntilFinished:429" is waiting on thread 
to be free. Looking at https://issues.apache.org/jira/browse/SOLR-8500 
it appears that it presents with an option to increase the number of 
threads that might help with managing more updates without having to 
wait (though need to update Solr to 5.5). I could not figure out the 
default number of threads for ConcurrentUpdateSolrClient class. Before I 
can try increasing number of threads, wondering if there are any 
"gotchas" increasing the number of threads and what is the reasonable 
number of the threads if so?



org.apache.solr.update.SolrCmdDistributor:finish:90 (method time = 0 ms, 
total time = 7489 ms)
 org.apache.solr.update.SolrCmdDistributor:blockAndDoRetries:232 
(method time = 0 ms, total time = 7489 ms)
  org.apache.solr.update.StreamingSolrClients:blockUntilFinished:107 
(method time = 0 ms, total time = 7489 ms)


org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient:blockUntilFinished:429 
(method time = 0 ms, total time = 7489 ms)

java.lang.Object:wait (method time = 7489 ms, total time = 7489 ms)


Thanks in advance


Re: Default Field Cache

2016-09-01 Thread Rallavagu

Yes. Thanks.

On 9/1/16 4:53 AM, Alessandro Benedetti wrote:

Are you looking for this ?

org/apache/solr/core/SolrConfig.java:243

CacheConfig conf = CacheConfig.getConfig(this, "query/fieldValueCache");
if (conf == null) {
  Map<String, String> args = new HashMap<>();
  args.put(NAME, "fieldValueCache");
  args.put("size", "1");
  args.put("initialSize", "10");
  args.put("showItems", "-1");
  conf = new CacheConfig(FastLRUCache.class, args, null);
}
fieldValueCacheConfig = conf;


Cheers


On Thu, Sep 1, 2016 at 2:41 AM, Rallavagu <rallav...@gmail.com> wrote:


But, the configuration is commented out (disabled). As comments section
mentioned

"The fieldValueCache is created by default even if not configured here"

I would like to know what would be the configuration of default
fieldValueCache created.


On 8/31/16 6:37 PM, Zheng Lin Edwin Yeo wrote:


If I didn't get your question wrong, what you have listed is already the
default configuration that comes with your version of Solr.

Regards,
Edwin

On 30 August 2016 at 07:49, Rallavagu <rallav...@gmail.com> wrote:

Solr 5.4.1





Wondering what is the default configuration for "fieldValueCache".









Re: Default Field Cache

2016-08-31 Thread Rallavagu
But, the configuration is commented out (disabled). As comments section 
mentioned


"The fieldValueCache is created by default even if not configured here"

I would like to know what would be the configuration of default 
fieldValueCache created.


On 8/31/16 6:37 PM, Zheng Lin Edwin Yeo wrote:

If I didn't get your question wrong, what you have listed is already the
default configuration that comes with your version of Solr.

Regards,
Edwin

On 30 August 2016 at 07:49, Rallavagu <rallav...@gmail.com> wrote:


Solr 5.4.1




Wondering what is the default configuration for "fieldValueCache".





Default Field Cache

2016-08-29 Thread Rallavagu

Solr 5.4.1




Wondering what is the default configuration for "fieldValueCache".


Re: Solr embedded jetty jstack

2016-08-29 Thread Rallavagu

Responding to my own query.

I got this fixed. The solr startup was maintained by systemd script 
which was configured with "PrivateTmp=true". I have changed that to 
"PrivateTmp=false" and "/tmp/hsperfdata_/" is not removed 
after server startup then jstack worked.


On 8/29/16 11:31 AM, Rallavagu wrote:

I have run into a strange issue where "jstack -l " does not work. I
have tried this as the user that solr (5.4.1) is running as. I get
following error.

$ jstack -l 24064
24064: Unable to open socket file: target process not responding or
HotSpot VM not loaded
The -F option can be used when the target process is not responding

I am running Solr 5.4.1, JDK 8 with latest updates.

I have also downloaded Jetty separately, installed and started the
server. However, jstack on directly downloaded jetty (not solr bundled)
works just fine. After some research, I have found that
/tmp/hsperfdata_/ files is not created by the bundled Solr
while similar file is created by standalone jetty server. After some
more debugging, it appears that the solr startup process creates the
file (/tmp/hsperfdata_/) and then removes it. I have tried
with "-F" option but no use. I have also set "-XX:+UsePerfData"
explicitly to no avail. I have enabled JMX and connected via visualvm to
get thread dump as of now. But, for me jstack is more convenient to
trigger a series of thread dumps. Any ideas? Thanks.


Solr embedded jetty jstack

2016-08-29 Thread Rallavagu
I have run into a strange issue where "jstack -l " does not work. I 
have tried this as the user that solr (5.4.1) is running as. I get 
following error.


$ jstack -l 24064
24064: Unable to open socket file: target process not responding or 
HotSpot VM not loaded

The -F option can be used when the target process is not responding

I am running Solr 5.4.1, JDK 8 with latest updates.

I have also downloaded Jetty separately, installed and started the 
server. However, jstack on directly downloaded jetty (not solr bundled) 
works just fine. After some research, I have found that 
/tmp/hsperfdata_/ files is not created by the bundled Solr 
while similar file is created by standalone jetty server. After some 
more debugging, it appears that the solr startup process creates the 
file (/tmp/hsperfdata_/) and then removes it. I have tried 
with "-F" option but no use. I have also set "-XX:+UsePerfData" 
explicitly to no avail. I have enabled JMX and connected via visualvm to 
get thread dump as of now. But, for me jstack is more convenient to 
trigger a series of thread dumps. Any ideas? Thanks.


Re: solr.NRTCachingDirectoryFactory

2016-08-26 Thread Rallavagu

Thanks Michail.

I am unable to locate bottleneck so far. Will try jstack and other tools.

On 8/25/16 11:40 PM, Mikhail Khludnev wrote:

Rough sampling under load makes sense as usual. JMC is one of the suitable
tools for this.
Sometimes even just jstack  or looking at SolrAdmin/Threads is enough.
If the only small ratio of documents is updated and a bottleneck is
filterCache you can experiment with segmened filters which suite more for
NRT.
http://blog-archive.griddynamics.com/2014/01/segmented-filter-cache-in-solr.html


On Fri, Aug 26, 2016 at 2:56 AM, Rallavagu <rallav...@gmail.com> wrote:


Follow up update ...

Set autowarm count to zero for caches for NRT and I could negotiate
latency from 2 min to 5 min :)

However, still seeing high QTimes and wondering where else can I look?
Should I debug the code or run some tools to isolate bottlenecks (disk I/O,
CPU or Query itself). Looking for some tuning advice. Thanks.


On 7/26/16 9:42 AM, Erick Erickson wrote:


And, I might add, you should look through your old logs
and see how long it takes to open a searcher. Let's
say Shawn's lower bound is what you see, i.e.
it takes a minute each to execute all the autowarming
in filterCache and queryResultCache... So you're current
latency is _at least_ 2 minutes between the time something
is indexed and it's available for search just for autowarming.

Plus up to another 2 minutes for your soft commit interval
to expire.

So if your business people haven't noticed a 4 minute
latency yet, tell them they don't know what they're talking
about when they insist on the NRT interval being a few
seconds ;).

Best,
Erick

On Tue, Jul 26, 2016 at 7:20 AM, Rallavagu <rallav...@gmail.com> wrote:




On 7/26/16 5:46 AM, Shawn Heisey wrote:



On 7/22/2016 10:15 AM, Rallavagu wrote:








 size="2"
 initialSize="2"
 autowarmCount="500"/>




As Erick indicated, these settings are incompatible with Near Real Time
updates.

With those settings, every time you commit and create a new searcher,
Solr will execute up to 1000 queries (potentially 500 for each of the
caches above) before that new searcher will begin returning new results.

I do not know how fast your filter queries execute when they aren't
cached... but even if they only take 100 milliseconds each, that's could
take up to a minute for filterCache warming.  If each one takes two
seconds and there are 500 entries in the cache, then autowarming the
filterCache would take nearly 17 minutes. You would also need to wait
for the warming queries on queryResultCache.

The autowarmCount on my filterCache is 4, and warming that cache *still*
sometimes takes ten or more seconds to complete.

If you want true NRT, you need to set all your autowarmCount values to
zero.  The tradeoff with NRT is that your caches are ineffective
immediately after a new searcher is created.



Will look into this and make changes as suggested.



Looking at the "top" screenshot ... you have plenty of memory to cache
the entire index.  Unless your queries are extreme, this is usually
enough for good performance.

One possible problem is that cache warming is taking far longer than
your autoSoftCommit interval, and the server is constantly busy making
thousands of warming queries.  Reducing autowarmCount, possibly to zero,
*might* fix that. I would expect higher CPU load than what your
screenshot shows if this were happening, but it still might be the
problem.



Great point. Thanks for the help.



Thanks,
Shawn









Re: solr.NRTCachingDirectoryFactory

2016-08-25 Thread Rallavagu

Follow up update ...

Set autowarm count to zero for caches for NRT and I could negotiate 
latency from 2 min to 5 min :)


However, still seeing high QTimes and wondering where else can I look? 
Should I debug the code or run some tools to isolate bottlenecks (disk 
I/O, CPU or Query itself). Looking for some tuning advice. Thanks.



On 7/26/16 9:42 AM, Erick Erickson wrote:

And, I might add, you should look through your old logs
and see how long it takes to open a searcher. Let's
say Shawn's lower bound is what you see, i.e.
it takes a minute each to execute all the autowarming
in filterCache and queryResultCache... So you're current
latency is _at least_ 2 minutes between the time something
is indexed and it's available for search just for autowarming.

Plus up to another 2 minutes for your soft commit interval
to expire.

So if your business people haven't noticed a 4 minute
latency yet, tell them they don't know what they're talking
about when they insist on the NRT interval being a few
seconds ;).

Best,
Erick

On Tue, Jul 26, 2016 at 7:20 AM, Rallavagu <rallav...@gmail.com> wrote:



On 7/26/16 5:46 AM, Shawn Heisey wrote:


On 7/22/2016 10:15 AM, Rallavagu wrote:











As Erick indicated, these settings are incompatible with Near Real Time
updates.

With those settings, every time you commit and create a new searcher,
Solr will execute up to 1000 queries (potentially 500 for each of the
caches above) before that new searcher will begin returning new results.

I do not know how fast your filter queries execute when they aren't
cached... but even if they only take 100 milliseconds each, that's could
take up to a minute for filterCache warming.  If each one takes two
seconds and there are 500 entries in the cache, then autowarming the
filterCache would take nearly 17 minutes. You would also need to wait
for the warming queries on queryResultCache.

The autowarmCount on my filterCache is 4, and warming that cache *still*
sometimes takes ten or more seconds to complete.

If you want true NRT, you need to set all your autowarmCount values to
zero.  The tradeoff with NRT is that your caches are ineffective
immediately after a new searcher is created.


Will look into this and make changes as suggested.



Looking at the "top" screenshot ... you have plenty of memory to cache
the entire index.  Unless your queries are extreme, this is usually
enough for good performance.

One possible problem is that cache warming is taking far longer than
your autoSoftCommit interval, and the server is constantly busy making
thousands of warming queries.  Reducing autowarmCount, possibly to zero,
*might* fix that. I would expect higher CPU load than what your
screenshot shows if this were happening, but it still might be the
problem.


Great point. Thanks for the help.



Thanks,
Shawn





Re: solr.NRTCachingDirectoryFactory

2016-07-26 Thread Rallavagu



On 7/26/16 5:46 AM, Shawn Heisey wrote:

On 7/22/2016 10:15 AM, Rallavagu wrote:









As Erick indicated, these settings are incompatible with Near Real Time
updates.

With those settings, every time you commit and create a new searcher,
Solr will execute up to 1000 queries (potentially 500 for each of the
caches above) before that new searcher will begin returning new results.

I do not know how fast your filter queries execute when they aren't
cached... but even if they only take 100 milliseconds each, that's could
take up to a minute for filterCache warming.  If each one takes two
seconds and there are 500 entries in the cache, then autowarming the
filterCache would take nearly 17 minutes. You would also need to wait
for the warming queries on queryResultCache.

The autowarmCount on my filterCache is 4, and warming that cache *still*
sometimes takes ten or more seconds to complete.

If you want true NRT, you need to set all your autowarmCount values to
zero.  The tradeoff with NRT is that your caches are ineffective
immediately after a new searcher is created.

Will look into this and make changes as suggested.



Looking at the "top" screenshot ... you have plenty of memory to cache
the entire index.  Unless your queries are extreme, this is usually
enough for good performance.

One possible problem is that cache warming is taking far longer than
your autoSoftCommit interval, and the server is constantly busy making
thousands of warming queries.  Reducing autowarmCount, possibly to zero,
*might* fix that. I would expect higher CPU load than what your
screenshot shows if this were happening, but it still might be the problem.

Great point. Thanks for the help.



Thanks,
Shawn



Re: solr.NRTCachingDirectoryFactory

2016-07-22 Thread Rallavagu



On 7/22/16 9:56 AM, Erick Erickson wrote:

OK, scratch autowarming. In fact your autowarm counts
are quite high, I suspect far past "diminishing returns".
I usually see autowarm counts < 64, but YMMV.

Are you seeing actual hit ratios that are decent on
those caches (admin UI>>plugins/stats>>cache>>...)
And your cache sizes are also quite high in my experience,
it's probably worth measuring the utilization there as well.
And, BTW, your filterCache can occupy up to 2G of your heap.
That's probably not your central problem, but it's something
to consider.

Will look into it.


So I don't know why your queries are taking that long, my
assumption is that they may simply be very complex queries,
or you have grouping on or.

Queries are a bit complex for sure.


I guess the next thing I'd do is start trying to characterize
what queries are slow. Grouping? Pivot Faceting? 'cause
from everything you've said so far it's surprising that you're
seeing queries take this long, something doesn't feel right
but what it is I don't have a clue.


Thanks



Best,
Erick

On Fri, Jul 22, 2016 at 9:15 AM, Rallavagu <rallav...@gmail.com> wrote:



On 7/22/16 8:34 AM, Erick Erickson wrote:


Mostly this sounds like a problem that could be cured with
autowarming. But two things are conflicting here:
1> you say "We have a requirement to have updates available immediately
(NRT)"
2> your docs aren't available for 120 seconds given your autoSoftCommit
settings unless you're specifying
-Dsolr.autoSoftCommit.maxTime=some_other_interval
as a startup parameter.


Yes. We have 120 seconds available.


So assuming you really do have a 120 second autocommit time, you should be
able to smooth out the spikes by appropriate autowarming. You also haven't
indicated what your filterCache and queryResultCache settings are. They
come with a default of 0 for autowarm. But what is their size? And do you
see a correlation between longer queries every on 2 minute intervals? And
do you have some test harness in place (jmeter works well) to demonstrate
that differences in your configuration help or hurt? I can't
over-emphasize the
importance of this, otherwise if you rely on somebody simply saying "it's
slow"
you have no way to know what effect changes have.



Here is the cache configuration.









We have run load tests using JMeter with directory pointing to Solr and also
tests that are pointing to the application that queries Solr. In both cases,
we have noticed the results being slower.

Thanks



Best,
Erick


On Thu, Jul 21, 2016 at 11:22 PM, Shawn Heisey <apa...@elyograg.org>
wrote:


On 7/21/2016 11:25 PM, Rallavagu wrote:


There is no other software running on the system and it is completely
dedicated to Solr. It is running on Linux. Here is the full version.

Linux version 3.8.13-55.1.6.el7uek.x86_64
(mockbu...@ca-build56.us.oracle.com) (gcc version 4.8.3 20140911 (Red
Hat 4.8.3-9) (GCC) ) #2 SMP Wed Feb 11 14:18:22 PST 2015



Run the top program, press shift-M to sort by memory usage, and then
grab a screenshot of the terminal window.  Share it with a site like
dropbox, imgur, or something similar, and send the URL.  You'll end up
with something like this:

https://www.dropbox.com/s/zlvpvd0rrr14yit/linux-solr-top.png?dl=0

If you know what to look for, you can figure out all the relevant memory
details from that.

Thanks,
Shawn





Re: solr.NRTCachingDirectoryFactory

2016-07-22 Thread Rallavagu

Also, here is the link to screenshot.

https://dl.dropboxusercontent.com/u/39813705/Screen%20Shot%202016-07-22%20at%2010.40.21%20AM.png

Thanks

On 7/21/16 11:22 PM, Shawn Heisey wrote:

On 7/21/2016 11:25 PM, Rallavagu wrote:

There is no other software running on the system and it is completely
dedicated to Solr. It is running on Linux. Here is the full version.

Linux version 3.8.13-55.1.6.el7uek.x86_64
(mockbu...@ca-build56.us.oracle.com) (gcc version 4.8.3 20140911 (Red
Hat 4.8.3-9) (GCC) ) #2 SMP Wed Feb 11 14:18:22 PST 2015


Run the top program, press shift-M to sort by memory usage, and then
grab a screenshot of the terminal window.  Share it with a site like
dropbox, imgur, or something similar, and send the URL.  You'll end up
with something like this:

https://www.dropbox.com/s/zlvpvd0rrr14yit/linux-solr-top.png?dl=0

If you know what to look for, you can figure out all the relevant memory
details from that.

Thanks,
Shawn



Re: solr.NRTCachingDirectoryFactory

2016-07-22 Thread Rallavagu
Here is the snapshot of memory usage from "top" as you mentioned. First 
row is "solr" process. Thanks.


 PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+ 
COMMAND
29468 solr  20   0 27.536g 0.013t 3.297g S  45.7 27.6   4251:45 java 






21366 root  20   0 14.499g 217824  12952 S   1.0  0.4 192:11.54 java 






 2077 root  20   0 14.049g 190824   9980 S   0.7  0.4  62:44.00 
java 





  511 root  20   0  125792  56848  56616 S   0.0  0.1   9:33.23 
systemd-journal 





  316 splunk20   0  232056  44284  11804 S   0.7  0.1  84:52.74 
splunkd 





 1045 root  20   0  257680  39956   6836 S   0.3  0.1   7:05.78 
puppet 





32631 root  20   0  360956  39292   4788 S   0.0  0.1   4:55.37 
mcollectived 





  703 root  20   0  250372   9000976 S   0.0  0.0   1:35.52 
rsyslogd 





 1058 nslcd 20   0  454192   6004   2996 S   0.0  0.0  15:08.87 nslcd

On 7/21/16 11:22 PM, Shawn Heisey wrote:

On 7/21/2016 11:25 PM, Rallavagu wrote:

There is no other software running on the system and it is completely
dedicated to Solr. It is running on Linux. Here is the full version.

Linux version 3.8.13-55.1.6.el7uek.x86_64
(mockbu...@ca-build56.us.oracle.com) (gcc version 4.8.3 20140911 (Red
Hat 4.8.3-9) (GCC) ) #2 SMP Wed Feb 11 14:18:22 PST 2015


Run the top program, press shift-M to sort by memory usage, and then
grab a screenshot of the terminal window.  Share it with a site like
dropbox, imgur, or something similar, and send the URL.  You'll end up
with something like this:

https://www.dropbox.com/s/zlvpvd0rrr14yit/linux-solr-top.png?dl=0

If you know what to look for, you can figure out all the relevant memory
details from that.

Thanks,
Shawn



Re: solr.NRTCachingDirectoryFactory

2016-07-22 Thread Rallavagu



On 7/22/16 8:34 AM, Erick Erickson wrote:

Mostly this sounds like a problem that could be cured with
autowarming. But two things are conflicting here:
1> you say "We have a requirement to have updates available immediately (NRT)"
2> your docs aren't available for 120 seconds given your autoSoftCommit
settings unless you're specifying
-Dsolr.autoSoftCommit.maxTime=some_other_interval
as a startup parameter.


Yes. We have 120 seconds available.


So assuming you really do have a 120 second autocommit time, you should be
able to smooth out the spikes by appropriate autowarming. You also haven't
indicated what your filterCache and queryResultCache settings are. They
come with a default of 0 for autowarm. But what is their size? And do you
see a correlation between longer queries every on 2 minute intervals? And
do you have some test harness in place (jmeter works well) to demonstrate
that differences in your configuration help or hurt? I can't over-emphasize the
importance of this, otherwise if you rely on somebody simply saying "it's slow"
you have no way to know what effect changes have.


Here is the cache configuration.









We have run load tests using JMeter with directory pointing to Solr and 
also tests that are pointing to the application that queries Solr. In 
both cases, we have noticed the results being slower.


Thanks



Best,
Erick


On Thu, Jul 21, 2016 at 11:22 PM, Shawn Heisey <apa...@elyograg.org> wrote:

On 7/21/2016 11:25 PM, Rallavagu wrote:

There is no other software running on the system and it is completely
dedicated to Solr. It is running on Linux. Here is the full version.

Linux version 3.8.13-55.1.6.el7uek.x86_64
(mockbu...@ca-build56.us.oracle.com) (gcc version 4.8.3 20140911 (Red
Hat 4.8.3-9) (GCC) ) #2 SMP Wed Feb 11 14:18:22 PST 2015


Run the top program, press shift-M to sort by memory usage, and then
grab a screenshot of the terminal window.  Share it with a site like
dropbox, imgur, or something similar, and send the URL.  You'll end up
with something like this:

https://www.dropbox.com/s/zlvpvd0rrr14yit/linux-solr-top.png?dl=0

If you know what to look for, you can figure out all the relevant memory
details from that.

Thanks,
Shawn



Re: solr.NRTCachingDirectoryFactory

2016-07-21 Thread Rallavagu



On 7/21/16 9:16 PM, Shawn Heisey wrote:

On 7/21/2016 9:37 AM, Rallavagu wrote:

I suspect swapping as well. But, for my understanding - are the index
files from disk memory mapped automatically at the startup time?


They are *mapped* at startup time, but they are not *read* at startup.
The mapping just sets up a virtual address space for the entire file,
but until something actually reads the data from the disk, it will not
be in memory.  Getting the data in memory is what makes mmap fast.

http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html


We are not performing "commit" after every update and here is the
configuration for softCommit and hardCommit.


   ${solr.autoCommit.maxTime:15000}
   false



   ${solr.autoSoftCommit.maxTime:12}


I am seeing QTimes (for searches) swing between 10 seconds - 2
seconds. Some queries were showing the slowness caused to due to
faceting (debug=true). Since we have adjusted indexing and facet times
are improved but basic query QTime is still high so wondering where
can I look? Is there a way to debug (instrument) a query on Solr node?


Assuming you have not defined the maxTime system properties mentioned in
those configs, that config means you will potentially be creating a new
searcher every two minutes ... but if you are sending explicit commits
or using commitWithin on your updates, then the true situation may be
very different than what's configured here.


We have allocated significant amount of RAM (48G total
physical memory, 12G heap, Total index disk size is 15G)


Assuming there's no other software on the system besides the one
instance of Solr with a 12GB heap, this would mean that you have enough
room to cache the entire index.  What OS are you running on? With that
information, I may be able to relay some instructions that will help
determine what the complete memory situation is on your server.


There is no other software running on the system and it is completely 
dedicated to Solr. It is running on Linux. Here is the full version.


Linux version 3.8.13-55.1.6.el7uek.x86_64 
(mockbu...@ca-build56.us.oracle.com) (gcc version 4.8.3 20140911 (Red 
Hat 4.8.3-9) (GCC) ) #2 SMP Wed Feb 11 14:18:22 PST 2015


Thanks



Thanks,
Shawn



Re: solr.NRTCachingDirectoryFactory

2016-07-21 Thread Rallavagu

Thanks Erick.

On 7/21/16 8:25 AM, Erick Erickson wrote:

bq: map index files so "reading from disk" will be as simple and quick
as reading from memory hence would not incur any significant
performance degradation.

Well, if
1> the read has already been done. First time a page of the file is
accessed, it must be read from disk.
2> You have enough physical memory that _all_ of the files can be held
in memory at once.

<2> is a little tricky since the big slowdown comes from swapping
eventually. But in an LRU scheme, that may be OK if the oldest pages
are the stored=true data which are only accessed to return the top N,
not to satisfy the search.
I suspect swapping as well. But, for my understanding - are the index 
files from disk memory mapped automatically at the startup time?


What are your QTimes anyway? Define "optimal"

I'd really push back on this statement: "We have a requirement to have
updates available immediately (NRT)". Truly? You can't set
expectations that 5 seconds will be needed (or 10?). Often this is an
artificial requirement that does no real service to the user, it's
just something people think they want. If this means you're sending a
commit after every document, it's actually a really bad practice
that'll get you into trouble eventually. Plus you won't be able to do
any autowarming which will read data from disk into the OS memory and
smooth out any spikes


We are not performing "commit" after every update and here is the 
configuration for softCommit and hardCommit.



   ${solr.autoCommit.maxTime:15000}
   false



   ${solr.autoSoftCommit.maxTime:12}


I am seeing QTimes (for searches) swing between 10 seconds - 2 seconds. 
Some queries were showing the slowness caused to due to faceting 
(debug=true). Since we have adjusted indexing and facet times are 
improved but basic query QTime is still high so wondering where can I 
look? Is there a way to debug (instrument) a query on Solr node?




FWIW,
Erick

On Thu, Jul 21, 2016 at 8:18 AM, Rallavagu <rallav...@gmail.com> wrote:

Solr 5.4.1 with embedded jetty with cloud enabled

We have a Solr deployment (approximately 3 million documents) with both
write and search operations happening. We have a requirement to have updates
available immediately (NRT). Configured with default
"solr.NRTCachingDirectoryFactory" for directory factory. Considering the
fact that every time there is an update, caches are invalidated and re-built
I assume that "solr.NRTCachingDirectoryFactory" would memory map index files
so "reading from disk" will be as simple and quick as reading from memory
hence would not incur any significant performance degradation. Am I right in
my assumption? We have allocated significant amount of RAM (48G total
physical memory, 12G heap, Total index disk size is 15G) but not sure if I
am seeing the optimal QTimes (for searches). Any inputs are welcome. Thanks
in advance.


solr.NRTCachingDirectoryFactory

2016-07-21 Thread Rallavagu

Solr 5.4.1 with embedded jetty with cloud enabled

We have a Solr deployment (approximately 3 million documents) with both 
write and search operations happening. We have a requirement to have 
updates available immediately (NRT). Configured with default 
"solr.NRTCachingDirectoryFactory" for directory factory. Considering the 
fact that every time there is an update, caches are invalidated and 
re-built I assume that "solr.NRTCachingDirectoryFactory" would memory 
map index files so "reading from disk" will be as simple and quick as 
reading from memory hence would not incur any significant performance 
degradation. Am I right in my assumption? We have allocated significant 
amount of RAM (48G total physical memory, 12G heap, Total index disk 
size is 15G) but not sure if I am seeing the optimal QTimes (for 
searches). Any inputs are welcome. Thanks in advance.


Re: Document Cache

2016-03-19 Thread Rallavagu

comments in line...

On 3/17/16 2:16 PM, Erick Erickson wrote:

First, I want to make sure when you say "TTL", you're talking about
documents being evicted from the documentCache and not the "Time To Live"
option whereby documents are removed completely from the index.


May be TTL was not the right word to use here. I wanted learn the 
criteria for an entry to be ejected.




The time varies with the number of new documents fetched. This is an LRU
cache whose size is configured in solrconfig.xml. It's pretty much
unpredictable. If for some odd reason every request gets the same document
it'll never be aged out. If no two queries return the same document, when
"cache size" docs are fetched by subsequent requests.

The entire thing is thrown out whenever a new searcher is opened (i.e.
softCommit or hardCommit with openSearcher=true)




But maybe this is an XY problem. Why do you care? Is there something you're
seeing that you're trying to understand or is this just a general interest
question?

I have following configuration,

${solr.autoCommit.maxTime:15000}false

${solr.autoSoftCommit.maxTime:12}

As you can see, openSearcher is set to "false". What I am seeing is 
(from heap dump due to OutOfMemory error) that the LRUCache pertaining 
"Document Cache" occupies around 85% of available heap and that is 
causing OOM errors. So, trying to understand the behavior to address the 
OOM issues.


Thanks



Best,
Erick

On Thu, Mar 17, 2016 at 1:40 PM, Rallavagu <rallav...@gmail.com> wrote:


Solr 5.4 embedded Jetty

Is it the right assumption that whenever a document that is returned as a
response to a query is cached in "Document Cache"?

Essentially, if I request for any entry like /select?q=id:
will it be cached in "Document Cache"? If yes, what is the TTL?

Thanks in advance





Solr5 Optimize

2016-03-19 Thread Rallavagu

All,

Solr 5.4 with emdbedded Jetty (4G heap)

Trying to understand behavior of "optimize" operation if not run 
explicitly. What is the frequency at which this operation is run, what 
are the storage requirements and how do we schedule it? Any 
comments/pointers would greatly help.


Thanks in advance


Re: Solr5 Optimize

2016-03-19 Thread Rallavagu

Thanks Erick. This helps.

On 3/16/16 10:11 AM, Erick Erickson wrote:

First of all, "optimize-like" does _not_ happen
"every time a commit happens". What _does_ happen
is the current state of the index is examined and if
certain conditions are met _then_ segment
merges happen. Think of these as "partial optimizes".

This is under control of the TieredMergePolicy by
default.

There are limits placed on the number of simultaneous
merges that can happen, and they're all done in
background threads so you should see lots of I/O,
but the priority of those threads is low so it shouldn't
have  much impact on query perf.

It's theoretically possible that the background merge
will merge down to one segment, so you still need at
least as much free space on your disk and your index
occupies.

Best,
Erick


On Wed, Mar 16, 2016 at 10:07 AM, Rallavagu <rallav...@gmail.com> wrote:

Erick, Thanks for the response. Comments in line...

On 3/16/16 9:56 AM, Erick Erickson wrote:


In general, don't bother with optimize unless the index is quite static,
i.e. there are very few adds/updates or those updates are done in
batches and rarely (i.e. once a day or less frequently).

As far as space, this will require that you have at _least_ as much
free space on your disks as your index occupies. Shouldn't require
much in the way of RAM though.

Optimize, also referred to as "Force Merge" will merge all the segments
down to one, and in the process reclaim data from deleted (or updated)
documents.

The thing is, this is also accomplished by "background merging" which
happens automatically. Every time you do a hard commit, Lucene
figures out if any segments need to be merged and does that automatically.
During that process, any information associated with deleted docs is
reclaimed.


If "optimize" like operation happening automatically every time a hard
commit happens, with following settings (15 seconds for hard commit) what
would be impact on performance particularly on disk space?


${solr.autoCommit.maxTime:15000}
false
  

Thanks.



The third video down here:

http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html
is Mikes visualization of the automatic merging process.

Best,
Erick

On Wed, Mar 16, 2016 at 9:40 AM, Rallavagu <rallav...@gmail.com> wrote:


All,

Solr 5.4 with emdbedded Jetty (4G heap)

Trying to understand behavior of "optimize" operation if not run
explicitly.
What is the frequency at which this operation is run, what are the
storage
requirements and how do we schedule it? Any comments/pointers would
greatly
help.

Thanks in advance


Re: Document Cache

2016-03-19 Thread Rallavagu



On 3/18/16 9:27 AM, Emir Arnautovic wrote:

Running single query that returns all docs and all fields will actually
load as many document as queryResultWindowSize is.
What you need to do is run multiple queries that will return different
documents. In case your id is numeric, you can run something like id:[1
TO 100] and then id:[100 TO 200] etc. Make sure that it is done within
those two minute period if there is any indexing activities.
Would the existing cache be cleared while a active thread is 
performing/receiving query?




Your index is relatively small so filter cache of initial size of 1000
entries should take around 20MB (assuming single shard)

Thanks,
Emir

On 18.03.2016 17:02, Rallavagu wrote:



On 3/18/16 8:56 AM, Emir Arnautovic wrote:

Problem starts with autowarmCount="5000" - that executes 5000 queries
when new searcher is created and as queries are executed, document cache
is filled. If you have large queryResultWindowSize and queries return
big number of documents, that will eat up memory before new search is
executed. It probably takes some time as well.

This is also combined with filter cache. How big is your index?


Index is not very large.


numDocs:
85933

maxDoc:
161115

deletedDocs:
75182

Size
1.08 GB

I have run a query to return all documents with all fields. I could
not reproduce OOM. I understand that I need to reduce cache sizes but
wondering what conditions could have caused OOM so I can keep a watch.

Thanks



Thanks,
Emir

On 18.03.2016 15:43, Rallavagu wrote:

Thanks for the recommendations Shawn. Those are the lines I am
thinking as well. I am reviewing application also.

Going with the note on cache invalidation for every two minutes due to
soft commit, wonder how would it go OOM in simply two minutes or is it
likely that a thread is holding the searcher due to long running query
that might be potentially causing OOM? Was trying to reproduce but
could not so far.

Here is the filter cache config



Query Results cache



On 3/18/16 7:31 AM, Shawn Heisey wrote:

On 3/18/2016 8:22 AM, Rallavagu wrote:

So, each soft commit would create a new searcher that would
invalidate
the old cache?

Here is the configuration for Document Cache



true


In an earlier message, you indicated you're running into OOM.  I think
we can see why with this cache definition.

There are exactly two ways to deal with OOM.  One is to increase the
heap size.  The other is to reduce the amount of memory that the
program
requires by changing something -- that might be the code, the
config, or
how you're using it.

Start by reducing that cache size to 4096 or 1024.

https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap

If yuo've also got a very large filterCache, reduce that size too. The
filterCache typically eats up a LOT of memory, because each entry
in the
cache is very large.

Thanks,
Shawn







Re: Document Cache

2016-03-19 Thread Rallavagu



On 3/18/16 8:56 AM, Emir Arnautovic wrote:

Problem starts with autowarmCount="5000" - that executes 5000 queries
when new searcher is created and as queries are executed, document cache
is filled. If you have large queryResultWindowSize and queries return
big number of documents, that will eat up memory before new search is
executed. It probably takes some time as well.

This is also combined with filter cache. How big is your index?


Index is not very large.


numDocs:
85933

maxDoc:
161115

deletedDocs:
75182

Size
1.08 GB

I have run a query to return all documents with all fields. I could not 
reproduce OOM. I understand that I need to reduce cache sizes but 
wondering what conditions could have caused OOM so I can keep a watch.


Thanks



Thanks,
Emir

On 18.03.2016 15:43, Rallavagu wrote:

Thanks for the recommendations Shawn. Those are the lines I am
thinking as well. I am reviewing application also.

Going with the note on cache invalidation for every two minutes due to
soft commit, wonder how would it go OOM in simply two minutes or is it
likely that a thread is holding the searcher due to long running query
that might be potentially causing OOM? Was trying to reproduce but
could not so far.

Here is the filter cache config



Query Results cache



On 3/18/16 7:31 AM, Shawn Heisey wrote:

On 3/18/2016 8:22 AM, Rallavagu wrote:

So, each soft commit would create a new searcher that would invalidate
the old cache?

Here is the configuration for Document Cache



true


In an earlier message, you indicated you're running into OOM.  I think
we can see why with this cache definition.

There are exactly two ways to deal with OOM.  One is to increase the
heap size.  The other is to reduce the amount of memory that the program
requires by changing something -- that might be the code, the config, or
how you're using it.

Start by reducing that cache size to 4096 or 1024.

https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap

If yuo've also got a very large filterCache, reduce that size too.  The
filterCache typically eats up a LOT of memory, because each entry in the
cache is very large.

Thanks,
Shawn





Document Cache

2016-03-19 Thread Rallavagu

Solr 5.4 embedded Jetty

Is it the right assumption that whenever a document that is returned as 
a response to a query is cached in "Document Cache"?


Essentially, if I request for any entry like /select?q=id: 
will it be cached in "Document Cache"? If yes, what is the TTL?


Thanks in advance


Re: Document Cache

2016-03-19 Thread Rallavagu
Thanks for the recommendations Shawn. Those are the lines I am thinking 
as well. I am reviewing application also.


Going with the note on cache invalidation for every two minutes due to 
soft commit, wonder how would it go OOM in simply two minutes or is it 
likely that a thread is holding the searcher due to long running query 
that might be potentially causing OOM? Was trying to reproduce but could 
not so far.


Here is the filter cache config

autowarmCount="1000"/>


Query Results cache

autowarmCount="5000"/>


On 3/18/16 7:31 AM, Shawn Heisey wrote:

On 3/18/2016 8:22 AM, Rallavagu wrote:

So, each soft commit would create a new searcher that would invalidate
the old cache?

Here is the configuration for Document Cache



true


In an earlier message, you indicated you're running into OOM.  I think
we can see why with this cache definition.

There are exactly two ways to deal with OOM.  One is to increase the
heap size.  The other is to reduce the amount of memory that the program
requires by changing something -- that might be the code, the config, or
how you're using it.

Start by reducing that cache size to 4096 or 1024.

https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap

If yuo've also got a very large filterCache, reduce that size too.  The
filterCache typically eats up a LOT of memory, because each entry in the
cache is very large.

Thanks,
Shawn



Re: Solr5 Optimize

2016-03-19 Thread Rallavagu

Erick, Thanks for the response. Comments in line...

On 3/16/16 9:56 AM, Erick Erickson wrote:

In general, don't bother with optimize unless the index is quite static,
i.e. there are very few adds/updates or those updates are done in
batches and rarely (i.e. once a day or less frequently).

As far as space, this will require that you have at _least_ as much
free space on your disks as your index occupies. Shouldn't require
much in the way of RAM though.

Optimize, also referred to as "Force Merge" will merge all the segments
down to one, and in the process reclaim data from deleted (or updated)
documents.

The thing is, this is also accomplished by "background merging" which
happens automatically. Every time you do a hard commit, Lucene
figures out if any segments need to be merged and does that automatically.
During that process, any information associated with deleted docs is
reclaimed.
If "optimize" like operation happening automatically every time a hard 
commit happens, with following settings (15 seconds for hard commit) 
what would be impact on performance particularly on disk space?



   ${solr.autoCommit.maxTime:15000}
   false
 

Thanks.



The third video down here:
http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html
is Mikes visualization of the automatic merging process.

Best,
Erick

On Wed, Mar 16, 2016 at 9:40 AM, Rallavagu <rallav...@gmail.com> wrote:

All,

Solr 5.4 with emdbedded Jetty (4G heap)

Trying to understand behavior of "optimize" operation if not run explicitly.
What is the frequency at which this operation is run, what are the storage
requirements and how do we schedule it? Any comments/pointers would greatly
help.

Thanks in advance


Re: Document Cache

2016-03-18 Thread Rallavagu
So, each soft commit would create a new searcher that would invalidate 
the old cache?


Here is the configuration for Document Cache

autowarmCount="0"/>


true

Thanks

On 3/18/16 12:45 AM, Emir Arnautovic wrote:

Hi,
Your cache will be cleared on soft commits - every two minutes. It seems
that it is either configured to be huge or you have big documents and
retrieving all fields or dont have lazy field loading set to true.

Can you please share your document cache config and heap settings.

Thanks,
Emir

On 17.03.2016 22:24, Rallavagu wrote:

comments in line...

On 3/17/16 2:16 PM, Erick Erickson wrote:

First, I want to make sure when you say "TTL", you're talking about
documents being evicted from the documentCache and not the "Time To
Live"
option whereby documents are removed completely from the index.


May be TTL was not the right word to use here. I wanted learn the
criteria for an entry to be ejected.



The time varies with the number of new documents fetched. This is an LRU
cache whose size is configured in solrconfig.xml. It's pretty much
unpredictable. If for some odd reason every request gets the same
document
it'll never be aged out. If no two queries return the same document,
when
"cache size" docs are fetched by subsequent requests.

The entire thing is thrown out whenever a new searcher is opened (i.e.
softCommit or hardCommit with openSearcher=true)




But maybe this is an XY problem. Why do you care? Is there something
you're
seeing that you're trying to understand or is this just a general
interest
question?

I have following configuration,

${solr.autoCommit.maxTime:15000}false


${solr.autoSoftCommit.maxTime:12}


As you can see, openSearcher is set to "false". What I am seeing is
(from heap dump due to OutOfMemory error) that the LRUCache pertaining
"Document Cache" occupies around 85% of available heap and that is
causing OOM errors. So, trying to understand the behavior to address
the OOM issues.

Thanks



Best,
Erick

On Thu, Mar 17, 2016 at 1:40 PM, Rallavagu <rallav...@gmail.com> wrote:


Solr 5.4 embedded Jetty

Is it the right assumption that whenever a document that is returned
as a
response to a query is cached in "Document Cache"?

Essentially, if I request for any entry like /select?q=id:
will it be cached in "Document Cache"? If yes, what is the TTL?

Thanks in advance







Re: SolrCloud breaks and does not recover

2015-11-03 Thread Rallavagu
One another item to look into is to increase the zookeeper timeout in 
solr.xml of Solr. This would help with timeout caused by long GC pauses.


On 11/3/15 9:12 AM, Björn Häuser wrote:

Hi,

thank you for your answer.

1> No OOM hit, the log does not contain any hind of that. Also solr
wasn't restarted automatically. But the gc log has some pauses which
are longer than 15 seconds.

2> So, if we need to recover a system we need to stop ingesting data into it?

3> The JVMs currently use a little bit more then 1GB of Heap, with a
now changed max-heap of 3GB. Currently thinking of lowering the heap
to 1.5 / 2 GB (following Uwe's post).

Also the RES is 4.1gb and VIRT is 12.5gb. Swap is more or less not
used (40mb of 1GB assigned swap). According to our server monitoring
sometimes an io spike happens, but again not that much.

What I am going todo:

1.) make sure that in case of failure we stop ingesting data into solrcloud
2.) lower the heap to 2GB
3.) Make sure that zookeeper can fsync its write-ahead log fast enough (<1 sec)

Thanks
Björn

2015-11-03 16:27 GMT+01:00 Erick Erickson :

The GC logs don't really show anything interesting, there would
be 15+ second GC pauses. The Zookeeper log isn't actually very
interesting. As far as OOM errors, I was thinking of _solr_ logs.

As to why the cluster doesn't self-heal, a couple of things:

1> Once you hit an OOM, all bets are off. The JVM needs to be
bounced. Many installations have kill scripts that bounce the
JVM. So it's explainable if you have OOM errors.

2> The system may be _trying_ to recover, but if you're
still ingesting data it may get into a resource-starved
situation where it makes progress but never catches up.

Again, though, this seems like very little memory for the
situation you describe, I suspect you're memory-starved to
a point where you can't really run. But that's a guess.

When you run, how much JVM memory are you using? The admin
UI should show that.

But the pattern of 8G physical memory and 6G for Java is a red
flag as per Uwe's blog post, you may be swapping a lot (OS
memory) and that may be slowing things down enough to have
sessions drop. Grasping at straws here, but "top" or similar
should tell you what the system is doing.

Best,
Erick

On Tue, Nov 3, 2015 at 12:04 AM, Björn Häuser  wrote:

Hi!

Thank you for your super fast answer.

I can provide more data, the question is which data :-)

These are the config parameters solr runs with:
https://gist.github.com/bjoernhaeuser/24e7080b9ff2a8785740 (taken from
the admin ui)

These are the log files:

https://gist.github.com/bjoernhaeuser/a60c2319d71eb35e9f1b

I think your first obversation is correct: SolrCloud looses the
connection to zookeeper, because the connection times out.

But why isn't solrcloud able to recover it self?

Thanks
Björn


2015-11-02 22:32 GMT+01:00 Erick Erickson :

Without more data, I'd guess one of two things:

1> you're seeing stop-the-world GC pauses that cause Zookeeper to
think the node is unresponsive, which puts a node into recovery and
things go bad from there.

2> Somewhere in your solr logs you'll see OutOfMemory errors which can
also cascade a bunch of problems.

In general it's an anti-pattern to allocate such a large portion of
our physical memory to the JVM, see:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html



Best,
Erick



On Mon, Nov 2, 2015 at 1:21 PM, Björn Häuser  wrote:

Hey there,

we are running a SolrCloud, which has 4 nodes, same config. Each node
has 8gb memory, 6GB assigned to the JVM. This is maybe too much, but
worked for a long time.

We currently run with 2 shards, 2 replicas and 11 collections. The
complete data-dir is about 5.3 GB.
I think we should move some JVM heap back to the OS.

We are running Solr 5.2.1., as I could not see any related bugs to
SolrCloud in the release notes for 5.3.0 and 5.3.1, we did not bother
to upgrade first.

One of our nodes (node A) reports these errors:

org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
Error from server at http://10.41.199.201:9004/solr/catalogue: Invalid
version (expected 2, but 101) or the data in not in 'javabin' format

Stacktrace: https://gist.github.com/bjoernhaeuser/46ac851586a51f8ec171

And shortly after (4 seconds) this happens on a *different* node (Node B):

Stopping recovery for core=suggestion coreNodeName=core_node2

No Stacktrace for this, but happens for all 11 collections.

6 seconds after that Node C reports these errors:

org.apache.solr.common.SolrException:
org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for /configs/customers/params.json

Stacktrace: https://gist.github.com/bjoernhaeuser/45a244dc32d74ac989f8

This also happens for 11 collections.

And then different errors happen:

OverseerAutoReplicaFailoverThread had an error in its thread work

growth of tlog

2015-10-30 Thread Rallavagu

4.10.4 solr cloud, 3 zk quorum, jdk 8

autocommit: 15 sec, softcommit: 2 min

Under heavy indexing load with above settings, i have seen tlog growing 
(into GB). After the updates stopped coming in, it settles down and 
takes a while to recover before cloud becomes "green".


With 15 second autocommit setting, what could potentially cause tlog to 
grow? What to look for?


Re: growth of tlog

2015-10-30 Thread Rallavagu



On 10/30/15 8:39 AM, Erick Erickson wrote:

I infer that this statement: "takes a while to recover before cloud
becomes green"
indicates that the node is in recovery or something while indexing. If you're
still indexing, the new documents will be written to the followers
tlog while the
follower is recovering, leading to it growing. I expect that after followers
all recover, the tlog shrinks after a few commits have gone by.


Correct. The recovery time is extended though. Also, this affects 
available physical memory as tlog continues to grow and it is memory mapped.




If that's all true, the question is why the follower goes into
recovery in the first
place. Prior to 5.2, there was a situation in which very heavy indexing
could cause a follower to go into Leader Initiated Recovery (LIR) (look for this
in both the leader and follower logs). Here's the blog Tim Potter wrote
on this subject:
https://lucidworks.com/blog/2015/06/10/indexing-performance-solr-5-2-now-twice-fast/

The smoking gun here is
1> heavy indexing is required
2> the _leader_ stays up
3> the _follower_ goes into recovery for no readily apparent reason
4> the nail in the coffin for this particular issue is seeing that the follower
  went into LIR.
5> You'll also see a very large number of threads on the leader waiting
   on sending the updates to the follower.


If this is a problem, prior to 5.2 there are really only two solutions
1> throttle indexing
2> take all of the followers offline during indexing. When indexing is
  completed, bring the followers back up and let them replicate the
  full index down from the leader.
Other than shutting followers down, is there a elegant/graceful way of 
taking follower nodes offline? Also, to give you more idea, as per the 
following document I am testing "Index heavy, Query heavy" situation.


https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Thanks



Best,
Erick

On Fri, Oct 30, 2015 at 8:28 AM, Rallavagu <rallav...@gmail.com> wrote:

4.10.4 solr cloud, 3 zk quorum, jdk 8

autocommit: 15 sec, softcommit: 2 min

Under heavy indexing load with above settings, i have seen tlog growing
(into GB). After the updates stopped coming in, it settles down and takes a
while to recover before cloud becomes "green".

With 15 second autocommit setting, what could potentially cause tlog to
grow? What to look for?


Solr for Pictures

2015-10-29 Thread Rallavagu
In general, is there a built-in data handler to index pictures 
(essentially, EXIF and other data embedded in an image)? If not, what is 
the best practice to do so? Thanks.


Re: Solr for Pictures

2015-10-29 Thread Rallavagu
I was playing with exiftool (written in perl) and a custom java class 
built using metadata-extrator project 
(https://github.com/drewnoakes/metadata-extractor) and wondering if 
there is anything built into Solr or are there any best practices 
(general practices) to index pictures.


On 10/29/15 1:56 PM, Daniel Valdivia wrote:

Some extra googling yield this Wiki from a integration between Tika and a 
EXIFTool

https://wiki.apache.org/tika/EXIFToolParser 
<https://wiki.apache.org/tika/EXIFToolParser>


On Oct 29, 2015, at 1:48 PM, Daniel Valdivia <h...@danielvaldivia.com> wrote:

I think you can look into Tika for this https://tika.apache.org/ 
<https://tika.apache.org/>

There’s handlers to integrate Tika and Solr, some context:

https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika
 
<https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika>




On Oct 29, 2015, at 1:47 PM, Rallavagu <rallav...@gmail.com 
<mailto:rallav...@gmail.com>> wrote:

In general, is there a built-in data handler to index pictures (essentially, 
EXIF and other data embedded in an image)? If not, what is the best practice to 
do so? Thanks.







Commit Error

2015-10-28 Thread Rallavagu

Solr 4.6.1, cloud

Seeing following commit errors.

[commitScheduler-19-thread-1] ERROR org.apache.solr.update.CommitTracker 
– auto commit error...:java.lang.IllegalStateException: this writer hit 
an OutOfMemoryError; cannot commit at 
org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2807) 
at 
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2984) at 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:559) 
at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216) at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:440) 
at java.util.concurrent.FutureTask.run(FutureTask.java:138) at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98) 
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:896) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919) 
at java.lang.Thread.run(Thread.java:682)


Looking at the code,

public final void prepareCommit() throws IOException {
ensureOpen();
prepareCommitInternal();
  }

  private void prepareCommitInternal() throws IOException {
synchronized(commitLock) {
  ensureOpen(false);
  if (infoStream.isEnabled("IW")) {
infoStream.message("IW", "prepareCommit: flush");
infoStream.message("IW", "  index before flush " + segString());
  }

  if (hitOOM) {
throw new IllegalStateException("this writer hit an 
OutOfMemoryError; cannot commit");

  }

It simply checking a flag if it hit OOM? What is making to check and set 
the flag? What could be the conditions? Thanks.


Re: Commit Error

2015-10-28 Thread Rallavagu

Thanks Shawn for the response.

Seeing very high CPU during this time and very high warmup times. During 
this time, there were plenty of these errors logged. So, trying to find 
out possible causes for this to occur. Could it be disk I/O issues or 
something else as it is related to commit (writing to disk).


On 10/28/15 3:57 PM, Shawn Heisey wrote:

On 10/28/2015 2:06 PM, Rallavagu wrote:

Solr 4.6.1, cloud

Seeing following commit errors.

[commitScheduler-19-thread-1] ERROR
org.apache.solr.update.CommitTracker – auto commit
error...:java.lang.IllegalStateException: this writer hit an
OutOfMemoryError; cannot commit at
org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2807)
at
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2984)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:559)
at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216) at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:440) at
java.util.concurrent.FutureTask.run(FutureTask.java:138) at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:896)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919)
at java.lang.Thread.run(Thread.java:682)

Looking at the code,

public final void prepareCommit() throws IOException {
 ensureOpen();
 prepareCommitInternal();
   }

   private void prepareCommitInternal() throws IOException {
 synchronized(commitLock) {
   ensureOpen(false);
   if (infoStream.isEnabled("IW")) {
 infoStream.message("IW", "prepareCommit: flush");
 infoStream.message("IW", "  index before flush " + segString());
   }

   if (hitOOM) {
 throw new IllegalStateException("this writer hit an
OutOfMemoryError; cannot commit");
   }

It simply checking a flag if it hit OOM? What is making to check and
set the flag? What could be the conditions? Thanks.


This exception handling was revamped in Lucene 4.10.1 (and therefore in
Solr 4.10.1) by this issue:

https://issues.apache.org/jira/browse/LUCENE-5958

The "hitOOM" variable was removed by the following specific commit --
this is the commit on the 4.10 branch, but it was also committed to
branch_4x and trunk as well.  Later commits on this same issue were made
to branch_5x -- the cutover to begin the 5.0 release process was made
while this issue was still being fixed.

https://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_4_10/lucene/core/src/java/org/apache/lucene/index/IndexWriter.java?r1=1626189=1626188=1626189

In the code before this fix, the hitOOM flag is set by other methods in
IndexWriter.  It is volatile to prevent problems with multiple threads
updating and accessing it.

Your message doesn't indicate what problems you're having besides an
error message in your log.  LUCENE-5958 indicates that the problems
could be as bad as a corrupt index.

The reason that IndexWriter swallows OOM exceptions is that this is the
only way Lucene can even *attempt* to avoid index corruption in every
error situation.  Lucene has had a very good track record at avoiding
index corruption, but every now and then a bug is found and a user
manages to get a corrupted index.

Thanks,
Shawn



Re: Commit Error

2015-10-28 Thread Rallavagu
Also, is this thread that went OOM and what could cause it? The heap was 
doing fine and server was live and running.


On 10/28/15 3:57 PM, Shawn Heisey wrote:

On 10/28/2015 2:06 PM, Rallavagu wrote:

Solr 4.6.1, cloud

Seeing following commit errors.

[commitScheduler-19-thread-1] ERROR
org.apache.solr.update.CommitTracker – auto commit
error...:java.lang.IllegalStateException: this writer hit an
OutOfMemoryError; cannot commit at
org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2807)
at
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2984)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:559)
at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216) at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:440) at
java.util.concurrent.FutureTask.run(FutureTask.java:138) at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:896)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919)
at java.lang.Thread.run(Thread.java:682)

Looking at the code,

public final void prepareCommit() throws IOException {
 ensureOpen();
 prepareCommitInternal();
   }

   private void prepareCommitInternal() throws IOException {
 synchronized(commitLock) {
   ensureOpen(false);
   if (infoStream.isEnabled("IW")) {
 infoStream.message("IW", "prepareCommit: flush");
 infoStream.message("IW", "  index before flush " + segString());
   }

   if (hitOOM) {
 throw new IllegalStateException("this writer hit an
OutOfMemoryError; cannot commit");
   }

It simply checking a flag if it hit OOM? What is making to check and
set the flag? What could be the conditions? Thanks.


This exception handling was revamped in Lucene 4.10.1 (and therefore in
Solr 4.10.1) by this issue:

https://issues.apache.org/jira/browse/LUCENE-5958

The "hitOOM" variable was removed by the following specific commit --
this is the commit on the 4.10 branch, but it was also committed to
branch_4x and trunk as well.  Later commits on this same issue were made
to branch_5x -- the cutover to begin the 5.0 release process was made
while this issue was still being fixed.

https://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_4_10/lucene/core/src/java/org/apache/lucene/index/IndexWriter.java?r1=1626189=1626188=1626189

In the code before this fix, the hitOOM flag is set by other methods in
IndexWriter.  It is volatile to prevent problems with multiple threads
updating and accessing it.

Your message doesn't indicate what problems you're having besides an
error message in your log.  LUCENE-5958 indicates that the problems
could be as bad as a corrupt index.

The reason that IndexWriter swallows OOM exceptions is that this is the
only way Lucene can even *attempt* to avoid index corruption in every
error situation.  Lucene has had a very good track record at avoiding
index corruption, but every now and then a bug is found and a user
manages to get a corrupted index.

Thanks,
Shawn



Re: Commit Error

2015-10-28 Thread Rallavagu



On 10/28/15 5:41 PM, Shawn Heisey wrote:

On 10/28/2015 5:11 PM, Rallavagu wrote:

Seeing very high CPU during this time and very high warmup times. During
this time, there were plenty of these errors logged. So, trying to find
out possible causes for this to occur. Could it be disk I/O issues or
something else as it is related to commit (writing to disk).


Lucene is claiming that you're hitting the Out Of Memory exception.  I
pulled down the 4.6.1 source code to verify IndexWriter's behavior.  The
only time hitOOM can be set to true is when OutOfMemoryError is being
thrown, so unless you're running Solr built from modified source code,
Lucene's claim *is* what's happening.


This is very likely true as source is not modified.



In OOM situations, there's a good chance that Java is going to be
spending a lot of time doing garbage collection, which can cause CPU
usage to go high and make warm times long.


Again, I think this is the likely case. Even though there is no apparent 
OOM, JVM can throw OOM in case of excessive number full GC and unable to 
claim certain amount of memory.




The behavior of most Java programs is completely unpredictable when Java
actually runs out of memory.  As already mentioned, the parts of Lucene
that update the index are specifically programmed to deal with OOM
without causing index corruption.  Writing code that is predictable in
OOM situations is challenging, so only a subset of the code in
Lucene/Solr has been hardened in this way.  Most of it is as
unpredictable in OOM as any other Java program.


Thanks Shawn.



Thanks,
Shawn



Re: Solr hard commit

2015-10-27 Thread Rallavagu



On 10/27/15 8:43 AM, Erick Erickson wrote:

bq: So, the updated file(s) on the disk automatically read into memory
as they are Memory mapped?

Yes.


Not quite sure why you care, curiosity or is there something you're
trying to accomplish?
This is out of curiosity. So, I can get better understanding of Solr's 
memory usage (heap & mmap).




The contents of the index's segment files are read into virtual memory
by MMapDirectory as needed to satisfy queries. Which is the point of
autowarming BTW.


Ok. But, I have noticed that even "tlog" files are memory mapped (output 
from "lsof") in addition to all other files under "data" directory.




commit in the following is either hard commit with openSearcher=true
or soft commit.


Hard commit is setup with openSearcher=false and softCommit is setup for 
every 2 min.




Segments that have been created (closed actually) after the last
commit  are _not_ read at all until the next searcher is opened via
another commit. Nothing is done with these new segments before the new
searcher is opened which you control with your commit strategy.


I see. Thanks for the insight.



Best,
Erick

On Mon, Oct 26, 2015 at 9:07 PM, Rallavagu <rallav...@gmail.com> wrote:

Erick, Thanks for clarification. I was under impression that MMapDirectory
is being used for both read/write operations. Now, I see how it is being
used. Essentially, it only reads from MMapDirectory and writes directly to
disk. So, the updated file(s) on the disk automatically read into memory as
they are Memory mapped?

On 10/26/15 8:43 PM, Erick Erickson wrote:


You're really looking at this backwards. The MMapDirectory stuff is
for Solr (Lucene, really) _reading_ data from closed segment files.

When indexing, there are internal memory structures that are flushed
to disk on commit, but these have nothing to do with MMapDirectory.

So the question is really moot ;)

Best,
Erick

On Mon, Oct 26, 2015 at 5:47 PM, Rallavagu <rallav...@gmail.com> wrote:


All,

Are memory mapped files (mmap) flushed to disk during "hard commit"? If
yes,
should we disable OS level (Linux for example) memory mapped flush?

I am referring to following for mmap files for Lucene/Solr

http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

Linux level flush

http://www.cyberciti.biz/faq/linux-stop-flushing-of-mmaped-pages-to-disk/

Solr's hard and soft commit


https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Thanks in advance.


Re: Using books.json in solr

2015-10-27 Thread Rallavagu
Could you please share your query? You could use "wt=json" query 
parameter to receive JSON formatted results if that is what you are 
looking for.


On 10/27/15 10:44 AM, Salonee Rege wrote:

Hello,
   We are trying to query the books.json that we have posted to solr.
But when we try to specfically query it on genre it does not return a
complete json with valid key-value pairs. Kindly help.

/Salonee Rege/
USC Viterbi School of Engineering
University of Southern California
Master of Computer Science - Student
Computer Science - B.E
salon...@usc.edu  _||_ _619-709-6756_
_
_
_
_


Re: Solr hard commit

2015-10-27 Thread Rallavagu

Is it related to this config?



Solr hard commit

2015-10-26 Thread Rallavagu

All,

Are memory mapped files (mmap) flushed to disk during "hard commit"? If 
yes, should we disable OS level (Linux for example) memory mapped flush?


I am referring to following for mmap files for Lucene/Solr

http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

Linux level flush

http://www.cyberciti.biz/faq/linux-stop-flushing-of-mmaped-pages-to-disk/

Solr's hard and soft commit

https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Thanks in advance.


Re: Solr hard commit

2015-10-26 Thread Rallavagu
Erick, Thanks for clarification. I was under impression that 
MMapDirectory is being used for both read/write operations. Now, I see 
how it is being used. Essentially, it only reads from MMapDirectory and 
writes directly to disk. So, the updated file(s) on the disk 
automatically read into memory as they are Memory mapped?


On 10/26/15 8:43 PM, Erick Erickson wrote:

You're really looking at this backwards. The MMapDirectory stuff is
for Solr (Lucene, really) _reading_ data from closed segment files.

When indexing, there are internal memory structures that are flushed
to disk on commit, but these have nothing to do with MMapDirectory.

So the question is really moot ;)

Best,
Erick

On Mon, Oct 26, 2015 at 5:47 PM, Rallavagu <rallav...@gmail.com> wrote:

All,

Are memory mapped files (mmap) flushed to disk during "hard commit"? If yes,
should we disable OS level (Linux for example) memory mapped flush?

I am referring to following for mmap files for Lucene/Solr

http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

Linux level flush

http://www.cyberciti.biz/faq/linux-stop-flushing-of-mmaped-pages-to-disk/

Solr's hard and soft commit

https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Thanks in advance.


Re: locks and high CPU

2015-10-22 Thread Rallavagu Kon
Erick,

Indexing happening via Solr cloud server. This thread was from the leader. Some 
followers show symptom of high cpu during this time. You think this is from 
locking? What is the thread that is holding the lock doing? Also, we are unable 
to reproduce this issue in load test environment. Any clues would help.

> On Oct 22, 2015, at 09:50, Erick Erickson <erickerick...@gmail.com> wrote:
> 
> Prior to Solr 5.2, there were several inefficiencies when distributing
> updates to replicas, see:
> https://lucidworks.com/blog/2015/06/10/indexing-performance-solr-5-2-now-twice-fast/.
> 
> The symptom was that there was significantly higher CPU utilization on
> the followers
> compared to the leader.
> 
> The only real fix is to upgrade to 5.2+ assuming that's your issue.
> 
> How are you indexing? Using SolrJ with CloudSolrServer would help if
> you're not using
> them.
> 
> Best,
> Erick
> 
>> On Thu, Oct 22, 2015 at 9:43 AM, Rallavagu <rallav...@gmail.com> wrote:
>> Solr 4.6.1 cloud
>> 
>> Looking into thread dump 4-5 threads causing cpu to go very high and causing
>> issues. These are tomcat's http threads and are locking. Can anybody help me
>> understand what is going on here? I see that incoming connections coming in
>> for updates and they are being passed on to StreamingSolrServer and
>> subsequently ConcurrentUpdateSolrServer and they both have locks. Thanks.
>> 
>> 
>> "http-bio-8080-exec-4394" id=8774 idx=0x988 tid=14548 prio=5 alive,
>> native_blocked, daemon
>>at __lll_lock_wait+34(:0)@0x38caa0e262
>>at safepointSyncOnPollAccess+167(safepoint.c:83)@0x7fc29b9c9138
>>at trapiNormalHandler+484(traps_posix.c:220)@0x7fc29b9fd745
>>at _L_unlock_16+44(:0)@0x38caa0f710
>>at
>> java/util/concurrent/locks/ReentrantLock.lock(ReentrantLock.java:262)[optimized]
>>at
>> org/apache/solr/client/solrj/impl/ConcurrentUpdateSolrServer.blockUntilFinished(ConcurrentUpdateSolrServer.java:391)[inlined]
>>at
>> org/apache/solr/update/StreamingSolrServers.blockUntilFinished(StreamingSolrServers.java:98)[inlined]
>>at
>> org/apache/solr/update/SolrCmdDistributor.finish(SolrCmdDistributor.java:61)[inlined]
>>at
>> org/apache/solr/update/processor/DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:501)[inlined]
>>at
>> org/apache/solr/update/processor/DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1278)[optimized]
>>^-- Holding lock:
>> org/apache/solr/update/StreamingSolrServers$1@0x496cf6e50[biased lock]
>>^-- Holding lock:
>> org/apache/solr/update/StreamingSolrServers@0x49d32adc8[biased lock]
>>at
>> org/apache/solr/handler/ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83)[optimized]
>>at
>> org/apache/solr/handler/RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)[optimized]
>>at org/apache/solr/core/SolrCore.execute(SolrCore.java:1859)[optimized]
>>at
>> org/apache/solr/servlet/SolrDispatchFilter.execute(SolrDispatchFilter.java:721)[inlined]
>>at
>> org/apache/solr/servlet/SolrDispatchFilter.doFilter(SolrDispatchFilter.java:417)[inlined]
>>at
>> org/apache/solr/servlet/SolrDispatchFilter.doFilter(SolrDispatchFilter.java:201)[optimized]
>>at
>> org/apache/catalina/core/ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)[inlined]
>>at
>> org/apache/catalina/core/ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)[optimized]
>>at
>> org/apache/catalina/core/StandardWrapperValve.invoke(StandardWrapperValve.java:222)[optimized]
>>at
>> org/apache/catalina/core/StandardContextValve.invoke(StandardContextValve.java:123)[optimized]
>>at
>> org/apache/catalina/core/StandardHostValve.invoke(StandardHostValve.java:171)[optimized]
>>at
>> org/apache/catalina/valves/ErrorReportValve.invoke(ErrorReportValve.java:99)[optimized]
>>at
>> org/apache/catalina/valves/AccessLogValve.invoke(AccessLogValve.java:953)[optimized]
>>at
>> org/apache/catalina/core/StandardEngineValve.invoke(StandardEngineValve.java:118)[optimized]
>>at
>> org/apache/catalina/connector/CoyoteAdapter.service(CoyoteAdapter.java:408)[optimized]
>>at
>> org/apache/coyote/http11/AbstractHttp11Processor.process(AbstractHttp11Processor.java:1023)[optimized]
>>at
>> org/apache/coyote/AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)[optimized]
>>at
>> org/apache/tomcat/util/net/JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)[optimized]
>>^-- Holding lock:
>> org/apache/tomcat/util/net/SocketWrapper@0x496e58810[thin lock]
>>at
>> java/util/concurrent/ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)[inlined]
>>at
>> java/util/concurrent/ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)[optimized]
>>at java/lang/Thread.run(Thread.java:682)[optimized]
>>at jrockit/vm/RNI.c2java(J)V(Native Method)


Re: locks and high CPU

2015-10-22 Thread Rallavagu
Thanks Erick. Currently, migrating to 5.3 and it is taking a bit of 
time. Meanwhile, I looked at the JIRAs from the blog and the stack trace 
looks a bit different from what I see but not sure if they are related. 
Also, as per the stack trace I have included in my original email, it is 
the tomcat thread that is locking but not the recovery thread which will 
be responsible writing updates to followers. I agree that we might 
throttle updates but what is annoying is unable to see issues in 
controlled load test env.


Just to understand better, what is the tomcat thread doing in this case?

Thanks

On 10/22/15 12:53 PM, Erick Erickson wrote:

The details are in Tim's blog post and the linked JIRAs

Unfortunately, the only real solution I know of is to upgrade
to at least Solr 5.2. Meanwhile, throttling the indexing rate
will at least smooth out the issue. Not a great approach but
all there is for 4.6.

Best,
Erick

On Thu, Oct 22, 2015 at 10:48 AM, Rallavagu Kon <rallav...@gmail.com> wrote:

Erick,

Indexing happening via Solr cloud server. This thread was from the leader. Some 
followers show symptom of high cpu during this time. You think this is from 
locking? What is the thread that is holding the lock doing? Also, we are unable 
to reproduce this issue in load test environment. Any clues would help.


On Oct 22, 2015, at 09:50, Erick Erickson <erickerick...@gmail.com> wrote:

Prior to Solr 5.2, there were several inefficiencies when distributing
updates to replicas, see:
https://lucidworks.com/blog/2015/06/10/indexing-performance-solr-5-2-now-twice-fast/.

The symptom was that there was significantly higher CPU utilization on
the followers
compared to the leader.

The only real fix is to upgrade to 5.2+ assuming that's your issue.

How are you indexing? Using SolrJ with CloudSolrServer would help if
you're not using
them.

Best,
Erick


On Thu, Oct 22, 2015 at 9:43 AM, Rallavagu <rallav...@gmail.com> wrote:
Solr 4.6.1 cloud

Looking into thread dump 4-5 threads causing cpu to go very high and causing
issues. These are tomcat's http threads and are locking. Can anybody help me
understand what is going on here? I see that incoming connections coming in
for updates and they are being passed on to StreamingSolrServer and
subsequently ConcurrentUpdateSolrServer and they both have locks. Thanks.


"http-bio-8080-exec-4394" id=8774 idx=0x988 tid=14548 prio=5 alive,
native_blocked, daemon
at __lll_lock_wait+34(:0)@0x38caa0e262
at safepointSyncOnPollAccess+167(safepoint.c:83)@0x7fc29b9c9138
at trapiNormalHandler+484(traps_posix.c:220)@0x7fc29b9fd745
at _L_unlock_16+44(:0)@0x38caa0f710
at
java/util/concurrent/locks/ReentrantLock.lock(ReentrantLock.java:262)[optimized]
at
org/apache/solr/client/solrj/impl/ConcurrentUpdateSolrServer.blockUntilFinished(ConcurrentUpdateSolrServer.java:391)[inlined]
at
org/apache/solr/update/StreamingSolrServers.blockUntilFinished(StreamingSolrServers.java:98)[inlined]
at
org/apache/solr/update/SolrCmdDistributor.finish(SolrCmdDistributor.java:61)[inlined]
at
org/apache/solr/update/processor/DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:501)[inlined]
at
org/apache/solr/update/processor/DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1278)[optimized]
^-- Holding lock:
org/apache/solr/update/StreamingSolrServers$1@0x496cf6e50[biased lock]
^-- Holding lock:
org/apache/solr/update/StreamingSolrServers@0x49d32adc8[biased lock]
at
org/apache/solr/handler/ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83)[optimized]
at
org/apache/solr/handler/RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)[optimized]
at org/apache/solr/core/SolrCore.execute(SolrCore.java:1859)[optimized]
at
org/apache/solr/servlet/SolrDispatchFilter.execute(SolrDispatchFilter.java:721)[inlined]
at
org/apache/solr/servlet/SolrDispatchFilter.doFilter(SolrDispatchFilter.java:417)[inlined]
at
org/apache/solr/servlet/SolrDispatchFilter.doFilter(SolrDispatchFilter.java:201)[optimized]
at
org/apache/catalina/core/ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)[inlined]
at
org/apache/catalina/core/ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)[optimized]
at
org/apache/catalina/core/StandardWrapperValve.invoke(StandardWrapperValve.java:222)[optimized]
at
org/apache/catalina/core/StandardContextValve.invoke(StandardContextValve.java:123)[optimized]
at
org/apache/catalina/core/StandardHostValve.invoke(StandardHostValve.java:171)[optimized]
at
org/apache/catalina/valves/ErrorReportValve.invoke(ErrorReportValve.java:99)[optimized]
at
org/apache/catalina/valves/AccessLogValve.invoke(AccessLogValve.java:953)[optimized]
at
org/apache/catalina/core/StandardEngineValve.invoke(StandardEngineValve.java:118)[optimized]
at
org/apache/catalina/connector/CoyoteAdapter.service(Coyot

locks and high CPU

2015-10-22 Thread Rallavagu

Solr 4.6.1 cloud

Looking into thread dump 4-5 threads causing cpu to go very high and 
causing issues. These are tomcat's http threads and are locking. Can 
anybody help me understand what is going on here? I see that incoming 
connections coming in for updates and they are being passed on to 
StreamingSolrServer and subsequently ConcurrentUpdateSolrServer and they 
both have locks. Thanks.



"http-bio-8080-exec-4394" id=8774 idx=0x988 tid=14548 prio=5 alive, 
native_blocked, daemon

at __lll_lock_wait+34(:0)@0x38caa0e262
at safepointSyncOnPollAccess+167(safepoint.c:83)@0x7fc29b9c9138
at trapiNormalHandler+484(traps_posix.c:220)@0x7fc29b9fd745
at _L_unlock_16+44(:0)@0x38caa0f710
at 
java/util/concurrent/locks/ReentrantLock.lock(ReentrantLock.java:262)[optimized]
at 
org/apache/solr/client/solrj/impl/ConcurrentUpdateSolrServer.blockUntilFinished(ConcurrentUpdateSolrServer.java:391)[inlined]
at 
org/apache/solr/update/StreamingSolrServers.blockUntilFinished(StreamingSolrServers.java:98)[inlined]
at 
org/apache/solr/update/SolrCmdDistributor.finish(SolrCmdDistributor.java:61)[inlined]
at 
org/apache/solr/update/processor/DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:501)[inlined]
at 
org/apache/solr/update/processor/DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1278)[optimized]
^-- Holding lock: 
org/apache/solr/update/StreamingSolrServers$1@0x496cf6e50[biased lock]
^-- Holding lock: 
org/apache/solr/update/StreamingSolrServers@0x49d32adc8[biased lock]
at 
org/apache/solr/handler/ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83)[optimized]
at 
org/apache/solr/handler/RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)[optimized]

at org/apache/solr/core/SolrCore.execute(SolrCore.java:1859)[optimized]
at 
org/apache/solr/servlet/SolrDispatchFilter.execute(SolrDispatchFilter.java:721)[inlined]
at 
org/apache/solr/servlet/SolrDispatchFilter.doFilter(SolrDispatchFilter.java:417)[inlined]
at 
org/apache/solr/servlet/SolrDispatchFilter.doFilter(SolrDispatchFilter.java:201)[optimized]
at 
org/apache/catalina/core/ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)[inlined]
at 
org/apache/catalina/core/ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)[optimized]
at 
org/apache/catalina/core/StandardWrapperValve.invoke(StandardWrapperValve.java:222)[optimized]
at 
org/apache/catalina/core/StandardContextValve.invoke(StandardContextValve.java:123)[optimized]
at 
org/apache/catalina/core/StandardHostValve.invoke(StandardHostValve.java:171)[optimized]
at 
org/apache/catalina/valves/ErrorReportValve.invoke(ErrorReportValve.java:99)[optimized]
at 
org/apache/catalina/valves/AccessLogValve.invoke(AccessLogValve.java:953)[optimized]
at 
org/apache/catalina/core/StandardEngineValve.invoke(StandardEngineValve.java:118)[optimized]
at 
org/apache/catalina/connector/CoyoteAdapter.service(CoyoteAdapter.java:408)[optimized]
at 
org/apache/coyote/http11/AbstractHttp11Processor.process(AbstractHttp11Processor.java:1023)[optimized]
at 
org/apache/coyote/AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)[optimized]
at 
org/apache/tomcat/util/net/JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)[optimized]
^-- Holding lock: 
org/apache/tomcat/util/net/SocketWrapper@0x496e58810[thin lock]
at 
java/util/concurrent/ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)[inlined]
at 
java/util/concurrent/ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)[optimized]

at java/lang/Thread.run(Thread.java:682)[optimized]
at jrockit/vm/RNI.c2java(J)V(Native Method)


coreZkRegister thread

2015-10-20 Thread Rallavagu

Solr 4.6.1, 4 node cloud with 3 zk

I see the following thread as blocked. Could somebody please help me 
understand what is going on here and how will it impact solr cloud? All 
four of these threads blocked. Thanks.


"coreZkRegister-1-thread-1" id=74 idx=0x108 tid=32162 prio=5 alive, 
parked, native_blocked
-- Parking to wait for: 
java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject@0x11a61daf8

at pthread_cond_wait@@GLIBC_2.3.2+202(:0)@0x3d4180b5ba
at eventTimedWaitNoTransitionImpl+71(event.c:90)@0x7f41a970aba8
at 
syncWaitForSignalNoTransition+65(synchronization.c:28)@0x7f41a989e0b2

at syncWaitForSignal+189(synchronization.c:85)@0x7f41a989e20e
at vmtPark+164(signaling.c:72)@0x7f41a987a165
at jrockit/vm/Locks.park0(J)V(Native Method)
at jrockit/vm/Locks.park(Locks.java:2230)
at sun/misc/Unsafe.park(ZJ)V(Native Method)
at java/util/concurrent/locks/LockSupport.park(LockSupport.java:156)
at 
java/util/concurrent/locks/AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
at 
java/util/concurrent/LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
at 
java/util/concurrent/ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:957)
at 
java/util/concurrent/ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:917)

at java/lang/Thread.run(Thread.java:682)
at jrockit/vm/RNI.c2java(J)V(Native Method)


Re: Help me read Thread

2015-10-16 Thread Rallavagu
One more observation made is that tomcat's acceptor thread for http 
disappears (http-bio-8080-acceptor thread) and due to this no incoming 
connections could be opened on http. During this time ZK potentially 
thinks node is up and shows green from leader.


On 10/13/15 9:17 AM, Erick Erickson wrote:

How heavy is heavy? The proverbial smoking gun here will be messages in any
logs referring to "leader initiated recovery". (note, that's the
message I remember seeing,
it may not be exact).

There's no particular work-around here except to back off the indexing
load. Certainly increasing the
thread pool size allowed this to surface. Also 5.2 has some
significant improvements in this area, see:
https://lucidworks.com/blog/2015/06/10/indexing-performance-solr-5-2-now-twice-fast/

And a lot depends on how you're indexing, batching up updates is a
good thing. If you go to a
multi-shard setup, using SolrJ and CloudSolrServer (CloudSolrClient in
5.x) would help. More
shards would help as well,  but I'd first take a look at the indexing
process and be sure you're
batching up updates.

It's also possible if indexing is a once-a-day process and it fits
with your SLAs to shut off the replicas,
index to the leader, then turn the replicas back on. That's not all
that satisfactory, but I've seen it used.

But with a single shard setup, I really have to ask why indexing at
such a furious rate is
required that you're hitting this. Are you unable to reduce the indexing rate?

Best,
Erick

On Tue, Oct 13, 2015 at 9:08 AM, Rallavagu <rallav...@gmail.com> wrote:

Also, we have increased number of connections per host from default (20) to
100 for http thread pool to communicate with other nodes. Could this have
caused the issues as it can now spin many threads to send updates?


On 10/13/15 8:56 AM, Erick Erickson wrote:


Is this under a very heavy indexing load? There were some
inefficiencies that caused followers to work a lot harder than the
leader, but the leader had to spin off a bunch of threads to send
update to followers. That's fixed int he 5.2 release.

Best,
Erick

On Tue, Oct 13, 2015 at 8:40 AM, Rallavagu <rallav...@gmail.com> wrote:


Please help me understand what is going on with this thread.

Solr 4.6.1, single shard, 4 node cluster, 3 node zk. Running on tomcat
with
500 threads.


There are 47 threads overall and designated leader becomes unresponsive
though shows "green" from cloud perspective. This is causing issues.

particularly,

"   at

org/apache/solr/update/processor/DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1278)[optimized]
  ^-- Holding lock: java/util/LinkedList@0x2ee24e958[thin lock]
  ^-- Holding lock:
org/apache/solr/update/StreamingSolrServers$1@0x2ee24e9c0[biased lock]
  ^-- Holding lock:
org/apache/solr/update/StreamingSolrServers@0x2ee24ea90[biased lock]"



"http-bio-8080-exec-2878" id=5899 idx=0x30c tid=17132 prio=5 alive,
native_blocked, daemon
  at __lll_lock_wait+34(:0)@0x382ba0e262
  at safepointSyncOnPollAccess+167(safepoint.c:83)@0x7f83ae266138
  at trapiNormalHandler+484(traps_posix.c:220)@0x7f83ae29a745
  at _L_unlock_16+44(:0)@0x382ba0f710
  at java/util/LinkedList.peek(LinkedList.java:447)[optimized]
  at

org/apache/solr/client/solrj/impl/ConcurrentUpdateSolrServer.blockUntilFinished(ConcurrentUpdateSolrServer.java:384)[inlined]
  at

org/apache/solr/update/StreamingSolrServers.blockUntilFinished(StreamingSolrServers.java:98)[inlined]
  at

org/apache/solr/update/SolrCmdDistributor.finish(SolrCmdDistributor.java:61)[inlined]
  at

org/apache/solr/update/processor/DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:501)[inlined]
  at

org/apache/solr/update/processor/DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1278)[optimized]
  ^-- Holding lock: java/util/LinkedList@0x2ee24e958[thin lock]
  ^-- Holding lock:
org/apache/solr/update/StreamingSolrServers$1@0x2ee24e9c0[biased lock]
  ^-- Holding lock:
org/apache/solr/update/StreamingSolrServers@0x2ee24ea90[biased lock]
  at

org/apache/solr/handler/ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83)[optimized]
  at

org/apache/solr/handler/RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)[optimized]
  at
org/apache/solr/core/SolrCore.execute(SolrCore.java:1859)[optimized]
  at

org/apache/solr/servlet/SolrDispatchFilter.execute(SolrDispatchFilter.java:721)[inlined]
  at

org/apache/solr/servlet/SolrDispatchFilter.doFilter(SolrDispatchFilter.java:417)[inlined]
  at

org/apache/solr/servlet/SolrDispatchFilter.doFilter(SolrDispatchFilter.java:201)[optimized]
  at

org/apache/catalina/core/ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)[inlined]
  at

org/apache/catalina/core/ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)[optimized]
  at


Help me read Thread

2015-10-13 Thread Rallavagu

Please help me understand what is going on with this thread.

Solr 4.6.1, single shard, 4 node cluster, 3 node zk. Running on tomcat 
with 500 threads.



There are 47 threads overall and designated leader becomes unresponsive 
though shows "green" from cloud perspective. This is causing issues.


particularly,

"   at 
org/apache/solr/update/processor/DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1278)[optimized]

^-- Holding lock: java/util/LinkedList@0x2ee24e958[thin lock]
^-- Holding lock: 
org/apache/solr/update/StreamingSolrServers$1@0x2ee24e9c0[biased lock]
^-- Holding lock: 
org/apache/solr/update/StreamingSolrServers@0x2ee24ea90[biased lock]"




"http-bio-8080-exec-2878" id=5899 idx=0x30c tid=17132 prio=5 alive, 
native_blocked, daemon

at __lll_lock_wait+34(:0)@0x382ba0e262
at safepointSyncOnPollAccess+167(safepoint.c:83)@0x7f83ae266138
at trapiNormalHandler+484(traps_posix.c:220)@0x7f83ae29a745
at _L_unlock_16+44(:0)@0x382ba0f710
at java/util/LinkedList.peek(LinkedList.java:447)[optimized]
at 
org/apache/solr/client/solrj/impl/ConcurrentUpdateSolrServer.blockUntilFinished(ConcurrentUpdateSolrServer.java:384)[inlined]
at 
org/apache/solr/update/StreamingSolrServers.blockUntilFinished(StreamingSolrServers.java:98)[inlined]
at 
org/apache/solr/update/SolrCmdDistributor.finish(SolrCmdDistributor.java:61)[inlined]
at 
org/apache/solr/update/processor/DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:501)[inlined]
at 
org/apache/solr/update/processor/DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1278)[optimized]

^-- Holding lock: java/util/LinkedList@0x2ee24e958[thin lock]
^-- Holding lock: 
org/apache/solr/update/StreamingSolrServers$1@0x2ee24e9c0[biased lock]
^-- Holding lock: 
org/apache/solr/update/StreamingSolrServers@0x2ee24ea90[biased lock]
at 
org/apache/solr/handler/ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83)[optimized]
at 
org/apache/solr/handler/RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)[optimized]

at org/apache/solr/core/SolrCore.execute(SolrCore.java:1859)[optimized]
at 
org/apache/solr/servlet/SolrDispatchFilter.execute(SolrDispatchFilter.java:721)[inlined]
at 
org/apache/solr/servlet/SolrDispatchFilter.doFilter(SolrDispatchFilter.java:417)[inlined]
at 
org/apache/solr/servlet/SolrDispatchFilter.doFilter(SolrDispatchFilter.java:201)[optimized]
at 
org/apache/catalina/core/ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)[inlined]
at 
org/apache/catalina/core/ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)[optimized]
at 
org/apache/catalina/core/StandardWrapperValve.invoke(StandardWrapperValve.java:222)[optimized]
at 
org/apache/catalina/core/StandardContextValve.invoke(StandardContextValve.java:123)[optimized]
at 
org/apache/catalina/core/StandardHostValve.invoke(StandardHostValve.java:171)[optimized]
at 
org/apache/catalina/valves/ErrorReportValve.invoke(ErrorReportValve.java:99)[optimized]
at 
org/apache/catalina/valves/AccessLogValve.invoke(AccessLogValve.java:953)[optimized]
at 
org/apache/catalina/core/StandardEngineValve.invoke(StandardEngineValve.java:118)[optimized]
at 
org/apache/catalina/connector/CoyoteAdapter.service(CoyoteAdapter.java:408)[optimized]
at 
org/apache/coyote/http11/AbstractHttp11Processor.process(AbstractHttp11Processor.java:1023)[optimized]
at 
org/apache/coyote/AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)[optimized]
at 
org/apache/tomcat/util/net/JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)[optimized]
^-- Holding lock: 
org/apache/tomcat/util/net/SocketWrapper@0x2ee6e4aa8[thin lock]
at 
java/util/concurrent/ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)[inlined]
at 
java/util/concurrent/ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)[optimized]

at java/lang/Thread.run(Thread.java:682)[optimized]
at jrockit/vm/RNI.c2java(J)V(Native Method)


Re: Help me read Thread

2015-10-13 Thread Rallavagu
The heavy load of indexing is true. During this time, all other nodes 
are under "recovery" mode and search queries are referred to leader and 
it times out. Is there a temporary work around for this? Thanks.


On 10/13/15 8:56 AM, Erick Erickson wrote:

Is this under a very heavy indexing load? There were some
inefficiencies that caused followers to work a lot harder than the
leader, but the leader had to spin off a bunch of threads to send
update to followers. That's fixed int he 5.2 release.

Best,
Erick

On Tue, Oct 13, 2015 at 8:40 AM, Rallavagu <rallav...@gmail.com> wrote:

Please help me understand what is going on with this thread.

Solr 4.6.1, single shard, 4 node cluster, 3 node zk. Running on tomcat with
500 threads.


There are 47 threads overall and designated leader becomes unresponsive
though shows "green" from cloud perspective. This is causing issues.

particularly,

"   at
org/apache/solr/update/processor/DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1278)[optimized]
 ^-- Holding lock: java/util/LinkedList@0x2ee24e958[thin lock]
 ^-- Holding lock:
org/apache/solr/update/StreamingSolrServers$1@0x2ee24e9c0[biased lock]
 ^-- Holding lock:
org/apache/solr/update/StreamingSolrServers@0x2ee24ea90[biased lock]"



"http-bio-8080-exec-2878" id=5899 idx=0x30c tid=17132 prio=5 alive,
native_blocked, daemon
 at __lll_lock_wait+34(:0)@0x382ba0e262
 at safepointSyncOnPollAccess+167(safepoint.c:83)@0x7f83ae266138
 at trapiNormalHandler+484(traps_posix.c:220)@0x7f83ae29a745
 at _L_unlock_16+44(:0)@0x382ba0f710
 at java/util/LinkedList.peek(LinkedList.java:447)[optimized]
 at
org/apache/solr/client/solrj/impl/ConcurrentUpdateSolrServer.blockUntilFinished(ConcurrentUpdateSolrServer.java:384)[inlined]
 at
org/apache/solr/update/StreamingSolrServers.blockUntilFinished(StreamingSolrServers.java:98)[inlined]
 at
org/apache/solr/update/SolrCmdDistributor.finish(SolrCmdDistributor.java:61)[inlined]
 at
org/apache/solr/update/processor/DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:501)[inlined]
 at
org/apache/solr/update/processor/DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1278)[optimized]
 ^-- Holding lock: java/util/LinkedList@0x2ee24e958[thin lock]
 ^-- Holding lock:
org/apache/solr/update/StreamingSolrServers$1@0x2ee24e9c0[biased lock]
 ^-- Holding lock:
org/apache/solr/update/StreamingSolrServers@0x2ee24ea90[biased lock]
 at
org/apache/solr/handler/ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83)[optimized]
 at
org/apache/solr/handler/RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)[optimized]
 at org/apache/solr/core/SolrCore.execute(SolrCore.java:1859)[optimized]
 at
org/apache/solr/servlet/SolrDispatchFilter.execute(SolrDispatchFilter.java:721)[inlined]
 at
org/apache/solr/servlet/SolrDispatchFilter.doFilter(SolrDispatchFilter.java:417)[inlined]
 at
org/apache/solr/servlet/SolrDispatchFilter.doFilter(SolrDispatchFilter.java:201)[optimized]
 at
org/apache/catalina/core/ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)[inlined]
 at
org/apache/catalina/core/ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)[optimized]
 at
org/apache/catalina/core/StandardWrapperValve.invoke(StandardWrapperValve.java:222)[optimized]
 at
org/apache/catalina/core/StandardContextValve.invoke(StandardContextValve.java:123)[optimized]
 at
org/apache/catalina/core/StandardHostValve.invoke(StandardHostValve.java:171)[optimized]
 at
org/apache/catalina/valves/ErrorReportValve.invoke(ErrorReportValve.java:99)[optimized]
 at
org/apache/catalina/valves/AccessLogValve.invoke(AccessLogValve.java:953)[optimized]
 at
org/apache/catalina/core/StandardEngineValve.invoke(StandardEngineValve.java:118)[optimized]
 at
org/apache/catalina/connector/CoyoteAdapter.service(CoyoteAdapter.java:408)[optimized]
 at
org/apache/coyote/http11/AbstractHttp11Processor.process(AbstractHttp11Processor.java:1023)[optimized]
 at
org/apache/coyote/AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)[optimized]
 at
org/apache/tomcat/util/net/JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)[optimized]
 ^-- Holding lock:
org/apache/tomcat/util/net/SocketWrapper@0x2ee6e4aa8[thin lock]
 at
java/util/concurrent/ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)[inlined]
 at
java/util/concurrent/ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)[optimized]
 at java/lang/Thread.run(Thread.java:682)[optimized]
 at jrockit/vm/RNI.c2java(J)V(Native Method)


Re: Help me read Thread

2015-10-13 Thread Rallavagu
Also, we have increased number of connections per host from default (20) 
to 100 for http thread pool to communicate with other nodes. Could this 
have caused the issues as it can now spin many threads to send updates?


On 10/13/15 8:56 AM, Erick Erickson wrote:

Is this under a very heavy indexing load? There were some
inefficiencies that caused followers to work a lot harder than the
leader, but the leader had to spin off a bunch of threads to send
update to followers. That's fixed int he 5.2 release.

Best,
Erick

On Tue, Oct 13, 2015 at 8:40 AM, Rallavagu <rallav...@gmail.com> wrote:

Please help me understand what is going on with this thread.

Solr 4.6.1, single shard, 4 node cluster, 3 node zk. Running on tomcat with
500 threads.


There are 47 threads overall and designated leader becomes unresponsive
though shows "green" from cloud perspective. This is causing issues.

particularly,

"   at
org/apache/solr/update/processor/DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1278)[optimized]
 ^-- Holding lock: java/util/LinkedList@0x2ee24e958[thin lock]
 ^-- Holding lock:
org/apache/solr/update/StreamingSolrServers$1@0x2ee24e9c0[biased lock]
 ^-- Holding lock:
org/apache/solr/update/StreamingSolrServers@0x2ee24ea90[biased lock]"



"http-bio-8080-exec-2878" id=5899 idx=0x30c tid=17132 prio=5 alive,
native_blocked, daemon
 at __lll_lock_wait+34(:0)@0x382ba0e262
 at safepointSyncOnPollAccess+167(safepoint.c:83)@0x7f83ae266138
 at trapiNormalHandler+484(traps_posix.c:220)@0x7f83ae29a745
 at _L_unlock_16+44(:0)@0x382ba0f710
 at java/util/LinkedList.peek(LinkedList.java:447)[optimized]
 at
org/apache/solr/client/solrj/impl/ConcurrentUpdateSolrServer.blockUntilFinished(ConcurrentUpdateSolrServer.java:384)[inlined]
 at
org/apache/solr/update/StreamingSolrServers.blockUntilFinished(StreamingSolrServers.java:98)[inlined]
 at
org/apache/solr/update/SolrCmdDistributor.finish(SolrCmdDistributor.java:61)[inlined]
 at
org/apache/solr/update/processor/DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:501)[inlined]
 at
org/apache/solr/update/processor/DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1278)[optimized]
 ^-- Holding lock: java/util/LinkedList@0x2ee24e958[thin lock]
 ^-- Holding lock:
org/apache/solr/update/StreamingSolrServers$1@0x2ee24e9c0[biased lock]
 ^-- Holding lock:
org/apache/solr/update/StreamingSolrServers@0x2ee24ea90[biased lock]
 at
org/apache/solr/handler/ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83)[optimized]
 at
org/apache/solr/handler/RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)[optimized]
 at org/apache/solr/core/SolrCore.execute(SolrCore.java:1859)[optimized]
 at
org/apache/solr/servlet/SolrDispatchFilter.execute(SolrDispatchFilter.java:721)[inlined]
 at
org/apache/solr/servlet/SolrDispatchFilter.doFilter(SolrDispatchFilter.java:417)[inlined]
 at
org/apache/solr/servlet/SolrDispatchFilter.doFilter(SolrDispatchFilter.java:201)[optimized]
 at
org/apache/catalina/core/ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)[inlined]
 at
org/apache/catalina/core/ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)[optimized]
 at
org/apache/catalina/core/StandardWrapperValve.invoke(StandardWrapperValve.java:222)[optimized]
 at
org/apache/catalina/core/StandardContextValve.invoke(StandardContextValve.java:123)[optimized]
 at
org/apache/catalina/core/StandardHostValve.invoke(StandardHostValve.java:171)[optimized]
 at
org/apache/catalina/valves/ErrorReportValve.invoke(ErrorReportValve.java:99)[optimized]
 at
org/apache/catalina/valves/AccessLogValve.invoke(AccessLogValve.java:953)[optimized]
 at
org/apache/catalina/core/StandardEngineValve.invoke(StandardEngineValve.java:118)[optimized]
 at
org/apache/catalina/connector/CoyoteAdapter.service(CoyoteAdapter.java:408)[optimized]
 at
org/apache/coyote/http11/AbstractHttp11Processor.process(AbstractHttp11Processor.java:1023)[optimized]
 at
org/apache/coyote/AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)[optimized]
 at
org/apache/tomcat/util/net/JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)[optimized]
 ^-- Holding lock:
org/apache/tomcat/util/net/SocketWrapper@0x2ee6e4aa8[thin lock]
 at
java/util/concurrent/ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)[inlined]
 at
java/util/concurrent/ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)[optimized]
 at java/lang/Thread.run(Thread.java:682)[optimized]
 at jrockit/vm/RNI.c2java(J)V(Native Method)


Re: Help me read Thread

2015-10-13 Thread Rallavagu
The main reason is that the updates are coming from some client 
applications and it is not a controlled indexing process. The controlled 
indexing process works fine (after spending some time to tune it). Will 
definitely look into throttling incoming updates requests and reduce the 
number of connections per host. Thanks for the insight.


On 10/13/15 9:17 AM, Erick Erickson wrote:

How heavy is heavy? The proverbial smoking gun here will be messages in any
logs referring to "leader initiated recovery". (note, that's the
message I remember seeing,
it may not be exact).

There's no particular work-around here except to back off the indexing
load. Certainly increasing the
thread pool size allowed this to surface. Also 5.2 has some
significant improvements in this area, see:
https://lucidworks.com/blog/2015/06/10/indexing-performance-solr-5-2-now-twice-fast/

And a lot depends on how you're indexing, batching up updates is a
good thing. If you go to a
multi-shard setup, using SolrJ and CloudSolrServer (CloudSolrClient in
5.x) would help. More
shards would help as well,  but I'd first take a look at the indexing
process and be sure you're
batching up updates.

It's also possible if indexing is a once-a-day process and it fits
with your SLAs to shut off the replicas,
index to the leader, then turn the replicas back on. That's not all
that satisfactory, but I've seen it used.

But with a single shard setup, I really have to ask why indexing at
such a furious rate is
required that you're hitting this. Are you unable to reduce the indexing rate?

Best,
Erick

On Tue, Oct 13, 2015 at 9:08 AM, Rallavagu <rallav...@gmail.com> wrote:

Also, we have increased number of connections per host from default (20) to
100 for http thread pool to communicate with other nodes. Could this have
caused the issues as it can now spin many threads to send updates?


On 10/13/15 8:56 AM, Erick Erickson wrote:


Is this under a very heavy indexing load? There were some
inefficiencies that caused followers to work a lot harder than the
leader, but the leader had to spin off a bunch of threads to send
update to followers. That's fixed int he 5.2 release.

Best,
Erick

On Tue, Oct 13, 2015 at 8:40 AM, Rallavagu <rallav...@gmail.com> wrote:


Please help me understand what is going on with this thread.

Solr 4.6.1, single shard, 4 node cluster, 3 node zk. Running on tomcat
with
500 threads.


There are 47 threads overall and designated leader becomes unresponsive
though shows "green" from cloud perspective. This is causing issues.

particularly,

"   at

org/apache/solr/update/processor/DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1278)[optimized]
  ^-- Holding lock: java/util/LinkedList@0x2ee24e958[thin lock]
  ^-- Holding lock:
org/apache/solr/update/StreamingSolrServers$1@0x2ee24e9c0[biased lock]
  ^-- Holding lock:
org/apache/solr/update/StreamingSolrServers@0x2ee24ea90[biased lock]"



"http-bio-8080-exec-2878" id=5899 idx=0x30c tid=17132 prio=5 alive,
native_blocked, daemon
  at __lll_lock_wait+34(:0)@0x382ba0e262
  at safepointSyncOnPollAccess+167(safepoint.c:83)@0x7f83ae266138
  at trapiNormalHandler+484(traps_posix.c:220)@0x7f83ae29a745
  at _L_unlock_16+44(:0)@0x382ba0f710
  at java/util/LinkedList.peek(LinkedList.java:447)[optimized]
  at

org/apache/solr/client/solrj/impl/ConcurrentUpdateSolrServer.blockUntilFinished(ConcurrentUpdateSolrServer.java:384)[inlined]
  at

org/apache/solr/update/StreamingSolrServers.blockUntilFinished(StreamingSolrServers.java:98)[inlined]
  at

org/apache/solr/update/SolrCmdDistributor.finish(SolrCmdDistributor.java:61)[inlined]
  at

org/apache/solr/update/processor/DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:501)[inlined]
  at

org/apache/solr/update/processor/DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1278)[optimized]
  ^-- Holding lock: java/util/LinkedList@0x2ee24e958[thin lock]
  ^-- Holding lock:
org/apache/solr/update/StreamingSolrServers$1@0x2ee24e9c0[biased lock]
  ^-- Holding lock:
org/apache/solr/update/StreamingSolrServers@0x2ee24ea90[biased lock]
  at

org/apache/solr/handler/ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:83)[optimized]
  at

org/apache/solr/handler/RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)[optimized]
  at
org/apache/solr/core/SolrCore.execute(SolrCore.java:1859)[optimized]
  at

org/apache/solr/servlet/SolrDispatchFilter.execute(SolrDispatchFilter.java:721)[inlined]
  at

org/apache/solr/servlet/SolrDispatchFilter.doFilter(SolrDispatchFilter.java:417)[inlined]
  at

org/apache/solr/servlet/SolrDispatchFilter.doFilter(SolrDispatchFilter.java:201)[optimized]
  at

org/apache/catalina/core/ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)[inlined]
  at

org/apache/catalina/core/Applica

solr cloud recovery and search

2015-10-13 Thread Rallavagu
It appears that when a node that is in "recovery" mode queried it would 
defer the query to leader instead of serving from locally. Is this the 
expected behavior? Thanks.


Re: solr cloud recovery and search

2015-10-13 Thread Rallavagu

Great. Thanks Erick.

On 10/13/15 5:39 PM, Erick Erickson wrote:

More than expected, guaranteed. As long as at least one replica in a
shard is active, all queries should succeed. Maybe more slowly, but
they should succeed.

Best,
Erick



On Tue, Oct 13, 2015 at 4:25 PM, Rallavagu <rallav...@gmail.com> wrote:

It appears that when a node that is in "recovery" mode queried it would
defer the query to leader instead of serving from locally. Is this the
expected behavior? Thanks.


Re: tlog replay

2015-10-08 Thread Rallavagu

As a follow up.

Eventually the tlog file is disappeared (could not track the time it 
took to clear out completely). However, following messages were noticed 
in follower's log.


5120638 [recoveryExecutor-14-thread-2] WARN 
org.apache.solr.update.UpdateLog  – Starting log replay tlog


On 10/7/15 8:29 PM, Erick Erickson wrote:

The only way I can account for such a large file off the top of my
head is if, for some reason,
the Solr on the node somehow was failing to index documents and kept
adding them to the
log for a lnnn time. But how that would happen without the
node being in recovery
mode I'm not sure. I mean the Solr instance would have to be healthy
otherwise but just not
able to index docs which makes no sense.

The usual question here is whether there were any messages in the solr
log file indicating
problems while this built up.

tlogs will build up to very large sizes if there are very long hard
commit intervals, but I don't
see how that interval would be different on the leader and follower.

So color me puzzled.

Best,
Erick

On Wed, Oct 7, 2015 at 8:09 PM, Rallavagu <rallav...@gmail.com> wrote:

Thanks Erick.

Eventually, followers caught up but the 14G tlog file still persists and
they are healthy. Is there anything to look for? Will monitor and see how
long will it take before it disappears.

Evaluating move to Solr 5.3.

On 10/7/15 7:51 PM, Erick Erickson wrote:


Uhm, that's very weird. Updates are not applied from the tlog. Rather the
raw doc is forwarded to the replica which both indexes the doc and
writes it to the local tlog. So having a 14G tlog on a follower but a
small
tlog on the leader is definitely strange, especially if it persists over
time.

I assume the follower is healthy? And does this very large tlog disappear
after a while? I'd expect it to be aged out after a few commits of > 100
docs.

All that said, there have been a LOT of improvements since 4.6, so it
might
be something that's been addressed in the intervening time.

Best,
Erick



On Wed, Oct 7, 2015 at 7:39 PM, Rallavagu <rallav...@gmail.com> wrote:


Solr 4.6.1, single shard, 4 node cloud, 3 node zk

Like to understand the behavior better when large number of updates
happen
on leader and it generates huge tlog (14G sometimes in my case) on other
nodes. At the same time leader's tlog is few KB. So, what is the rate at
which the changes from transaction log are applied at nodes? The
autocommit
interval is set to 15 seconds after going through

https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Thanks


Re: tlog replay

2015-10-08 Thread Rallavagu

Erick,

Actually, configured autocommit to 15 seconds and openSearcher is set to 
false. Neither 2 nor 3 happened. However, softCommit is set to 10 min.



   ${solr.autoCommit.maxTime:15000}
   false
 

Working on upgrading to 5.3 which will take a bit of time and trying to 
get this under control until that time.


On 10/8/15 5:28 PM, Erick Erickson wrote:

right, so the scenario is
1> somehow you didn't do a hard commit (openSearcher=true or false
doesn't matter) for a really long time while indexing.
2> Solr abnormally terminated.
3> When Solr started back up it replayed the entire log.

How <1> happened is the mystery though. With a hard commit
(autocommit) interval of 15 seconds that's weird.

The message indicates something like that happened. In very recent
Solr versions, the log will have
progress messages printed that'll help see this is happening.

Best,
Erick

On Thu, Oct 8, 2015 at 12:23 PM, Rallavagu <rallav...@gmail.com> wrote:

As a follow up.

Eventually the tlog file is disappeared (could not track the time it took to
clear out completely). However, following messages were noticed in
follower's log.

5120638 [recoveryExecutor-14-thread-2] WARN org.apache.solr.update.UpdateLog
– Starting log replay tlog

On 10/7/15 8:29 PM, Erick Erickson wrote:


The only way I can account for such a large file off the top of my
head is if, for some reason,
the Solr on the node somehow was failing to index documents and kept
adding them to the
log for a lnnn time. But how that would happen without the
node being in recovery
mode I'm not sure. I mean the Solr instance would have to be healthy
otherwise but just not
able to index docs which makes no sense.

The usual question here is whether there were any messages in the solr
log file indicating
problems while this built up.

tlogs will build up to very large sizes if there are very long hard
commit intervals, but I don't
see how that interval would be different on the leader and follower.

So color me puzzled.

Best,
Erick

On Wed, Oct 7, 2015 at 8:09 PM, Rallavagu <rallav...@gmail.com> wrote:


Thanks Erick.

Eventually, followers caught up but the 14G tlog file still persists and
they are healthy. Is there anything to look for? Will monitor and see how
long will it take before it disappears.

Evaluating move to Solr 5.3.

On 10/7/15 7:51 PM, Erick Erickson wrote:



Uhm, that's very weird. Updates are not applied from the tlog. Rather
the
raw doc is forwarded to the replica which both indexes the doc and
writes it to the local tlog. So having a 14G tlog on a follower but a
small
tlog on the leader is definitely strange, especially if it persists over
time.

I assume the follower is healthy? And does this very large tlog
disappear
after a while? I'd expect it to be aged out after a few commits of > 100
docs.

All that said, there have been a LOT of improvements since 4.6, so it
might
be something that's been addressed in the intervening time.

Best,
Erick



On Wed, Oct 7, 2015 at 7:39 PM, Rallavagu <rallav...@gmail.com> wrote:



Solr 4.6.1, single shard, 4 node cloud, 3 node zk

Like to understand the behavior better when large number of updates
happen
on leader and it generates huge tlog (14G sometimes in my case) on
other
nodes. At the same time leader's tlog is few KB. So, what is the rate
at
which the changes from transaction log are applied at nodes? The
autocommit
interval is set to 15 seconds after going through


https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Thanks


Re: tlog replay

2015-10-07 Thread Rallavagu

Thanks Erick.

Eventually, followers caught up but the 14G tlog file still persists and 
they are healthy. Is there anything to look for? Will monitor and see 
how long will it take before it disappears.


Evaluating move to Solr 5.3.

On 10/7/15 7:51 PM, Erick Erickson wrote:

Uhm, that's very weird. Updates are not applied from the tlog. Rather the
raw doc is forwarded to the replica which both indexes the doc and
writes it to the local tlog. So having a 14G tlog on a follower but a small
tlog on the leader is definitely strange, especially if it persists over time.

I assume the follower is healthy? And does this very large tlog disappear
after a while? I'd expect it to be aged out after a few commits of > 100 docs.

All that said, there have been a LOT of improvements since 4.6, so it might
be something that's been addressed in the intervening time.

Best,
Erick



On Wed, Oct 7, 2015 at 7:39 PM, Rallavagu <rallav...@gmail.com> wrote:

Solr 4.6.1, single shard, 4 node cloud, 3 node zk

Like to understand the behavior better when large number of updates happen
on leader and it generates huge tlog (14G sometimes in my case) on other
nodes. At the same time leader's tlog is few KB. So, what is the rate at
which the changes from transaction log are applied at nodes? The autocommit
interval is set to 15 seconds after going through
https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Thanks


tlog replay

2015-10-07 Thread Rallavagu

Solr 4.6.1, single shard, 4 node cloud, 3 node zk

Like to understand the behavior better when large number of updates 
happen on leader and it generates huge tlog (14G sometimes in my case) 
on other nodes. At the same time leader's tlog is few KB. So, what is 
the rate at which the changes from transaction log are applied at nodes? 
The autocommit interval is set to 15 seconds after going through 
https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/


Thanks


Re: Recovery Thread Blocked

2015-10-06 Thread Rallavagu
Mark - currently 5.3 is being evaluated for upgrade purposes and 
hopefully get there sooner. Meanwhile, following exception is noted from 
logs during updates


ERROR org.apache.solr.update.CommitTracker  – auto commit 
error...:java.lang.IllegalStateException: this writer hit an 
OutOfMemoryError; cannot commit
at 
org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2807)
at 
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2984)
at 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:559)

at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:440)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:896)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919)

at java.lang.Thread.run(Thread.java:682)

Considering the fact that the machine is configured with 48G (24G for 
JVM which will be reduced in future) wondering how would it still go out 
of memory. For memory mapped index files the remaining 24G or what is 
available off of it should be available. Looking at the lsof output the 
memory mapped files were around 10G.


Thanks.


On 10/5/15 5:41 PM, Mark Miller wrote:

I'd make two guess:

Looks like you are using Jrocket? I don't think that is common or well
tested at this point.

There are a billion or so bug fixes from 4.6.1 to 5.3.2. Given the pace of
SolrCloud, you are dealing with something fairly ancient and so it will be
harder to find help with older issues most likely.

- Mark

On Mon, Oct 5, 2015 at 12:46 PM Rallavagu <rallav...@gmail.com> wrote:


Any takers on this? Any kinda clue would help. Thanks.

On 10/4/15 10:14 AM, Rallavagu wrote:

As there were no responses so far, I assume that this is not a very
common issue that folks come across. So, I went into source (4.6.1) to
see if I can figure out what could be the cause.


The thread that is locking is in this block of code

synchronized (recoveryLock) {
// to be air tight we must also check after lock
if (cc.isShutDown()) {
  log.warn("Skipping recovery because Solr is shutdown");
  return;
}
log.info("Running recovery - first canceling any ongoing

recovery");

cancelRecovery();

while (recoveryRunning) {
  try {
recoveryLock.wait(1000);
  } catch (InterruptedException e) {

  }
  // check again for those that were waiting
  if (cc.isShutDown()) {
log.warn("Skipping recovery because Solr is shutdown");
return;
  }
  if (closed) return;
}

Subsequently, the thread will get into cancelRecovery method as below,

public void cancelRecovery() {
  synchronized (recoveryLock) {
if (recoveryStrat != null && recoveryRunning) {
  recoveryStrat.close();
  while (true) {
try {
  recoveryStrat.join();
} catch (InterruptedException e) {
  // not interruptible - keep waiting
  continue;
}
break;
  }

  recoveryRunning = false;
  recoveryLock.notifyAll();
}
  }
}

As per the stack trace "recoveryStrat.join()" is where things are
holding up.

I wonder why/how cancelRecovery would take time so around 870 threads
would be waiting on. Is it possible that ZK is not responding or
something else like Operating System resources could cause this? Thanks.


On 10/2/15 4:17 PM, Rallavagu wrote:

Here is the stack trace of the thread that is holding the lock.


"Thread-55266" id=77142 idx=0xc18 tid=992 prio=5 alive, waiting,
native_blocked, daemon
  -- Waiting for notification on:
org/apache/solr/cloud/RecoveryStrategy@0x3f34e8480[fat lock]
  at pthread_cond_wait@@GLIBC_2.3.2+202(:0)@0x3d4180b5ba
  at eventTimedWaitNoTransitionImpl+71(event.c:90)@0x7ff3133b6ba8
  at
syncWaitForSignalNoTransition+65(synchronization.c:28)@0x7ff31354a0b2
  at syncWaitForSignal+189(synchronization.c:85)@0x7ff31354a20e
  at syncWaitForJavaSignal+38(synchronization.c:93)@0x7ff31354a327
  at


RJNI_jrockit_vm_Threads_waitForNotifySignal+73(rnithreads.c:72)@0x7ff31351939a



  at
jrockit/vm/Threads.waitForNotifySignal(JLjava/lang/Object;)Z(Native
Method)
  at java/lang/Object.wait(J)V(Native Method)
  at java/lang/Thread.join(Thread.java:1206)
  ^-- Lock released while waiting:
org/apache/solr/cloud/RecoveryStrategy@0x3f34e8480[fat lock]
  at ja

Re: Recovery Thread Blocked

2015-10-06 Thread Rallavagu
GC logging shows normal. The "OutOfMemoryError" appears to be pertaining 
to a thread but not to JVM.


On 10/6/15 1:07 PM, Mark Miller wrote:

That amount of RAM can easily be eaten up depending on your sorting,
faceting, data.

Do you have gc logging enabled? That should describe what is happening with
the heap.

- Mark

On Tue, Oct 6, 2015 at 4:04 PM Rallavagu <rallav...@gmail.com> wrote:


Mark - currently 5.3 is being evaluated for upgrade purposes and
hopefully get there sooner. Meanwhile, following exception is noted from
logs during updates

ERROR org.apache.solr.update.CommitTracker  – auto commit
error...:java.lang.IllegalStateException: this writer hit an
OutOfMemoryError; cannot commit
  at

org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2807)
  at
org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2984)
  at

org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:559)
  at
org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
  at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:440)
  at

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
  at

java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
  at

java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:896)
  at

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919)
  at java.lang.Thread.run(Thread.java:682)

Considering the fact that the machine is configured with 48G (24G for
JVM which will be reduced in future) wondering how would it still go out
of memory. For memory mapped index files the remaining 24G or what is
available off of it should be available. Looking at the lsof output the
memory mapped files were around 10G.

Thanks.


On 10/5/15 5:41 PM, Mark Miller wrote:

I'd make two guess:

Looks like you are using Jrocket? I don't think that is common or well
tested at this point.

There are a billion or so bug fixes from 4.6.1 to 5.3.2. Given the pace

of

SolrCloud, you are dealing with something fairly ancient and so it will

be

harder to find help with older issues most likely.

- Mark

On Mon, Oct 5, 2015 at 12:46 PM Rallavagu <rallav...@gmail.com> wrote:


Any takers on this? Any kinda clue would help. Thanks.

On 10/4/15 10:14 AM, Rallavagu wrote:

As there were no responses so far, I assume that this is not a very
common issue that folks come across. So, I went into source (4.6.1) to
see if I can figure out what could be the cause.


The thread that is locking is in this block of code

synchronized (recoveryLock) {
 // to be air tight we must also check after lock
 if (cc.isShutDown()) {
   log.warn("Skipping recovery because Solr is shutdown");
   return;
 }
 log.info("Running recovery - first canceling any ongoing

recovery");

 cancelRecovery();

 while (recoveryRunning) {
   try {
 recoveryLock.wait(1000);
   } catch (InterruptedException e) {

   }
   // check again for those that were waiting
   if (cc.isShutDown()) {
 log.warn("Skipping recovery because Solr is shutdown");
 return;
   }
   if (closed) return;
 }

Subsequently, the thread will get into cancelRecovery method as below,

public void cancelRecovery() {
   synchronized (recoveryLock) {
 if (recoveryStrat != null && recoveryRunning) {
   recoveryStrat.close();
   while (true) {
 try {
   recoveryStrat.join();
 } catch (InterruptedException e) {
   // not interruptible - keep waiting
   continue;
 }
 break;
   }

   recoveryRunning = false;
   recoveryLock.notifyAll();
 }
   }
 }

As per the stack trace "recoveryStrat.join()" is where things are
holding up.

I wonder why/how cancelRecovery would take time so around 870 threads
would be waiting on. Is it possible that ZK is not responding or
something else like Operating System resources could cause this?

Thanks.



On 10/2/15 4:17 PM, Rallavagu wrote:

Here is the stack trace of the thread that is holding the lock.


"Thread-55266" id=77142 idx=0xc18 tid=992 prio=5 alive, waiting,
native_blocked, daemon
   -- Waiting for notification on:
org/apache/solr/cloud/RecoveryStrategy@0x3f34e8480[fat lock]
   at pthread_cond_wait@@GLIBC_2.3.2+202(:0)@0x3d4180b5ba
   at eventTimedWaitNoTransitionImpl+71(event.c:90)@0x7ff3133b6ba8
   at
syncWaitForSignalNoTransition+65(synchronization.c:28)@0x7ff31354a0b2
   at syncWaitForSignal+189(synchronization.c:85)@0x7ff3

Re: Recovery Thread Blocked

2015-10-06 Thread Rallavagu

It is java thread though. Does it need increasing OS level threads?

On 10/6/15 6:21 PM, Mark Miller wrote:

If it's a thread and you have plenty of RAM and the heap is fine, have you
checked raising OS thread limits?

- Mark

On Tue, Oct 6, 2015 at 4:54 PM Rallavagu <rallav...@gmail.com> wrote:


GC logging shows normal. The "OutOfMemoryError" appears to be pertaining
to a thread but not to JVM.

On 10/6/15 1:07 PM, Mark Miller wrote:

That amount of RAM can easily be eaten up depending on your sorting,
faceting, data.

Do you have gc logging enabled? That should describe what is happening

with

the heap.

- Mark

On Tue, Oct 6, 2015 at 4:04 PM Rallavagu <rallav...@gmail.com> wrote:


Mark - currently 5.3 is being evaluated for upgrade purposes and
hopefully get there sooner. Meanwhile, following exception is noted from
logs during updates

ERROR org.apache.solr.update.CommitTracker  – auto commit
error...:java.lang.IllegalStateException: this writer hit an
OutOfMemoryError; cannot commit
   at



org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2807)

   at


org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2984)

   at



org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:559)

   at
org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
   at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:440)
   at



java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)

   at



java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)

   at



java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:896)

   at



java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919)

   at java.lang.Thread.run(Thread.java:682)

Considering the fact that the machine is configured with 48G (24G for
JVM which will be reduced in future) wondering how would it still go out
of memory. For memory mapped index files the remaining 24G or what is
available off of it should be available. Looking at the lsof output the
memory mapped files were around 10G.

Thanks.


On 10/5/15 5:41 PM, Mark Miller wrote:

I'd make two guess:

Looks like you are using Jrocket? I don't think that is common or well
tested at this point.

There are a billion or so bug fixes from 4.6.1 to 5.3.2. Given the pace

of

SolrCloud, you are dealing with something fairly ancient and so it will

be

harder to find help with older issues most likely.

- Mark

On Mon, Oct 5, 2015 at 12:46 PM Rallavagu <rallav...@gmail.com> wrote:


Any takers on this? Any kinda clue would help. Thanks.

On 10/4/15 10:14 AM, Rallavagu wrote:

As there were no responses so far, I assume that this is not a very
common issue that folks come across. So, I went into source (4.6.1)

to

see if I can figure out what could be the cause.


The thread that is locking is in this block of code

synchronized (recoveryLock) {
  // to be air tight we must also check after lock
  if (cc.isShutDown()) {
log.warn("Skipping recovery because Solr is shutdown");
return;
  }
  log.info("Running recovery - first canceling any ongoing

recovery");

  cancelRecovery();

  while (recoveryRunning) {
try {
  recoveryLock.wait(1000);
} catch (InterruptedException e) {

}
// check again for those that were waiting
if (cc.isShutDown()) {
  log.warn("Skipping recovery because Solr is shutdown");
  return;
}
if (closed) return;
  }

Subsequently, the thread will get into cancelRecovery method as

below,


public void cancelRecovery() {
synchronized (recoveryLock) {
  if (recoveryStrat != null && recoveryRunning) {
recoveryStrat.close();
while (true) {
  try {
recoveryStrat.join();
  } catch (InterruptedException e) {
// not interruptible - keep waiting
continue;
  }
  break;
}

recoveryRunning = false;
recoveryLock.notifyAll();
  }
}
  }

As per the stack trace "recoveryStrat.join()" is where things are
holding up.

I wonder why/how cancelRecovery would take time so around 870 threads
would be waiting on. Is it possible that ZK is not responding or
something else like Operating System resources could cause this?

Thanks.



On 10/2/15 4:17 PM, Rallavagu wrote:

Here is the stack trace of the thread that is holding the lock.


"Thread-55266" id=77142 idx=0xc18 tid=992 prio=5 alive, waiting,
native_blocked,

Re: Recovery Thread Blocked

2015-10-05 Thread Rallavagu

Any takers on this? Any kinda clue would help. Thanks.

On 10/4/15 10:14 AM, Rallavagu wrote:

As there were no responses so far, I assume that this is not a very
common issue that folks come across. So, I went into source (4.6.1) to
see if I can figure out what could be the cause.


The thread that is locking is in this block of code

synchronized (recoveryLock) {
   // to be air tight we must also check after lock
   if (cc.isShutDown()) {
 log.warn("Skipping recovery because Solr is shutdown");
 return;
   }
   log.info("Running recovery - first canceling any ongoing recovery");
   cancelRecovery();

   while (recoveryRunning) {
 try {
   recoveryLock.wait(1000);
 } catch (InterruptedException e) {

 }
 // check again for those that were waiting
 if (cc.isShutDown()) {
   log.warn("Skipping recovery because Solr is shutdown");
   return;
 }
 if (closed) return;
   }

Subsequently, the thread will get into cancelRecovery method as below,

public void cancelRecovery() {
 synchronized (recoveryLock) {
   if (recoveryStrat != null && recoveryRunning) {
 recoveryStrat.close();
 while (true) {
   try {
 recoveryStrat.join();
   } catch (InterruptedException e) {
 // not interruptible - keep waiting
 continue;
   }
   break;
 }

 recoveryRunning = false;
 recoveryLock.notifyAll();
   }
 }
   }

As per the stack trace "recoveryStrat.join()" is where things are
holding up.

I wonder why/how cancelRecovery would take time so around 870 threads
would be waiting on. Is it possible that ZK is not responding or
something else like Operating System resources could cause this? Thanks.


On 10/2/15 4:17 PM, Rallavagu wrote:

Here is the stack trace of the thread that is holding the lock.


"Thread-55266" id=77142 idx=0xc18 tid=992 prio=5 alive, waiting,
native_blocked, daemon
 -- Waiting for notification on:
org/apache/solr/cloud/RecoveryStrategy@0x3f34e8480[fat lock]
 at pthread_cond_wait@@GLIBC_2.3.2+202(:0)@0x3d4180b5ba
 at eventTimedWaitNoTransitionImpl+71(event.c:90)@0x7ff3133b6ba8
 at
syncWaitForSignalNoTransition+65(synchronization.c:28)@0x7ff31354a0b2
 at syncWaitForSignal+189(synchronization.c:85)@0x7ff31354a20e
 at syncWaitForJavaSignal+38(synchronization.c:93)@0x7ff31354a327
 at
RJNI_jrockit_vm_Threads_waitForNotifySignal+73(rnithreads.c:72)@0x7ff31351939a


 at
jrockit/vm/Threads.waitForNotifySignal(JLjava/lang/Object;)Z(Native
Method)
 at java/lang/Object.wait(J)V(Native Method)
 at java/lang/Thread.join(Thread.java:1206)
 ^-- Lock released while waiting:
org/apache/solr/cloud/RecoveryStrategy@0x3f34e8480[fat lock]
 at java/lang/Thread.join(Thread.java:1259)
 at
org/apache/solr/update/DefaultSolrCoreState.cancelRecovery(DefaultSolrCoreState.java:331)


 ^-- Holding lock: java/lang/Object@0x114d8dd00[recursive]
 at
org/apache/solr/update/DefaultSolrCoreState.doRecovery(DefaultSolrCoreState.java:297)


 ^-- Holding lock: java/lang/Object@0x114d8dd00[fat lock]
 at
org/apache/solr/handler/admin/CoreAdminHandler$2.run(CoreAdminHandler.java:770)


 at jrockit/vm/RNI.c2java(J)V(Native Method)


Stack trace of one of the 870 threads that is waiting for the lock to be
released.

"Thread-55489" id=77520 idx=0xebc tid=1494 prio=5 alive, blocked,
native_blocked, daemon
 -- Blocked trying to get lock: java/lang/Object@0x114d8dd00[fat
lock]
 at pthread_cond_wait@@GLIBC_2.3.2+202(:0)@0x3d4180b5ba
 at eventTimedWaitNoTransitionImpl+71(event.c:90)@0x7ff3133b6ba8
 at
syncWaitForSignalNoTransition+65(synchronization.c:28)@0x7ff31354a0b2
 at syncWaitForSignal+189(synchronization.c:85)@0x7ff31354a20e
 at syncWaitForJavaSignal+38(synchronization.c:93)@0x7ff31354a327
 at jrockit/vm/Threads.waitForUnblockSignal()V(Native Method)
 at jrockit/vm/Locks.fatLockBlockOrSpin(Locks.java:1411)[optimized]
 at jrockit/vm/Locks.lockFat(Locks.java:1512)[optimized]
 at
jrockit/vm/Locks.monitorEnterSecondStageHard(Locks.java:1054)[optimized]
 at
jrockit/vm/Locks.monitorEnterSecondStage(Locks.java:1005)[optimized]
 at jrockit/vm/Locks.monitorEnter(Locks.java:2179)[optimized]
 at
org/apache/solr/update/DefaultSolrCoreState.doRecovery(DefaultSolrCoreState.java:290)


 at
org/apache/solr/handler/admin/CoreAdminHandler$2.run(CoreAdminHandler.java:770)


 at jrockit/vm/RNI.c2java(J)V(Native Method)

On 10/2/15 4:12 PM, Rallavagu wrote:

Solr 4.6.1 on Tomcat 7, single shard 4 node cloud with 3 node zookeeper

During updates, some nodes are going very high cpu and becomes
unavailable. The thread dump shows the following thread is blocked 870
threads which explains high CPU. Any clues on where to lo

Re: Recovery Thread Blocked

2015-10-04 Thread Rallavagu
As there were no responses so far, I assume that this is not a very 
common issue that folks come across. So, I went into source (4.6.1) to 
see if I can figure out what could be the cause.



The thread that is locking is in this block of code

synchronized (recoveryLock) {
  // to be air tight we must also check after lock
  if (cc.isShutDown()) {
log.warn("Skipping recovery because Solr is shutdown");
return;
  }
  log.info("Running recovery - first canceling any ongoing recovery");
  cancelRecovery();

  while (recoveryRunning) {
try {
  recoveryLock.wait(1000);
} catch (InterruptedException e) {

}
// check again for those that were waiting
if (cc.isShutDown()) {
  log.warn("Skipping recovery because Solr is shutdown");
  return;
}
if (closed) return;
  }

Subsequently, the thread will get into cancelRecovery method as below,

public void cancelRecovery() {
synchronized (recoveryLock) {
  if (recoveryStrat != null && recoveryRunning) {
recoveryStrat.close();
while (true) {
  try {
recoveryStrat.join();
  } catch (InterruptedException e) {
// not interruptible - keep waiting
continue;
  }
  break;
}

recoveryRunning = false;
recoveryLock.notifyAll();
  }
}
  }

As per the stack trace "recoveryStrat.join()" is where things are 
holding up.


I wonder why/how cancelRecovery would take time so around 870 threads 
would be waiting on. Is it possible that ZK is not responding or 
something else like Operating System resources could cause this? Thanks.



On 10/2/15 4:17 PM, Rallavagu wrote:

Here is the stack trace of the thread that is holding the lock.


"Thread-55266" id=77142 idx=0xc18 tid=992 prio=5 alive, waiting,
native_blocked, daemon
 -- Waiting for notification on:
org/apache/solr/cloud/RecoveryStrategy@0x3f34e8480[fat lock]
 at pthread_cond_wait@@GLIBC_2.3.2+202(:0)@0x3d4180b5ba
 at eventTimedWaitNoTransitionImpl+71(event.c:90)@0x7ff3133b6ba8
 at
syncWaitForSignalNoTransition+65(synchronization.c:28)@0x7ff31354a0b2
 at syncWaitForSignal+189(synchronization.c:85)@0x7ff31354a20e
 at syncWaitForJavaSignal+38(synchronization.c:93)@0x7ff31354a327
 at
RJNI_jrockit_vm_Threads_waitForNotifySignal+73(rnithreads.c:72)@0x7ff31351939a

 at
jrockit/vm/Threads.waitForNotifySignal(JLjava/lang/Object;)Z(Native Method)
 at java/lang/Object.wait(J)V(Native Method)
 at java/lang/Thread.join(Thread.java:1206)
 ^-- Lock released while waiting:
org/apache/solr/cloud/RecoveryStrategy@0x3f34e8480[fat lock]
 at java/lang/Thread.join(Thread.java:1259)
 at
org/apache/solr/update/DefaultSolrCoreState.cancelRecovery(DefaultSolrCoreState.java:331)

 ^-- Holding lock: java/lang/Object@0x114d8dd00[recursive]
 at
org/apache/solr/update/DefaultSolrCoreState.doRecovery(DefaultSolrCoreState.java:297)

 ^-- Holding lock: java/lang/Object@0x114d8dd00[fat lock]
 at
org/apache/solr/handler/admin/CoreAdminHandler$2.run(CoreAdminHandler.java:770)

 at jrockit/vm/RNI.c2java(J)V(Native Method)


Stack trace of one of the 870 threads that is waiting for the lock to be
released.

"Thread-55489" id=77520 idx=0xebc tid=1494 prio=5 alive, blocked,
native_blocked, daemon
 -- Blocked trying to get lock: java/lang/Object@0x114d8dd00[fat lock]
 at pthread_cond_wait@@GLIBC_2.3.2+202(:0)@0x3d4180b5ba
 at eventTimedWaitNoTransitionImpl+71(event.c:90)@0x7ff3133b6ba8
 at
syncWaitForSignalNoTransition+65(synchronization.c:28)@0x7ff31354a0b2
 at syncWaitForSignal+189(synchronization.c:85)@0x7ff31354a20e
 at syncWaitForJavaSignal+38(synchronization.c:93)@0x7ff31354a327
 at jrockit/vm/Threads.waitForUnblockSignal()V(Native Method)
 at jrockit/vm/Locks.fatLockBlockOrSpin(Locks.java:1411)[optimized]
 at jrockit/vm/Locks.lockFat(Locks.java:1512)[optimized]
 at
jrockit/vm/Locks.monitorEnterSecondStageHard(Locks.java:1054)[optimized]
 at
jrockit/vm/Locks.monitorEnterSecondStage(Locks.java:1005)[optimized]
 at jrockit/vm/Locks.monitorEnter(Locks.java:2179)[optimized]
 at
org/apache/solr/update/DefaultSolrCoreState.doRecovery(DefaultSolrCoreState.java:290)

 at
org/apache/solr/handler/admin/CoreAdminHandler$2.run(CoreAdminHandler.java:770)

 at jrockit/vm/RNI.c2java(J)V(Native Method)

On 10/2/15 4:12 PM, Rallavagu wrote:

Solr 4.6.1 on Tomcat 7, single shard 4 node cloud with 3 node zookeeper

During updates, some nodes are going very high cpu and becomes
unavailable. The thread dump shows the following thread is blocked 870
threads which explains high CPU. Any clues on where to look?

"Thread-56848" id=79207 idx=0x38 tid=3169 prio=5 alive, blocked,
native_blocked, daemon
 -- Blocked trying

Re: Zk and Solr Cloud

2015-10-02 Thread Rallavagu

Thanks Shawn.

Right. That is a great insight into the issue. We ended up clearing the 
overseer queue and then cloud became normal.


We were running Solr indexing process and wondering if that caused the 
queue to grow. Will Solr (leader) add a work entry to zookeeper for 
every update if not what are those work entries?


Thanks

On 10/1/15 10:58 PM, Shawn Heisey wrote:

On 10/1/2015 1:26 PM, Rallavagu wrote:

Solr 4.6.1 single shard with 4 nodes. Zookeeper 3.4.5 ensemble of 3.

See following errors in ZK and Solr and they are connected.

When I see the following error in Zookeeper,

unexpected error, closing socket connection and attempting reconnect
java.io.IOException: Packet len11823809 is out of range!


This is usually caused by the overseer queue (stored in zookeeper)
becoming extraordinarily huge, because it's being flooded with work
entries far faster than the overseer can process them.  This causes the
znode where the queue is stored to become larger than the maximum size
for a znode, which defaults to about 1MB.  In this case (reading your
log message that says len11823809), something in zookeeper has gotten to
be 11MB in size, so the zookeeper client cannot read it.

I think the zookeeper server code must be handling the addition of
children to the queue znode through a code path that doesn't pay
attention to the maximum buffer size, just goes ahead and adds it,
probably by simply appending data.  I'm unfamiliar with how the ZK
database works, so I'm guessing here.

If I'm right about where the problem is, there are two workarounds to
your immediate issue.

1) Delete all the entries in your overseer queue using a zookeeper
client that lets you edit the DB directly.  If you haven't changed the
cloud structure and all your servers are working, this should be safe.

2) Set the jute.maxbuffer system property on the startup commandline for
all ZK servers and all ZK clients (Solr instances) to a size that's
large enough to accommodate the huge znode.  In order to do the deletion
mentioned in option 1 above,you might need to increase jute.maxbuffer on
the servers and the client you use for the deletion.

These are just workarounds.  Whatever caused the huge queue in the first
place must be addressed.  It is frequently a performance issue.  If you
go to the following link, you will see that jute.maxbuffer is considered
an unsafe option:

http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html#Unsafe+Options

In Jira issue SOLR-7191, I wrote the following in one of my comments:

"The giant queue I encountered was about 85 entries, and resulted in
a packet length of a little over 14 megabytes. If I divide 85 by 14,
I know that I can have about 6 overseer queue entries in one znode
before jute.maxbuffer needs to be increased."

https://issues.apache.org/jira/browse/SOLR-7191?focusedCommentId=14347834

Thanks,
Shawn



Re: Zk and Solr Cloud

2015-10-02 Thread Rallavagu

Thanks for the insight into this Erick. Thanks.

On 10/2/15 8:58 AM, Erick Erickson wrote:

Rallavagu:

Absent nodes going up and down or otherwise changing state, Zookeeper
isn't involved in the normal operations of Solr (adding docs,
querying, all that). That said, things that change the state of the
Solr nodes _do_ involve Zookeeper and the Overseer. The Overseer is
used to serialize and control changing information in the
clusterstate.json (or state.json) and others. If the nodes all tried
to write to Zk directly, it's hard to coordinate. That's a little
simplistic and counterintuitive, but maybe this will help.

When a Solr instance starts up it
1> registers itself as live with ZK
2> creates a listener that ZK pings when there's a state change (some
node goes up or down, goes into recovery, gets added, whatever).
3> gets the current cluster state from ZK.

Thereafter, this particular node doesn't need to ask ZK for anything.
It knows the current topology of cluster and can route requests (index
or query) to the correct Solr replica etc.

Now, let's claim that "something changes". Solr stops on one of the
nodes. Or someone adds a collection. Or. The overseer usually gets
involved in changing the state on ZK for this new action. Part of that
is that ZK sends an event to all the Solr nodes that have registered
themselves as listeners that causes them to ask ZK for the current
state of the cluster, and each Solr node adjusts its actions based on
this information. Note the kind of thing here that changes and
triggers this is that a whole replica becomes able or unable to carry
out its functions, NOT that the some collection gets another doc added
or answers a query.

Zk also periodically pings each Solr instance that's registered itself
and, if the node fails to respond may force it into recovery & etc.
Again, though, that has nothing to do with standard Solr operations.

So a massive overseer queue tends to indicate that there's a LOT of
state changes, lots of nodes going up and down etc. One implication of
the above is that if you turn on all your nodes in a large cluster at
the same time, there'll be a LOT of activity; they'll all register
themselves, try to elect leaders for shards, to into/out of recovery,
become active, all these are things that trigger overseer activity.

Or there are simply bugs in how the overseer works in the version
you're using, I know there's been a lot of effort to harden that area
over the various versions.

Two things that are "interesting".
1> Only one of your Solr instances hosts the overseer. If you're doing
a restart of _all_ your boxes, it's advisable to bounce the node
that's the overseer _last_. Otherwise you risk an odd situation: the
overseer is elected and starts to work, that node restarts which
causes the overseer role to switch to another node which immediately
is bounced and a new overseer is elected and

2> As of 5.x, there are two ZK formats
a> the "old" format where the entire clusterstate for all collections
is kept in a single ZK node (/clusterstate.json)
b> the "new" format where each collection has its own state.json that
only contains the state for that collection.

This is very helpful when you have many clusters. In the  case, any
time _any_ node changes, _all_ nodes have to get a new state. In ,
only the nodes involved in a single collection need to get new
information when any node in _that_ collection change.

FWIW,
Erick



On Fri, Oct 2, 2015 at 8:03 AM, Ravi Solr <ravis...@gmail.com> wrote:

Awesome nugget Shawn, I also faced similar issue a while ago while i was
doing a full re-index. It would be great if such tips are added into FAQ
type documentation on cwiki. I love the SOLR forum everyday I learn
something new :-)

Thanks

Ravi Kiran Bhaskar

On Fri, Oct 2, 2015 at 1:58 AM, Shawn Heisey <apa...@elyograg.org> wrote:


On 10/1/2015 1:26 PM, Rallavagu wrote:

Solr 4.6.1 single shard with 4 nodes. Zookeeper 3.4.5 ensemble of 3.

See following errors in ZK and Solr and they are connected.

When I see the following error in Zookeeper,

unexpected error, closing socket connection and attempting reconnect
java.io.IOException: Packet len11823809 is out of range!


This is usually caused by the overseer queue (stored in zookeeper)
becoming extraordinarily huge, because it's being flooded with work
entries far faster than the overseer can process them.  This causes the
znode where the queue is stored to become larger than the maximum size
for a znode, which defaults to about 1MB.  In this case (reading your
log message that says len11823809), something in zookeeper has gotten to
be 11MB in size, so the zookeeper client cannot read it.

I think the zookeeper server code must be handling the addition of
children to the queue znode through a code path that doesn't pay
attention to the maximum buffer size, just goes ahead and adds it,
probably by simply appendi

Recovery Thread Blocked

2015-10-02 Thread Rallavagu

Solr 4.6.1 on Tomcat 7, single shard 4 node cloud with 3 node zookeeper

During updates, some nodes are going very high cpu and becomes 
unavailable. The thread dump shows the following thread is blocked 870 
threads which explains high CPU. Any clues on where to look?


"Thread-56848" id=79207 idx=0x38 tid=3169 prio=5 alive, blocked, 
native_blocked, daemon

-- Blocked trying to get lock: java/lang/Object@0x114d8dd00[fat lock]
at pthread_cond_wait@@GLIBC_2.3.2+202(:0)@0x3d4180b5ba
at eventTimedWaitNoTransitionImpl+71(event.c:90)@0x7ff3133b6ba8
at 
syncWaitForSignalNoTransition+65(synchronization.c:28)@0x7ff31354a0b2

at syncWaitForSignal+189(synchronization.c:85)@0x7ff31354a20e
at syncWaitForJavaSignal+38(synchronization.c:93)@0x7ff31354a327
at jrockit/vm/Threads.waitForUnblockSignal()V(Native Method)
at jrockit/vm/Locks.fatLockBlockOrSpin(Locks.java:1411)[optimized]
at jrockit/vm/Locks.lockFat(Locks.java:1512)[optimized]
at 
jrockit/vm/Locks.monitorEnterSecondStageHard(Locks.java:1054)[optimized]

at jrockit/vm/Locks.monitorEnterSecondStage(Locks.java:1005)[optimized]
at jrockit/vm/Locks.monitorEnter(Locks.java:2179)[optimized]
at 
org/apache/solr/update/DefaultSolrCoreState.doRecovery(DefaultSolrCoreState.java:290)
at 
org/apache/solr/handler/admin/CoreAdminHandler$2.run(CoreAdminHandler.java:770)

at jrockit/vm/RNI.c2java(J)V(Native Method)


Re: Recovery Thread Blocked

2015-10-02 Thread Rallavagu

Here is the stack trace of the thread that is holding the lock.


"Thread-55266" id=77142 idx=0xc18 tid=992 prio=5 alive, waiting, 
native_blocked, daemon
-- Waiting for notification on: 
org/apache/solr/cloud/RecoveryStrategy@0x3f34e8480[fat lock]

at pthread_cond_wait@@GLIBC_2.3.2+202(:0)@0x3d4180b5ba
at eventTimedWaitNoTransitionImpl+71(event.c:90)@0x7ff3133b6ba8
at 
syncWaitForSignalNoTransition+65(synchronization.c:28)@0x7ff31354a0b2

at syncWaitForSignal+189(synchronization.c:85)@0x7ff31354a20e
at syncWaitForJavaSignal+38(synchronization.c:93)@0x7ff31354a327
at 
RJNI_jrockit_vm_Threads_waitForNotifySignal+73(rnithreads.c:72)@0x7ff31351939a
at 
jrockit/vm/Threads.waitForNotifySignal(JLjava/lang/Object;)Z(Native Method)

at java/lang/Object.wait(J)V(Native Method)
at java/lang/Thread.join(Thread.java:1206)
^-- Lock released while waiting: 
org/apache/solr/cloud/RecoveryStrategy@0x3f34e8480[fat lock]

at java/lang/Thread.join(Thread.java:1259)
at 
org/apache/solr/update/DefaultSolrCoreState.cancelRecovery(DefaultSolrCoreState.java:331)

^-- Holding lock: java/lang/Object@0x114d8dd00[recursive]
at 
org/apache/solr/update/DefaultSolrCoreState.doRecovery(DefaultSolrCoreState.java:297)

^-- Holding lock: java/lang/Object@0x114d8dd00[fat lock]
at 
org/apache/solr/handler/admin/CoreAdminHandler$2.run(CoreAdminHandler.java:770)

at jrockit/vm/RNI.c2java(J)V(Native Method)


Stack trace of one of the 870 threads that is waiting for the lock to be 
released.


"Thread-55489" id=77520 idx=0xebc tid=1494 prio=5 alive, blocked, 
native_blocked, daemon

-- Blocked trying to get lock: java/lang/Object@0x114d8dd00[fat lock]
at pthread_cond_wait@@GLIBC_2.3.2+202(:0)@0x3d4180b5ba
at eventTimedWaitNoTransitionImpl+71(event.c:90)@0x7ff3133b6ba8
at 
syncWaitForSignalNoTransition+65(synchronization.c:28)@0x7ff31354a0b2

at syncWaitForSignal+189(synchronization.c:85)@0x7ff31354a20e
at syncWaitForJavaSignal+38(synchronization.c:93)@0x7ff31354a327
at jrockit/vm/Threads.waitForUnblockSignal()V(Native Method)
at jrockit/vm/Locks.fatLockBlockOrSpin(Locks.java:1411)[optimized]
at jrockit/vm/Locks.lockFat(Locks.java:1512)[optimized]
at 
jrockit/vm/Locks.monitorEnterSecondStageHard(Locks.java:1054)[optimized]

at jrockit/vm/Locks.monitorEnterSecondStage(Locks.java:1005)[optimized]
at jrockit/vm/Locks.monitorEnter(Locks.java:2179)[optimized]
at 
org/apache/solr/update/DefaultSolrCoreState.doRecovery(DefaultSolrCoreState.java:290)
at 
org/apache/solr/handler/admin/CoreAdminHandler$2.run(CoreAdminHandler.java:770)

at jrockit/vm/RNI.c2java(J)V(Native Method)

On 10/2/15 4:12 PM, Rallavagu wrote:

Solr 4.6.1 on Tomcat 7, single shard 4 node cloud with 3 node zookeeper

During updates, some nodes are going very high cpu and becomes
unavailable. The thread dump shows the following thread is blocked 870
threads which explains high CPU. Any clues on where to look?

"Thread-56848" id=79207 idx=0x38 tid=3169 prio=5 alive, blocked,
native_blocked, daemon
 -- Blocked trying to get lock: java/lang/Object@0x114d8dd00[fat lock]
 at pthread_cond_wait@@GLIBC_2.3.2+202(:0)@0x3d4180b5ba
 at eventTimedWaitNoTransitionImpl+71(event.c:90)@0x7ff3133b6ba8
 at
syncWaitForSignalNoTransition+65(synchronization.c:28)@0x7ff31354a0b2
 at syncWaitForSignal+189(synchronization.c:85)@0x7ff31354a20e
 at syncWaitForJavaSignal+38(synchronization.c:93)@0x7ff31354a327
 at jrockit/vm/Threads.waitForUnblockSignal()V(Native Method)
 at jrockit/vm/Locks.fatLockBlockOrSpin(Locks.java:1411)[optimized]
 at jrockit/vm/Locks.lockFat(Locks.java:1512)[optimized]
 at
jrockit/vm/Locks.monitorEnterSecondStageHard(Locks.java:1054)[optimized]
 at
jrockit/vm/Locks.monitorEnterSecondStage(Locks.java:1005)[optimized]
 at jrockit/vm/Locks.monitorEnter(Locks.java:2179)[optimized]
 at
org/apache/solr/update/DefaultSolrCoreState.doRecovery(DefaultSolrCoreState.java:290)

 at
org/apache/solr/handler/admin/CoreAdminHandler$2.run(CoreAdminHandler.java:770)

 at jrockit/vm/RNI.c2java(J)V(Native Method)


PoolingClientConnectionManager

2015-10-01 Thread Rallavagu

Solr 4.6.1, single Shard, cloud with 4 nodes

Solr is running on Tomcat configured with 200 threads for thread pool. 
As Solr uses "org.apache.http.impl.conn.PoolingClientConnectionManager" 
for replication, my question is does Solr threads use connections from 
tomcat thread pool or they create their own thread pool? I am trying to 
find out if it would be 200 + Solr threads or not. Thanks.


Re: PoolingClientConnectionManager

2015-10-01 Thread Rallavagu

Thanks for the response Andrea.

Assuming that Solr has it's own thread pool, it appears that 
"PoolingClientConnectionManager" has a maximum 20 threads per host as 
default. Is there a way to changes this increase to handle heavy update 
traffic? Thanks.




On 10/1/15 11:05 AM, Andrea Gazzarini wrote:

Hi,
Maybe I could be wrong as your question is related with Solr internals (I
believe the dev list is a better candidate for such questions).

Anyway, my thoughts: unless you're within a JCA inbound component (and Solr
isn't), the JEE specs say you shouldn' start new threads. For this  reason,
there's no a (standard) way to directly connect to and use the servlet
container threads.

As far as I know Solr 4.x is a standard and JEE compliant web application
so the answer to your question *should* be: "yes, it is using its own
threads"

Best,
Andrea
Solr 4.6.1, single Shard, cloud with 4 nodes

Solr is running on Tomcat configured with 200 threads for thread pool. As
Solr uses "org.apache.http.impl.conn.PoolingClientConnectionManager" for
replication, my question is does Solr threads use connections from tomcat
thread pool or they create their own thread pool? I am trying to find out
if it would be 200 + Solr threads or not. Thanks.



Re: PoolingClientConnectionManager

2015-10-01 Thread Rallavagu

Thanks Shawn. This is good data.

On 10/1/15 11:43 AM, Shawn Heisey wrote:

On 10/1/2015 11:50 AM, Rallavagu wrote:

Solr 4.6.1, single Shard, cloud with 4 nodes

Solr is running on Tomcat configured with 200 threads for thread pool.
As Solr uses
"org.apache.http.impl.conn.PoolingClientConnectionManager" for
replication, my question is does Solr threads use connections from
tomcat thread pool or they create their own thread pool? I am trying
to find out if it would be 200 + Solr threads or not. Thanks.


I don't know the answer to the actual question you have asked ... but I
do know that keeping the container maxThreads at 200 can cause serious
problems for Solr.  It does not take a very big installation to exceed
200 threads, and users have had problems fixed by increasing
maxThreads.  This implies that the container is able to control the
threads in Solr to some degree.

The Jetty included with all versions of Solr that I have actually
checked (back to 3.2.0) has maxThreads set to 1, which effectively
removes the thread limit for any typical install.  Very large installs
might need it bumped higher than 1.

Thanks,
Shawn



Re: PoolingClientConnectionManager

2015-10-01 Thread Rallavagu

Awesome. This is what I was looking for. Will try these. Thanks.

On 10/1/15 1:31 PM, Shawn Heisey wrote:

On 10/1/2015 12:39 PM, Rallavagu wrote:

Thanks for the response Andrea.

Assuming that Solr has it's own thread pool, it appears that
"PoolingClientConnectionManager" has a maximum 20 threads per host as
default. Is there a way to changes this increase to handle heavy
update traffic? Thanks.


You can configure all ShardHandler instances with the solr.xml file.
The shard handler controls SolrJ (and HttpClient) within Solr.

https://cwiki.apache.org/confluence/display/solr/Moving+to+the+New+solr.xml+Format

That page does not go into all the shard handler options, though.  For
that, you need to look at the page for distributed requests ... but
don't configure it in solrconfig.xml as the following link shows,
configure it in solr.xml as shown by the earlier link.

https://cwiki.apache.org/confluence/display/solr/Distributed+Requests#DistributedRequests-ConfiguringtheShardHandlerFactory

Thanks,
Shawn



Zk and Solr Cloud

2015-10-01 Thread Rallavagu

Solr 4.6.1 single shard with 4 nodes. Zookeeper 3.4.5 ensemble of 3.

See following errors in ZK and Solr and they are connected.

When I see the following error in Zookeeper,

unexpected error, closing socket connection and attempting reconnect
java.io.IOException: Packet len11823809 is out of range!
at 
org.apache.zookeeper.ClientCnxnSocket.readLength(ClientCnxnSocket.java:112)
at 
org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:79)
at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355)
at 
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)



There is the following corresponding error in Solr

caught end of stream exception
EndOfStreamException: Unable to read additional data from client 
sessionid 0x25024c8ea0e, likely client has closed socket
at 
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
at 
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)

at java.lang.Thread.run(Thread.java:744)

Any clues as to what is causing these errors. Thanks.


Re: Solr 4.6.1 Cloud Stops Replication

2015-08-31 Thread Rallavagu

Erick,

Apologies for missing out on status on indexing (replication) issues as 
I have originally started this thread. After implementing 
CloudSolrServer instead of CouncurrentUpdateSolrServer things were much 
better. I simply wanted to follow up on understanding the memory 
behavior better though we have tuned both heap and physical memory a 
while ago.


Thanks

On 8/24/15 9:09 AM, Erick Erickson wrote:

bq: As a follow up, the default is set to "NRTCachingDirectoryFactory"
for DirectoryFactory but not MMapDirectory. It is mentioned that
NRTCachingDirectoryFactory "caches small files in memory for better
NRT performance".

NRTCachingDirectoryFactory also uses MMapDirectory under the covers as
well as "caches small files in memory"
so you really can't separate out the two.

I didn't mention this explicitly, but your original problem should
_not_ be happening in a well-tuned
system. Why your nodes go into a down state needs to be understood.
The connection timeout is
the only clue so far, and the usual reason here is that very long GC
pauses are happening. If this
continually happens, you might try turning on GC reporting options.

Best,
Erick


On Mon, Aug 24, 2015 at 2:47 AM, Rallavagu <rallav...@gmail.com> wrote:

As a follow up, the default is set to "NRTCachingDirectoryFactory" for
DirectoryFactory but not MMapDirectory. It is mentioned that
NRTCachingDirectoryFactory "caches small files in memory for better NRT
performance".

Wondering if the this would also consume physical memory to the amount of
MMap directory. Thoughts?

On 8/18/15 9:29 AM, Erick Erickson wrote:


Couple of things:

1> Here's an excellent backgrounder for MMapDirectory, which is
what makes it appear that Solr is consuming all the physical memory

http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

2> It's possible that your transaction log was huge. Perhaps not likely,
but possible. If Solr abnormally terminates (kill -9 is a prime way to do
this),
then upon restart the transaction log is replayed. This log is rolled over
upon
every hard commit (openSearcher true or false doesn't matter). So, in the
scenario where you are indexing a whole lot of stuff without committing,
then
it can take a very long time to replay the log. Not only that, but as you
do
replay the log, any incoming updates are written to the end of the tlog..
That
said, nothing in your e-mails indicates this could be a problem and it's
frankly not consistent with the errors you _do_ report but I thought
I'd mention it.
See:
https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
You can avoid the possibility of this by configuring your autoCommit
interval
to be relatively short (say 60 seconds) with openSearcher=false

3> ConcurrentUpdateSolrServer isn't the best thing for bulk loading
SolrCloud,
CloudSolrServer (renamed CloudSolrClient in 5.x) is better. CUSS sends all
the docs to some node, and from there that node figures out which
shard each doc belongs on and forwards the doc (actually in batches) to
the
appropriate leader. So doing what you're doing creates a lot of cross
chatter
amongst nodes. CloudSolrServer/Client figures that out on the client side
and
only sends packets to each leader that consist of only the docs the belong
on
that shard. You can getnearly linear throughput with increasing numbers of
shards this way.

Best,
Erick

On Tue, Aug 18, 2015 at 9:03 AM, Rallavagu <rallav...@gmail.com> wrote:


Thanks Shawn.

All participating cloud nodes are running Tomcat and as you suggested
will
review the number of threads and increase them as needed.

Essentially, what I have noticed was that two of four nodes caught up
with
"bulk" updates instantly while other two nodes took almost 3 hours to
completely in sync with "leader". I have "tickled" other nodes by sending
an
update thinking that it would initiate the replication but not sure if
that
caused other two nodes to eventually catch up.

On similar note, I was using "CouncurrentUpdateSolrServer" directly
pointing
to leader to bulk load Solr cloud. I have configured the chunk size and
thread count for the same. Is this the right practice to bulk load
SolrCloud?

Also, the maximum number of connections per host parameter for
"HttpShardHandler" is in solrconfig.xml I suppose?

Thanks



On 8/18/15 8:28 AM, Shawn Heisey wrote:



On 8/18/2015 8:18 AM, Rallavagu wrote:



Thanks for the response. Does this cache behavior influence the delay
in catching up with cloud? How can we explain solr cloud replication
and what are the option to monitor and take proactive action (such as
initializing, pausing etc) if needed?




I don't know enough about your setup to speculate.

I did notice this exception in a previous reply:

org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for
connection fro

Re: Solr 4.6.1 Cloud Stops Replication

2015-08-24 Thread Rallavagu
As a follow up, the default is set to NRTCachingDirectoryFactory for 
DirectoryFactory but not MMapDirectory. It is mentioned that 
NRTCachingDirectoryFactory caches small files in memory for better NRT 
performance.


Wondering if the this would also consume physical memory to the amount 
of MMap directory. Thoughts?


On 8/18/15 9:29 AM, Erick Erickson wrote:

Couple of things:

1 Here's an excellent backgrounder for MMapDirectory, which is
what makes it appear that Solr is consuming all the physical memory

http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

2 It's possible that your transaction log was huge. Perhaps not likely,
but possible. If Solr abnormally terminates (kill -9 is a prime way to do this),
then upon restart the transaction log is replayed. This log is rolled over upon
every hard commit (openSearcher true or false doesn't matter). So, in the
scenario where you are indexing a whole lot of stuff without committing, then
it can take a very long time to replay the log. Not only that, but as you do
replay the log, any incoming updates are written to the end of the tlog.. That
said, nothing in your e-mails indicates this could be a problem and it's
frankly not consistent with the errors you _do_ report but I thought
I'd mention it.
See: 
https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
You can avoid the possibility of this by configuring your autoCommit interval
to be relatively short (say 60 seconds) with openSearcher=false

3 ConcurrentUpdateSolrServer isn't the best thing for bulk loading SolrCloud,
CloudSolrServer (renamed CloudSolrClient in 5.x) is better. CUSS sends all
the docs to some node, and from there that node figures out which
shard each doc belongs on and forwards the doc (actually in batches) to the
appropriate leader. So doing what you're doing creates a lot of cross chatter
amongst nodes. CloudSolrServer/Client figures that out on the client side and
only sends packets to each leader that consist of only the docs the belong on
that shard. You can getnearly linear throughput with increasing numbers of
shards this way.

Best,
Erick

On Tue, Aug 18, 2015 at 9:03 AM, Rallavagu rallav...@gmail.com wrote:

Thanks Shawn.

All participating cloud nodes are running Tomcat and as you suggested will
review the number of threads and increase them as needed.

Essentially, what I have noticed was that two of four nodes caught up with
bulk updates instantly while other two nodes took almost 3 hours to
completely in sync with leader. I have tickled other nodes by sending an
update thinking that it would initiate the replication but not sure if that
caused other two nodes to eventually catch up.

On similar note, I was using CouncurrentUpdateSolrServer directly pointing
to leader to bulk load Solr cloud. I have configured the chunk size and
thread count for the same. Is this the right practice to bulk load
SolrCloud?

Also, the maximum number of connections per host parameter for
HttpShardHandler is in solrconfig.xml I suppose?

Thanks



On 8/18/15 8:28 AM, Shawn Heisey wrote:


On 8/18/2015 8:18 AM, Rallavagu wrote:


Thanks for the response. Does this cache behavior influence the delay
in catching up with cloud? How can we explain solr cloud replication
and what are the option to monitor and take proactive action (such as
initializing, pausing etc) if needed?



I don't know enough about your setup to speculate.

I did notice this exception in a previous reply:

org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for
connection from pool

I can think of two things that would cause this.

One cause is that your servlet container is limiting the number of
available threads.  A typical jetty or tomcat default for maxThreads is
200, which can easily be exceeded by a small Solr install, especially if
it's SolrCloud.  The jetty included with Solr sets maxThreads to 1,
which is effectively unlimited except for extremely large installs.  If
you are providing your own container, this will almost certainly need to
be raised.

The other cause is that your install is extremely busy and you have run
out of available HttpClient connections.  The solution in this case is
to increase the maximum number of connections per host in the
HttpShardHandler config, which defaults to 20.


https://wiki.apache.org/solr/SolrConfigXml#Configuration_of_Shard_Handlers_for_Distributed_searches

There might be other causes for that exception, but I think those are
the most common causes.  Depending on how things are set up, you have
problems with both.

Thanks,
Shawn





Re: GC parameters tuning for core of 140M docs on 50G of heap memory

2015-08-24 Thread Rallavagu
One other item to check is non heap memory usage. This can be monitored 
from admin page.


On 8/23/15 11:48 PM, Pavel Hladik wrote:

Hi,

we have a Solr 5.2.1 with 9 cores and one of them has 140M docs. Can you
please recommend tuning of those GC parameters? The performance is not a
issue, sometimes during peaks we have OOM and we use 50G of heap memory, the
server has 64G of ram.

GC_TUNE=-XX:NewRatio=3 \
-XX:SurvivorRatio=4 \
-XX:TargetSurvivorRatio=90 \
-XX:MaxTenuringThreshold=8 \
-XX:+UseConcMarkSweepGC \
-XX:+UseParNewGC \
-XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 \
-XX:+CMSScavengeBeforeRemark \
-XX:PretenureSizeThreshold=64m \
-XX:+UseCMSInitiatingOccupancyOnly \
-XX:CMSInitiatingOccupancyFraction=50 \
-XX:CMSMaxAbortablePrecleanTime=6000 \
-XX:+CMSParallelRemarkEnabled \
-XX:+ParallelRefProcEnabled



--
View this message in context: 
http://lucene.472066.n3.nabble.com/GC-parameters-tuning-for-core-of-140M-docs-on-50G-of-heap-memory-tp4224813.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr 4.6.1 Cloud Stops Replication

2015-08-18 Thread Rallavagu
Thanks for the response. Does this cache behavior influence the delay in 
catching up with cloud? How can we explain solr cloud replication and 
what are the option to monitor and take proactive action (such as 
initializing, pausing etc) if needed?



On 8/18/15 5:57 AM, Shawn Heisey wrote:

On 8/17/2015 10:53 PM, Rallavagu wrote:

Also, I have noticed that the memory consumption goes very high. For
instance, each node is configured with 48G memory while java heap is
configured with 12G. The available physical memory is consumed almost
46G and the heap size is well within the limits (at this time it is at
8G). Is there a documentation or to understand this behavior? I suspect
it could be lucene related memory consumption but not sure.


This is completely normal.  Your total memory usage could have been
47.9GB instead of 46GB and I would still say the same thing.

Solr cannot consume more than the 12GB heap that you have assigned, plus
a little overhead (probably a few hundred MB) for the JVM itself.  The
rest of your memory (assuming Solr is the only significant software
installed on the system) is used by the operating system for caching
contents on your disk.  Solr *relies* on this behavior (and the
available RAM it requires) for good performance.

https://en.wikipedia.org/wiki/Page_cache
https://wiki.apache.org/solr/SolrPerformanceProblems

Thanks,
Shawn



Re: Solr 4.6.1 Cloud Stops Replication

2015-08-18 Thread Rallavagu

Thanks Shawn.

All participating cloud nodes are running Tomcat and as you suggested 
will review the number of threads and increase them as needed.


Essentially, what I have noticed was that two of four nodes caught up 
with bulk updates instantly while other two nodes took almost 3 hours 
to completely in sync with leader. I have tickled other nodes by 
sending an update thinking that it would initiate the replication but 
not sure if that caused other two nodes to eventually catch up.


On similar note, I was using CouncurrentUpdateSolrServer directly 
pointing to leader to bulk load Solr cloud. I have configured the chunk 
size and thread count for the same. Is this the right practice to bulk 
load SolrCloud?


Also, the maximum number of connections per host parameter for 
HttpShardHandler is in solrconfig.xml I suppose?


Thanks


On 8/18/15 8:28 AM, Shawn Heisey wrote:

On 8/18/2015 8:18 AM, Rallavagu wrote:

Thanks for the response. Does this cache behavior influence the delay
in catching up with cloud? How can we explain solr cloud replication
and what are the option to monitor and take proactive action (such as
initializing, pausing etc) if needed?


I don't know enough about your setup to speculate.

I did notice this exception in a previous reply:

org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for
connection from pool

I can think of two things that would cause this.

One cause is that your servlet container is limiting the number of
available threads.  A typical jetty or tomcat default for maxThreads is
200, which can easily be exceeded by a small Solr install, especially if
it's SolrCloud.  The jetty included with Solr sets maxThreads to 1,
which is effectively unlimited except for extremely large installs.  If
you are providing your own container, this will almost certainly need to
be raised.

The other cause is that your install is extremely busy and you have run
out of available HttpClient connections.  The solution in this case is
to increase the maximum number of connections per host in the
HttpShardHandler config, which defaults to 20.

https://wiki.apache.org/solr/SolrConfigXml#Configuration_of_Shard_Handlers_for_Distributed_searches

There might be other causes for that exception, but I think those are
the most common causes.  Depending on how things are set up, you have
problems with both.

Thanks,
Shawn



Re: Solr 4.6.1 Cloud Stops Replication

2015-08-18 Thread Rallavagu
Thanks for the response. Will take a look into using cloud solr server 
for updates and review tlog mechanism.


On 8/18/15 9:29 AM, Erick Erickson wrote:

Couple of things:

1 Here's an excellent backgrounder for MMapDirectory, which is
what makes it appear that Solr is consuming all the physical memory

http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

2 It's possible that your transaction log was huge. Perhaps not likely,
but possible. If Solr abnormally terminates (kill -9 is a prime way to do this),
then upon restart the transaction log is replayed. This log is rolled over upon
every hard commit (openSearcher true or false doesn't matter). So, in the
scenario where you are indexing a whole lot of stuff without committing, then
it can take a very long time to replay the log. Not only that, but as you do
replay the log, any incoming updates are written to the end of the tlog.. That
said, nothing in your e-mails indicates this could be a problem and it's
frankly not consistent with the errors you _do_ report but I thought
I'd mention it.
See: 
https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
You can avoid the possibility of this by configuring your autoCommit interval
to be relatively short (say 60 seconds) with openSearcher=false

3 ConcurrentUpdateSolrServer isn't the best thing for bulk loading SolrCloud,
CloudSolrServer (renamed CloudSolrClient in 5.x) is better. CUSS sends all
the docs to some node, and from there that node figures out which
shard each doc belongs on and forwards the doc (actually in batches) to the
appropriate leader. So doing what you're doing creates a lot of cross chatter
amongst nodes. CloudSolrServer/Client figures that out on the client side and
only sends packets to each leader that consist of only the docs the belong on
that shard. You can getnearly linear throughput with increasing numbers of
shards this way.

Best,
Erick

On Tue, Aug 18, 2015 at 9:03 AM, Rallavagu rallav...@gmail.com wrote:

Thanks Shawn.

All participating cloud nodes are running Tomcat and as you suggested will
review the number of threads and increase them as needed.

Essentially, what I have noticed was that two of four nodes caught up with
bulk updates instantly while other two nodes took almost 3 hours to
completely in sync with leader. I have tickled other nodes by sending an
update thinking that it would initiate the replication but not sure if that
caused other two nodes to eventually catch up.

On similar note, I was using CouncurrentUpdateSolrServer directly pointing
to leader to bulk load Solr cloud. I have configured the chunk size and
thread count for the same. Is this the right practice to bulk load
SolrCloud?

Also, the maximum number of connections per host parameter for
HttpShardHandler is in solrconfig.xml I suppose?

Thanks



On 8/18/15 8:28 AM, Shawn Heisey wrote:


On 8/18/2015 8:18 AM, Rallavagu wrote:


Thanks for the response. Does this cache behavior influence the delay
in catching up with cloud? How can we explain solr cloud replication
and what are the option to monitor and take proactive action (such as
initializing, pausing etc) if needed?



I don't know enough about your setup to speculate.

I did notice this exception in a previous reply:

org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for
connection from pool

I can think of two things that would cause this.

One cause is that your servlet container is limiting the number of
available threads.  A typical jetty or tomcat default for maxThreads is
200, which can easily be exceeded by a small Solr install, especially if
it's SolrCloud.  The jetty included with Solr sets maxThreads to 1,
which is effectively unlimited except for extremely large installs.  If
you are providing your own container, this will almost certainly need to
be raised.

The other cause is that your install is extremely busy and you have run
out of available HttpClient connections.  The solution in this case is
to increase the maximum number of connections per host in the
HttpShardHandler config, which defaults to 20.


https://wiki.apache.org/solr/SolrConfigXml#Configuration_of_Shard_Handlers_for_Distributed_searches

There might be other causes for that exception, but I think those are
the most common causes.  Depending on how things are set up, you have
problems with both.

Thanks,
Shawn





Solr 4.6.1 Cloud Stops Replication

2015-08-17 Thread Rallavagu

Hello,

Have 4 nodes participating solr cloud. After indexing about 2 mil 
documents, only two nodes are Active (green) while other two are shown 
as down. How can I initialize the replication from leader so other 
two nodes would receive updates?


Thanks


Re: Solr 4.6.1 Cloud Stops Replication

2015-08-17 Thread Rallavagu
By the time the last email was sent, other node also caught up. Makes me 
wonder what happened and how does this work.


Thanks

On 8/17/15 9:53 PM, Rallavagu wrote:

response inline..

On 8/17/15 8:40 PM, Erick Erickson wrote:

Is this 4 shards? Two shards each with a leader and follower? Details
matter a lot


It is a single collection single shard.



What, if anything, is in the log file for the down nodes? I'm assuming
that when you
start, all the nodes are active


During the update process found following exceptions

org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting 
for connection from pool
at 
org.apache.http.impl.conn.PoolingClientConnectionManager.leaseConnection(PoolingClientConnectionManager.java:232)
at 
org.apache.http.impl.conn.PoolingClientConnectionManager$1.getConnection(PoolingClientConnectionManager.java:199)
at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:456)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
at 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:232)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)

at java.lang.Thread.run(Thread.java:682)

However, after couple of hours one of the nodes (out of two that were 
trailing) caught up with status Active. However, other node is still 
in state Down. It has following message.


Log replay finished. recoveryInfo=RecoveryInfo{adds=2009581 
deletes=148 deleteByQuery=0 errors=0 positionOfStart=0}


I am trying to understand the behavior and wondering is there a way to 
trigger the updates to other participating nodes in the cloud.


Also, I have noticed that the memory consumption goes very high. For 
instance, each node is configured with 48G memory while java heap is 
configured with 12G. The available physical memory is consumed almost 
46G and the heap size is well within the limits (at this time it is at 
8G). Is there a documentation or to understand this behavior? I 
suspect it could be lucene related memory consumption but not sure.





You might review:
http://wiki.apache.org/solr/UsingMailingLists


Sorry for not being very clear to start with. Hope the provided 
information would help.


Thanks



Best,
Erick

On Mon, Aug 17, 2015 at 6:19 PM, Rallavagu rallav...@gmail.com wrote:

Hello,

Have 4 nodes participating solr cloud. After indexing about 2 mil 
documents,
only two nodes are Active (green) while other two are shown as 
down. How

can I initialize the replication from leader so other two nodes would
receive updates?

Thanks




Re: Solr 4.6.1 Cloud Stops Replication

2015-08-17 Thread Rallavagu

response inline..

On 8/17/15 8:40 PM, Erick Erickson wrote:

Is this 4 shards? Two shards each with a leader and follower? Details
matter a lot


It is a single collection single shard.



What, if anything, is in the log file for the down nodes? I'm assuming
that when you
start, all the nodes are active


During the update process found following exceptions

org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for 
connection from pool
	at 
org.apache.http.impl.conn.PoolingClientConnectionManager.leaseConnection(PoolingClientConnectionManager.java:232)
	at 
org.apache.http.impl.conn.PoolingClientConnectionManager$1.getConnection(PoolingClientConnectionManager.java:199)
	at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:456)
	at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
	at 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:232)
	at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)

at java.lang.Thread.run(Thread.java:682)

However, after couple of hours one of the nodes (out of two that were 
trailing) caught up with status Active. However, other node is still 
in state Down. It has following message.


Log replay finished. recoveryInfo=RecoveryInfo{adds=2009581 deletes=148 
deleteByQuery=0 errors=0 positionOfStart=0}


I am trying to understand the behavior and wondering is there a way to 
trigger the updates to other participating nodes in the cloud.


Also, I have noticed that the memory consumption goes very high. For 
instance, each node is configured with 48G memory while java heap is 
configured with 12G. The available physical memory is consumed almost 
46G and the heap size is well within the limits (at this time it is at 
8G). Is there a documentation or to understand this behavior? I suspect 
it could be lucene related memory consumption but not sure.





You might review:
http://wiki.apache.org/solr/UsingMailingLists


Sorry for not being very clear to start with. Hope the provided 
information would help.


Thanks



Best,
Erick

On Mon, Aug 17, 2015 at 6:19 PM, Rallavagu rallav...@gmail.com wrote:

Hello,

Have 4 nodes participating solr cloud. After indexing about 2 mil documents,
only two nodes are Active (green) while other two are shown as down. How
can I initialize the replication from leader so other two nodes would
receive updates?

Thanks


Collections Design

2014-04-10 Thread Rallavagu

All,

What is the best practice or guideline towards considering multiple 
collections particularly in the solr cloud env?


Thanks

Srikanth


No route to host

2014-04-09 Thread Rallavagu

All,

I see the following error in the log file. The host that it is trying to 
find is itself. Wondering if anybody experienced this before or any 
other info would helpful. Thanks.


709703139 [http-bio-8080-exec-43] ERROR 
org.apache.solr.update.SolrCmdDistributor  – 
org.apache.solr.client.solrj.SolrServerException: IOException occured 
when talking to server at: http://host:8080/solr/collection1
	at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:503)
	at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:197)
	at 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer.request(ConcurrentUpdateSolrServer.java:293)
	at 
org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:212)
	at 
org.apache.solr.update.SolrCmdDistributor.distribCommit(SolrCmdDistributor.java:181)
	at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1260)
	at 
org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:157)
	at 
org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
	at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
	at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
	at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:710)
	at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:413)
	at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197)
	at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
	at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
	at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
	at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
	at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
	at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
	at 
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953)
	at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
	at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
	at 
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1023)
	at 
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
	at 
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:312)
	at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:744)
Caused by: java.net.NoRouteToHostException: No route to host
at java.net.PlainSocketImpl.socketConnect(Native Method)
	at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
	at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
	at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)

at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
	at 
org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:127)
	at 
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180)
	at 
org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
	at 
org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:643)
	at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:479)
	at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
	at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
	at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
	at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:393)


Re: No route to host

2014-04-09 Thread Rallavagu

Agreed. But, it is failing to find route to itself. Weird.

On 4/9/14, 1:34 PM, Greg Walters wrote:

This doesn't looks like a solr-specfic issue. Be sure to check your routes and 
your firewall. I've seen firewalls refuse packets and return a special flag 
that results in a no route to host error.

Thanks,
Greg

On Apr 9, 2014, at 3:28 PM, Rallavagu rallav...@gmail.com wrote:


All,

I see the following error in the log file. The host that it is trying to find 
is itself. Wondering if anybody experienced this before or any other info would 
helpful. Thanks.

709703139 [http-bio-8080-exec-43] ERROR org.apache.solr.update.SolrCmdDistributor  – 
org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to 
server at: http://host:8080/solr/collection1
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:503)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:197)
at 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer.request(ConcurrentUpdateSolrServer.java:293)
at 
org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:212)
at 
org.apache.solr.update.SolrCmdDistributor.distribCommit(SolrCmdDistributor.java:181)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1260)
at 
org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:157)
at 
org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:710)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:413)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
at 
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
at 
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1023)
at 
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
at 
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:312)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.net.NoRouteToHostException: No route to host
at java.net.PlainSocketImpl.socketConnect(Native Method)
at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at 
org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:127)
at 
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180)
at 
org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
at 
org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:643)
at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:479)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request

Re: No route to host

2014-04-09 Thread Rallavagu
Sorry. I should have mentioned earlier. I have removed the original host 
name on purpose. Thanks.


On 4/9/14, 1:42 PM, Siegfried Goeschl wrote:

Hi folks,

the URL looks wrong (misconfigured)

http://host:8080/solr/collection1

Cheers,

Siegfried Goeschl

On 09 Apr 2014, at 14:28, Rallavagu rallav...@gmail.com wrote:


All,

I see the following error in the log file. The host that it is trying to find 
is itself. Wondering if anybody experienced this before or any other info would 
helpful. Thanks.

709703139 [http-bio-8080-exec-43] ERROR org.apache.solr.update.SolrCmdDistributor  – 
org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to 
server at: http://host:8080/solr/collection1
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:503)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:197)
at 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer.request(ConcurrentUpdateSolrServer.java:293)
at 
org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:212)
at 
org.apache.solr.update.SolrCmdDistributor.distribCommit(SolrCmdDistributor.java:181)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1260)
at 
org.apache.solr.update.processor.LogUpdateProcessor.processCommit(LogUpdateProcessorFactory.java:157)
at 
org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:710)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:413)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:197)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
at 
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
at 
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1023)
at 
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
at 
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:312)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.net.NoRouteToHostException: No route to host
at java.net.PlainSocketImpl.socketConnect(Native Method)
at 
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at 
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at 
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at 
org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:127)
at 
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:180)
at 
org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
at 
org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:643)
at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:479)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:393)




Re: Indexing huge data

2014-03-08 Thread Rallavagu
Thanks for all responses so far. Test runs so far does not suggest any 
bottleneck with Solr yet as I continue to work on different approaches. 
Collecting the data from different sources seems to be consuming most of 
the time.


On 3/7/14, 5:53 PM, Erick Erickson wrote:

Kranti and Susheel's appoaches are certainly
reasonable assuming I bet right :).

Another strategy is to rack together N
indexing programs that simultaneously
feed Solr.

In any of these scenarios, the end goal is to get
Solr using up all the CPU cycles it can, _assuming_
that Solr isn't the bottleneck in the first place.

Best,
Erick

On Thu, Mar 6, 2014 at 6:38 PM, Kranti Parisa kranti.par...@gmail.com wrote:

thats what I do. precreate JSONs following the schema, saving that in
MongoDB, this is part of the ETL process. after that, just dump the JSONs
into Solr using batching etc. with this you can do full and incremental
indexing as well.

Thanks,
Kranti K. Parisa
http://www.linkedin.com/in/krantiparisa



On Thu, Mar 6, 2014 at 9:57 AM, Rallavagu rallav...@gmail.com wrote:


Yeah. I have thought about spitting out JSON and run it against Solr using
parallel Http threads separately. Thanks.


On 3/5/14, 6:46 PM, Susheel Kumar wrote:


One more suggestion is to collect/prepare the data in CSV format (1-2
million sample depending on size) and then import data direct into Solr
using CSV handler  curl.  This will give you the pure indexing time  the
differences.

Thanks,
Susheel

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Wednesday, March 05, 2014 8:03 PM
To: solr-user@lucene.apache.org
Subject: Re: Indexing huge data

Here's the easiest thing to try to figure out where to concentrate your
energies. Just comment out the server.add call in your SolrJ program.
Well, and any commits you're doing from SolrJ.

My bet: Your program will run at about the same speed it does when you
actually index the docs, indicating that your problem is in the data
acquisition side. Of course the older I get, the more times I've been wrong
:).

You can also monitor the CPU usage on the box running Solr. I often see
it idling along  30% when indexing, or even  10%, again indicating that
the bottleneck is on the acquisition side.

Note I haven't mentioned any solutions, I'm a believer in identifying the
_problem_ before worrying about a solution.

Best,
Erick

On Wed, Mar 5, 2014 at 4:29 PM, Jack Krupansky j...@basetechnology.com
wrote:


Make sure you're not doing a commit on each individual document add.
Commit every few minutes or every few hundred or few thousand
documents is sufficient. You can set up auto commit in solrconfig.xml.

-- Jack Krupansky

-Original Message- From: Rallavagu
Sent: Wednesday, March 5, 2014 2:37 PM
To: solr-user@lucene.apache.org
Subject: Indexing huge data


All,

Wondering about best practices/common practices to index/re-index huge
amount of data in Solr. The data is about 6 million entries in the db
and other source (data is not located in one resource). Trying with
solrj based solution to collect data from difference resources to
index into Solr. It takes hours to index Solr.

Thanks in advance





Re: Indexing huge data

2014-03-06 Thread Rallavagu

Erick,

That helps so I can focus on the problem areas. Thanks.

On 3/5/14, 6:03 PM, Erick Erickson wrote:

Here's the easiest thing to try to figure out where to
concentrate your energies. Just comment out the
server.add call in your SolrJ program. Well, and any
commits you're doing from SolrJ.

My bet: Your program will run at about the same speed
it does when you actually index the docs, indicating that
your problem is in the data acquisition side. Of course
the older I get, the more times I've been wrong :).

You can also monitor the CPU usage on the box running
Solr. I often see it idling along  30% when indexing, or
even  10%, again indicating that the bottleneck is on the
acquisition side.

Note I haven't mentioned any solutions, I'm a believer in
identifying the _problem_ before worrying about a solution.

Best,
Erick

On Wed, Mar 5, 2014 at 4:29 PM, Jack Krupansky j...@basetechnology.com wrote:

Make sure you're not doing a commit on each individual document add. Commit
every few minutes or every few hundred or few thousand documents is
sufficient. You can set up auto commit in solrconfig.xml.

-- Jack Krupansky

-Original Message- From: Rallavagu
Sent: Wednesday, March 5, 2014 2:37 PM
To: solr-user@lucene.apache.org
Subject: Indexing huge data


All,

Wondering about best practices/common practices to index/re-index huge
amount of data in Solr. The data is about 6 million entries in the db
and other source (data is not located in one resource). Trying with
solrj based solution to collect data from difference resources to index
into Solr. It takes hours to index Solr.

Thanks in advance


  1   2   >