Re: Solr performance issue

2018-02-15 Thread Shawn Heisey
On 2/15/2018 2:00 AM, Srinivas Kashyap wrote:
> I have implemented 'SortedMapBackedCache' in my SqlEntityProcessor for the 
> child entities in data-config.xml. And i'm using the same for full-import 
> only. And in the beginning of my implementation, i had written delta-import 
> query to index the modified changes. But my requirement grew and i have 17 
> child entities for a single parent entity now. When doing delta-import for 
> huge data, the number of requests being made to datasource(database)  became 
> more and CPU utilization was 100% when concurrent users started modifying the 
> data. For this instead of calling delta-import which imports based on last 
> index time, I did full-import('SortedMapBackedCache' ) based on last index 
> time.
>
> Though the parent entity query would return only records that are modified, 
> the child entity queries pull all the data from the database and the indexing 
> happens 'in-memory' which is causing the JVM memory go out of memory.

Can you provide your DIH config file (with passwords redacted) and the
precise URL you are using to initiate dataimport?  Also, I would like to
know what field you have defined as your uniqueKey.  I may have more
questions about the data in your system, depending on what I see.

That cache implementation should only cache entries from the database
that are actually requested.  If your query is correctly defined, it
should not pull all records from the DB table.

> Is there a way to specify in the child query entity to pull the record 
> related to parent entity in the full-import mode.

If I am understanding your question correctly, this is one of the fairly
basic things that DIH does.  Look at this config example in the
reference guide:

https://lucene.apache.org/solr/guide/6_6/uploading-structured-data-store-data-with-the-data-import-handler.html#configuring-the-dih-configuration-file

In the entity named feature in that example config, the query string
uses ${item.ID} to reference the ID column from the parent entity, which
is item.

I should warn you that a cached entity does not always improve
performance.  This is particularly true if the lookup into the cache is
the information that goes to your uniqueKey field.  When the lookup is
by uniqueKey, every single row requested from the database will be used
exactly once, so there's not really any point to caching it.

Thanks,
Shawn



Re: Solr performance issue

2018-02-15 Thread Erick Erickson
Srinivas:

Not an answer to your question, but when DIH starts getting this
complicated, I start to seriously think about SolrJ, see:
https://lucidworks.com/2012/02/14/indexing-with-solrj/

IN particular, it moves the heavy lifting of acquiring the data from a
Solr node (which I'm assuming also has to index docs) to "some
client". It also let's you play some tricks with the code to make
things faster.

Best,
Erick

On Thu, Feb 15, 2018 at 1:00 AM, Srinivas Kashyap
 wrote:
> Hi,
>
> I have implemented 'SortedMapBackedCache' in my SqlEntityProcessor for the 
> child entities in data-config.xml. And i'm using the same for full-import 
> only. And in the beginning of my implementation, i had written delta-import 
> query to index the modified changes. But my requirement grew and i have 17 
> child entities for a single parent entity now. When doing delta-import for 
> huge data, the number of requests being made to datasource(database)  became 
> more and CPU utilization was 100% when concurrent users started modifying the 
> data. For this instead of calling delta-import which imports based on last 
> index time, I did full-import('SortedMapBackedCache' ) based on last index 
> time.
>
> Though the parent entity query would return only records that are modified, 
> the child entity queries pull all the data from the database and the indexing 
> happens 'in-memory' which is causing the JVM memory go out of memory.
>
> Is there a way to specify in the child query entity to pull the record 
> related to parent entity in the full-import mode.
>
> Thanks and Regards,
> Srinivas Kashyap
>
> DISCLAIMER:
> E-mails and attachments from TradeStone Software, Inc. are confidential.
> If you are not the intended recipient, please notify the sender immediately by
> replying to the e-mail, and then delete it without making copies or using it
> in any way. No representation is made that this email or any attachments are
> free of viruses. Virus scanning is recommended and is the responsibility of
> the recipient.


Solr performance issue

2018-02-15 Thread Srinivas Kashyap
Hi,

I have implemented 'SortedMapBackedCache' in my SqlEntityProcessor for the 
child entities in data-config.xml. And i'm using the same for full-import only. 
And in the beginning of my implementation, i had written delta-import query to 
index the modified changes. But my requirement grew and i have 17 child 
entities for a single parent entity now. When doing delta-import for huge data, 
the number of requests being made to datasource(database)  became more and CPU 
utilization was 100% when concurrent users started modifying the data. For this 
instead of calling delta-import which imports based on last index time, I did 
full-import('SortedMapBackedCache' ) based on last index time.

Though the parent entity query would return only records that are modified, the 
child entity queries pull all the data from the database and the indexing 
happens 'in-memory' which is causing the JVM memory go out of memory.

Is there a way to specify in the child query entity to pull the record related 
to parent entity in the full-import mode.

Thanks and Regards,
Srinivas Kashyap

DISCLAIMER: 
E-mails and attachments from TradeStone Software, Inc. are confidential.
If you are not the intended recipient, please notify the sender immediately by
replying to the e-mail, and then delete it without making copies or using it
in any way. No representation is made that this email or any attachments are
free of viruses. Virus scanning is recommended and is the responsibility of
the recipient.

Re: Solr performance issue on querying --> Solr 6.5.1

2017-09-30 Thread sasarun
Hi Erick, 

As suggested, I did try nonHDFS solr cloud instance and it response looks to
be really better. From the configuration side to, I am mostly using default
configurations and with block.cache.direct.memory.allocation as false.  On
analysis of hdfs cache, evictions seems to be on higher side. 

Thanks, 
Arun



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr performance issue on querying --> Solr 6.5.1

2017-09-27 Thread Emir Arnautović
Hi Arun,
It is hard to measure something without affecting it, but we could use debug 
results and combine with QTime without debug: If we ignore merging results, it 
seems that majority of time is spent for retrieving docs (~500ms). You should 
consider reducing number of rows if you want better response time (you can ask 
for rows=0 to see max possible time). Also, as Erick suggested, reducing number 
of shards (1 if not plan much more doc) will trim some overhead of merging 
results.

Thanks,
Emir

I noticed that you removed bq - is time with bq acceptable as well?
> On 27 Sep 2017, at 12:34, sasarun  wrote:
> 
> Hi Emir, 
> 
> Please find the response without bq parameter and debugQuery set to true. 
> Also it was noted that Qtime comes down drastically without the debug
> parameter to about 700-800. 
> 
> 
> true
> 0
> 3446
> 
> 
> ("hybrid electric powerplant" "hybrid electric powerplants" "Electric"
> "Electrical" "Electricity" "Engine" "fuel economy" "fuel efficiency" "Hybrid
> Electric Propulsion" "Power Systems" "Powerplant" "Propulsion" "hybrid"
> "hybrid electric" "electric powerplant")
> 
> edismax
> on
> 
> host
> title
> url
> customContent
> contentSpecificSearch
> 
> 
> id
> contentOntologyTagsCount
> 
> 0
> OR
> 3985d7e2-3e54-48d8-8336-229e85f5d9de
> 600
> true
> 
> 
>  maxScore="56.74194">...
> 
> 
> 
> solr-prd-cluster-m-GooglePatent_shard4_replica2-1506504238282-20
> 
> 
> 
> 35
> 159
> GET_TOP_IDS
> 41294
> ...
> 
> 
> 29
> 165
> GET_TOP_IDS
> 40980
> ...
> 
> 
> 31
> 200
> GET_TOP_IDS
> 41006
> ...
> 
> 
> 43
> 208
> GET_TOP_IDS
> 41040
> ...
> 
> 
> 181
> 466
> GET_TOP_IDS
> 41138
> ...
> 
> 
> 
> 
> 1518
> 1523
> GET_FIELDS,GET_DEBUG
> 110
> ...
> 
> 
> 1562
> 1573
> GET_FIELDS,GET_DEBUG
> 115
> ...
> 
> 
> 1793
> 1800
> GET_FIELDS,GET_DEBUG
> 120
> ...
> 
> 
> 2153
> 2161
> GET_FIELDS,GET_DEBUG
> 125
> ...
> 
> 
> 2957
> 2970
> GET_FIELDS,GET_DEBUG
> 130
> ...
> 
> 
> 
> 
> 10302.0
> 
> 2.0
> 
> 2.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 
> 10288.0
> 
> 661.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 9627.0
> 
> 
> 
> 
> ("hybrid electric powerplant" "hybrid electric powerplants" "Electric"
> "Electrical" "Electricity" "Engine" "fuel economy" "fuel efficiency" "Hybrid
> Electric Propulsion" "Power Systems" "Powerplant" "Propulsion" "hybrid"
> "hybrid electric" "electric powerplant")
> 
> 
> ("hybrid electric powerplant" "hybrid electric powerplants" "Electric"
> "Electrical" "Electricity" "Engine" "fuel economy" "fuel efficiency" "Hybrid
> Electric Propulsion" "Power Systems" "Powerplant" "Propulsion" "hybrid"
> "hybrid electric" "electric powerplant")
> 
> 
> (+(DisjunctionMaxQuery((host:hybrid electric powerplant |
> contentSpecificSearch:"hybrid electric powerplant" | customContent:"hybrid
> electric powerplant" | title:hybrid electric powerplant | url:hybrid
> electric powerplant)) DisjunctionMaxQuery((host:hybrid electric powerplants
> | contentSpecificSearch:"hybrid electric powerplants" |
> customContent:"hybrid electric powerplants" | title:hybrid electric
> powerplants | url:hybrid electric powerplants))
> DisjunctionMaxQuery((host:Electric | contentSpecificSearch:electric |
> customContent:electric | title:Electric | url:Electric))
> DisjunctionMaxQuery((host:Electrical | contentSpecificSearch:electrical |
> customContent:electrical | title:Electrical | url:Electrical))
> DisjunctionMaxQuery((host:Electricity | contentSpecificSearch:electricity |
> customContent:electricity | title:Electricity | url:Electricity))
> DisjunctionMaxQuery((host:Engine | contentSpecificSearch:engine |
> customContent:engine | title:Engine | url:Engine))
> DisjunctionMaxQuery((host:fuel economy | contentSpecificSearch:"fuel
> economy" | customContent:"fuel economy" | title:fuel economy | url:fuel
> economy)) DisjunctionMaxQuery((host:fuel efficiency |
> contentSpecificSearch:"fuel efficiency" | customContent:"fuel efficiency" |
> title:fuel efficiency | url:fuel efficiency))
> DisjunctionMaxQuery((host:Hybrid Electric Propulsion |
> contentSpecificSearch:"hybrid electric propulsion" | customContent:"hybrid
> electric propulsion" | title:Hybrid Electric Propulsion | url:Hybrid
> Electric Propulsion)) DisjunctionMaxQuery((host:Power Systems |
> contentSpecificSearch:"power systems" | customContent:"power systems" |
> title:Power Systems | url:Power Systems))
> DisjunctionMaxQuery((host:Powerplant | contentSpecificSearch:powerplant |
> customContent:powerplant | title:Powerplant | url:Powerplant))
> DisjunctionMaxQuery((host:Propulsion | contentSpecificSearch:propulsion |
> customContent:propulsion | title:Propulsion | url:Propulsion))
> DisjunctionMaxQuery((host:hybrid | contentSpecificSearch:hybrid |
> customContent:hybrid | title:hybrid | url:hybrid))
> DisjunctionMaxQuery((host:hybrid electric | contentSpecificSearch:"hybrid
> electric" | customContent:"hybrid electric" | title:h

Re: Solr performance issue on querying --> Solr 6.5.1

2017-09-27 Thread sasarun
Hi Emir, 

Please find the response without bq parameter and debugQuery set to true. 
Also it was noted that Qtime comes down drastically without the debug
parameter to about 700-800. 


true
0
3446


("hybrid electric powerplant" "hybrid electric powerplants" "Electric"
"Electrical" "Electricity" "Engine" "fuel economy" "fuel efficiency" "Hybrid
Electric Propulsion" "Power Systems" "Powerplant" "Propulsion" "hybrid"
"hybrid electric" "electric powerplant")

edismax
on

host
title
url
customContent
contentSpecificSearch


id
contentOntologyTagsCount

0
OR
3985d7e2-3e54-48d8-8336-229e85f5d9de
600
true


...



solr-prd-cluster-m-GooglePatent_shard4_replica2-1506504238282-20



35
159
GET_TOP_IDS
41294
...


29
165
GET_TOP_IDS
40980
...


31
200
GET_TOP_IDS
41006
...


43
208
GET_TOP_IDS
41040
...


181
466
GET_TOP_IDS
41138
...




1518
1523
GET_FIELDS,GET_DEBUG
110
...


1562
1573
GET_FIELDS,GET_DEBUG
115
...


1793
1800
GET_FIELDS,GET_DEBUG
120
...


2153
2161
GET_FIELDS,GET_DEBUG
125
...


2957
2970
GET_FIELDS,GET_DEBUG
130
...




10302.0

2.0

2.0


0.0


0.0


0.0


0.0


0.0


0.0


0.0


0.0



10288.0

661.0


0.0


0.0


0.0


0.0


0.0


0.0


0.0


9627.0




("hybrid electric powerplant" "hybrid electric powerplants" "Electric"
"Electrical" "Electricity" "Engine" "fuel economy" "fuel efficiency" "Hybrid
Electric Propulsion" "Power Systems" "Powerplant" "Propulsion" "hybrid"
"hybrid electric" "electric powerplant")


("hybrid electric powerplant" "hybrid electric powerplants" "Electric"
"Electrical" "Electricity" "Engine" "fuel economy" "fuel efficiency" "Hybrid
Electric Propulsion" "Power Systems" "Powerplant" "Propulsion" "hybrid"
"hybrid electric" "electric powerplant")


(+(DisjunctionMaxQuery((host:hybrid electric powerplant |
contentSpecificSearch:"hybrid electric powerplant" | customContent:"hybrid
electric powerplant" | title:hybrid electric powerplant | url:hybrid
electric powerplant)) DisjunctionMaxQuery((host:hybrid electric powerplants
| contentSpecificSearch:"hybrid electric powerplants" |
customContent:"hybrid electric powerplants" | title:hybrid electric
powerplants | url:hybrid electric powerplants))
DisjunctionMaxQuery((host:Electric | contentSpecificSearch:electric |
customContent:electric | title:Electric | url:Electric))
DisjunctionMaxQuery((host:Electrical | contentSpecificSearch:electrical |
customContent:electrical | title:Electrical | url:Electrical))
DisjunctionMaxQuery((host:Electricity | contentSpecificSearch:electricity |
customContent:electricity | title:Electricity | url:Electricity))
DisjunctionMaxQuery((host:Engine | contentSpecificSearch:engine |
customContent:engine | title:Engine | url:Engine))
DisjunctionMaxQuery((host:fuel economy | contentSpecificSearch:"fuel
economy" | customContent:"fuel economy" | title:fuel economy | url:fuel
economy)) DisjunctionMaxQuery((host:fuel efficiency |
contentSpecificSearch:"fuel efficiency" | customContent:"fuel efficiency" |
title:fuel efficiency | url:fuel efficiency))
DisjunctionMaxQuery((host:Hybrid Electric Propulsion |
contentSpecificSearch:"hybrid electric propulsion" | customContent:"hybrid
electric propulsion" | title:Hybrid Electric Propulsion | url:Hybrid
Electric Propulsion)) DisjunctionMaxQuery((host:Power Systems |
contentSpecificSearch:"power systems" | customContent:"power systems" |
title:Power Systems | url:Power Systems))
DisjunctionMaxQuery((host:Powerplant | contentSpecificSearch:powerplant |
customContent:powerplant | title:Powerplant | url:Powerplant))
DisjunctionMaxQuery((host:Propulsion | contentSpecificSearch:propulsion |
customContent:propulsion | title:Propulsion | url:Propulsion))
DisjunctionMaxQuery((host:hybrid | contentSpecificSearch:hybrid |
customContent:hybrid | title:hybrid | url:hybrid))
DisjunctionMaxQuery((host:hybrid electric | contentSpecificSearch:"hybrid
electric" | customContent:"hybrid electric" | title:hybrid electric |
url:hybrid electric)) DisjunctionMaxQuery((host:electric powerplant |
contentSpecificSearch:"electric powerplant" | customContent:"electric
powerplant" | title:electric powerplant | url:electric
powerplant/no_coord


+((host:hybrid electric powerplant | contentSpecificSearch:"hybrid electric
powerplant" | customContent:"hybrid electric powerplant" | title:hybrid
electric powerplant | url:hybrid electric powerplant) (host:hybrid electric
powerplants | contentSpecificSearch:"hybrid electric powerplants" |
customContent:"hybrid electric powerplants" | title:hybrid electric
powerplants | url:hybrid electric powerplants) (host:Electric |
contentSpecificSearch:electric | customContent:electric | title:Electric |
url:Electric) (host:Electrical | contentSpecificSearch:electrical |
customContent:electrical | title:Electrical | url:Electrical)
(host:Electricity | contentSpecificSearch:electricity |
customContent:electricity | title:Electricity | url:Electricity)
(host:Engine | contentSpecificSearch:engine | customContent:engine |
title:Engine | url:Engine) (host:fuel econ

Re: Solr performance issue on querying --> Solr 6.5.1

2017-09-27 Thread sasarun
Hi Erick, 

Qtime comes down with rows set as 1. Also it was noted that qtime comes down
when debug parameter is not added with the query. It comes to about 900.

Thanks, 
Arun 



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr performance issue on querying --> Solr 6.5.1

2017-09-27 Thread Toke Eskildsen
On Tue, 2017-09-26 at 07:43 -0700, sasarun wrote:
> Allocated heap size for young generation is about 8 gb and old 
> generation is about 24 gb. And gc analysis showed peak
> size utlisation is really low compared to these values.

That does not come as a surprise. Your collections would normally be
considered small, if not tiny, looking only at their size measured in
bytes. Again, if you expect them to grow significantly (more than 10x),
your allocation might make sense. If you do not expect such a growth in
the near future, you will be better off with a much smaller heap: The
peak heap utilization that you have logged (or twice that to err on the
cautious side) seems a good starting point.

And whatever you do, don't set Xmx to 32GB. Use <31GB or significantly
more than 32GB:
https://blog.codecentric.de/en/2014/02/35gb-heap-less-32gb-java-jvm-mem
ory-oddities/


Are you indexing while you search? If so, you need to set auto-warm or
state a few explicit warmup-queries. If not, your measuring will not be
representative as it will be on first-searches, which are always slower
than warmed-searches.


- Toke Eskildsen, Royal Danish Library



Re: Solr performance issue on querying --> Solr 6.5.1

2017-09-27 Thread Emir Arnautović
Hi Arun,
This is not the most simple query either - a dozen of phrase queries on several 
fields + the same query as bq. Can you provide debugQuery info.
I did not look much into debug times and what includes what, but one thing that 
is strange to me is that QTime is 4s while query in debug is 1.3s. Can you try 
running without bq? Can you include boost factors in the main query?

Thanks,
Emir

> On 26 Sep 2017, at 16:43, sasarun  wrote:
> 
> Hi All, 
> I have been using Solr for some time now but mostly in standalone mode. Now
> my current project is using Solr 6.5.1 hosted on hadoop. My solrconfig.xml
> has the following configuration. In the prod environment the performance on
> querying seems to really slow. Can anyone help me with few pointers on
> howimprove on the same. 
> 
> 
>${solr.hdfs.home:}
> name="solr.hdfs.blockcache.enabled">${solr.hdfs.blockcache.enabled:true}
> name="solr.hdfs.blockcache.slab.count">${solr.hdfs.blockcache.slab.count:1}
> name="solr.hdfs.blockcache.direct.memory.allocation">${solr.hdfs.blockcache.direct.memory.allocation:false}
> name="solr.hdfs.blockcache.blocksperbank">${solr.hdfs.blockcache.blocksperbank:16384}
> name="solr.hdfs.blockcache.read.enabled">${solr.hdfs.blockcache.read.enabled:true}
> name="solr.hdfs.blockcache.write.enabled">${solr.hdfs.blockcache.write.enabled:false}
> name="solr.hdfs.nrtcachingdirectory.enable">${solr.hdfs.nrtcachingdirectory.enable:true}
> name="solr.hdfs.nrtcachingdirectory.maxmergesizemb">${solr.hdfs.nrtcachingdirectory.maxmergesizemb:16}
> name="solr.hdfs.nrtcachingdirectory.maxcachedmb">${solr.hdfs.nrtcachingdirectory.maxcachedmb:192}
> 
>hdfs
> It has 6 collections of following size 
> Collection 1 -->6.41 MB
> Collection 2 -->634.51 KB 
> Collection 3 -->4.59 MB 
> Collection 4 -->1,020.56 MB 
> Collection 5 --> 607.26 MB
> Collection 6 -->102.4 kb
> Each Collection has 5 shards each. Allocated heap size for young generation
> is about 8 gb and old generation is about 24 gb. And gc analysis showed peak
> size 
> utlisation is really low compared to these values. 
> But querying to Collection 4 and collection 5 is giving really slow response
> even thoughwe are not using any complex queries.Output of debug quries run
> with debug=timing
> are given below for reference. Can anyone help suggest a way improve the
> performance.
> 
> Response to query
> 
> 
> true
> 0
> 3962
> 
> 
> ("hybrid electric powerplant" "hybrid electric powerplants" "Electric"
> "Electrical" "Electricity" "Engine" "fuel economy" "fuel efficiency" "Hybrid
> Electric Propulsion" "Power Systems" "Powerplant" "Propulsion" "hybrid"
> "hybrid electric" "electric powerplant")
> 
> edismax
> true
> on
> 
> host
> title
> url
> customContent
> contentSpecificSearch
> 
> 
> id
> contentTagsCount
> 
> 0
> OR
> OR
> 3985d7e2-3e54-48d8-8336-229e85f5d9de
> 600
> 
> ("hybrid electric powerplant"^100.0 "hybrid electric powerplants"^100.0
> "Electric"^50.0 "Electrical"^50.0 "Electricity"^50.0 "Engine"^50.0 "fuel
> economy"^50.0 "fuel efficiency"^50.0 "Hybrid Electric Propulsion"^50.0
> "Power Systems"^50.0 "Powerplant"^50.0 "Propulsion"^50.0 "hybrid"^15.0
> "hybrid electric"^15.0 "electric powerplant"^15.0)
> 
> 
> 
> 
> 
> 15374.0
> 
> 2.0
> 
> 2.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 
> 15363.0
> 
> 1313.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 14048.0
> 
> 
> 
> 
> 
> Thanks,
> Arun
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Re: Solr performance issue on querying --> Solr 6.5.1

2017-09-26 Thread Erick Erickson
Well, 15 second responses are not what I'd expect either. But two
things (just looked again)

1> note that the time to assemble the debug information is a large
majority of your total time (14 of 15.3 seconds).
2> you're specifying 600 rows which is quite a lot as each one
requires that a 16K block of data be read from disk and decompressed
to assemble the "fl" list.

so one quick test would be to set rows=1 or something. All that said,
the QTime value returned does _not_ include <1> or <2> above and even
4 seconds seems excessive.

Best,
Erick

On Tue, Sep 26, 2017 at 10:54 AM, sasarun  wrote:
> Hi Erick,
>
> Thank you for the quick response. Query time was relatively faster once it
> is read from memory. But personally I always felt response time could be far
> better. As suggested, We will try and set up in a non HDFS environment and
> update on the results.
>
> Thanks,
> Arun
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr performance issue on querying --> Solr 6.5.1

2017-09-26 Thread sasarun
Hi Erick, 

Thank you for the quick response. Query time was relatively faster once it
is read from memory. But personally I always felt response time could be far
better. As suggested, We will try and set up in a non HDFS environment and
update on the results. 

Thanks, 
Arun



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr performance issue on querying --> Solr 6.5.1

2017-09-26 Thread Erick Erickson
Does the query time _stay_ low? Once the data is read from HDFS it
should pretty much stay in memory. So my question is whether, once
Solr warms up you see this kind of query response time.

Have you tried this on a non HDFS system? That would be useful to help
figure out where to look.

And given the sizes of your collections, unless you expect them to get
much larger, there's no reason to shard any of them. Sharding should
only really be used when the collections are too big for a single
shard as distributed searches inevitably have increased overhead. I
expect _at least_ 20M documents/shard, and have seen 200M docs/shard.
YMMV of course.

Best,
Erick

On Tue, Sep 26, 2017 at 7:43 AM, sasarun  wrote:
> Hi All,
> I have been using Solr for some time now but mostly in standalone mode. Now
> my current project is using Solr 6.5.1 hosted on hadoop. My solrconfig.xml
> has the following configuration. In the prod environment the performance on
> querying seems to really slow. Can anyone help me with few pointers on
> howimprove on the same.
>
> 
> ${solr.hdfs.home:}
>  name="solr.hdfs.blockcache.enabled">${solr.hdfs.blockcache.enabled:true}
>  name="solr.hdfs.blockcache.slab.count">${solr.hdfs.blockcache.slab.count:1}
>  name="solr.hdfs.blockcache.direct.memory.allocation">${solr.hdfs.blockcache.direct.memory.allocation:false}
>  name="solr.hdfs.blockcache.blocksperbank">${solr.hdfs.blockcache.blocksperbank:16384}
>  name="solr.hdfs.blockcache.read.enabled">${solr.hdfs.blockcache.read.enabled:true}
>  name="solr.hdfs.blockcache.write.enabled">${solr.hdfs.blockcache.write.enabled:false}
>  name="solr.hdfs.nrtcachingdirectory.enable">${solr.hdfs.nrtcachingdirectory.enable:true}
>  name="solr.hdfs.nrtcachingdirectory.maxmergesizemb">${solr.hdfs.nrtcachingdirectory.maxmergesizemb:16}
>  name="solr.hdfs.nrtcachingdirectory.maxcachedmb">${solr.hdfs.nrtcachingdirectory.maxcachedmb:192}
> 
> hdfs
> It has 6 collections of following size
> Collection 1 -->6.41 MB
> Collection 2 -->634.51 KB
> Collection 3 -->4.59 MB
> Collection 4 -->1,020.56 MB
> Collection 5 --> 607.26 MB
> Collection 6 -->102.4 kb
> Each Collection has 5 shards each. Allocated heap size for young generation
> is about 8 gb and old generation is about 24 gb. And gc analysis showed peak
> size
> utlisation is really low compared to these values.
> But querying to Collection 4 and collection 5 is giving really slow response
> even thoughwe are not using any complex queries.Output of debug quries run
> with debug=timing
> are given below for reference. Can anyone help suggest a way improve the
> performance.
>
> Response to query
> 
> 
> true
> 0
> 3962
> 
> 
> ("hybrid electric powerplant" "hybrid electric powerplants" "Electric"
> "Electrical" "Electricity" "Engine" "fuel economy" "fuel efficiency" "Hybrid
> Electric Propulsion" "Power Systems" "Powerplant" "Propulsion" "hybrid"
> "hybrid electric" "electric powerplant")
> 
> edismax
> true
> on
> 
> host
> title
> url
> customContent
> contentSpecificSearch
> 
> 
> id
> contentTagsCount
> 
> 0
> OR
> OR
> 3985d7e2-3e54-48d8-8336-229e85f5d9de
> 600
> 
> ("hybrid electric powerplant"^100.0 "hybrid electric powerplants"^100.0
> "Electric"^50.0 "Electrical"^50.0 "Electricity"^50.0 "Engine"^50.0 "fuel
> economy"^50.0 "fuel efficiency"^50.0 "Hybrid Electric Propulsion"^50.0
> "Power Systems"^50.0 "Powerplant"^50.0 "Propulsion"^50.0 "hybrid"^15.0
> "hybrid electric"^15.0 "electric powerplant"^15.0)
> 
> 
> 
> 
> 
> 15374.0
> 
> 2.0
> 
> 2.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 
> 15363.0
> 
> 1313.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 0.0
> 
> 
> 14048.0
> 
> 
> 
>
>
> Thanks,
> Arun
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Solr performance issue on querying --> Solr 6.5.1

2017-09-26 Thread sasarun
Hi All, 
I have been using Solr for some time now but mostly in standalone mode. Now
my current project is using Solr 6.5.1 hosted on hadoop. My solrconfig.xml
has the following configuration. In the prod environment the performance on
querying seems to really slow. Can anyone help me with few pointers on
howimprove on the same. 


${solr.hdfs.home:}
${solr.hdfs.blockcache.enabled:true}
${solr.hdfs.blockcache.slab.count:1}
${solr.hdfs.blockcache.direct.memory.allocation:false}
${solr.hdfs.blockcache.blocksperbank:16384}
${solr.hdfs.blockcache.read.enabled:true}
${solr.hdfs.blockcache.write.enabled:false}
${solr.hdfs.nrtcachingdirectory.enable:true}
${solr.hdfs.nrtcachingdirectory.maxmergesizemb:16}
${solr.hdfs.nrtcachingdirectory.maxcachedmb:192}

hdfs
It has 6 collections of following size 
Collection 1 -->6.41 MB
Collection 2 -->634.51 KB 
Collection 3 -->4.59 MB 
Collection 4 -->1,020.56 MB 
Collection 5 --> 607.26 MB
Collection 6 -->102.4 kb
Each Collection has 5 shards each. Allocated heap size for young generation
is about 8 gb and old generation is about 24 gb. And gc analysis showed peak
size 
utlisation is really low compared to these values. 
But querying to Collection 4 and collection 5 is giving really slow response
even thoughwe are not using any complex queries.Output of debug quries run
with debug=timing
are given below for reference. Can anyone help suggest a way improve the
performance.

Response to query


true
0
3962


("hybrid electric powerplant" "hybrid electric powerplants" "Electric"
"Electrical" "Electricity" "Engine" "fuel economy" "fuel efficiency" "Hybrid
Electric Propulsion" "Power Systems" "Powerplant" "Propulsion" "hybrid"
"hybrid electric" "electric powerplant")

edismax
true
on

host
title
url
customContent
contentSpecificSearch


id
contentTagsCount

0
OR
OR
3985d7e2-3e54-48d8-8336-229e85f5d9de
600

("hybrid electric powerplant"^100.0 "hybrid electric powerplants"^100.0
"Electric"^50.0 "Electrical"^50.0 "Electricity"^50.0 "Engine"^50.0 "fuel
economy"^50.0 "fuel efficiency"^50.0 "Hybrid Electric Propulsion"^50.0
"Power Systems"^50.0 "Powerplant"^50.0 "Propulsion"^50.0 "hybrid"^15.0
"hybrid electric"^15.0 "electric powerplant"^15.0)





15374.0

2.0

2.0


0.0


0.0


0.0


0.0


0.0


0.0


0.0


0.0



15363.0

1313.0


0.0


0.0


0.0


0.0


0.0


0.0


0.0


14048.0





Thanks,
Arun



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


RE: Solr performance issue on indexing

2017-04-04 Thread Allison, Timothy B.
>  Also we will try to decouple tika to solr.
+1


-Original Message-
From: tstusr [mailto:ulfrhe...@gmail.com] 
Sent: Friday, March 31, 2017 4:31 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr performance issue on indexing

Hi, thanks for the feedback.

Yes, it is about OOM, indeed even solr instance makes unavailable. As I was 
saying I can't find more relevant information on logs.

We're are able to increment JVM amout, so, the first thing we'll do will be 
that.

As far as I know, all documents are bounded to that amount (14K), just the 
processing could change. We are making some tests on indexing and it seems it 
works without concurrent threads. Also we will try to decouple tika to solr.

By the way, make it available with solr cloud will improve performance? Or 
there will be no perceptible improvement?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-performance-issue-on-indexing-tp4327886p4327914.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr performance issue on indexing

2017-03-31 Thread Erick Erickson
If, by chance, the docs you're sending get routed to different Solr
nodes then all the processing is in parallel. I don't know if there's
a good way to insure that the docs get sent to different replicas on
different Solr instances. You could try addressing specific Solr
replicas, something like "blah
blah/solr/collection1_shard1_replica1/export" but I'm not totally sure
that'll do what you want either.

 But that still doesn't decouple Tika from the Solr instances running
those replicas. So if Tika has a problem it has the potential to bring
the Solr node down.

Best,
Erick

On Fri, Mar 31, 2017 at 1:31 PM, tstusr  wrote:
> Hi, thanks for the feedback.
>
> Yes, it is about OOM, indeed even solr instance makes unavailable. As I was
> saying I can't find more relevant information on logs.
>
> We're are able to increment JVM amout, so, the first thing we'll do will be
> that.
>
> As far as I know, all documents are bounded to that amount (14K), just the
> processing could change. We are making some tests on indexing and it seems
> it works without concurrent threads. Also we will try to decouple tika to
> solr.
>
> By the way, make it available with solr cloud will improve performance? Or
> there will be no perceptible improvement?
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-performance-issue-on-indexing-tp4327886p4327914.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr performance issue on indexing

2017-03-31 Thread tstusr
Hi, thanks for the feedback.

Yes, it is about OOM, indeed even solr instance makes unavailable. As I was
saying I can't find more relevant information on logs.

We're are able to increment JVM amout, so, the first thing we'll do will be
that.

As far as I know, all documents are bounded to that amount (14K), just the
processing could change. We are making some tests on indexing and it seems
it works without concurrent threads. Also we will try to decouple tika to
solr.

By the way, make it available with solr cloud will improve performance? Or
there will be no perceptible improvement?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-performance-issue-on-indexing-tp4327886p4327914.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr performance issue on indexing

2017-03-31 Thread Erick Erickson
First, running multiple threads with PDF files to a Solr running 4G of
JVM is...ambitious. You say it crashes; how? OOMs?

Second while the extracting request handler is a fine way to get up
and running, any problems with Tika will affect Solr. Tika does a
great job of extraction, but there are so many variants of so many
file formats that this scenario isn' recommended for production.
Consider extracting the PDF on a client and sending the docs to Solr.
Tika can run as a server also so you aren't coupling Solr and Tika.

For a sample SolrJ program, see:
https://lucidworks.com/2012/02/14/indexing-with-solrj/

Best,
Erick

On Fri, Mar 31, 2017 at 10:44 AM, tstusr  wrote:
> Hi there.
>
> We are currently indexing some PDF files, the main handler to index is
> /extract where we perform simple processing (extract relevant fields and
> store on some fields).
>
> The PDF files are about 10M~100M size and we have to have available the text
> extracted. So, everything works correct on test stages, but when we try to
> index all the 14K files (around 120Gb) on a client application that only
> sends http curls through 3-4 concurrent threads to /extract handler it
> crashes. I can't find some relevant information about on solr logs (We
> checked in server/logs & in core_dir/tlog).
>
> My question is about performance. I think it is a small amount of info we
> are processing, the deploy scenario is in a docker container with 4gb of JVM
> Memory and ~50gb of physical memory (reported through dashboard) we are
> using a single instance.
>
> I don't think is a normal behaviour that handler crashes. So, what are some
> general tips about improving performance for this scenario?
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-performance-issue-on-indexing-tp4327886.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Solr performance issue on indexing

2017-03-31 Thread tstusr
Hi there.

We are currently indexing some PDF files, the main handler to index is
/extract where we perform simple processing (extract relevant fields and
store on some fields). 

The PDF files are about 10M~100M size and we have to have available the text
extracted. So, everything works correct on test stages, but when we try to
index all the 14K files (around 120Gb) on a client application that only
sends http curls through 3-4 concurrent threads to /extract handler it
crashes. I can't find some relevant information about on solr logs (We
checked in server/logs & in core_dir/tlog).

My question is about performance. I think it is a small amount of info we
are processing, the deploy scenario is in a docker container with 4gb of JVM
Memory and ~50gb of physical memory (reported through dashboard) we are
using a single instance. 

I don't think is a normal behaviour that handler crashes. So, what are some
general tips about improving performance for this scenario?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-performance-issue-on-indexing-tp4327886.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr performance issue

2016-02-09 Thread Zheng Lin Edwin Yeo
1 million document isn't considered big for Solr. How much RAM does your
machine have?

Regards,
Edwin

On 8 February 2016 at 23:45, Susheel Kumar  wrote:

> 1 million document shouldn't have any issues at all.  Something else is
> wrong with your hw/system configuration.
>
> Thanks,
> Susheel
>
> On Mon, Feb 8, 2016 at 6:45 AM, sara hajili  wrote:
>
> > On Mon, Feb 8, 2016 at 3:04 AM, sara hajili 
> wrote:
> >
> > > sorry i made a mistake i have a bout 1000 K doc.
> > > i mean about 100 doc.
> > >
> > > On Mon, Feb 8, 2016 at 1:35 AM, Emir Arnautovic <
> > > emir.arnauto...@sematext.com> wrote:
> > >
> > >> Hi Sara,
> > >> Not sure if I am reading this right, but I read it as you have 1000
> doc
> > >> index and issues? Can you tell us bit more about your setup: number of
> > >> servers, hw, index size, number of shards, queries that you run, do
> you
> > >> index at the same time...
> > >>
> > >> It seems to me that you are running Solr on server with limited RAM
> and
> > >> probably small heap. Swapping for sure will slow things down and GC is
> > most
> > >> likely reason for high CPU.
> > >>
> > >> You can use http://sematext.com/spm to collect Solr and host metrics
> > and
> > >> see where the issue is.
> > >>
> > >> Thanks,
> > >> Emir
> > >>
> > >> --
> > >> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> > >> Solr & Elasticsearch Support * http://sematext.com/
> > >>
> > >>
> > >>
> > >> On 08.02.2016 10:27, sara hajili wrote:
> > >>
> > >>> hi all.
> > >>> i have a problem with my solr performance and usage hardware like a
> > >>> ram,cup...
> > >>> i have a lot of document and so indexed file about 1000 doc in solr
> > that
> > >>> every doc has about 8 field in average.
> > >>> and each field has about 60 char.
> > >>> i set my field as a storedfield = "false" except of  1 field. // i
> read
> > >>> that this help performance.
> > >>> i used copy field and dynamic field if it was necessary . // i read
> > that
> > >>> this help performance.
> > >>> and now my question is that when i run a lot of query on solr i faced
> > >>> with
> > >>> a problem solr use more cpu and ram and after that filled ,it use a
> lot
> > >>>   swapped storage and then use hard,but doesn't create a system file!
> > >>> solr
> > >>> fill hard until i forced to restart server to release hard disk.
> > >>> and now my question is why solr treat in this way? and how i can
> avoid
> > >>> solr
> > >>> to use huge cpu space?
> > >>> any config need?!
> > >>>
> > >>>
> > >>
> > >
> >
>


Re: solr performance issue

2016-02-08 Thread Susheel Kumar
1 million document shouldn't have any issues at all.  Something else is
wrong with your hw/system configuration.

Thanks,
Susheel

On Mon, Feb 8, 2016 at 6:45 AM, sara hajili  wrote:

> On Mon, Feb 8, 2016 at 3:04 AM, sara hajili  wrote:
>
> > sorry i made a mistake i have a bout 1000 K doc.
> > i mean about 100 doc.
> >
> > On Mon, Feb 8, 2016 at 1:35 AM, Emir Arnautovic <
> > emir.arnauto...@sematext.com> wrote:
> >
> >> Hi Sara,
> >> Not sure if I am reading this right, but I read it as you have 1000 doc
> >> index and issues? Can you tell us bit more about your setup: number of
> >> servers, hw, index size, number of shards, queries that you run, do you
> >> index at the same time...
> >>
> >> It seems to me that you are running Solr on server with limited RAM and
> >> probably small heap. Swapping for sure will slow things down and GC is
> most
> >> likely reason for high CPU.
> >>
> >> You can use http://sematext.com/spm to collect Solr and host metrics
> and
> >> see where the issue is.
> >>
> >> Thanks,
> >> Emir
> >>
> >> --
> >> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> >> Solr & Elasticsearch Support * http://sematext.com/
> >>
> >>
> >>
> >> On 08.02.2016 10:27, sara hajili wrote:
> >>
> >>> hi all.
> >>> i have a problem with my solr performance and usage hardware like a
> >>> ram,cup...
> >>> i have a lot of document and so indexed file about 1000 doc in solr
> that
> >>> every doc has about 8 field in average.
> >>> and each field has about 60 char.
> >>> i set my field as a storedfield = "false" except of  1 field. // i read
> >>> that this help performance.
> >>> i used copy field and dynamic field if it was necessary . // i read
> that
> >>> this help performance.
> >>> and now my question is that when i run a lot of query on solr i faced
> >>> with
> >>> a problem solr use more cpu and ram and after that filled ,it use a lot
> >>>   swapped storage and then use hard,but doesn't create a system file!
> >>> solr
> >>> fill hard until i forced to restart server to release hard disk.
> >>> and now my question is why solr treat in this way? and how i can avoid
> >>> solr
> >>> to use huge cpu space?
> >>> any config need?!
> >>>
> >>>
> >>
> >
>


Re: solr performance issue

2016-02-08 Thread sara hajili
On Mon, Feb 8, 2016 at 3:04 AM, sara hajili  wrote:

> sorry i made a mistake i have a bout 1000 K doc.
> i mean about 100 doc.
>
> On Mon, Feb 8, 2016 at 1:35 AM, Emir Arnautovic <
> emir.arnauto...@sematext.com> wrote:
>
>> Hi Sara,
>> Not sure if I am reading this right, but I read it as you have 1000 doc
>> index and issues? Can you tell us bit more about your setup: number of
>> servers, hw, index size, number of shards, queries that you run, do you
>> index at the same time...
>>
>> It seems to me that you are running Solr on server with limited RAM and
>> probably small heap. Swapping for sure will slow things down and GC is most
>> likely reason for high CPU.
>>
>> You can use http://sematext.com/spm to collect Solr and host metrics and
>> see where the issue is.
>>
>> Thanks,
>> Emir
>>
>> --
>> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
>> Solr & Elasticsearch Support * http://sematext.com/
>>
>>
>>
>> On 08.02.2016 10:27, sara hajili wrote:
>>
>>> hi all.
>>> i have a problem with my solr performance and usage hardware like a
>>> ram,cup...
>>> i have a lot of document and so indexed file about 1000 doc in solr that
>>> every doc has about 8 field in average.
>>> and each field has about 60 char.
>>> i set my field as a storedfield = "false" except of  1 field. // i read
>>> that this help performance.
>>> i used copy field and dynamic field if it was necessary . // i read that
>>> this help performance.
>>> and now my question is that when i run a lot of query on solr i faced
>>> with
>>> a problem solr use more cpu and ram and after that filled ,it use a lot
>>>   swapped storage and then use hard,but doesn't create a system file!
>>> solr
>>> fill hard until i forced to restart server to release hard disk.
>>> and now my question is why solr treat in this way? and how i can avoid
>>> solr
>>> to use huge cpu space?
>>> any config need?!
>>>
>>>
>>
>


Re: solr performance issue

2016-02-08 Thread Emir Arnautovic

Hi Sara,
It is still considered to be small index. Can you give us bit details 
about your setup?


Thanks,
Emir

On 08.02.2016 12:04, sara hajili wrote:

sorry i made a mistake i have a bout 1000 K doc.
i mean about 100 doc.

On Mon, Feb 8, 2016 at 1:35 AM, Emir Arnautovic <
emir.arnauto...@sematext.com> wrote:


Hi Sara,
Not sure if I am reading this right, but I read it as you have 1000 doc
index and issues? Can you tell us bit more about your setup: number of
servers, hw, index size, number of shards, queries that you run, do you
index at the same time...

It seems to me that you are running Solr on server with limited RAM and
probably small heap. Swapping for sure will slow things down and GC is most
likely reason for high CPU.

You can use http://sematext.com/spm to collect Solr and host metrics and
see where the issue is.

Thanks,
Emir

--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



On 08.02.2016 10:27, sara hajili wrote:


hi all.
i have a problem with my solr performance and usage hardware like a
ram,cup...
i have a lot of document and so indexed file about 1000 doc in solr that
every doc has about 8 field in average.
and each field has about 60 char.
i set my field as a storedfield = "false" except of  1 field. // i read
that this help performance.
i used copy field and dynamic field if it was necessary . // i read that
this help performance.
and now my question is that when i run a lot of query on solr i faced with
a problem solr use more cpu and ram and after that filled ,it use a lot
   swapped storage and then use hard,but doesn't create a system file! solr
fill hard until i forced to restart server to release hard disk.
and now my question is why solr treat in this way? and how i can avoid
solr
to use huge cpu space?
any config need?!




--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



Re: solr performance issue

2016-02-08 Thread sara hajili
sorry i made a mistake i have a bout 1000 K doc.
i mean about 100 doc.

On Mon, Feb 8, 2016 at 1:35 AM, Emir Arnautovic <
emir.arnauto...@sematext.com> wrote:

> Hi Sara,
> Not sure if I am reading this right, but I read it as you have 1000 doc
> index and issues? Can you tell us bit more about your setup: number of
> servers, hw, index size, number of shards, queries that you run, do you
> index at the same time...
>
> It seems to me that you are running Solr on server with limited RAM and
> probably small heap. Swapping for sure will slow things down and GC is most
> likely reason for high CPU.
>
> You can use http://sematext.com/spm to collect Solr and host metrics and
> see where the issue is.
>
> Thanks,
> Emir
>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>
>
> On 08.02.2016 10:27, sara hajili wrote:
>
>> hi all.
>> i have a problem with my solr performance and usage hardware like a
>> ram,cup...
>> i have a lot of document and so indexed file about 1000 doc in solr that
>> every doc has about 8 field in average.
>> and each field has about 60 char.
>> i set my field as a storedfield = "false" except of  1 field. // i read
>> that this help performance.
>> i used copy field and dynamic field if it was necessary . // i read that
>> this help performance.
>> and now my question is that when i run a lot of query on solr i faced with
>> a problem solr use more cpu and ram and after that filled ,it use a lot
>>   swapped storage and then use hard,but doesn't create a system file! solr
>> fill hard until i forced to restart server to release hard disk.
>> and now my question is why solr treat in this way? and how i can avoid
>> solr
>> to use huge cpu space?
>> any config need?!
>>
>>
>


Re: solr performance issue

2016-02-08 Thread Emir Arnautovic

Hi Sara,
Not sure if I am reading this right, but I read it as you have 1000 doc 
index and issues? Can you tell us bit more about your setup: number of 
servers, hw, index size, number of shards, queries that you run, do you 
index at the same time...


It seems to me that you are running Solr on server with limited RAM and 
probably small heap. Swapping for sure will slow things down and GC is 
most likely reason for high CPU.


You can use http://sematext.com/spm to collect Solr and host metrics and 
see where the issue is.


Thanks,
Emir

--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On 08.02.2016 10:27, sara hajili wrote:

hi all.
i have a problem with my solr performance and usage hardware like a
ram,cup...
i have a lot of document and so indexed file about 1000 doc in solr that
every doc has about 8 field in average.
and each field has about 60 char.
i set my field as a storedfield = "false" except of  1 field. // i read
that this help performance.
i used copy field and dynamic field if it was necessary . // i read that
this help performance.
and now my question is that when i run a lot of query on solr i faced with
a problem solr use more cpu and ram and after that filled ,it use a lot
  swapped storage and then use hard,but doesn't create a system file! solr
fill hard until i forced to restart server to release hard disk.
and now my question is why solr treat in this way? and how i can avoid solr
to use huge cpu space?
any config need?!





solr performance issue

2016-02-08 Thread sara hajili
hi all.
i have a problem with my solr performance and usage hardware like a
ram,cup...
i have a lot of document and so indexed file about 1000 doc in solr that
every doc has about 8 field in average.
and each field has about 60 char.
i set my field as a storedfield = "false" except of  1 field. // i read
that this help performance.
i used copy field and dynamic field if it was necessary . // i read that
this help performance.
and now my question is that when i run a lot of query on solr i faced with
a problem solr use more cpu and ram and after that filled ,it use a lot
 swapped storage and then use hard,but doesn't create a system file! solr
fill hard until i forced to restart server to release hard disk.
and now my question is why solr treat in this way? and how i can avoid solr
to use huge cpu space?
any config need?!


Re: Solr Performance Issue

2013-12-05 Thread Hien Luu
Thanks Furkan. Looking forward to seeing your test results.

Sent from Yahoo Mail on Android



Re: Solr Performance Issue

2013-12-05 Thread Furkan KAMACI
Hi Hien;

Actually high index rate is a relative concept. I could index such kind of
data within a few hours. I aim to index much much more data within same
time soon. I can share my test results when I do.

Thanks;
Furkan KAMACI

6 Aralık 2013 Cuma tarihinde Hien Luu  adlı kullanıcı şöyle
yazdı:
> Hi Furkan,
>
> Just curious what was the index rate that you were able to achieve?
>
> Regards,
>
> Hien
>
>
>
> On Thursday, December 5, 2013 3:06 PM, Furkan KAMACI <
furkankam...@gmail.com> wrote:
>
> Hi;
>
> Erick and Shawn have explained that we need more information about your
> infrastructure. I should add that: I had test data at my SolrCloud nearly
> as much as yours and I did not have any problems except for when indexing
> at a huge index rate and it can be solved with turning. You should
optimize
> your parameters according to your system. So you should give use more
> information about your system.
>
> Thanks;
> Furkan KAMACI
>
> 4 Aralık 2013 Çarşamba tarihinde Shawn Heisey  adlı
> kullanıcı şöyle yazdı:
>
>> On 12/4/2013 6:31 AM, kumar wrote:
>>> I am having almost 5 to 6 crores of indexed documents in solr. And when
> i am
>>> going to change anything in the configuration file solr server is going
>>> down.
>>
>> If you mean crore and not core, then you are talking about 50 to 60
>> million documents.  That's a lot.  Solr is perfectly capable of handling
>> that many documents, but you do need to have very good hardware.
>>
>> Even if they are small, your index is likely to be many gigabytes in
>> size.  If the documents are large, that might be measured in terabytes.
>>  Large indexes require a lot of memory for good performance.  This will
>> be discussed in more detail below.
>>
>>> As a new user to solr i can't able to find the exact reason for going
> server
>>> down.
>>>
>>> I am using cache's in the following way :
>>>
>>> >>  size="16384"
>>>  initialSize="4096"
>>>  autowarmCount="4096"/>
>>>  >>  size="16384"
>>>  initialSize="4096"
>>>  autowarmCount="1024"/>
>>>
>>> and i am not using any documentCache, fieldValueCahe's
>>
>> As Erick said, these cache sizes are HUGE.  In particular, your
>> autowarmCount values are extremely high.
>>
>>> Whether this can lead any performance issue means going server down.
>>
>> Another thing that Erick pointed out is that you haven't really told us
>> what's happening.  When you say that the server goes down, what EXACTLY
>> do you mean?
>>
>>> And i am seeing logging in the server it is showing exception in the
>>> following way
>>>
>>>
>>> Servlet.service() for servlet [default] in context with path [/solr]
> threw
>>> exception [java.lang.IllegalStateException: Cannot call sendError()
after
>>> the response has been committed] with root cause
>>
>> This message comes from your servlet container, not Solr.  You're
>> probably using Tomcat, not the included Jetty.  There is some indirect
>> evidence that this can be fixed by increasing the servlet container's
>> setting for the maximum number of request parameters.
>>
>> http://forums.adobe.com/message/4590864
>>
>> Here's what I can say without further information:
>>
>> You're likely having performance issues.  One potential problem is your
>> insanely high autowarmCount values.  Your cache configuration tells Solr
>> that every time you have a soft commit or a hard commit with
>> openSearcher=true, you're going to execute up to 1024 queries and up to
>> 4096 filters from the old caches, in order to warm the new caches.  Even
>> if you have an optimal setup, this takes a lot of time.  I suspect that
>> you don't have an optimal setup.
>>
>> Another potential problem is that you don't have enough memory for the
>> size of your index.  A number of potential performance problems are
>> discussed on this wiki page:
>>
>>


Re: Solr Performance Issue

2013-12-05 Thread Shawn Heisey

On 12/5/2013 4:08 PM, Hien Luu wrote:

Just curious what was the index rate that you were able to achieve?


What I've usually seen based on my experience and what people have said 
here and on IRC is that the data source is usually the bottleneck - Solr 
typically indexes VERY fast, as long as you have sized your hardware and 
configuration appropriately.


I import from MySQL.  By running dataimport handlers on all my shards at 
once and using two servers for the entire index, I can do a full 
re-index of 87 million documents on my production hardware in under 5 
hours.  On my single dev server, it takes about 8.5 hours.  I'm not 
using SolrCloud.I'm very7 confident that MySQL is the bottleneck here, 
not Solr.


Thanks,
Shawn



Re: Solr Performance Issue

2013-12-05 Thread Hien Luu
Hi Furkan,

Just curious what was the index rate that you were able to achieve?
 
Regards, 

Hien



On Thursday, December 5, 2013 3:06 PM, Furkan KAMACI  
wrote:
 
Hi;

Erick and Shawn have explained that we need more information about your
infrastructure. I should add that: I had test data at my SolrCloud nearly
as much as yours and I did not have any problems except for when indexing
at a huge index rate and it can be solved with turning. You should optimize
your parameters according to your system. So you should give use more
information about your system.

Thanks;
Furkan KAMACI

4 Aralık 2013 Çarşamba tarihinde Shawn Heisey  adlı
kullanıcı şöyle yazdı:

> On 12/4/2013 6:31 AM, kumar wrote:
>> I am having almost 5 to 6 crores of indexed documents in solr. And when
i am
>> going to change anything in the configuration file solr server is going
>> down.
>
> If you mean crore and not core, then you are talking about 50 to 60
> million documents.  That's a lot.  Solr is perfectly capable of handling
> that many documents, but you do need to have very good hardware.
>
> Even if they are small, your index is likely to be many gigabytes in
> size.  If the documents are large, that might be measured in terabytes.
>  Large indexes require a lot of memory for good performance.  This will
> be discussed in more detail below.
>
>> As a new user to solr i can't able to find the exact reason for going
server
>> down.
>>
>> I am using cache's in the following way :
>>
>> >                  size="16384"
>>                  initialSize="4096"
>>                  autowarmCount="4096"/>
>>  >                      size="16384"
>>                      initialSize="4096"
>>                      autowarmCount="1024"/>
>>
>> and i am not using any documentCache, fieldValueCahe's
>
> As Erick said, these cache sizes are HUGE.  In particular, your
> autowarmCount values are extremely high.
>
>> Whether this can lead any performance issue means going server down.
>
> Another thing that Erick pointed out is that you haven't really told us
> what's happening.  When you say that the server goes down, what EXACTLY
> do you mean?
>
>> And i am seeing logging in the server it is showing exception in the
>> following way
>>
>>
>> Servlet.service() for servlet [default] in context with path [/solr]
threw
>> exception [java.lang.IllegalStateException: Cannot call sendError() after
>> the response has been committed] with root cause
>
> This message comes from your servlet container, not Solr.  You're
> probably using Tomcat, not the included Jetty.  There is some indirect
> evidence that this can be fixed by increasing the servlet container's
> setting for the maximum number of request parameters.
>
> http://forums.adobe.com/message/4590864
>
> Here's what I can say without further information:
>
> You're likely having performance issues.  One potential problem is your
> insanely high autowarmCount values.  Your cache configuration tells Solr
> that every time you have a soft commit or a hard commit with
> openSearcher=true, you're going to execute up to 1024 queries and up to
> 4096 filters from the old caches, in order to warm the new caches.  Even
> if you have an optimal setup, this takes a lot of time.  I suspect that
> you don't have an optimal setup.
>
> Another potential problem is that you don't have enough memory for the
> size of your index.  A number of potential performance problems are
> discussed on this wiki page:
>
> http://wiki.apache.org/solr/SolrPerformanceProblems
>
> A lot more details are required.  Here's some things that will be
> helpful, and more is always better:
>
> * Exact symptoms.
> * Excerpts from the Solr logfile that include entire stacktraces.
> * Operating system and version.
> * Total server index size on disk.
> * Total machine memory.
> * Java heap size for your servlet container.
> * Which servlet container you are using to run Solr.
> * Solr version.
> * Server hardware details.
>
> Thanks,
> Shawn
>
>

Re: Solr Performance Issue

2013-12-05 Thread Furkan KAMACI
Hi;

Erick and Shawn have explained that we need more information about your
infrastructure. I should add that: I had test data at my SolrCloud nearly
as much as yours and I did not have any problems except for when indexing
at a huge index rate and it can be solved with turning. You should optimize
your parameters according to your system. So you should give use more
information about your system.

Thanks;
Furkan KAMACI

4 Aralık 2013 Çarşamba tarihinde Shawn Heisey  adlı
kullanıcı şöyle yazdı:
> On 12/4/2013 6:31 AM, kumar wrote:
>> I am having almost 5 to 6 crores of indexed documents in solr. And when
i am
>> going to change anything in the configuration file solr server is going
>> down.
>
> If you mean crore and not core, then you are talking about 50 to 60
> million documents.  That's a lot.  Solr is perfectly capable of handling
> that many documents, but you do need to have very good hardware.
>
> Even if they are small, your index is likely to be many gigabytes in
> size.  If the documents are large, that might be measured in terabytes.
>  Large indexes require a lot of memory for good performance.  This will
> be discussed in more detail below.
>
>> As a new user to solr i can't able to find the exact reason for going
server
>> down.
>>
>> I am using cache's in the following way :
>>
>> >  size="16384"
>>  initialSize="4096"
>>  autowarmCount="4096"/>
>>  >  size="16384"
>>  initialSize="4096"
>>  autowarmCount="1024"/>
>>
>> and i am not using any documentCache, fieldValueCahe's
>
> As Erick said, these cache sizes are HUGE.  In particular, your
> autowarmCount values are extremely high.
>
>> Whether this can lead any performance issue means going server down.
>
> Another thing that Erick pointed out is that you haven't really told us
> what's happening.  When you say that the server goes down, what EXACTLY
> do you mean?
>
>> And i am seeing logging in the server it is showing exception in the
>> following way
>>
>>
>> Servlet.service() for servlet [default] in context with path [/solr]
threw
>> exception [java.lang.IllegalStateException: Cannot call sendError() after
>> the response has been committed] with root cause
>
> This message comes from your servlet container, not Solr.  You're
> probably using Tomcat, not the included Jetty.  There is some indirect
> evidence that this can be fixed by increasing the servlet container's
> setting for the maximum number of request parameters.
>
> http://forums.adobe.com/message/4590864
>
> Here's what I can say without further information:
>
> You're likely having performance issues.  One potential problem is your
> insanely high autowarmCount values.  Your cache configuration tells Solr
> that every time you have a soft commit or a hard commit with
> openSearcher=true, you're going to execute up to 1024 queries and up to
> 4096 filters from the old caches, in order to warm the new caches.  Even
> if you have an optimal setup, this takes a lot of time.  I suspect that
> you don't have an optimal setup.
>
> Another potential problem is that you don't have enough memory for the
> size of your index.  A number of potential performance problems are
> discussed on this wiki page:
>
> http://wiki.apache.org/solr/SolrPerformanceProblems
>
> A lot more details are required.  Here's some things that will be
> helpful, and more is always better:
>
> * Exact symptoms.
> * Excerpts from the Solr logfile that include entire stacktraces.
> * Operating system and version.
> * Total server index size on disk.
> * Total machine memory.
> * Java heap size for your servlet container.
> * Which servlet container you are using to run Solr.
> * Solr version.
> * Server hardware details.
>
> Thanks,
> Shawn
>
>


Re: Solr Performance Issue

2013-12-04 Thread Shawn Heisey
On 12/4/2013 6:31 AM, kumar wrote:
> I am having almost 5 to 6 crores of indexed documents in solr. And when i am
> going to change anything in the configuration file solr server is going
> down.

If you mean crore and not core, then you are talking about 50 to 60
million documents.  That's a lot.  Solr is perfectly capable of handling
that many documents, but you do need to have very good hardware.

Even if they are small, your index is likely to be many gigabytes in
size.  If the documents are large, that might be measured in terabytes.
 Large indexes require a lot of memory for good performance.  This will
be discussed in more detail below.

> As a new user to solr i can't able to find the exact reason for going server
> down.
> 
> I am using cache's in the following way :
> 
>   size="16384"
>  initialSize="4096"
>  autowarmCount="4096"/>
>size="16384"
>  initialSize="4096"
>  autowarmCount="1024"/>
> 
> and i am not using any documentCache, fieldValueCahe's

As Erick said, these cache sizes are HUGE.  In particular, your
autowarmCount values are extremely high.

> Whether this can lead any performance issue means going server down.

Another thing that Erick pointed out is that you haven't really told us
what's happening.  When you say that the server goes down, what EXACTLY
do you mean?

> And i am seeing logging in the server it is showing exception in the
> following way
> 
> 
> Servlet.service() for servlet [default] in context with path [/solr] threw
> exception [java.lang.IllegalStateException: Cannot call sendError() after
> the response has been committed] with root cause

This message comes from your servlet container, not Solr.  You're
probably using Tomcat, not the included Jetty.  There is some indirect
evidence that this can be fixed by increasing the servlet container's
setting for the maximum number of request parameters.

http://forums.adobe.com/message/4590864

Here's what I can say without further information:

You're likely having performance issues.  One potential problem is your
insanely high autowarmCount values.  Your cache configuration tells Solr
that every time you have a soft commit or a hard commit with
openSearcher=true, you're going to execute up to 1024 queries and up to
4096 filters from the old caches, in order to warm the new caches.  Even
if you have an optimal setup, this takes a lot of time.  I suspect that
you don't have an optimal setup.

Another potential problem is that you don't have enough memory for the
size of your index.  A number of potential performance problems are
discussed on this wiki page:

http://wiki.apache.org/solr/SolrPerformanceProblems

A lot more details are required.  Here's some things that will be
helpful, and more is always better:

* Exact symptoms.
* Excerpts from the Solr logfile that include entire stacktraces.
* Operating system and version.
* Total server index size on disk.
* Total machine memory.
* Java heap size for your servlet container.
* Which servlet container you are using to run Solr.
* Solr version.
* Server hardware details.

Thanks,
Shawn



Re: Solr Performance Issue

2013-12-04 Thread Erick Erickson
You need to give us more of the exception trace,
the real cause is often buried down the stack with
some text like
"Caused by..."

But at a glance your cache sizes and autowarm counts
are far higher than they should be. Try reducing
particularly the autowarm count down to, say, 16 or so.
It's actually rare that you really need very many.

I'd actually go back to the defaults to start with to test
whether this is the problem.

Further, we need to know exactly what you mean by
"change anything in the configuration file". Change
what? Details matter.

Of course the last thing you changed before you started
seeing this problem is the most likely culprit.

Best,
Erick


On Wed, Dec 4, 2013 at 8:31 AM, kumar  wrote:

> I am having almost 5 to 6 crores of indexed documents in solr. And when i
> am
> going to change anything in the configuration file solr server is going
> down.
>
> As a new user to solr i can't able to find the exact reason for going
> server
> down.
>
> I am using cache's in the following way :
>
>   size="16384"
>  initialSize="4096"
>  autowarmCount="4096"/>
>size="16384"
>  initialSize="4096"
>  autowarmCount="1024"/>
>
> and i am not using any documentCache, fieldValueCahe's
>
> Whether this can lead any performance issue means going server down.
>
> And i am seeing logging in the server it is showing exception in the
> following way
>
>
> Servlet.service() for servlet [default] in context with path [/solr] threw
> exception [java.lang.IllegalStateException: Cannot call sendError() after
> the response has been committed] with root cause
>
>
>
> Can anybody help me how can i solve this problem.
>
> Kumar.
>
>
>
>
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-Performance-Issue-tp4104907.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Solr Performance Issue

2013-12-04 Thread kumar
I am having almost 5 to 6 crores of indexed documents in solr. And when i am
going to change anything in the configuration file solr server is going
down.

As a new user to solr i can't able to find the exact reason for going server
down.

I am using cache's in the following way :


 

and i am not using any documentCache, fieldValueCahe's

Whether this can lead any performance issue means going server down.

And i am seeing logging in the server it is showing exception in the
following way


Servlet.service() for servlet [default] in context with path [/solr] threw
exception [java.lang.IllegalStateException: Cannot call sendError() after
the response has been committed] with root cause



Can anybody help me how can i solve this problem.

Kumar.









--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Performance-Issue-tp4104907.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr performance issue

2011-03-23 Thread Doğacan Güney
Hello,

The problem turned out to be some sort of sharding/searching weirdness. We
modified some code in sharding but I don't think it is related. In any case,
we just added a new server that just shards (but doesn't do any searching /
doesn't contain any index) and performance is very very good.

Thanks for all the help.

On Tue, Mar 22, 2011 at 14:30, Alexey Serba  wrote:

> > Btw, I am monitoring output via jconsole with 8gb of ram and it still
> goes
> > to 8gb every 20 seconds or so,
> > gc runs, falls down to 1gb.
>
> Hmm, jvm is eating 8Gb for 20 seconds - sounds a lot.
>
> Do you return all results (ids) for your queries? Any tricky
> faceting/sorting/function queries?
>



-- 
Doğacan Güney


Re: Solr performance issue

2011-03-22 Thread Alexey Serba
> Btw, I am monitoring output via jconsole with 8gb of ram and it still goes
> to 8gb every 20 seconds or so,
> gc runs, falls down to 1gb.

Hmm, jvm is eating 8Gb for 20 seconds - sounds a lot.

Do you return all results (ids) for your queries? Any tricky
faceting/sorting/function queries?


Re: Solr performance issue

2011-03-15 Thread Shawn Heisey
The host is dual quad-core, each Xen VM has been given two CPUs.  Not 
counting dom0, two of the hosts have 10/8 CPUs allocated, two of them 
have 8/8.  The dom0 VM is also allocated two CPUs.


I'm not really sure how that works out when it comes to Java running on 
the VM, but if at all possible, it is likely that Xen would try and keep 
both VM cpus on the same physical CPU and the VM's memory allocation on 
the same NUMA node.  If that's the case, it would meet what you've 
stated as the recommendation for incremental mode.


Shawn


On 3/15/2011 9:10 AM, Markus Jelsma wrote:

CMS is very good for multicore CPU's. Use incremental mode only when you have
a single CPU with only one or two cores.




Re: Solr performance issue

2011-03-15 Thread Markus Jelsma
CMS is very good for multicore CPU's. Use incremental mode only when you have 
a single CPU with only one or two cores.

On Tuesday 15 March 2011 16:03:38 Shawn Heisey wrote:
> My solr+jetty+java6 install seems to work well with these GC options.
> It's a dual processor environment:
> 
> -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode
> 
> I've never had a real problem with memory, so I've not done any kind of
> auditing.  I probably should, but time is a limited resource.
> 
> Shawn
> 
> On 3/14/2011 2:29 PM, Markus Jelsma wrote:
> > That depends on your GC settings and generation sizes. And, instead of
> > UseParallelGC you'd better use UseParNewGC in combination with CMS.
> > 
> > See 22: http://java.sun.com/docs/hotspot/gc1.4.2/faq.html
> > 
> >> It's actually, as I understand it, expected JVM behavior to see the heap
> >> rise to close to it's limit before it gets GC'd, that's how Java GC
> >> works.  Whether that should happen every 20 seconds or what, I don't
> >> nkow.
> >> 
> >> Another option is setting better JVM garbage collection arguments, so GC
> >> doesn't "stop the world" so often. I have had good luck with my Solr
> >> using this:  -XX:+UseParallelGC

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: Solr performance issue

2011-03-15 Thread Shawn Heisey
My solr+jetty+java6 install seems to work well with these GC options.  
It's a dual processor environment:


-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode

I've never had a real problem with memory, so I've not done any kind of 
auditing.  I probably should, but time is a limited resource.


Shawn


On 3/14/2011 2:29 PM, Markus Jelsma wrote:

That depends on your GC settings and generation sizes. And, instead of
UseParallelGC you'd better use UseParNewGC in combination with CMS.

See 22: http://java.sun.com/docs/hotspot/gc1.4.2/faq.html

It's actually, as I understand it, expected JVM behavior to see the heap
rise to close to it's limit before it gets GC'd, that's how Java GC
works.  Whether that should happen every 20 seconds or what, I don't nkow.

Another option is setting better JVM garbage collection arguments, so GC
doesn't "stop the world" so often. I have had good luck with my Solr
using this:  -XX:+UseParallelGC




Re: Solr performance issue

2011-03-14 Thread Doğacan Güney
2011/3/14 Markus Jelsma 

> Mmm. SearchHander.handleRequestBody takes care of sharding. Could your
> system
> suffer from
> http://wiki.apache.org/solr/DistributedSearch#Distributed_Deadlock
> ?
>
>
We increased thread limit (which was 1 before) but it did not help.

Anyway, we will try to disable sharding tomorrow. Maybe this can give us a
better picture.

Thanks for the help, everyone.


> I'm not sure, i haven't seen a similar issue in a sharded environment,
> probably because it was a controlled environment.
>
>
> > Hello,
> >
> > 2011/3/14 Markus Jelsma 
> >
> > > That depends on your GC settings and generation sizes. And, instead of
> > > UseParallelGC you'd better use UseParNewGC in combination with CMS.
> >
> > JConsole now shows a different profile output but load is still high and
> > performance is still bad.
> >
> > Btw, here is the thread profile from newrelic:
> >
> > https://skitch.com/meralan/rwscm/thread-profiler-solr-new-relic-rpm
> >
> > Note that we do use a form of sharding so I maybe all the time spent
> > waiting for handleRequestBody
> > is results from sharding?
> >
> > > See 22: http://java.sun.com/docs/hotspot/gc1.4.2/faq.html
> > >
> > > > It's actually, as I understand it, expected JVM behavior to see the
> > > > heap rise to close to it's limit before it gets GC'd, that's how Java
> > > > GC works.  Whether that should happen every 20 seconds or what, I
> > > > don't
> > >
> > > nkow.
> > >
> > > > Another option is setting better JVM garbage collection arguments, so
> > > > GC doesn't "stop the world" so often. I have had good luck with my
> > > > Solr using this:  -XX:+UseParallelGC
> > > >
> > > > On 3/14/2011 4:15 PM, Doğacan Güney wrote:
> > > > > Hello again,
> > > > >
> > > > > 2011/3/14 Markus Jelsma
> > > > >
> > > > >>> Hello,
> > > > >>>
> > > > >>> 2011/3/14 Markus Jelsma
> > > > >>>
> > > >  Hi Doğacan,
> > > > 
> > > >  Are you, at some point, running out of heap space? In my
> > > >  experience, that's the common cause of increased load and
> > > >  excessivly high
> > >
> > > response
> > >
> > > >  times (or time
> > > >  outs).
> > > > >>>
> > > > >>> How much of a heap size would be enough? Our index size is
> growing
> > > > >>> slowly but we did not have this problem
> > > > >>> a couple weeks ago where index size was maybe 100mb smaller.
> > > > >>
> > > > >> Telling how much heap space is needed isn't easy to say. It
> usually
> > > > >> needs to
> > > > >> be increased when you run out of memory and get those nasty OOM
> > >
> > > errors,
> > >
> > > > >> are you getting them?
> > > > >> Replication eventes will increase heap usage due to cache warming
> > > > >> queries and
> > > > >> autowarming.
> > > > >
> > > > > Nope, no OOM errors.
> > > > >
> > > > >>> We left most of the caches in solrconfig as default and only
> > >
> > > increased
> > >
> > > > >>> filterCache to 1024. We only ask for "id"s (which
> > > > >>> are unique) and no other fields during queries (though we do
> > >
> > > faceting).
> > >
> > > > >>> Btw, 1.6gb of our index is stored fields (we store
> > > > >>> everything for now, even though we do not get them during
> queries),
> > >
> > > and
> > >
> > > > >>> about 1gb of index.
> > > > >>
> > > > >> Hmm, it seems 4000 would be enough indeed. What about the
> > > > >> fieldCache, are there
> > > > >> a lot of entries? Is there an insanity count? Do you use boost
> > > > >> functions?
> > > > >
> > > > > Insanity count is 0 and fieldCAche has 12 entries. We do use some
> > > > > boosting functions.
> > > > >
> > > > > Btw, I am monitoring output via jconsole with 8gb of ram and it
> still
> > > > > goes to 8gb every 20 seconds or so,
> > > > > gc runs, falls down to 1gb.
> > > > >
> > > > > Btw, our current revision was just a random choice but up until two
> > >
> > > weeks
> > >
> > > > > ago it has been rock-solid so we have been
> > > > > reluctant to update to another version. Would you recommend
> upgrading
> > >
> > > to
> > >
> > > > > latest trunk?
> > > > >
> > > > >> It might not have anything to do with memory at all but i'm just
> > >
> > > asking.
> > >
> > > > >> There
> > > > >> may be a bug in your revision causing this.
> > > > >>
> > > > >>> Anyway, Xmx was 4000m, we tried increasing it to 8000m but did
> not
> > >
> > > get
> > >
> > > > >> any
> > > > >>
> > > > >>> improvement in load. I can try monitoring with Jconsole
> > > > >>> with 8gigs of heap to see if it helps.
> > > > >>>
> > > >  Cheers,
> > > > 
> > > > > Hello everyone,
> > > > >
> > > > > First of all here is our Solr setup:
> > > > >
> > > > > - Solr nightly build 986158
> > > > > - Running solr inside the default jetty comes with solr build
> > > > > - 1 write only Master , 4 read only Slaves (quad core 5640 with
> > >
> > > 24gb
> > >
> > > > >> of
> > > > >>
> > > > > RAM) - Index replicated (on optimize) to slaves via Solr
> > >
> > > Replication
> > >
> > > > > - 

Re: Solr performance issue

2011-03-14 Thread Markus Jelsma
Mmm. SearchHander.handleRequestBody takes care of sharding. Could your system 
suffer from http://wiki.apache.org/solr/DistributedSearch#Distributed_Deadlock 
?

I'm not sure, i haven't seen a similar issue in a sharded environment, 
probably because it was a controlled environment.


> Hello,
> 
> 2011/3/14 Markus Jelsma 
> 
> > That depends on your GC settings and generation sizes. And, instead of
> > UseParallelGC you'd better use UseParNewGC in combination with CMS.
> 
> JConsole now shows a different profile output but load is still high and
> performance is still bad.
> 
> Btw, here is the thread profile from newrelic:
> 
> https://skitch.com/meralan/rwscm/thread-profiler-solr-new-relic-rpm
> 
> Note that we do use a form of sharding so I maybe all the time spent
> waiting for handleRequestBody
> is results from sharding?
> 
> > See 22: http://java.sun.com/docs/hotspot/gc1.4.2/faq.html
> > 
> > > It's actually, as I understand it, expected JVM behavior to see the
> > > heap rise to close to it's limit before it gets GC'd, that's how Java
> > > GC works.  Whether that should happen every 20 seconds or what, I
> > > don't
> > 
> > nkow.
> > 
> > > Another option is setting better JVM garbage collection arguments, so
> > > GC doesn't "stop the world" so often. I have had good luck with my
> > > Solr using this:  -XX:+UseParallelGC
> > > 
> > > On 3/14/2011 4:15 PM, Doğacan Güney wrote:
> > > > Hello again,
> > > > 
> > > > 2011/3/14 Markus Jelsma
> > > > 
> > > >>> Hello,
> > > >>> 
> > > >>> 2011/3/14 Markus Jelsma
> > > >>> 
> > >  Hi Doğacan,
> > >  
> > >  Are you, at some point, running out of heap space? In my
> > >  experience, that's the common cause of increased load and
> > >  excessivly high
> > 
> > response
> > 
> > >  times (or time
> > >  outs).
> > > >>> 
> > > >>> How much of a heap size would be enough? Our index size is growing
> > > >>> slowly but we did not have this problem
> > > >>> a couple weeks ago where index size was maybe 100mb smaller.
> > > >> 
> > > >> Telling how much heap space is needed isn't easy to say. It usually
> > > >> needs to
> > > >> be increased when you run out of memory and get those nasty OOM
> > 
> > errors,
> > 
> > > >> are you getting them?
> > > >> Replication eventes will increase heap usage due to cache warming
> > > >> queries and
> > > >> autowarming.
> > > > 
> > > > Nope, no OOM errors.
> > > > 
> > > >>> We left most of the caches in solrconfig as default and only
> > 
> > increased
> > 
> > > >>> filterCache to 1024. We only ask for "id"s (which
> > > >>> are unique) and no other fields during queries (though we do
> > 
> > faceting).
> > 
> > > >>> Btw, 1.6gb of our index is stored fields (we store
> > > >>> everything for now, even though we do not get them during queries),
> > 
> > and
> > 
> > > >>> about 1gb of index.
> > > >> 
> > > >> Hmm, it seems 4000 would be enough indeed. What about the
> > > >> fieldCache, are there
> > > >> a lot of entries? Is there an insanity count? Do you use boost
> > > >> functions?
> > > > 
> > > > Insanity count is 0 and fieldCAche has 12 entries. We do use some
> > > > boosting functions.
> > > > 
> > > > Btw, I am monitoring output via jconsole with 8gb of ram and it still
> > > > goes to 8gb every 20 seconds or so,
> > > > gc runs, falls down to 1gb.
> > > > 
> > > > Btw, our current revision was just a random choice but up until two
> > 
> > weeks
> > 
> > > > ago it has been rock-solid so we have been
> > > > reluctant to update to another version. Would you recommend upgrading
> > 
> > to
> > 
> > > > latest trunk?
> > > > 
> > > >> It might not have anything to do with memory at all but i'm just
> > 
> > asking.
> > 
> > > >> There
> > > >> may be a bug in your revision causing this.
> > > >> 
> > > >>> Anyway, Xmx was 4000m, we tried increasing it to 8000m but did not
> > 
> > get
> > 
> > > >> any
> > > >> 
> > > >>> improvement in load. I can try monitoring with Jconsole
> > > >>> with 8gigs of heap to see if it helps.
> > > >>> 
> > >  Cheers,
> > >  
> > > > Hello everyone,
> > > > 
> > > > First of all here is our Solr setup:
> > > > 
> > > > - Solr nightly build 986158
> > > > - Running solr inside the default jetty comes with solr build
> > > > - 1 write only Master , 4 read only Slaves (quad core 5640 with
> > 
> > 24gb
> > 
> > > >> of
> > > >> 
> > > > RAM) - Index replicated (on optimize) to slaves via Solr
> > 
> > Replication
> > 
> > > > - Size of index is around 2.5gb
> > > > - No incremental writes, index is created from scratch(delete old
> > >  
> > >  documents
> > >  
> > > > ->  commit new documents ->  optimize)  every 6 hours
> > > > - Avg # of request per second is around 60 (for a single slave)
> > > > - Avg time per request is around 25ms (before having problems)
> > > > - Load on each is slave is around 2
> > > > 
> > > > We are using this set-up for months wit

Re: Solr performance issue

2011-03-14 Thread Doğacan Güney
Hello,

2011/3/14 Markus Jelsma 

> That depends on your GC settings and generation sizes. And, instead of
> UseParallelGC you'd better use UseParNewGC in combination with CMS.
>
>
JConsole now shows a different profile output but load is still high and
performance is still bad.

Btw, here is the thread profile from newrelic:

https://skitch.com/meralan/rwscm/thread-profiler-solr-new-relic-rpm

Note that we do use a form of sharding so I maybe all the time spent waiting
for handleRequestBody
is results from sharding?


> See 22: http://java.sun.com/docs/hotspot/gc1.4.2/faq.html
>
> > It's actually, as I understand it, expected JVM behavior to see the heap
> > rise to close to it's limit before it gets GC'd, that's how Java GC
> > works.  Whether that should happen every 20 seconds or what, I don't
> nkow.
> >
> > Another option is setting better JVM garbage collection arguments, so GC
> > doesn't "stop the world" so often. I have had good luck with my Solr
> > using this:  -XX:+UseParallelGC
> >
> > On 3/14/2011 4:15 PM, Doğacan Güney wrote:
> > > Hello again,
> > >
> > > 2011/3/14 Markus Jelsma
> > >
> > >>> Hello,
> > >>>
> > >>> 2011/3/14 Markus Jelsma
> > >>>
> >  Hi Doğacan,
> > 
> >  Are you, at some point, running out of heap space? In my experience,
> >  that's the common cause of increased load and excessivly high
> response
> >  times (or time
> >  outs).
> > >>>
> > >>> How much of a heap size would be enough? Our index size is growing
> > >>> slowly but we did not have this problem
> > >>> a couple weeks ago where index size was maybe 100mb smaller.
> > >>
> > >> Telling how much heap space is needed isn't easy to say. It usually
> > >> needs to
> > >> be increased when you run out of memory and get those nasty OOM
> errors,
> > >> are you getting them?
> > >> Replication eventes will increase heap usage due to cache warming
> > >> queries and
> > >> autowarming.
> > >
> > > Nope, no OOM errors.
> > >
> > >>> We left most of the caches in solrconfig as default and only
> increased
> > >>> filterCache to 1024. We only ask for "id"s (which
> > >>> are unique) and no other fields during queries (though we do
> faceting).
> > >>> Btw, 1.6gb of our index is stored fields (we store
> > >>> everything for now, even though we do not get them during queries),
> and
> > >>> about 1gb of index.
> > >>
> > >> Hmm, it seems 4000 would be enough indeed. What about the fieldCache,
> > >> are there
> > >> a lot of entries? Is there an insanity count? Do you use boost
> > >> functions?
> > >
> > > Insanity count is 0 and fieldCAche has 12 entries. We do use some
> > > boosting functions.
> > >
> > > Btw, I am monitoring output via jconsole with 8gb of ram and it still
> > > goes to 8gb every 20 seconds or so,
> > > gc runs, falls down to 1gb.
> > >
> > > Btw, our current revision was just a random choice but up until two
> weeks
> > > ago it has been rock-solid so we have been
> > > reluctant to update to another version. Would you recommend upgrading
> to
> > > latest trunk?
> > >
> > >> It might not have anything to do with memory at all but i'm just
> asking.
> > >> There
> > >> may be a bug in your revision causing this.
> > >>
> > >>> Anyway, Xmx was 4000m, we tried increasing it to 8000m but did not
> get
> > >>
> > >> any
> > >>
> > >>> improvement in load. I can try monitoring with Jconsole
> > >>> with 8gigs of heap to see if it helps.
> > >>>
> >  Cheers,
> > 
> > > Hello everyone,
> > >
> > > First of all here is our Solr setup:
> > >
> > > - Solr nightly build 986158
> > > - Running solr inside the default jetty comes with solr build
> > > - 1 write only Master , 4 read only Slaves (quad core 5640 with
> 24gb
> > >>
> > >> of
> > >>
> > > RAM) - Index replicated (on optimize) to slaves via Solr
> Replication
> > > - Size of index is around 2.5gb
> > > - No incremental writes, index is created from scratch(delete old
> > 
> >  documents
> > 
> > > ->  commit new documents ->  optimize)  every 6 hours
> > > - Avg # of request per second is around 60 (for a single slave)
> > > - Avg time per request is around 25ms (before having problems)
> > > - Load on each is slave is around 2
> > >
> > > We are using this set-up for months without any problem. However
> last
> > 
> >  week
> > 
> > > we started to experience very weird performance problems like :
> > >
> > > - Avg time per request increased from 25ms to 200-300ms (even
> higher
> > >>
> > >> if
> > >>
> >  we
> > 
> > > don't restart the slaves)
> > > - Load on each slave increased from 2 to 15-20 (solr uses %400-%600
> > > cpu)
> > >
> > > When we profile solr we see two very strange things :
> > >
> > > 1 - This is the jconsole output:
> > >
> > > https://skitch.com/meralan/rwwcf/mail-886x691
> > >
> > > As you see gc runs for every 10-15 seconds and collects more t

Re: Solr performance issue

2011-03-14 Thread Markus Jelsma
That depends on your GC settings and generation sizes. And, instead of 
UseParallelGC you'd better use UseParNewGC in combination with CMS.

See 22: http://java.sun.com/docs/hotspot/gc1.4.2/faq.html

> It's actually, as I understand it, expected JVM behavior to see the heap
> rise to close to it's limit before it gets GC'd, that's how Java GC
> works.  Whether that should happen every 20 seconds or what, I don't nkow.
> 
> Another option is setting better JVM garbage collection arguments, so GC
> doesn't "stop the world" so often. I have had good luck with my Solr
> using this:  -XX:+UseParallelGC
> 
> On 3/14/2011 4:15 PM, Doğacan Güney wrote:
> > Hello again,
> > 
> > 2011/3/14 Markus Jelsma
> > 
> >>> Hello,
> >>> 
> >>> 2011/3/14 Markus Jelsma
> >>> 
>  Hi Doğacan,
>  
>  Are you, at some point, running out of heap space? In my experience,
>  that's the common cause of increased load and excessivly high response
>  times (or time
>  outs).
> >>> 
> >>> How much of a heap size would be enough? Our index size is growing
> >>> slowly but we did not have this problem
> >>> a couple weeks ago where index size was maybe 100mb smaller.
> >> 
> >> Telling how much heap space is needed isn't easy to say. It usually
> >> needs to
> >> be increased when you run out of memory and get those nasty OOM errors,
> >> are you getting them?
> >> Replication eventes will increase heap usage due to cache warming
> >> queries and
> >> autowarming.
> > 
> > Nope, no OOM errors.
> > 
> >>> We left most of the caches in solrconfig as default and only increased
> >>> filterCache to 1024. We only ask for "id"s (which
> >>> are unique) and no other fields during queries (though we do faceting).
> >>> Btw, 1.6gb of our index is stored fields (we store
> >>> everything for now, even though we do not get them during queries), and
> >>> about 1gb of index.
> >> 
> >> Hmm, it seems 4000 would be enough indeed. What about the fieldCache,
> >> are there
> >> a lot of entries? Is there an insanity count? Do you use boost
> >> functions?
> > 
> > Insanity count is 0 and fieldCAche has 12 entries. We do use some
> > boosting functions.
> > 
> > Btw, I am monitoring output via jconsole with 8gb of ram and it still
> > goes to 8gb every 20 seconds or so,
> > gc runs, falls down to 1gb.
> > 
> > Btw, our current revision was just a random choice but up until two weeks
> > ago it has been rock-solid so we have been
> > reluctant to update to another version. Would you recommend upgrading to
> > latest trunk?
> > 
> >> It might not have anything to do with memory at all but i'm just asking.
> >> There
> >> may be a bug in your revision causing this.
> >> 
> >>> Anyway, Xmx was 4000m, we tried increasing it to 8000m but did not get
> >> 
> >> any
> >> 
> >>> improvement in load. I can try monitoring with Jconsole
> >>> with 8gigs of heap to see if it helps.
> >>> 
>  Cheers,
>  
> > Hello everyone,
> > 
> > First of all here is our Solr setup:
> > 
> > - Solr nightly build 986158
> > - Running solr inside the default jetty comes with solr build
> > - 1 write only Master , 4 read only Slaves (quad core 5640 with 24gb
> >> 
> >> of
> >> 
> > RAM) - Index replicated (on optimize) to slaves via Solr Replication
> > - Size of index is around 2.5gb
> > - No incremental writes, index is created from scratch(delete old
>  
>  documents
>  
> > ->  commit new documents ->  optimize)  every 6 hours
> > - Avg # of request per second is around 60 (for a single slave)
> > - Avg time per request is around 25ms (before having problems)
> > - Load on each is slave is around 2
> > 
> > We are using this set-up for months without any problem. However last
>  
>  week
>  
> > we started to experience very weird performance problems like :
> > 
> > - Avg time per request increased from 25ms to 200-300ms (even higher
> >> 
> >> if
> >> 
>  we
>  
> > don't restart the slaves)
> > - Load on each slave increased from 2 to 15-20 (solr uses %400-%600
> > cpu)
> > 
> > When we profile solr we see two very strange things :
> > 
> > 1 - This is the jconsole output:
> > 
> > https://skitch.com/meralan/rwwcf/mail-886x691
> > 
> > As you see gc runs for every 10-15 seconds and collects more than 1
> >> 
> >> gb
> >> 
> > of memory. (Actually if you wait more than 10 minutes you see spikes
> > up to
>  
>  4gb
>  
> > consistently)
> > 
> > 2 - This is the newrelic output :
> > 
> > https://skitch.com/meralan/rwwci/solr-requests-solr-new-relic-rpm
> > 
> > As you see solr spent ridiculously long time in
> > SolrDispatchFilter.doFilter() method.
> > 
> > 
> > Apart form these, when we clean the index directory, re-replicate and
> > restart  each slave one by one we see a relief in the system but
> >> 
> >> after
> >> 
>  some

Re: Solr performance issue

2011-03-14 Thread Markus Jelsma
You might also want to add the following switches for your GC log.

> JAVA_OPTS="$JAVA_OPTS -verbose:gc -XX:+PrintGCTimeStamps
> -XX:+PrintGCDetails - Xloggc:/var/log/tomcat6/gc.log"

-XX:+PrintGCApplicationConcurrentTime
-XX:+PrintGCApplicationStoppedTime

> 
> Also, what JVM version are you using and what are your other JVM settings?
> Are Xms and Xmx at the same value? I see you're using the throughput
> collector. You might want to use CMS because it partially runs
> concurrently (the low- pause collector) and has less stop-the-world
> interruptions.
> 
> http://download.oracle.com/javase/6/docs/technotes/guides/vm/cms-6.html
> 
> Again, this may not be the issue ;)
> 
> > Btw, our current revision was just a random choice but up until two weeks
> > ago it has been rock-solid so we have been
> > reluctant to update to another version. Would you recommend upgrading to
> > latest trunk?
> 
> I don't know what changes have been made since your revision. Please
> consult the CHANGES.txt for that.
> 
> > > It might not have anything to do with memory at all but i'm just
> > > asking. There
> > > may be a bug in your revision causing this.
> > > 
> > > > Anyway, Xmx was 4000m, we tried increasing it to 8000m but did not
> > > > get
> > > 
> > > any
> > > 
> > > > improvement in load. I can try monitoring with Jconsole
> > > > with 8gigs of heap to see if it helps.
> > > > 
> > > > > Cheers,
> > > > > 
> > > > > > Hello everyone,
> > > > > > 
> > > > > > First of all here is our Solr setup:
> > > > > > 
> > > > > > - Solr nightly build 986158
> > > > > > - Running solr inside the default jetty comes with solr build
> > > > > > - 1 write only Master , 4 read only Slaves (quad core 5640 with
> > > > > > 24gb
> > > 
> > > of
> > > 
> > > > > > RAM) - Index replicated (on optimize) to slaves via Solr
> > > > > > Replication - Size of index is around 2.5gb
> > > > > > - No incremental writes, index is created from scratch(delete old
> > > > > 
> > > > > documents
> > > > > 
> > > > > > -> commit new documents -> optimize)  every 6 hours
> > > > > > - Avg # of request per second is around 60 (for a single slave)
> > > > > > - Avg time per request is around 25ms (before having problems)
> > > > > > - Load on each is slave is around 2
> > > > > > 
> > > > > > We are using this set-up for months without any problem. However
> > > > > > last
> > > > > 
> > > > > week
> > > > > 
> > > > > > we started to experience very weird performance problems like :
> > > > > > 
> > > > > > - Avg time per request increased from 25ms to 200-300ms (even
> > > > > > higher
> > > 
> > > if
> > > 
> > > > > we
> > > > > 
> > > > > > don't restart the slaves)
> > > > > > - Load on each slave increased from 2 to 15-20 (solr uses
> > > > > > %400-%600 cpu)
> > > > > > 
> > > > > > When we profile solr we see two very strange things :
> > > > > > 
> > > > > > 1 - This is the jconsole output:
> > > > > > 
> > > > > > https://skitch.com/meralan/rwwcf/mail-886x691
> > > > > > 
> > > > > > As you see gc runs for every 10-15 seconds and collects more than
> > > > > > 1
> > > 
> > > gb
> > > 
> > > > > > of memory. (Actually if you wait more than 10 minutes you see
> > > > > > spikes up to
> > > > > 
> > > > > 4gb
> > > > > 
> > > > > > consistently)
> > > > > > 
> > > > > > 2 - This is the newrelic output :
> > > > > > 
> > > > > > https://skitch.com/meralan/rwwci/solr-requests-solr-new-relic-rpm
> > > > > > 
> > > > > > As you see solr spent ridiculously long time in
> > > > > > SolrDispatchFilter.doFilter() method.
> > > > > > 
> > > > > > 
> > > > > > Apart form these, when we clean the index directory, re-replicate
> > > > > > and restart  each slave one by one we see a relief in the system
> > > > > > but
> > > 
> > > after
> > > 
> > > > > some
> > > > > 
> > > > > > time servers start to melt down again. Although deleting index
> > > > > > and replicating doesn't solve the problem, we think that these
> > > > > > problems
> > > 
> > > are
> > > 
> > > > > > somehow related to replication. Because symptoms started after
> > > > > 
> > > > > replication
> > > > > 
> > > > > > and once it heals itself after replication. I also see
> > > > > > lucene-write.lock files in slaves (we don't have write.lock files
> > > > > > in the master) which I think we shouldn't see.
> > > > > > 
> > > > > > 
> > > > > > If anyone can give any sort of ideas, we will appreciate it.
> > > > > > 
> > > > > > Regards,
> > > > > > Dogacan Guney


Re: Solr performance issue

2011-03-14 Thread Jonathan Rochkind
It's actually, as I understand it, expected JVM behavior to see the heap 
rise to close to it's limit before it gets GC'd, that's how Java GC 
works.  Whether that should happen every 20 seconds or what, I don't nkow.


Another option is setting better JVM garbage collection arguments, so GC 
doesn't "stop the world" so often. I have had good luck with my Solr 
using this:  -XX:+UseParallelGC






On 3/14/2011 4:15 PM, Doğacan Güney wrote:

Hello again,

2011/3/14 Markus Jelsma


Hello,

2011/3/14 Markus Jelsma


Hi Doğacan,

Are you, at some point, running out of heap space? In my experience,
that's the common cause of increased load and excessivly high response
times (or time
outs).

How much of a heap size would be enough? Our index size is growing slowly
but we did not have this problem
a couple weeks ago where index size was maybe 100mb smaller.

Telling how much heap space is needed isn't easy to say. It usually needs
to
be increased when you run out of memory and get those nasty OOM errors, are
you getting them?
Replication eventes will increase heap usage due to cache warming queries
and
autowarming.



Nope, no OOM errors.



We left most of the caches in solrconfig as default and only increased
filterCache to 1024. We only ask for "id"s (which
are unique) and no other fields during queries (though we do faceting).
Btw, 1.6gb of our index is stored fields (we store
everything for now, even though we do not get them during queries), and
about 1gb of index.

Hmm, it seems 4000 would be enough indeed. What about the fieldCache, are
there
a lot of entries? Is there an insanity count? Do you use boost functions?



Insanity count is 0 and fieldCAche has 12 entries. We do use some boosting
functions.

Btw, I am monitoring output via jconsole with 8gb of ram and it still goes
to 8gb every 20 seconds or so,
gc runs, falls down to 1gb.

Btw, our current revision was just a random choice but up until two weeks
ago it has been rock-solid so we have been
reluctant to update to another version. Would you recommend upgrading to
latest trunk?



It might not have anything to do with memory at all but i'm just asking.
There
may be a bug in your revision causing this.


Anyway, Xmx was 4000m, we tried increasing it to 8000m but did not get

any

improvement in load. I can try monitoring with Jconsole
with 8gigs of heap to see if it helps.


Cheers,


Hello everyone,

First of all here is our Solr setup:

- Solr nightly build 986158
- Running solr inside the default jetty comes with solr build
- 1 write only Master , 4 read only Slaves (quad core 5640 with 24gb

of

RAM) - Index replicated (on optimize) to slaves via Solr Replication
- Size of index is around 2.5gb
- No incremental writes, index is created from scratch(delete old

documents


->  commit new documents ->  optimize)  every 6 hours
- Avg # of request per second is around 60 (for a single slave)
- Avg time per request is around 25ms (before having problems)
- Load on each is slave is around 2

We are using this set-up for months without any problem. However last

week


we started to experience very weird performance problems like :

- Avg time per request increased from 25ms to 200-300ms (even higher

if

we


don't restart the slaves)
- Load on each slave increased from 2 to 15-20 (solr uses %400-%600
cpu)

When we profile solr we see two very strange things :

1 - This is the jconsole output:

https://skitch.com/meralan/rwwcf/mail-886x691

As you see gc runs for every 10-15 seconds and collects more than 1

gb

of memory. (Actually if you wait more than 10 minutes you see spikes
up to

4gb


consistently)

2 - This is the newrelic output :

https://skitch.com/meralan/rwwci/solr-requests-solr-new-relic-rpm

As you see solr spent ridiculously long time in
SolrDispatchFilter.doFilter() method.


Apart form these, when we clean the index directory, re-replicate and
restart  each slave one by one we see a relief in the system but

after

some


time servers start to melt down again. Although deleting index and
replicating doesn't solve the problem, we think that these problems

are

somehow related to replication. Because symptoms started after

replication


and once it heals itself after replication. I also see
lucene-write.lock files in slaves (we don't have write.lock files in
the master) which I think we shouldn't see.


If anyone can give any sort of ideas, we will appreciate it.

Regards,
Dogacan Guney





Re: Solr performance issue

2011-03-14 Thread Markus Jelsma
> Nope, no OOM errors.

That's a good start!

> Insanity count is 0 and fieldCAche has 12 entries. We do use some boosting
> functions.
> 
> Btw, I am monitoring output via jconsole with 8gb of ram and it still goes
> to 8gb every 20 seconds or so,
> gc runs, falls down to 1gb.

Hmm, maybe the garbage collector takes up a lot of CPU time. Could you check 
your garbage collector log? It must be enabled via some JVM options:

JAVA_OPTS="$JAVA_OPTS -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -
Xloggc:/var/log/tomcat6/gc.log"

Also, what JVM version are you using and what are your other JVM settings? Are 
Xms and Xmx at the same value? I see you're using the throughput collector. 
You might want to use CMS because it partially runs concurrently (the low-
pause collector) and has less stop-the-world interruptions.

http://download.oracle.com/javase/6/docs/technotes/guides/vm/cms-6.html

Again, this may not be the issue ;)

> 
> Btw, our current revision was just a random choice but up until two weeks
> ago it has been rock-solid so we have been
> reluctant to update to another version. Would you recommend upgrading to
> latest trunk?

I don't know what changes have been made since your revision. Please consult 
the CHANGES.txt for that.

> 
> > It might not have anything to do with memory at all but i'm just asking.
> > There
> > may be a bug in your revision causing this.
> > 
> > > Anyway, Xmx was 4000m, we tried increasing it to 8000m but did not get
> > 
> > any
> > 
> > > improvement in load. I can try monitoring with Jconsole
> > > with 8gigs of heap to see if it helps.
> > > 
> > > > Cheers,
> > > > 
> > > > > Hello everyone,
> > > > > 
> > > > > First of all here is our Solr setup:
> > > > > 
> > > > > - Solr nightly build 986158
> > > > > - Running solr inside the default jetty comes with solr build
> > > > > - 1 write only Master , 4 read only Slaves (quad core 5640 with
> > > > > 24gb
> > 
> > of
> > 
> > > > > RAM) - Index replicated (on optimize) to slaves via Solr
> > > > > Replication - Size of index is around 2.5gb
> > > > > - No incremental writes, index is created from scratch(delete old
> > > > 
> > > > documents
> > > > 
> > > > > -> commit new documents -> optimize)  every 6 hours
> > > > > - Avg # of request per second is around 60 (for a single slave)
> > > > > - Avg time per request is around 25ms (before having problems)
> > > > > - Load on each is slave is around 2
> > > > > 
> > > > > We are using this set-up for months without any problem. However
> > > > > last
> > > > 
> > > > week
> > > > 
> > > > > we started to experience very weird performance problems like :
> > > > > 
> > > > > - Avg time per request increased from 25ms to 200-300ms (even
> > > > > higher
> > 
> > if
> > 
> > > > we
> > > > 
> > > > > don't restart the slaves)
> > > > > - Load on each slave increased from 2 to 15-20 (solr uses %400-%600
> > > > > cpu)
> > > > > 
> > > > > When we profile solr we see two very strange things :
> > > > > 
> > > > > 1 - This is the jconsole output:
> > > > > 
> > > > > https://skitch.com/meralan/rwwcf/mail-886x691
> > > > > 
> > > > > As you see gc runs for every 10-15 seconds and collects more than 1
> > 
> > gb
> > 
> > > > > of memory. (Actually if you wait more than 10 minutes you see
> > > > > spikes up to
> > > > 
> > > > 4gb
> > > > 
> > > > > consistently)
> > > > > 
> > > > > 2 - This is the newrelic output :
> > > > > 
> > > > > https://skitch.com/meralan/rwwci/solr-requests-solr-new-relic-rpm
> > > > > 
> > > > > As you see solr spent ridiculously long time in
> > > > > SolrDispatchFilter.doFilter() method.
> > > > > 
> > > > > 
> > > > > Apart form these, when we clean the index directory, re-replicate
> > > > > and restart  each slave one by one we see a relief in the system
> > > > > but
> > 
> > after
> > 
> > > > some
> > > > 
> > > > > time servers start to melt down again. Although deleting index and
> > > > > replicating doesn't solve the problem, we think that these problems
> > 
> > are
> > 
> > > > > somehow related to replication. Because symptoms started after
> > > > 
> > > > replication
> > > > 
> > > > > and once it heals itself after replication. I also see
> > > > > lucene-write.lock files in slaves (we don't have write.lock files
> > > > > in the master) which I think we shouldn't see.
> > > > > 
> > > > > 
> > > > > If anyone can give any sort of ideas, we will appreciate it.
> > > > > 
> > > > > Regards,
> > > > > Dogacan Guney


Re: Solr performance issue

2011-03-14 Thread Doğacan Güney
Hello again,

2011/3/14 Markus Jelsma 

> > Hello,
> >
> > 2011/3/14 Markus Jelsma 
> >
> > > Hi Doğacan,
> > >
> > > Are you, at some point, running out of heap space? In my experience,
> > > that's the common cause of increased load and excessivly high response
> > > times (or time
> > > outs).
> >
> > How much of a heap size would be enough? Our index size is growing slowly
> > but we did not have this problem
> > a couple weeks ago where index size was maybe 100mb smaller.
>
> Telling how much heap space is needed isn't easy to say. It usually needs
> to
> be increased when you run out of memory and get those nasty OOM errors, are
> you getting them?
> Replication eventes will increase heap usage due to cache warming queries
> and
> autowarming.
>
>
Nope, no OOM errors.


> >
> > We left most of the caches in solrconfig as default and only increased
> > filterCache to 1024. We only ask for "id"s (which
> > are unique) and no other fields during queries (though we do faceting).
> > Btw, 1.6gb of our index is stored fields (we store
> > everything for now, even though we do not get them during queries), and
> > about 1gb of index.
>
> Hmm, it seems 4000 would be enough indeed. What about the fieldCache, are
> there
> a lot of entries? Is there an insanity count? Do you use boost functions?
>
>
Insanity count is 0 and fieldCAche has 12 entries. We do use some boosting
functions.

Btw, I am monitoring output via jconsole with 8gb of ram and it still goes
to 8gb every 20 seconds or so,
gc runs, falls down to 1gb.

Btw, our current revision was just a random choice but up until two weeks
ago it has been rock-solid so we have been
reluctant to update to another version. Would you recommend upgrading to
latest trunk?


> It might not have anything to do with memory at all but i'm just asking.
> There
> may be a bug in your revision causing this.
>
> >
> > Anyway, Xmx was 4000m, we tried increasing it to 8000m but did not get
> any
> > improvement in load. I can try monitoring with Jconsole
> > with 8gigs of heap to see if it helps.
> >
> > > Cheers,
> > >
> > > > Hello everyone,
> > > >
> > > > First of all here is our Solr setup:
> > > >
> > > > - Solr nightly build 986158
> > > > - Running solr inside the default jetty comes with solr build
> > > > - 1 write only Master , 4 read only Slaves (quad core 5640 with 24gb
> of
> > > > RAM) - Index replicated (on optimize) to slaves via Solr Replication
> > > > - Size of index is around 2.5gb
> > > > - No incremental writes, index is created from scratch(delete old
> > >
> > > documents
> > >
> > > > -> commit new documents -> optimize)  every 6 hours
> > > > - Avg # of request per second is around 60 (for a single slave)
> > > > - Avg time per request is around 25ms (before having problems)
> > > > - Load on each is slave is around 2
> > > >
> > > > We are using this set-up for months without any problem. However last
> > >
> > > week
> > >
> > > > we started to experience very weird performance problems like :
> > > >
> > > > - Avg time per request increased from 25ms to 200-300ms (even higher
> if
> > >
> > > we
> > >
> > > > don't restart the slaves)
> > > > - Load on each slave increased from 2 to 15-20 (solr uses %400-%600
> > > > cpu)
> > > >
> > > > When we profile solr we see two very strange things :
> > > >
> > > > 1 - This is the jconsole output:
> > > >
> > > > https://skitch.com/meralan/rwwcf/mail-886x691
> > > >
> > > > As you see gc runs for every 10-15 seconds and collects more than 1
> gb
> > > > of memory. (Actually if you wait more than 10 minutes you see spikes
> > > > up to
> > >
> > > 4gb
> > >
> > > > consistently)
> > > >
> > > > 2 - This is the newrelic output :
> > > >
> > > > https://skitch.com/meralan/rwwci/solr-requests-solr-new-relic-rpm
> > > >
> > > > As you see solr spent ridiculously long time in
> > > > SolrDispatchFilter.doFilter() method.
> > > >
> > > >
> > > > Apart form these, when we clean the index directory, re-replicate and
> > > > restart  each slave one by one we see a relief in the system but
> after
> > >
> > > some
> > >
> > > > time servers start to melt down again. Although deleting index and
> > > > replicating doesn't solve the problem, we think that these problems
> are
> > > > somehow related to replication. Because symptoms started after
> > >
> > > replication
> > >
> > > > and once it heals itself after replication. I also see
> > > > lucene-write.lock files in slaves (we don't have write.lock files in
> > > > the master) which I think we shouldn't see.
> > > >
> > > >
> > > > If anyone can give any sort of ideas, we will appreciate it.
> > > >
> > > > Regards,
> > > > Dogacan Guney
>



-- 
Doğacan Güney


Re: Solr performance issue

2011-03-14 Thread Jonathan Rochkind
I've definitely had cases in 1.4.1 where even though I didn't have an 
OOM error, Solr was being weirdly slow, and increasing the JVM heap size 
fixed it.  I can't explain why it happened, or exactly how you'd know 
this was going on, I didn't see anything odd in the logs to indicate, I 
just tried increasing the JVM heap to see what happened, and it worked 
great.


The one case I remember specifically is when I was using the 
StatsComponent, with a stats.facet.  Pathologically slow, increasing 
heap magically made it go down to negligible again.


On 3/14/2011 3:38 PM, Markus Jelsma wrote:

Hello,

2011/3/14 Markus Jelsma


Hi Doğacan,

Are you, at some point, running out of heap space? In my experience,
that's the common cause of increased load and excessivly high response
times (or time
outs).

How much of a heap size would be enough? Our index size is growing slowly
but we did not have this problem
a couple weeks ago where index size was maybe 100mb smaller.

Telling how much heap space is needed isn't easy to say. It usually needs to
be increased when you run out of memory and get those nasty OOM errors, are
you getting them?
Replication eventes will increase heap usage due to cache warming queries and
autowarming.


We left most of the caches in solrconfig as default and only increased
filterCache to 1024. We only ask for "id"s (which
are unique) and no other fields during queries (though we do faceting).
Btw, 1.6gb of our index is stored fields (we store
everything for now, even though we do not get them during queries), and
about 1gb of index.

Hmm, it seems 4000 would be enough indeed. What about the fieldCache, are there
a lot of entries? Is there an insanity count? Do you use boost functions?

It might not have anything to do with memory at all but i'm just asking. There
may be a bug in your revision causing this.


Anyway, Xmx was 4000m, we tried increasing it to 8000m but did not get any
improvement in load. I can try monitoring with Jconsole
with 8gigs of heap to see if it helps.


Cheers,


Hello everyone,

First of all here is our Solr setup:

- Solr nightly build 986158
- Running solr inside the default jetty comes with solr build
- 1 write only Master , 4 read only Slaves (quad core 5640 with 24gb of
RAM) - Index replicated (on optimize) to slaves via Solr Replication
- Size of index is around 2.5gb
- No incremental writes, index is created from scratch(delete old

documents


->  commit new documents ->  optimize)  every 6 hours
- Avg # of request per second is around 60 (for a single slave)
- Avg time per request is around 25ms (before having problems)
- Load on each is slave is around 2

We are using this set-up for months without any problem. However last

week


we started to experience very weird performance problems like :

- Avg time per request increased from 25ms to 200-300ms (even higher if

we


don't restart the slaves)
- Load on each slave increased from 2 to 15-20 (solr uses %400-%600
cpu)

When we profile solr we see two very strange things :

1 - This is the jconsole output:

https://skitch.com/meralan/rwwcf/mail-886x691

As you see gc runs for every 10-15 seconds and collects more than 1 gb
of memory. (Actually if you wait more than 10 minutes you see spikes
up to

4gb


consistently)

2 - This is the newrelic output :

https://skitch.com/meralan/rwwci/solr-requests-solr-new-relic-rpm

As you see solr spent ridiculously long time in
SolrDispatchFilter.doFilter() method.


Apart form these, when we clean the index directory, re-replicate and
restart  each slave one by one we see a relief in the system but after

some


time servers start to melt down again. Although deleting index and
replicating doesn't solve the problem, we think that these problems are
somehow related to replication. Because symptoms started after

replication


and once it heals itself after replication. I also see
lucene-write.lock files in slaves (we don't have write.lock files in
the master) which I think we shouldn't see.


If anyone can give any sort of ideas, we will appreciate it.

Regards,
Dogacan Guney


Re: Solr performance issue

2011-03-14 Thread Markus Jelsma
> Hello,
> 
> 2011/3/14 Markus Jelsma 
> 
> > Hi Doğacan,
> > 
> > Are you, at some point, running out of heap space? In my experience,
> > that's the common cause of increased load and excessivly high response
> > times (or time
> > outs).
> 
> How much of a heap size would be enough? Our index size is growing slowly
> but we did not have this problem
> a couple weeks ago where index size was maybe 100mb smaller.

Telling how much heap space is needed isn't easy to say. It usually needs to 
be increased when you run out of memory and get those nasty OOM errors, are 
you getting them? 
Replication eventes will increase heap usage due to cache warming queries and 
autowarming.

> 
> We left most of the caches in solrconfig as default and only increased
> filterCache to 1024. We only ask for "id"s (which
> are unique) and no other fields during queries (though we do faceting).
> Btw, 1.6gb of our index is stored fields (we store
> everything for now, even though we do not get them during queries), and
> about 1gb of index.

Hmm, it seems 4000 would be enough indeed. What about the fieldCache, are there 
a lot of entries? Is there an insanity count? Do you use boost functions?

It might not have anything to do with memory at all but i'm just asking. There 
may be a bug in your revision causing this.

> 
> Anyway, Xmx was 4000m, we tried increasing it to 8000m but did not get any
> improvement in load. I can try monitoring with Jconsole
> with 8gigs of heap to see if it helps.
> 
> > Cheers,
> > 
> > > Hello everyone,
> > > 
> > > First of all here is our Solr setup:
> > > 
> > > - Solr nightly build 986158
> > > - Running solr inside the default jetty comes with solr build
> > > - 1 write only Master , 4 read only Slaves (quad core 5640 with 24gb of
> > > RAM) - Index replicated (on optimize) to slaves via Solr Replication
> > > - Size of index is around 2.5gb
> > > - No incremental writes, index is created from scratch(delete old
> > 
> > documents
> > 
> > > -> commit new documents -> optimize)  every 6 hours
> > > - Avg # of request per second is around 60 (for a single slave)
> > > - Avg time per request is around 25ms (before having problems)
> > > - Load on each is slave is around 2
> > > 
> > > We are using this set-up for months without any problem. However last
> > 
> > week
> > 
> > > we started to experience very weird performance problems like :
> > > 
> > > - Avg time per request increased from 25ms to 200-300ms (even higher if
> > 
> > we
> > 
> > > don't restart the slaves)
> > > - Load on each slave increased from 2 to 15-20 (solr uses %400-%600
> > > cpu)
> > > 
> > > When we profile solr we see two very strange things :
> > > 
> > > 1 - This is the jconsole output:
> > > 
> > > https://skitch.com/meralan/rwwcf/mail-886x691
> > > 
> > > As you see gc runs for every 10-15 seconds and collects more than 1 gb
> > > of memory. (Actually if you wait more than 10 minutes you see spikes
> > > up to
> > 
> > 4gb
> > 
> > > consistently)
> > > 
> > > 2 - This is the newrelic output :
> > > 
> > > https://skitch.com/meralan/rwwci/solr-requests-solr-new-relic-rpm
> > > 
> > > As you see solr spent ridiculously long time in
> > > SolrDispatchFilter.doFilter() method.
> > > 
> > > 
> > > Apart form these, when we clean the index directory, re-replicate and
> > > restart  each slave one by one we see a relief in the system but after
> > 
> > some
> > 
> > > time servers start to melt down again. Although deleting index and
> > > replicating doesn't solve the problem, we think that these problems are
> > > somehow related to replication. Because symptoms started after
> > 
> > replication
> > 
> > > and once it heals itself after replication. I also see
> > > lucene-write.lock files in slaves (we don't have write.lock files in
> > > the master) which I think we shouldn't see.
> > > 
> > > 
> > > If anyone can give any sort of ideas, we will appreciate it.
> > > 
> > > Regards,
> > > Dogacan Guney


Re: Solr performance issue

2011-03-14 Thread Doğacan Güney
Hello,

2011/3/14 Markus Jelsma 

> Hi Doğacan,
>
> Are you, at some point, running out of heap space? In my experience, that's
> the common cause of increased load and excessivly high response times (or
> time
> outs).
>
>
How much of a heap size would be enough? Our index size is growing slowly
but we did not have this problem
a couple weeks ago where index size was maybe 100mb smaller.

We left most of the caches in solrconfig as default and only increased
filterCache to 1024. We only ask for "id"s (which
are unique) and no other fields during queries (though we do faceting). Btw,
1.6gb of our index is stored fields (we store
everything for now, even though we do not get them during queries), and
about 1gb of index.

Anyway, Xmx was 4000m, we tried increasing it to 8000m but did not get any
improvement in load. I can try monitoring with Jconsole
with 8gigs of heap to see if it helps.


> Cheers,
>
> > Hello everyone,
> >
> > First of all here is our Solr setup:
> >
> > - Solr nightly build 986158
> > - Running solr inside the default jetty comes with solr build
> > - 1 write only Master , 4 read only Slaves (quad core 5640 with 24gb of
> > RAM) - Index replicated (on optimize) to slaves via Solr Replication
> > - Size of index is around 2.5gb
> > - No incremental writes, index is created from scratch(delete old
> documents
> > -> commit new documents -> optimize)  every 6 hours
> > - Avg # of request per second is around 60 (for a single slave)
> > - Avg time per request is around 25ms (before having problems)
> > - Load on each is slave is around 2
> >
> > We are using this set-up for months without any problem. However last
> week
> > we started to experience very weird performance problems like :
> >
> > - Avg time per request increased from 25ms to 200-300ms (even higher if
> we
> > don't restart the slaves)
> > - Load on each slave increased from 2 to 15-20 (solr uses %400-%600 cpu)
> >
> > When we profile solr we see two very strange things :
> >
> > 1 - This is the jconsole output:
> >
> > https://skitch.com/meralan/rwwcf/mail-886x691
> >
> > As you see gc runs for every 10-15 seconds and collects more than 1 gb of
> > memory. (Actually if you wait more than 10 minutes you see spikes up to
> 4gb
> > consistently)
> >
> > 2 - This is the newrelic output :
> >
> > https://skitch.com/meralan/rwwci/solr-requests-solr-new-relic-rpm
> >
> > As you see solr spent ridiculously long time in
> > SolrDispatchFilter.doFilter() method.
> >
> >
> > Apart form these, when we clean the index directory, re-replicate and
> > restart  each slave one by one we see a relief in the system but after
> some
> > time servers start to melt down again. Although deleting index and
> > replicating doesn't solve the problem, we think that these problems are
> > somehow related to replication. Because symptoms started after
> replication
> > and once it heals itself after replication. I also see lucene-write.lock
> > files in slaves (we don't have write.lock files in the master) which I
> > think we shouldn't see.
> >
> >
> > If anyone can give any sort of ideas, we will appreciate it.
> >
> > Regards,
> > Dogacan Guney
>



-- 
Doğacan Güney


Re: Solr performance issue

2011-03-14 Thread Markus Jelsma
Hi Doğacan,

Are you, at some point, running out of heap space? In my experience, that's 
the common cause of increased load and excessivly high response times (or time 
outs).

Cheers,

> Hello everyone,
> 
> First of all here is our Solr setup:
> 
> - Solr nightly build 986158
> - Running solr inside the default jetty comes with solr build
> - 1 write only Master , 4 read only Slaves (quad core 5640 with 24gb of
> RAM) - Index replicated (on optimize) to slaves via Solr Replication
> - Size of index is around 2.5gb
> - No incremental writes, index is created from scratch(delete old documents
> -> commit new documents -> optimize)  every 6 hours
> - Avg # of request per second is around 60 (for a single slave)
> - Avg time per request is around 25ms (before having problems)
> - Load on each is slave is around 2
> 
> We are using this set-up for months without any problem. However last week
> we started to experience very weird performance problems like :
> 
> - Avg time per request increased from 25ms to 200-300ms (even higher if we
> don't restart the slaves)
> - Load on each slave increased from 2 to 15-20 (solr uses %400-%600 cpu)
> 
> When we profile solr we see two very strange things :
> 
> 1 - This is the jconsole output:
> 
> https://skitch.com/meralan/rwwcf/mail-886x691
> 
> As you see gc runs for every 10-15 seconds and collects more than 1 gb of
> memory. (Actually if you wait more than 10 minutes you see spikes up to 4gb
> consistently)
> 
> 2 - This is the newrelic output :
> 
> https://skitch.com/meralan/rwwci/solr-requests-solr-new-relic-rpm
> 
> As you see solr spent ridiculously long time in
> SolrDispatchFilter.doFilter() method.
> 
> 
> Apart form these, when we clean the index directory, re-replicate and
> restart  each slave one by one we see a relief in the system but after some
> time servers start to melt down again. Although deleting index and
> replicating doesn't solve the problem, we think that these problems are
> somehow related to replication. Because symptoms started after replication
> and once it heals itself after replication. I also see lucene-write.lock
> files in slaves (we don't have write.lock files in the master) which I
> think we shouldn't see.
> 
> 
> If anyone can give any sort of ideas, we will appreciate it.
> 
> Regards,
> Dogacan Guney


Solr performance issue

2011-03-14 Thread Doğacan Güney
Hello everyone,

First of all here is our Solr setup:

- Solr nightly build 986158
- Running solr inside the default jetty comes with solr build
- 1 write only Master , 4 read only Slaves (quad core 5640 with 24gb of RAM)
- Index replicated (on optimize) to slaves via Solr Replication
- Size of index is around 2.5gb
- No incremental writes, index is created from scratch(delete old documents
-> commit new documents -> optimize)  every 6 hours
- Avg # of request per second is around 60 (for a single slave)
- Avg time per request is around 25ms (before having problems)
- Load on each is slave is around 2

We are using this set-up for months without any problem. However last week
we started to experience very weird performance problems like :

- Avg time per request increased from 25ms to 200-300ms (even higher if we
don't restart the slaves)
- Load on each slave increased from 2 to 15-20 (solr uses %400-%600 cpu)

When we profile solr we see two very strange things :

1 - This is the jconsole output:

https://skitch.com/meralan/rwwcf/mail-886x691

As you see gc runs for every 10-15 seconds and collects more than 1 gb of
memory. (Actually if you wait more than 10 minutes you see spikes up to 4gb
consistently)

2 - This is the newrelic output :

https://skitch.com/meralan/rwwci/solr-requests-solr-new-relic-rpm

As you see solr spent ridiculously long time in
SolrDispatchFilter.doFilter() method.


Apart form these, when we clean the index directory, re-replicate and
restart  each slave one by one we see a relief in the system but after some
time servers start to melt down again. Although deleting index and
replicating doesn't solve the problem, we think that these problems are
somehow related to replication. Because symptoms started after replication
and once it heals itself after replication. I also see lucene-write.lock
files in slaves (we don't have write.lock files in the master) which I think
we shouldn't see.


If anyone can give any sort of ideas, we will appreciate it.

Regards,
Dogacan Guney