Re: Solr performance issue
On 2/15/2018 2:00 AM, Srinivas Kashyap wrote: > I have implemented 'SortedMapBackedCache' in my SqlEntityProcessor for the > child entities in data-config.xml. And i'm using the same for full-import > only. And in the beginning of my implementation, i had written delta-import > query to index the modified changes. But my requirement grew and i have 17 > child entities for a single parent entity now. When doing delta-import for > huge data, the number of requests being made to datasource(database) became > more and CPU utilization was 100% when concurrent users started modifying the > data. For this instead of calling delta-import which imports based on last > index time, I did full-import('SortedMapBackedCache' ) based on last index > time. > > Though the parent entity query would return only records that are modified, > the child entity queries pull all the data from the database and the indexing > happens 'in-memory' which is causing the JVM memory go out of memory. Can you provide your DIH config file (with passwords redacted) and the precise URL you are using to initiate dataimport? Also, I would like to know what field you have defined as your uniqueKey. I may have more questions about the data in your system, depending on what I see. That cache implementation should only cache entries from the database that are actually requested. If your query is correctly defined, it should not pull all records from the DB table. > Is there a way to specify in the child query entity to pull the record > related to parent entity in the full-import mode. If I am understanding your question correctly, this is one of the fairly basic things that DIH does. Look at this config example in the reference guide: https://lucene.apache.org/solr/guide/6_6/uploading-structured-data-store-data-with-the-data-import-handler.html#configuring-the-dih-configuration-file In the entity named feature in that example config, the query string uses ${item.ID} to reference the ID column from the parent entity, which is item. I should warn you that a cached entity does not always improve performance. This is particularly true if the lookup into the cache is the information that goes to your uniqueKey field. When the lookup is by uniqueKey, every single row requested from the database will be used exactly once, so there's not really any point to caching it. Thanks, Shawn
Re: Solr performance issue
Srinivas: Not an answer to your question, but when DIH starts getting this complicated, I start to seriously think about SolrJ, see: https://lucidworks.com/2012/02/14/indexing-with-solrj/ IN particular, it moves the heavy lifting of acquiring the data from a Solr node (which I'm assuming also has to index docs) to "some client". It also let's you play some tricks with the code to make things faster. Best, Erick On Thu, Feb 15, 2018 at 1:00 AM, Srinivas Kashyap wrote: > Hi, > > I have implemented 'SortedMapBackedCache' in my SqlEntityProcessor for the > child entities in data-config.xml. And i'm using the same for full-import > only. And in the beginning of my implementation, i had written delta-import > query to index the modified changes. But my requirement grew and i have 17 > child entities for a single parent entity now. When doing delta-import for > huge data, the number of requests being made to datasource(database) became > more and CPU utilization was 100% when concurrent users started modifying the > data. For this instead of calling delta-import which imports based on last > index time, I did full-import('SortedMapBackedCache' ) based on last index > time. > > Though the parent entity query would return only records that are modified, > the child entity queries pull all the data from the database and the indexing > happens 'in-memory' which is causing the JVM memory go out of memory. > > Is there a way to specify in the child query entity to pull the record > related to parent entity in the full-import mode. > > Thanks and Regards, > Srinivas Kashyap > > DISCLAIMER: > E-mails and attachments from TradeStone Software, Inc. are confidential. > If you are not the intended recipient, please notify the sender immediately by > replying to the e-mail, and then delete it without making copies or using it > in any way. No representation is made that this email or any attachments are > free of viruses. Virus scanning is recommended and is the responsibility of > the recipient.
Solr performance issue
Hi, I have implemented 'SortedMapBackedCache' in my SqlEntityProcessor for the child entities in data-config.xml. And i'm using the same for full-import only. And in the beginning of my implementation, i had written delta-import query to index the modified changes. But my requirement grew and i have 17 child entities for a single parent entity now. When doing delta-import for huge data, the number of requests being made to datasource(database) became more and CPU utilization was 100% when concurrent users started modifying the data. For this instead of calling delta-import which imports based on last index time, I did full-import('SortedMapBackedCache' ) based on last index time. Though the parent entity query would return only records that are modified, the child entity queries pull all the data from the database and the indexing happens 'in-memory' which is causing the JVM memory go out of memory. Is there a way to specify in the child query entity to pull the record related to parent entity in the full-import mode. Thanks and Regards, Srinivas Kashyap DISCLAIMER: E-mails and attachments from TradeStone Software, Inc. are confidential. If you are not the intended recipient, please notify the sender immediately by replying to the e-mail, and then delete it without making copies or using it in any way. No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient.
Re: Solr performance issue on querying --> Solr 6.5.1
Hi Erick, As suggested, I did try nonHDFS solr cloud instance and it response looks to be really better. From the configuration side to, I am mostly using default configurations and with block.cache.direct.memory.allocation as false. On analysis of hdfs cache, evictions seems to be on higher side. Thanks, Arun -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Solr performance issue on querying --> Solr 6.5.1
Hi Arun, It is hard to measure something without affecting it, but we could use debug results and combine with QTime without debug: If we ignore merging results, it seems that majority of time is spent for retrieving docs (~500ms). You should consider reducing number of rows if you want better response time (you can ask for rows=0 to see max possible time). Also, as Erick suggested, reducing number of shards (1 if not plan much more doc) will trim some overhead of merging results. Thanks, Emir I noticed that you removed bq - is time with bq acceptable as well? > On 27 Sep 2017, at 12:34, sasarun wrote: > > Hi Emir, > > Please find the response without bq parameter and debugQuery set to true. > Also it was noted that Qtime comes down drastically without the debug > parameter to about 700-800. > > > true > 0 > 3446 > > > ("hybrid electric powerplant" "hybrid electric powerplants" "Electric" > "Electrical" "Electricity" "Engine" "fuel economy" "fuel efficiency" "Hybrid > Electric Propulsion" "Power Systems" "Powerplant" "Propulsion" "hybrid" > "hybrid electric" "electric powerplant") > > edismax > on > > host > title > url > customContent > contentSpecificSearch > > > id > contentOntologyTagsCount > > 0 > OR > 3985d7e2-3e54-48d8-8336-229e85f5d9de > 600 > true > > > maxScore="56.74194">... > > > > solr-prd-cluster-m-GooglePatent_shard4_replica2-1506504238282-20 > > > > 35 > 159 > GET_TOP_IDS > 41294 > ... > > > 29 > 165 > GET_TOP_IDS > 40980 > ... > > > 31 > 200 > GET_TOP_IDS > 41006 > ... > > > 43 > 208 > GET_TOP_IDS > 41040 > ... > > > 181 > 466 > GET_TOP_IDS > 41138 > ... > > > > > 1518 > 1523 > GET_FIELDS,GET_DEBUG > 110 > ... > > > 1562 > 1573 > GET_FIELDS,GET_DEBUG > 115 > ... > > > 1793 > 1800 > GET_FIELDS,GET_DEBUG > 120 > ... > > > 2153 > 2161 > GET_FIELDS,GET_DEBUG > 125 > ... > > > 2957 > 2970 > GET_FIELDS,GET_DEBUG > 130 > ... > > > > > 10302.0 > > 2.0 > > 2.0 > > > 0.0 > > > 0.0 > > > 0.0 > > > 0.0 > > > 0.0 > > > 0.0 > > > 0.0 > > > 0.0 > > > > 10288.0 > > 661.0 > > > 0.0 > > > 0.0 > > > 0.0 > > > 0.0 > > > 0.0 > > > 0.0 > > > 0.0 > > > 9627.0 > > > > > ("hybrid electric powerplant" "hybrid electric powerplants" "Electric" > "Electrical" "Electricity" "Engine" "fuel economy" "fuel efficiency" "Hybrid > Electric Propulsion" "Power Systems" "Powerplant" "Propulsion" "hybrid" > "hybrid electric" "electric powerplant") > > > ("hybrid electric powerplant" "hybrid electric powerplants" "Electric" > "Electrical" "Electricity" "Engine" "fuel economy" "fuel efficiency" "Hybrid > Electric Propulsion" "Power Systems" "Powerplant" "Propulsion" "hybrid" > "hybrid electric" "electric powerplant") > > > (+(DisjunctionMaxQuery((host:hybrid electric powerplant | > contentSpecificSearch:"hybrid electric powerplant" | customContent:"hybrid > electric powerplant" | title:hybrid electric powerplant | url:hybrid > electric powerplant)) DisjunctionMaxQuery((host:hybrid electric powerplants > | contentSpecificSearch:"hybrid electric powerplants" | > customContent:"hybrid electric powerplants" | title:hybrid electric > powerplants | url:hybrid electric powerplants)) > DisjunctionMaxQuery((host:Electric | contentSpecificSearch:electric | > customContent:electric | title:Electric | url:Electric)) > DisjunctionMaxQuery((host:Electrical | contentSpecificSearch:electrical | > customContent:electrical | title:Electrical | url:Electrical)) > DisjunctionMaxQuery((host:Electricity | contentSpecificSearch:electricity | > customContent:electricity | title:Electricity | url:Electricity)) > DisjunctionMaxQuery((host:Engine | contentSpecificSearch:engine | > customContent:engine | title:Engine | url:Engine)) > DisjunctionMaxQuery((host:fuel economy | contentSpecificSearch:"fuel > economy" | customContent:"fuel economy" | title:fuel economy | url:fuel > economy)) DisjunctionMaxQuery((host:fuel efficiency | > contentSpecificSearch:"fuel efficiency" | customContent:"fuel efficiency" | > title:fuel efficiency | url:fuel efficiency)) > DisjunctionMaxQuery((host:Hybrid Electric Propulsion | > contentSpecificSearch:"hybrid electric propulsion" | customContent:"hybrid > electric propulsion" | title:Hybrid Electric Propulsion | url:Hybrid > Electric Propulsion)) DisjunctionMaxQuery((host:Power Systems | > contentSpecificSearch:"power systems" | customContent:"power systems" | > title:Power Systems | url:Power Systems)) > DisjunctionMaxQuery((host:Powerplant | contentSpecificSearch:powerplant | > customContent:powerplant | title:Powerplant | url:Powerplant)) > DisjunctionMaxQuery((host:Propulsion | contentSpecificSearch:propulsion | > customContent:propulsion | title:Propulsion | url:Propulsion)) > DisjunctionMaxQuery((host:hybrid | contentSpecificSearch:hybrid | > customContent:hybrid | title:hybrid | url:hybrid)) > DisjunctionMaxQuery((host:hybrid electric | contentSpecificSearch:"hybrid > electric" | customContent:"hybrid electric" | title:h
Re: Solr performance issue on querying --> Solr 6.5.1
Hi Emir, Please find the response without bq parameter and debugQuery set to true. Also it was noted that Qtime comes down drastically without the debug parameter to about 700-800. true 0 3446 ("hybrid electric powerplant" "hybrid electric powerplants" "Electric" "Electrical" "Electricity" "Engine" "fuel economy" "fuel efficiency" "Hybrid Electric Propulsion" "Power Systems" "Powerplant" "Propulsion" "hybrid" "hybrid electric" "electric powerplant") edismax on host title url customContent contentSpecificSearch id contentOntologyTagsCount 0 OR 3985d7e2-3e54-48d8-8336-229e85f5d9de 600 true ... solr-prd-cluster-m-GooglePatent_shard4_replica2-1506504238282-20 35 159 GET_TOP_IDS 41294 ... 29 165 GET_TOP_IDS 40980 ... 31 200 GET_TOP_IDS 41006 ... 43 208 GET_TOP_IDS 41040 ... 181 466 GET_TOP_IDS 41138 ... 1518 1523 GET_FIELDS,GET_DEBUG 110 ... 1562 1573 GET_FIELDS,GET_DEBUG 115 ... 1793 1800 GET_FIELDS,GET_DEBUG 120 ... 2153 2161 GET_FIELDS,GET_DEBUG 125 ... 2957 2970 GET_FIELDS,GET_DEBUG 130 ... 10302.0 2.0 2.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 10288.0 661.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 9627.0 ("hybrid electric powerplant" "hybrid electric powerplants" "Electric" "Electrical" "Electricity" "Engine" "fuel economy" "fuel efficiency" "Hybrid Electric Propulsion" "Power Systems" "Powerplant" "Propulsion" "hybrid" "hybrid electric" "electric powerplant") ("hybrid electric powerplant" "hybrid electric powerplants" "Electric" "Electrical" "Electricity" "Engine" "fuel economy" "fuel efficiency" "Hybrid Electric Propulsion" "Power Systems" "Powerplant" "Propulsion" "hybrid" "hybrid electric" "electric powerplant") (+(DisjunctionMaxQuery((host:hybrid electric powerplant | contentSpecificSearch:"hybrid electric powerplant" | customContent:"hybrid electric powerplant" | title:hybrid electric powerplant | url:hybrid electric powerplant)) DisjunctionMaxQuery((host:hybrid electric powerplants | contentSpecificSearch:"hybrid electric powerplants" | customContent:"hybrid electric powerplants" | title:hybrid electric powerplants | url:hybrid electric powerplants)) DisjunctionMaxQuery((host:Electric | contentSpecificSearch:electric | customContent:electric | title:Electric | url:Electric)) DisjunctionMaxQuery((host:Electrical | contentSpecificSearch:electrical | customContent:electrical | title:Electrical | url:Electrical)) DisjunctionMaxQuery((host:Electricity | contentSpecificSearch:electricity | customContent:electricity | title:Electricity | url:Electricity)) DisjunctionMaxQuery((host:Engine | contentSpecificSearch:engine | customContent:engine | title:Engine | url:Engine)) DisjunctionMaxQuery((host:fuel economy | contentSpecificSearch:"fuel economy" | customContent:"fuel economy" | title:fuel economy | url:fuel economy)) DisjunctionMaxQuery((host:fuel efficiency | contentSpecificSearch:"fuel efficiency" | customContent:"fuel efficiency" | title:fuel efficiency | url:fuel efficiency)) DisjunctionMaxQuery((host:Hybrid Electric Propulsion | contentSpecificSearch:"hybrid electric propulsion" | customContent:"hybrid electric propulsion" | title:Hybrid Electric Propulsion | url:Hybrid Electric Propulsion)) DisjunctionMaxQuery((host:Power Systems | contentSpecificSearch:"power systems" | customContent:"power systems" | title:Power Systems | url:Power Systems)) DisjunctionMaxQuery((host:Powerplant | contentSpecificSearch:powerplant | customContent:powerplant | title:Powerplant | url:Powerplant)) DisjunctionMaxQuery((host:Propulsion | contentSpecificSearch:propulsion | customContent:propulsion | title:Propulsion | url:Propulsion)) DisjunctionMaxQuery((host:hybrid | contentSpecificSearch:hybrid | customContent:hybrid | title:hybrid | url:hybrid)) DisjunctionMaxQuery((host:hybrid electric | contentSpecificSearch:"hybrid electric" | customContent:"hybrid electric" | title:hybrid electric | url:hybrid electric)) DisjunctionMaxQuery((host:electric powerplant | contentSpecificSearch:"electric powerplant" | customContent:"electric powerplant" | title:electric powerplant | url:electric powerplant/no_coord +((host:hybrid electric powerplant | contentSpecificSearch:"hybrid electric powerplant" | customContent:"hybrid electric powerplant" | title:hybrid electric powerplant | url:hybrid electric powerplant) (host:hybrid electric powerplants | contentSpecificSearch:"hybrid electric powerplants" | customContent:"hybrid electric powerplants" | title:hybrid electric powerplants | url:hybrid electric powerplants) (host:Electric | contentSpecificSearch:electric | customContent:electric | title:Electric | url:Electric) (host:Electrical | contentSpecificSearch:electrical | customContent:electrical | title:Electrical | url:Electrical) (host:Electricity | contentSpecificSearch:electricity | customContent:electricity | title:Electricity | url:Electricity) (host:Engine | contentSpecificSearch:engine | customContent:engine | title:Engine | url:Engine) (host:fuel econ
Re: Solr performance issue on querying --> Solr 6.5.1
Hi Erick, Qtime comes down with rows set as 1. Also it was noted that qtime comes down when debug parameter is not added with the query. It comes to about 900. Thanks, Arun -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Solr performance issue on querying --> Solr 6.5.1
On Tue, 2017-09-26 at 07:43 -0700, sasarun wrote: > Allocated heap size for young generation is about 8 gb and old > generation is about 24 gb. And gc analysis showed peak > size utlisation is really low compared to these values. That does not come as a surprise. Your collections would normally be considered small, if not tiny, looking only at their size measured in bytes. Again, if you expect them to grow significantly (more than 10x), your allocation might make sense. If you do not expect such a growth in the near future, you will be better off with a much smaller heap: The peak heap utilization that you have logged (or twice that to err on the cautious side) seems a good starting point. And whatever you do, don't set Xmx to 32GB. Use <31GB or significantly more than 32GB: https://blog.codecentric.de/en/2014/02/35gb-heap-less-32gb-java-jvm-mem ory-oddities/ Are you indexing while you search? If so, you need to set auto-warm or state a few explicit warmup-queries. If not, your measuring will not be representative as it will be on first-searches, which are always slower than warmed-searches. - Toke Eskildsen, Royal Danish Library
Re: Solr performance issue on querying --> Solr 6.5.1
Hi Arun, This is not the most simple query either - a dozen of phrase queries on several fields + the same query as bq. Can you provide debugQuery info. I did not look much into debug times and what includes what, but one thing that is strange to me is that QTime is 4s while query in debug is 1.3s. Can you try running without bq? Can you include boost factors in the main query? Thanks, Emir > On 26 Sep 2017, at 16:43, sasarun wrote: > > Hi All, > I have been using Solr for some time now but mostly in standalone mode. Now > my current project is using Solr 6.5.1 hosted on hadoop. My solrconfig.xml > has the following configuration. In the prod environment the performance on > querying seems to really slow. Can anyone help me with few pointers on > howimprove on the same. > > >${solr.hdfs.home:} > name="solr.hdfs.blockcache.enabled">${solr.hdfs.blockcache.enabled:true} > name="solr.hdfs.blockcache.slab.count">${solr.hdfs.blockcache.slab.count:1} > name="solr.hdfs.blockcache.direct.memory.allocation">${solr.hdfs.blockcache.direct.memory.allocation:false} > name="solr.hdfs.blockcache.blocksperbank">${solr.hdfs.blockcache.blocksperbank:16384} > name="solr.hdfs.blockcache.read.enabled">${solr.hdfs.blockcache.read.enabled:true} > name="solr.hdfs.blockcache.write.enabled">${solr.hdfs.blockcache.write.enabled:false} > name="solr.hdfs.nrtcachingdirectory.enable">${solr.hdfs.nrtcachingdirectory.enable:true} > name="solr.hdfs.nrtcachingdirectory.maxmergesizemb">${solr.hdfs.nrtcachingdirectory.maxmergesizemb:16} > name="solr.hdfs.nrtcachingdirectory.maxcachedmb">${solr.hdfs.nrtcachingdirectory.maxcachedmb:192} > >hdfs > It has 6 collections of following size > Collection 1 -->6.41 MB > Collection 2 -->634.51 KB > Collection 3 -->4.59 MB > Collection 4 -->1,020.56 MB > Collection 5 --> 607.26 MB > Collection 6 -->102.4 kb > Each Collection has 5 shards each. Allocated heap size for young generation > is about 8 gb and old generation is about 24 gb. And gc analysis showed peak > size > utlisation is really low compared to these values. > But querying to Collection 4 and collection 5 is giving really slow response > even thoughwe are not using any complex queries.Output of debug quries run > with debug=timing > are given below for reference. Can anyone help suggest a way improve the > performance. > > Response to query > > > true > 0 > 3962 > > > ("hybrid electric powerplant" "hybrid electric powerplants" "Electric" > "Electrical" "Electricity" "Engine" "fuel economy" "fuel efficiency" "Hybrid > Electric Propulsion" "Power Systems" "Powerplant" "Propulsion" "hybrid" > "hybrid electric" "electric powerplant") > > edismax > true > on > > host > title > url > customContent > contentSpecificSearch > > > id > contentTagsCount > > 0 > OR > OR > 3985d7e2-3e54-48d8-8336-229e85f5d9de > 600 > > ("hybrid electric powerplant"^100.0 "hybrid electric powerplants"^100.0 > "Electric"^50.0 "Electrical"^50.0 "Electricity"^50.0 "Engine"^50.0 "fuel > economy"^50.0 "fuel efficiency"^50.0 "Hybrid Electric Propulsion"^50.0 > "Power Systems"^50.0 "Powerplant"^50.0 "Propulsion"^50.0 "hybrid"^15.0 > "hybrid electric"^15.0 "electric powerplant"^15.0) > > > > > > 15374.0 > > 2.0 > > 2.0 > > > 0.0 > > > 0.0 > > > 0.0 > > > 0.0 > > > 0.0 > > > 0.0 > > > 0.0 > > > 0.0 > > > > 15363.0 > > 1313.0 > > > 0.0 > > > 0.0 > > > 0.0 > > > 0.0 > > > 0.0 > > > 0.0 > > > 0.0 > > > 14048.0 > > > > > > Thanks, > Arun > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Solr performance issue on querying --> Solr 6.5.1
Well, 15 second responses are not what I'd expect either. But two things (just looked again) 1> note that the time to assemble the debug information is a large majority of your total time (14 of 15.3 seconds). 2> you're specifying 600 rows which is quite a lot as each one requires that a 16K block of data be read from disk and decompressed to assemble the "fl" list. so one quick test would be to set rows=1 or something. All that said, the QTime value returned does _not_ include <1> or <2> above and even 4 seconds seems excessive. Best, Erick On Tue, Sep 26, 2017 at 10:54 AM, sasarun wrote: > Hi Erick, > > Thank you for the quick response. Query time was relatively faster once it > is read from memory. But personally I always felt response time could be far > better. As suggested, We will try and set up in a non HDFS environment and > update on the results. > > Thanks, > Arun > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Solr performance issue on querying --> Solr 6.5.1
Hi Erick, Thank you for the quick response. Query time was relatively faster once it is read from memory. But personally I always felt response time could be far better. As suggested, We will try and set up in a non HDFS environment and update on the results. Thanks, Arun -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Solr performance issue on querying --> Solr 6.5.1
Does the query time _stay_ low? Once the data is read from HDFS it should pretty much stay in memory. So my question is whether, once Solr warms up you see this kind of query response time. Have you tried this on a non HDFS system? That would be useful to help figure out where to look. And given the sizes of your collections, unless you expect them to get much larger, there's no reason to shard any of them. Sharding should only really be used when the collections are too big for a single shard as distributed searches inevitably have increased overhead. I expect _at least_ 20M documents/shard, and have seen 200M docs/shard. YMMV of course. Best, Erick On Tue, Sep 26, 2017 at 7:43 AM, sasarun wrote: > Hi All, > I have been using Solr for some time now but mostly in standalone mode. Now > my current project is using Solr 6.5.1 hosted on hadoop. My solrconfig.xml > has the following configuration. In the prod environment the performance on > querying seems to really slow. Can anyone help me with few pointers on > howimprove on the same. > > > ${solr.hdfs.home:} > name="solr.hdfs.blockcache.enabled">${solr.hdfs.blockcache.enabled:true} > name="solr.hdfs.blockcache.slab.count">${solr.hdfs.blockcache.slab.count:1} > name="solr.hdfs.blockcache.direct.memory.allocation">${solr.hdfs.blockcache.direct.memory.allocation:false} > name="solr.hdfs.blockcache.blocksperbank">${solr.hdfs.blockcache.blocksperbank:16384} > name="solr.hdfs.blockcache.read.enabled">${solr.hdfs.blockcache.read.enabled:true} > name="solr.hdfs.blockcache.write.enabled">${solr.hdfs.blockcache.write.enabled:false} > name="solr.hdfs.nrtcachingdirectory.enable">${solr.hdfs.nrtcachingdirectory.enable:true} > name="solr.hdfs.nrtcachingdirectory.maxmergesizemb">${solr.hdfs.nrtcachingdirectory.maxmergesizemb:16} > name="solr.hdfs.nrtcachingdirectory.maxcachedmb">${solr.hdfs.nrtcachingdirectory.maxcachedmb:192} > > hdfs > It has 6 collections of following size > Collection 1 -->6.41 MB > Collection 2 -->634.51 KB > Collection 3 -->4.59 MB > Collection 4 -->1,020.56 MB > Collection 5 --> 607.26 MB > Collection 6 -->102.4 kb > Each Collection has 5 shards each. Allocated heap size for young generation > is about 8 gb and old generation is about 24 gb. And gc analysis showed peak > size > utlisation is really low compared to these values. > But querying to Collection 4 and collection 5 is giving really slow response > even thoughwe are not using any complex queries.Output of debug quries run > with debug=timing > are given below for reference. Can anyone help suggest a way improve the > performance. > > Response to query > > > true > 0 > 3962 > > > ("hybrid electric powerplant" "hybrid electric powerplants" "Electric" > "Electrical" "Electricity" "Engine" "fuel economy" "fuel efficiency" "Hybrid > Electric Propulsion" "Power Systems" "Powerplant" "Propulsion" "hybrid" > "hybrid electric" "electric powerplant") > > edismax > true > on > > host > title > url > customContent > contentSpecificSearch > > > id > contentTagsCount > > 0 > OR > OR > 3985d7e2-3e54-48d8-8336-229e85f5d9de > 600 > > ("hybrid electric powerplant"^100.0 "hybrid electric powerplants"^100.0 > "Electric"^50.0 "Electrical"^50.0 "Electricity"^50.0 "Engine"^50.0 "fuel > economy"^50.0 "fuel efficiency"^50.0 "Hybrid Electric Propulsion"^50.0 > "Power Systems"^50.0 "Powerplant"^50.0 "Propulsion"^50.0 "hybrid"^15.0 > "hybrid electric"^15.0 "electric powerplant"^15.0) > > > > > > 15374.0 > > 2.0 > > 2.0 > > > 0.0 > > > 0.0 > > > 0.0 > > > 0.0 > > > 0.0 > > > 0.0 > > > 0.0 > > > 0.0 > > > > 15363.0 > > 1313.0 > > > 0.0 > > > 0.0 > > > 0.0 > > > 0.0 > > > 0.0 > > > 0.0 > > > 0.0 > > > 14048.0 > > > > > > Thanks, > Arun > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Solr performance issue on querying --> Solr 6.5.1
Hi All, I have been using Solr for some time now but mostly in standalone mode. Now my current project is using Solr 6.5.1 hosted on hadoop. My solrconfig.xml has the following configuration. In the prod environment the performance on querying seems to really slow. Can anyone help me with few pointers on howimprove on the same. ${solr.hdfs.home:} ${solr.hdfs.blockcache.enabled:true} ${solr.hdfs.blockcache.slab.count:1} ${solr.hdfs.blockcache.direct.memory.allocation:false} ${solr.hdfs.blockcache.blocksperbank:16384} ${solr.hdfs.blockcache.read.enabled:true} ${solr.hdfs.blockcache.write.enabled:false} ${solr.hdfs.nrtcachingdirectory.enable:true} ${solr.hdfs.nrtcachingdirectory.maxmergesizemb:16} ${solr.hdfs.nrtcachingdirectory.maxcachedmb:192} hdfs It has 6 collections of following size Collection 1 -->6.41 MB Collection 2 -->634.51 KB Collection 3 -->4.59 MB Collection 4 -->1,020.56 MB Collection 5 --> 607.26 MB Collection 6 -->102.4 kb Each Collection has 5 shards each. Allocated heap size for young generation is about 8 gb and old generation is about 24 gb. And gc analysis showed peak size utlisation is really low compared to these values. But querying to Collection 4 and collection 5 is giving really slow response even thoughwe are not using any complex queries.Output of debug quries run with debug=timing are given below for reference. Can anyone help suggest a way improve the performance. Response to query true 0 3962 ("hybrid electric powerplant" "hybrid electric powerplants" "Electric" "Electrical" "Electricity" "Engine" "fuel economy" "fuel efficiency" "Hybrid Electric Propulsion" "Power Systems" "Powerplant" "Propulsion" "hybrid" "hybrid electric" "electric powerplant") edismax true on host title url customContent contentSpecificSearch id contentTagsCount 0 OR OR 3985d7e2-3e54-48d8-8336-229e85f5d9de 600 ("hybrid electric powerplant"^100.0 "hybrid electric powerplants"^100.0 "Electric"^50.0 "Electrical"^50.0 "Electricity"^50.0 "Engine"^50.0 "fuel economy"^50.0 "fuel efficiency"^50.0 "Hybrid Electric Propulsion"^50.0 "Power Systems"^50.0 "Powerplant"^50.0 "Propulsion"^50.0 "hybrid"^15.0 "hybrid electric"^15.0 "electric powerplant"^15.0) 15374.0 2.0 2.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 15363.0 1313.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 14048.0 Thanks, Arun -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
RE: Solr performance issue on indexing
> Also we will try to decouple tika to solr. +1 -Original Message- From: tstusr [mailto:ulfrhe...@gmail.com] Sent: Friday, March 31, 2017 4:31 PM To: solr-user@lucene.apache.org Subject: Re: Solr performance issue on indexing Hi, thanks for the feedback. Yes, it is about OOM, indeed even solr instance makes unavailable. As I was saying I can't find more relevant information on logs. We're are able to increment JVM amout, so, the first thing we'll do will be that. As far as I know, all documents are bounded to that amount (14K), just the processing could change. We are making some tests on indexing and it seems it works without concurrent threads. Also we will try to decouple tika to solr. By the way, make it available with solr cloud will improve performance? Or there will be no perceptible improvement? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-performance-issue-on-indexing-tp4327886p4327914.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr performance issue on indexing
If, by chance, the docs you're sending get routed to different Solr nodes then all the processing is in parallel. I don't know if there's a good way to insure that the docs get sent to different replicas on different Solr instances. You could try addressing specific Solr replicas, something like "blah blah/solr/collection1_shard1_replica1/export" but I'm not totally sure that'll do what you want either. But that still doesn't decouple Tika from the Solr instances running those replicas. So if Tika has a problem it has the potential to bring the Solr node down. Best, Erick On Fri, Mar 31, 2017 at 1:31 PM, tstusr wrote: > Hi, thanks for the feedback. > > Yes, it is about OOM, indeed even solr instance makes unavailable. As I was > saying I can't find more relevant information on logs. > > We're are able to increment JVM amout, so, the first thing we'll do will be > that. > > As far as I know, all documents are bounded to that amount (14K), just the > processing could change. We are making some tests on indexing and it seems > it works without concurrent threads. Also we will try to decouple tika to > solr. > > By the way, make it available with solr cloud will improve performance? Or > there will be no perceptible improvement? > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Solr-performance-issue-on-indexing-tp4327886p4327914.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr performance issue on indexing
Hi, thanks for the feedback. Yes, it is about OOM, indeed even solr instance makes unavailable. As I was saying I can't find more relevant information on logs. We're are able to increment JVM amout, so, the first thing we'll do will be that. As far as I know, all documents are bounded to that amount (14K), just the processing could change. We are making some tests on indexing and it seems it works without concurrent threads. Also we will try to decouple tika to solr. By the way, make it available with solr cloud will improve performance? Or there will be no perceptible improvement? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-performance-issue-on-indexing-tp4327886p4327914.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr performance issue on indexing
First, running multiple threads with PDF files to a Solr running 4G of JVM is...ambitious. You say it crashes; how? OOMs? Second while the extracting request handler is a fine way to get up and running, any problems with Tika will affect Solr. Tika does a great job of extraction, but there are so many variants of so many file formats that this scenario isn' recommended for production. Consider extracting the PDF on a client and sending the docs to Solr. Tika can run as a server also so you aren't coupling Solr and Tika. For a sample SolrJ program, see: https://lucidworks.com/2012/02/14/indexing-with-solrj/ Best, Erick On Fri, Mar 31, 2017 at 10:44 AM, tstusr wrote: > Hi there. > > We are currently indexing some PDF files, the main handler to index is > /extract where we perform simple processing (extract relevant fields and > store on some fields). > > The PDF files are about 10M~100M size and we have to have available the text > extracted. So, everything works correct on test stages, but when we try to > index all the 14K files (around 120Gb) on a client application that only > sends http curls through 3-4 concurrent threads to /extract handler it > crashes. I can't find some relevant information about on solr logs (We > checked in server/logs & in core_dir/tlog). > > My question is about performance. I think it is a small amount of info we > are processing, the deploy scenario is in a docker container with 4gb of JVM > Memory and ~50gb of physical memory (reported through dashboard) we are > using a single instance. > > I don't think is a normal behaviour that handler crashes. So, what are some > general tips about improving performance for this scenario? > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Solr-performance-issue-on-indexing-tp4327886.html > Sent from the Solr - User mailing list archive at Nabble.com.
Solr performance issue on indexing
Hi there. We are currently indexing some PDF files, the main handler to index is /extract where we perform simple processing (extract relevant fields and store on some fields). The PDF files are about 10M~100M size and we have to have available the text extracted. So, everything works correct on test stages, but when we try to index all the 14K files (around 120Gb) on a client application that only sends http curls through 3-4 concurrent threads to /extract handler it crashes. I can't find some relevant information about on solr logs (We checked in server/logs & in core_dir/tlog). My question is about performance. I think it is a small amount of info we are processing, the deploy scenario is in a docker container with 4gb of JVM Memory and ~50gb of physical memory (reported through dashboard) we are using a single instance. I don't think is a normal behaviour that handler crashes. So, what are some general tips about improving performance for this scenario? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-performance-issue-on-indexing-tp4327886.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr performance issue
1 million document isn't considered big for Solr. How much RAM does your machine have? Regards, Edwin On 8 February 2016 at 23:45, Susheel Kumar wrote: > 1 million document shouldn't have any issues at all. Something else is > wrong with your hw/system configuration. > > Thanks, > Susheel > > On Mon, Feb 8, 2016 at 6:45 AM, sara hajili wrote: > > > On Mon, Feb 8, 2016 at 3:04 AM, sara hajili > wrote: > > > > > sorry i made a mistake i have a bout 1000 K doc. > > > i mean about 100 doc. > > > > > > On Mon, Feb 8, 2016 at 1:35 AM, Emir Arnautovic < > > > emir.arnauto...@sematext.com> wrote: > > > > > >> Hi Sara, > > >> Not sure if I am reading this right, but I read it as you have 1000 > doc > > >> index and issues? Can you tell us bit more about your setup: number of > > >> servers, hw, index size, number of shards, queries that you run, do > you > > >> index at the same time... > > >> > > >> It seems to me that you are running Solr on server with limited RAM > and > > >> probably small heap. Swapping for sure will slow things down and GC is > > most > > >> likely reason for high CPU. > > >> > > >> You can use http://sematext.com/spm to collect Solr and host metrics > > and > > >> see where the issue is. > > >> > > >> Thanks, > > >> Emir > > >> > > >> -- > > >> Monitoring * Alerting * Anomaly Detection * Centralized Log Management > > >> Solr & Elasticsearch Support * http://sematext.com/ > > >> > > >> > > >> > > >> On 08.02.2016 10:27, sara hajili wrote: > > >> > > >>> hi all. > > >>> i have a problem with my solr performance and usage hardware like a > > >>> ram,cup... > > >>> i have a lot of document and so indexed file about 1000 doc in solr > > that > > >>> every doc has about 8 field in average. > > >>> and each field has about 60 char. > > >>> i set my field as a storedfield = "false" except of 1 field. // i > read > > >>> that this help performance. > > >>> i used copy field and dynamic field if it was necessary . // i read > > that > > >>> this help performance. > > >>> and now my question is that when i run a lot of query on solr i faced > > >>> with > > >>> a problem solr use more cpu and ram and after that filled ,it use a > lot > > >>> swapped storage and then use hard,but doesn't create a system file! > > >>> solr > > >>> fill hard until i forced to restart server to release hard disk. > > >>> and now my question is why solr treat in this way? and how i can > avoid > > >>> solr > > >>> to use huge cpu space? > > >>> any config need?! > > >>> > > >>> > > >> > > > > > >
Re: solr performance issue
1 million document shouldn't have any issues at all. Something else is wrong with your hw/system configuration. Thanks, Susheel On Mon, Feb 8, 2016 at 6:45 AM, sara hajili wrote: > On Mon, Feb 8, 2016 at 3:04 AM, sara hajili wrote: > > > sorry i made a mistake i have a bout 1000 K doc. > > i mean about 100 doc. > > > > On Mon, Feb 8, 2016 at 1:35 AM, Emir Arnautovic < > > emir.arnauto...@sematext.com> wrote: > > > >> Hi Sara, > >> Not sure if I am reading this right, but I read it as you have 1000 doc > >> index and issues? Can you tell us bit more about your setup: number of > >> servers, hw, index size, number of shards, queries that you run, do you > >> index at the same time... > >> > >> It seems to me that you are running Solr on server with limited RAM and > >> probably small heap. Swapping for sure will slow things down and GC is > most > >> likely reason for high CPU. > >> > >> You can use http://sematext.com/spm to collect Solr and host metrics > and > >> see where the issue is. > >> > >> Thanks, > >> Emir > >> > >> -- > >> Monitoring * Alerting * Anomaly Detection * Centralized Log Management > >> Solr & Elasticsearch Support * http://sematext.com/ > >> > >> > >> > >> On 08.02.2016 10:27, sara hajili wrote: > >> > >>> hi all. > >>> i have a problem with my solr performance and usage hardware like a > >>> ram,cup... > >>> i have a lot of document and so indexed file about 1000 doc in solr > that > >>> every doc has about 8 field in average. > >>> and each field has about 60 char. > >>> i set my field as a storedfield = "false" except of 1 field. // i read > >>> that this help performance. > >>> i used copy field and dynamic field if it was necessary . // i read > that > >>> this help performance. > >>> and now my question is that when i run a lot of query on solr i faced > >>> with > >>> a problem solr use more cpu and ram and after that filled ,it use a lot > >>> swapped storage and then use hard,but doesn't create a system file! > >>> solr > >>> fill hard until i forced to restart server to release hard disk. > >>> and now my question is why solr treat in this way? and how i can avoid > >>> solr > >>> to use huge cpu space? > >>> any config need?! > >>> > >>> > >> > > >
Re: solr performance issue
On Mon, Feb 8, 2016 at 3:04 AM, sara hajili wrote: > sorry i made a mistake i have a bout 1000 K doc. > i mean about 100 doc. > > On Mon, Feb 8, 2016 at 1:35 AM, Emir Arnautovic < > emir.arnauto...@sematext.com> wrote: > >> Hi Sara, >> Not sure if I am reading this right, but I read it as you have 1000 doc >> index and issues? Can you tell us bit more about your setup: number of >> servers, hw, index size, number of shards, queries that you run, do you >> index at the same time... >> >> It seems to me that you are running Solr on server with limited RAM and >> probably small heap. Swapping for sure will slow things down and GC is most >> likely reason for high CPU. >> >> You can use http://sematext.com/spm to collect Solr and host metrics and >> see where the issue is. >> >> Thanks, >> Emir >> >> -- >> Monitoring * Alerting * Anomaly Detection * Centralized Log Management >> Solr & Elasticsearch Support * http://sematext.com/ >> >> >> >> On 08.02.2016 10:27, sara hajili wrote: >> >>> hi all. >>> i have a problem with my solr performance and usage hardware like a >>> ram,cup... >>> i have a lot of document and so indexed file about 1000 doc in solr that >>> every doc has about 8 field in average. >>> and each field has about 60 char. >>> i set my field as a storedfield = "false" except of 1 field. // i read >>> that this help performance. >>> i used copy field and dynamic field if it was necessary . // i read that >>> this help performance. >>> and now my question is that when i run a lot of query on solr i faced >>> with >>> a problem solr use more cpu and ram and after that filled ,it use a lot >>> swapped storage and then use hard,but doesn't create a system file! >>> solr >>> fill hard until i forced to restart server to release hard disk. >>> and now my question is why solr treat in this way? and how i can avoid >>> solr >>> to use huge cpu space? >>> any config need?! >>> >>> >> >
Re: solr performance issue
Hi Sara, It is still considered to be small index. Can you give us bit details about your setup? Thanks, Emir On 08.02.2016 12:04, sara hajili wrote: sorry i made a mistake i have a bout 1000 K doc. i mean about 100 doc. On Mon, Feb 8, 2016 at 1:35 AM, Emir Arnautovic < emir.arnauto...@sematext.com> wrote: Hi Sara, Not sure if I am reading this right, but I read it as you have 1000 doc index and issues? Can you tell us bit more about your setup: number of servers, hw, index size, number of shards, queries that you run, do you index at the same time... It seems to me that you are running Solr on server with limited RAM and probably small heap. Swapping for sure will slow things down and GC is most likely reason for high CPU. You can use http://sematext.com/spm to collect Solr and host metrics and see where the issue is. Thanks, Emir -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/ On 08.02.2016 10:27, sara hajili wrote: hi all. i have a problem with my solr performance and usage hardware like a ram,cup... i have a lot of document and so indexed file about 1000 doc in solr that every doc has about 8 field in average. and each field has about 60 char. i set my field as a storedfield = "false" except of 1 field. // i read that this help performance. i used copy field and dynamic field if it was necessary . // i read that this help performance. and now my question is that when i run a lot of query on solr i faced with a problem solr use more cpu and ram and after that filled ,it use a lot swapped storage and then use hard,but doesn't create a system file! solr fill hard until i forced to restart server to release hard disk. and now my question is why solr treat in this way? and how i can avoid solr to use huge cpu space? any config need?! -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/
Re: solr performance issue
sorry i made a mistake i have a bout 1000 K doc. i mean about 100 doc. On Mon, Feb 8, 2016 at 1:35 AM, Emir Arnautovic < emir.arnauto...@sematext.com> wrote: > Hi Sara, > Not sure if I am reading this right, but I read it as you have 1000 doc > index and issues? Can you tell us bit more about your setup: number of > servers, hw, index size, number of shards, queries that you run, do you > index at the same time... > > It seems to me that you are running Solr on server with limited RAM and > probably small heap. Swapping for sure will slow things down and GC is most > likely reason for high CPU. > > You can use http://sematext.com/spm to collect Solr and host metrics and > see where the issue is. > > Thanks, > Emir > > -- > Monitoring * Alerting * Anomaly Detection * Centralized Log Management > Solr & Elasticsearch Support * http://sematext.com/ > > > > On 08.02.2016 10:27, sara hajili wrote: > >> hi all. >> i have a problem with my solr performance and usage hardware like a >> ram,cup... >> i have a lot of document and so indexed file about 1000 doc in solr that >> every doc has about 8 field in average. >> and each field has about 60 char. >> i set my field as a storedfield = "false" except of 1 field. // i read >> that this help performance. >> i used copy field and dynamic field if it was necessary . // i read that >> this help performance. >> and now my question is that when i run a lot of query on solr i faced with >> a problem solr use more cpu and ram and after that filled ,it use a lot >> swapped storage and then use hard,but doesn't create a system file! solr >> fill hard until i forced to restart server to release hard disk. >> and now my question is why solr treat in this way? and how i can avoid >> solr >> to use huge cpu space? >> any config need?! >> >> >
Re: solr performance issue
Hi Sara, Not sure if I am reading this right, but I read it as you have 1000 doc index and issues? Can you tell us bit more about your setup: number of servers, hw, index size, number of shards, queries that you run, do you index at the same time... It seems to me that you are running Solr on server with limited RAM and probably small heap. Swapping for sure will slow things down and GC is most likely reason for high CPU. You can use http://sematext.com/spm to collect Solr and host metrics and see where the issue is. Thanks, Emir -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/ On 08.02.2016 10:27, sara hajili wrote: hi all. i have a problem with my solr performance and usage hardware like a ram,cup... i have a lot of document and so indexed file about 1000 doc in solr that every doc has about 8 field in average. and each field has about 60 char. i set my field as a storedfield = "false" except of 1 field. // i read that this help performance. i used copy field and dynamic field if it was necessary . // i read that this help performance. and now my question is that when i run a lot of query on solr i faced with a problem solr use more cpu and ram and after that filled ,it use a lot swapped storage and then use hard,but doesn't create a system file! solr fill hard until i forced to restart server to release hard disk. and now my question is why solr treat in this way? and how i can avoid solr to use huge cpu space? any config need?!
solr performance issue
hi all. i have a problem with my solr performance and usage hardware like a ram,cup... i have a lot of document and so indexed file about 1000 doc in solr that every doc has about 8 field in average. and each field has about 60 char. i set my field as a storedfield = "false" except of 1 field. // i read that this help performance. i used copy field and dynamic field if it was necessary . // i read that this help performance. and now my question is that when i run a lot of query on solr i faced with a problem solr use more cpu and ram and after that filled ,it use a lot swapped storage and then use hard,but doesn't create a system file! solr fill hard until i forced to restart server to release hard disk. and now my question is why solr treat in this way? and how i can avoid solr to use huge cpu space? any config need?!
Re: Solr Performance Issue
Thanks Furkan. Looking forward to seeing your test results. Sent from Yahoo Mail on Android
Re: Solr Performance Issue
Hi Hien; Actually high index rate is a relative concept. I could index such kind of data within a few hours. I aim to index much much more data within same time soon. I can share my test results when I do. Thanks; Furkan KAMACI 6 Aralık 2013 Cuma tarihinde Hien Luu adlı kullanıcı şöyle yazdı: > Hi Furkan, > > Just curious what was the index rate that you were able to achieve? > > Regards, > > Hien > > > > On Thursday, December 5, 2013 3:06 PM, Furkan KAMACI < furkankam...@gmail.com> wrote: > > Hi; > > Erick and Shawn have explained that we need more information about your > infrastructure. I should add that: I had test data at my SolrCloud nearly > as much as yours and I did not have any problems except for when indexing > at a huge index rate and it can be solved with turning. You should optimize > your parameters according to your system. So you should give use more > information about your system. > > Thanks; > Furkan KAMACI > > 4 Aralık 2013 Çarşamba tarihinde Shawn Heisey adlı > kullanıcı şöyle yazdı: > >> On 12/4/2013 6:31 AM, kumar wrote: >>> I am having almost 5 to 6 crores of indexed documents in solr. And when > i am >>> going to change anything in the configuration file solr server is going >>> down. >> >> If you mean crore and not core, then you are talking about 50 to 60 >> million documents. That's a lot. Solr is perfectly capable of handling >> that many documents, but you do need to have very good hardware. >> >> Even if they are small, your index is likely to be many gigabytes in >> size. If the documents are large, that might be measured in terabytes. >> Large indexes require a lot of memory for good performance. This will >> be discussed in more detail below. >> >>> As a new user to solr i can't able to find the exact reason for going > server >>> down. >>> >>> I am using cache's in the following way : >>> >>> >> size="16384" >>> initialSize="4096" >>> autowarmCount="4096"/> >>> >> size="16384" >>> initialSize="4096" >>> autowarmCount="1024"/> >>> >>> and i am not using any documentCache, fieldValueCahe's >> >> As Erick said, these cache sizes are HUGE. In particular, your >> autowarmCount values are extremely high. >> >>> Whether this can lead any performance issue means going server down. >> >> Another thing that Erick pointed out is that you haven't really told us >> what's happening. When you say that the server goes down, what EXACTLY >> do you mean? >> >>> And i am seeing logging in the server it is showing exception in the >>> following way >>> >>> >>> Servlet.service() for servlet [default] in context with path [/solr] > threw >>> exception [java.lang.IllegalStateException: Cannot call sendError() after >>> the response has been committed] with root cause >> >> This message comes from your servlet container, not Solr. You're >> probably using Tomcat, not the included Jetty. There is some indirect >> evidence that this can be fixed by increasing the servlet container's >> setting for the maximum number of request parameters. >> >> http://forums.adobe.com/message/4590864 >> >> Here's what I can say without further information: >> >> You're likely having performance issues. One potential problem is your >> insanely high autowarmCount values. Your cache configuration tells Solr >> that every time you have a soft commit or a hard commit with >> openSearcher=true, you're going to execute up to 1024 queries and up to >> 4096 filters from the old caches, in order to warm the new caches. Even >> if you have an optimal setup, this takes a lot of time. I suspect that >> you don't have an optimal setup. >> >> Another potential problem is that you don't have enough memory for the >> size of your index. A number of potential performance problems are >> discussed on this wiki page: >> >>
Re: Solr Performance Issue
On 12/5/2013 4:08 PM, Hien Luu wrote: Just curious what was the index rate that you were able to achieve? What I've usually seen based on my experience and what people have said here and on IRC is that the data source is usually the bottleneck - Solr typically indexes VERY fast, as long as you have sized your hardware and configuration appropriately. I import from MySQL. By running dataimport handlers on all my shards at once and using two servers for the entire index, I can do a full re-index of 87 million documents on my production hardware in under 5 hours. On my single dev server, it takes about 8.5 hours. I'm not using SolrCloud.I'm very7 confident that MySQL is the bottleneck here, not Solr. Thanks, Shawn
Re: Solr Performance Issue
Hi Furkan, Just curious what was the index rate that you were able to achieve? Regards, Hien On Thursday, December 5, 2013 3:06 PM, Furkan KAMACI wrote: Hi; Erick and Shawn have explained that we need more information about your infrastructure. I should add that: I had test data at my SolrCloud nearly as much as yours and I did not have any problems except for when indexing at a huge index rate and it can be solved with turning. You should optimize your parameters according to your system. So you should give use more information about your system. Thanks; Furkan KAMACI 4 Aralık 2013 Çarşamba tarihinde Shawn Heisey adlı kullanıcı şöyle yazdı: > On 12/4/2013 6:31 AM, kumar wrote: >> I am having almost 5 to 6 crores of indexed documents in solr. And when i am >> going to change anything in the configuration file solr server is going >> down. > > If you mean crore and not core, then you are talking about 50 to 60 > million documents. That's a lot. Solr is perfectly capable of handling > that many documents, but you do need to have very good hardware. > > Even if they are small, your index is likely to be many gigabytes in > size. If the documents are large, that might be measured in terabytes. > Large indexes require a lot of memory for good performance. This will > be discussed in more detail below. > >> As a new user to solr i can't able to find the exact reason for going server >> down. >> >> I am using cache's in the following way : >> >> > size="16384" >> initialSize="4096" >> autowarmCount="4096"/> >> > size="16384" >> initialSize="4096" >> autowarmCount="1024"/> >> >> and i am not using any documentCache, fieldValueCahe's > > As Erick said, these cache sizes are HUGE. In particular, your > autowarmCount values are extremely high. > >> Whether this can lead any performance issue means going server down. > > Another thing that Erick pointed out is that you haven't really told us > what's happening. When you say that the server goes down, what EXACTLY > do you mean? > >> And i am seeing logging in the server it is showing exception in the >> following way >> >> >> Servlet.service() for servlet [default] in context with path [/solr] threw >> exception [java.lang.IllegalStateException: Cannot call sendError() after >> the response has been committed] with root cause > > This message comes from your servlet container, not Solr. You're > probably using Tomcat, not the included Jetty. There is some indirect > evidence that this can be fixed by increasing the servlet container's > setting for the maximum number of request parameters. > > http://forums.adobe.com/message/4590864 > > Here's what I can say without further information: > > You're likely having performance issues. One potential problem is your > insanely high autowarmCount values. Your cache configuration tells Solr > that every time you have a soft commit or a hard commit with > openSearcher=true, you're going to execute up to 1024 queries and up to > 4096 filters from the old caches, in order to warm the new caches. Even > if you have an optimal setup, this takes a lot of time. I suspect that > you don't have an optimal setup. > > Another potential problem is that you don't have enough memory for the > size of your index. A number of potential performance problems are > discussed on this wiki page: > > http://wiki.apache.org/solr/SolrPerformanceProblems > > A lot more details are required. Here's some things that will be > helpful, and more is always better: > > * Exact symptoms. > * Excerpts from the Solr logfile that include entire stacktraces. > * Operating system and version. > * Total server index size on disk. > * Total machine memory. > * Java heap size for your servlet container. > * Which servlet container you are using to run Solr. > * Solr version. > * Server hardware details. > > Thanks, > Shawn > >
Re: Solr Performance Issue
Hi; Erick and Shawn have explained that we need more information about your infrastructure. I should add that: I had test data at my SolrCloud nearly as much as yours and I did not have any problems except for when indexing at a huge index rate and it can be solved with turning. You should optimize your parameters according to your system. So you should give use more information about your system. Thanks; Furkan KAMACI 4 Aralık 2013 Çarşamba tarihinde Shawn Heisey adlı kullanıcı şöyle yazdı: > On 12/4/2013 6:31 AM, kumar wrote: >> I am having almost 5 to 6 crores of indexed documents in solr. And when i am >> going to change anything in the configuration file solr server is going >> down. > > If you mean crore and not core, then you are talking about 50 to 60 > million documents. That's a lot. Solr is perfectly capable of handling > that many documents, but you do need to have very good hardware. > > Even if they are small, your index is likely to be many gigabytes in > size. If the documents are large, that might be measured in terabytes. > Large indexes require a lot of memory for good performance. This will > be discussed in more detail below. > >> As a new user to solr i can't able to find the exact reason for going server >> down. >> >> I am using cache's in the following way : >> >> > size="16384" >> initialSize="4096" >> autowarmCount="4096"/> >> > size="16384" >> initialSize="4096" >> autowarmCount="1024"/> >> >> and i am not using any documentCache, fieldValueCahe's > > As Erick said, these cache sizes are HUGE. In particular, your > autowarmCount values are extremely high. > >> Whether this can lead any performance issue means going server down. > > Another thing that Erick pointed out is that you haven't really told us > what's happening. When you say that the server goes down, what EXACTLY > do you mean? > >> And i am seeing logging in the server it is showing exception in the >> following way >> >> >> Servlet.service() for servlet [default] in context with path [/solr] threw >> exception [java.lang.IllegalStateException: Cannot call sendError() after >> the response has been committed] with root cause > > This message comes from your servlet container, not Solr. You're > probably using Tomcat, not the included Jetty. There is some indirect > evidence that this can be fixed by increasing the servlet container's > setting for the maximum number of request parameters. > > http://forums.adobe.com/message/4590864 > > Here's what I can say without further information: > > You're likely having performance issues. One potential problem is your > insanely high autowarmCount values. Your cache configuration tells Solr > that every time you have a soft commit or a hard commit with > openSearcher=true, you're going to execute up to 1024 queries and up to > 4096 filters from the old caches, in order to warm the new caches. Even > if you have an optimal setup, this takes a lot of time. I suspect that > you don't have an optimal setup. > > Another potential problem is that you don't have enough memory for the > size of your index. A number of potential performance problems are > discussed on this wiki page: > > http://wiki.apache.org/solr/SolrPerformanceProblems > > A lot more details are required. Here's some things that will be > helpful, and more is always better: > > * Exact symptoms. > * Excerpts from the Solr logfile that include entire stacktraces. > * Operating system and version. > * Total server index size on disk. > * Total machine memory. > * Java heap size for your servlet container. > * Which servlet container you are using to run Solr. > * Solr version. > * Server hardware details. > > Thanks, > Shawn > >
Re: Solr Performance Issue
On 12/4/2013 6:31 AM, kumar wrote: > I am having almost 5 to 6 crores of indexed documents in solr. And when i am > going to change anything in the configuration file solr server is going > down. If you mean crore and not core, then you are talking about 50 to 60 million documents. That's a lot. Solr is perfectly capable of handling that many documents, but you do need to have very good hardware. Even if they are small, your index is likely to be many gigabytes in size. If the documents are large, that might be measured in terabytes. Large indexes require a lot of memory for good performance. This will be discussed in more detail below. > As a new user to solr i can't able to find the exact reason for going server > down. > > I am using cache's in the following way : > > size="16384" > initialSize="4096" > autowarmCount="4096"/> >size="16384" > initialSize="4096" > autowarmCount="1024"/> > > and i am not using any documentCache, fieldValueCahe's As Erick said, these cache sizes are HUGE. In particular, your autowarmCount values are extremely high. > Whether this can lead any performance issue means going server down. Another thing that Erick pointed out is that you haven't really told us what's happening. When you say that the server goes down, what EXACTLY do you mean? > And i am seeing logging in the server it is showing exception in the > following way > > > Servlet.service() for servlet [default] in context with path [/solr] threw > exception [java.lang.IllegalStateException: Cannot call sendError() after > the response has been committed] with root cause This message comes from your servlet container, not Solr. You're probably using Tomcat, not the included Jetty. There is some indirect evidence that this can be fixed by increasing the servlet container's setting for the maximum number of request parameters. http://forums.adobe.com/message/4590864 Here's what I can say without further information: You're likely having performance issues. One potential problem is your insanely high autowarmCount values. Your cache configuration tells Solr that every time you have a soft commit or a hard commit with openSearcher=true, you're going to execute up to 1024 queries and up to 4096 filters from the old caches, in order to warm the new caches. Even if you have an optimal setup, this takes a lot of time. I suspect that you don't have an optimal setup. Another potential problem is that you don't have enough memory for the size of your index. A number of potential performance problems are discussed on this wiki page: http://wiki.apache.org/solr/SolrPerformanceProblems A lot more details are required. Here's some things that will be helpful, and more is always better: * Exact symptoms. * Excerpts from the Solr logfile that include entire stacktraces. * Operating system and version. * Total server index size on disk. * Total machine memory. * Java heap size for your servlet container. * Which servlet container you are using to run Solr. * Solr version. * Server hardware details. Thanks, Shawn
Re: Solr Performance Issue
You need to give us more of the exception trace, the real cause is often buried down the stack with some text like "Caused by..." But at a glance your cache sizes and autowarm counts are far higher than they should be. Try reducing particularly the autowarm count down to, say, 16 or so. It's actually rare that you really need very many. I'd actually go back to the defaults to start with to test whether this is the problem. Further, we need to know exactly what you mean by "change anything in the configuration file". Change what? Details matter. Of course the last thing you changed before you started seeing this problem is the most likely culprit. Best, Erick On Wed, Dec 4, 2013 at 8:31 AM, kumar wrote: > I am having almost 5 to 6 crores of indexed documents in solr. And when i > am > going to change anything in the configuration file solr server is going > down. > > As a new user to solr i can't able to find the exact reason for going > server > down. > > I am using cache's in the following way : > > size="16384" > initialSize="4096" > autowarmCount="4096"/> >size="16384" > initialSize="4096" > autowarmCount="1024"/> > > and i am not using any documentCache, fieldValueCahe's > > Whether this can lead any performance issue means going server down. > > And i am seeing logging in the server it is showing exception in the > following way > > > Servlet.service() for servlet [default] in context with path [/solr] threw > exception [java.lang.IllegalStateException: Cannot call sendError() after > the response has been committed] with root cause > > > > Can anybody help me how can i solve this problem. > > Kumar. > > > > > > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Solr-Performance-Issue-tp4104907.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Solr Performance Issue
I am having almost 5 to 6 crores of indexed documents in solr. And when i am going to change anything in the configuration file solr server is going down. As a new user to solr i can't able to find the exact reason for going server down. I am using cache's in the following way : and i am not using any documentCache, fieldValueCahe's Whether this can lead any performance issue means going server down. And i am seeing logging in the server it is showing exception in the following way Servlet.service() for servlet [default] in context with path [/solr] threw exception [java.lang.IllegalStateException: Cannot call sendError() after the response has been committed] with root cause Can anybody help me how can i solve this problem. Kumar. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Performance-Issue-tp4104907.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr performance issue
Hello, The problem turned out to be some sort of sharding/searching weirdness. We modified some code in sharding but I don't think it is related. In any case, we just added a new server that just shards (but doesn't do any searching / doesn't contain any index) and performance is very very good. Thanks for all the help. On Tue, Mar 22, 2011 at 14:30, Alexey Serba wrote: > > Btw, I am monitoring output via jconsole with 8gb of ram and it still > goes > > to 8gb every 20 seconds or so, > > gc runs, falls down to 1gb. > > Hmm, jvm is eating 8Gb for 20 seconds - sounds a lot. > > Do you return all results (ids) for your queries? Any tricky > faceting/sorting/function queries? > -- Doğacan Güney
Re: Solr performance issue
> Btw, I am monitoring output via jconsole with 8gb of ram and it still goes > to 8gb every 20 seconds or so, > gc runs, falls down to 1gb. Hmm, jvm is eating 8Gb for 20 seconds - sounds a lot. Do you return all results (ids) for your queries? Any tricky faceting/sorting/function queries?
Re: Solr performance issue
The host is dual quad-core, each Xen VM has been given two CPUs. Not counting dom0, two of the hosts have 10/8 CPUs allocated, two of them have 8/8. The dom0 VM is also allocated two CPUs. I'm not really sure how that works out when it comes to Java running on the VM, but if at all possible, it is likely that Xen would try and keep both VM cpus on the same physical CPU and the VM's memory allocation on the same NUMA node. If that's the case, it would meet what you've stated as the recommendation for incremental mode. Shawn On 3/15/2011 9:10 AM, Markus Jelsma wrote: CMS is very good for multicore CPU's. Use incremental mode only when you have a single CPU with only one or two cores.
Re: Solr performance issue
CMS is very good for multicore CPU's. Use incremental mode only when you have a single CPU with only one or two cores. On Tuesday 15 March 2011 16:03:38 Shawn Heisey wrote: > My solr+jetty+java6 install seems to work well with these GC options. > It's a dual processor environment: > > -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode > > I've never had a real problem with memory, so I've not done any kind of > auditing. I probably should, but time is a limited resource. > > Shawn > > On 3/14/2011 2:29 PM, Markus Jelsma wrote: > > That depends on your GC settings and generation sizes. And, instead of > > UseParallelGC you'd better use UseParNewGC in combination with CMS. > > > > See 22: http://java.sun.com/docs/hotspot/gc1.4.2/faq.html > > > >> It's actually, as I understand it, expected JVM behavior to see the heap > >> rise to close to it's limit before it gets GC'd, that's how Java GC > >> works. Whether that should happen every 20 seconds or what, I don't > >> nkow. > >> > >> Another option is setting better JVM garbage collection arguments, so GC > >> doesn't "stop the world" so often. I have had good luck with my Solr > >> using this: -XX:+UseParallelGC -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: Solr performance issue
My solr+jetty+java6 install seems to work well with these GC options. It's a dual processor environment: -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode I've never had a real problem with memory, so I've not done any kind of auditing. I probably should, but time is a limited resource. Shawn On 3/14/2011 2:29 PM, Markus Jelsma wrote: That depends on your GC settings and generation sizes. And, instead of UseParallelGC you'd better use UseParNewGC in combination with CMS. See 22: http://java.sun.com/docs/hotspot/gc1.4.2/faq.html It's actually, as I understand it, expected JVM behavior to see the heap rise to close to it's limit before it gets GC'd, that's how Java GC works. Whether that should happen every 20 seconds or what, I don't nkow. Another option is setting better JVM garbage collection arguments, so GC doesn't "stop the world" so often. I have had good luck with my Solr using this: -XX:+UseParallelGC
Re: Solr performance issue
2011/3/14 Markus Jelsma > Mmm. SearchHander.handleRequestBody takes care of sharding. Could your > system > suffer from > http://wiki.apache.org/solr/DistributedSearch#Distributed_Deadlock > ? > > We increased thread limit (which was 1 before) but it did not help. Anyway, we will try to disable sharding tomorrow. Maybe this can give us a better picture. Thanks for the help, everyone. > I'm not sure, i haven't seen a similar issue in a sharded environment, > probably because it was a controlled environment. > > > > Hello, > > > > 2011/3/14 Markus Jelsma > > > > > That depends on your GC settings and generation sizes. And, instead of > > > UseParallelGC you'd better use UseParNewGC in combination with CMS. > > > > JConsole now shows a different profile output but load is still high and > > performance is still bad. > > > > Btw, here is the thread profile from newrelic: > > > > https://skitch.com/meralan/rwscm/thread-profiler-solr-new-relic-rpm > > > > Note that we do use a form of sharding so I maybe all the time spent > > waiting for handleRequestBody > > is results from sharding? > > > > > See 22: http://java.sun.com/docs/hotspot/gc1.4.2/faq.html > > > > > > > It's actually, as I understand it, expected JVM behavior to see the > > > > heap rise to close to it's limit before it gets GC'd, that's how Java > > > > GC works. Whether that should happen every 20 seconds or what, I > > > > don't > > > > > > nkow. > > > > > > > Another option is setting better JVM garbage collection arguments, so > > > > GC doesn't "stop the world" so often. I have had good luck with my > > > > Solr using this: -XX:+UseParallelGC > > > > > > > > On 3/14/2011 4:15 PM, Doğacan Güney wrote: > > > > > Hello again, > > > > > > > > > > 2011/3/14 Markus Jelsma > > > > > > > > > >>> Hello, > > > > >>> > > > > >>> 2011/3/14 Markus Jelsma > > > > >>> > > > > Hi Doğacan, > > > > > > > > Are you, at some point, running out of heap space? In my > > > > experience, that's the common cause of increased load and > > > > excessivly high > > > > > > response > > > > > > > times (or time > > > > outs). > > > > >>> > > > > >>> How much of a heap size would be enough? Our index size is > growing > > > > >>> slowly but we did not have this problem > > > > >>> a couple weeks ago where index size was maybe 100mb smaller. > > > > >> > > > > >> Telling how much heap space is needed isn't easy to say. It > usually > > > > >> needs to > > > > >> be increased when you run out of memory and get those nasty OOM > > > > > > errors, > > > > > > > >> are you getting them? > > > > >> Replication eventes will increase heap usage due to cache warming > > > > >> queries and > > > > >> autowarming. > > > > > > > > > > Nope, no OOM errors. > > > > > > > > > >>> We left most of the caches in solrconfig as default and only > > > > > > increased > > > > > > > >>> filterCache to 1024. We only ask for "id"s (which > > > > >>> are unique) and no other fields during queries (though we do > > > > > > faceting). > > > > > > > >>> Btw, 1.6gb of our index is stored fields (we store > > > > >>> everything for now, even though we do not get them during > queries), > > > > > > and > > > > > > > >>> about 1gb of index. > > > > >> > > > > >> Hmm, it seems 4000 would be enough indeed. What about the > > > > >> fieldCache, are there > > > > >> a lot of entries? Is there an insanity count? Do you use boost > > > > >> functions? > > > > > > > > > > Insanity count is 0 and fieldCAche has 12 entries. We do use some > > > > > boosting functions. > > > > > > > > > > Btw, I am monitoring output via jconsole with 8gb of ram and it > still > > > > > goes to 8gb every 20 seconds or so, > > > > > gc runs, falls down to 1gb. > > > > > > > > > > Btw, our current revision was just a random choice but up until two > > > > > > weeks > > > > > > > > ago it has been rock-solid so we have been > > > > > reluctant to update to another version. Would you recommend > upgrading > > > > > > to > > > > > > > > latest trunk? > > > > > > > > > >> It might not have anything to do with memory at all but i'm just > > > > > > asking. > > > > > > > >> There > > > > >> may be a bug in your revision causing this. > > > > >> > > > > >>> Anyway, Xmx was 4000m, we tried increasing it to 8000m but did > not > > > > > > get > > > > > > > >> any > > > > >> > > > > >>> improvement in load. I can try monitoring with Jconsole > > > > >>> with 8gigs of heap to see if it helps. > > > > >>> > > > > Cheers, > > > > > > > > > Hello everyone, > > > > > > > > > > First of all here is our Solr setup: > > > > > > > > > > - Solr nightly build 986158 > > > > > - Running solr inside the default jetty comes with solr build > > > > > - 1 write only Master , 4 read only Slaves (quad core 5640 with > > > > > > 24gb > > > > > > > >> of > > > > >> > > > > > RAM) - Index replicated (on optimize) to slaves via Solr > > > > > > Replication > > > > > > > > -
Re: Solr performance issue
Mmm. SearchHander.handleRequestBody takes care of sharding. Could your system suffer from http://wiki.apache.org/solr/DistributedSearch#Distributed_Deadlock ? I'm not sure, i haven't seen a similar issue in a sharded environment, probably because it was a controlled environment. > Hello, > > 2011/3/14 Markus Jelsma > > > That depends on your GC settings and generation sizes. And, instead of > > UseParallelGC you'd better use UseParNewGC in combination with CMS. > > JConsole now shows a different profile output but load is still high and > performance is still bad. > > Btw, here is the thread profile from newrelic: > > https://skitch.com/meralan/rwscm/thread-profiler-solr-new-relic-rpm > > Note that we do use a form of sharding so I maybe all the time spent > waiting for handleRequestBody > is results from sharding? > > > See 22: http://java.sun.com/docs/hotspot/gc1.4.2/faq.html > > > > > It's actually, as I understand it, expected JVM behavior to see the > > > heap rise to close to it's limit before it gets GC'd, that's how Java > > > GC works. Whether that should happen every 20 seconds or what, I > > > don't > > > > nkow. > > > > > Another option is setting better JVM garbage collection arguments, so > > > GC doesn't "stop the world" so often. I have had good luck with my > > > Solr using this: -XX:+UseParallelGC > > > > > > On 3/14/2011 4:15 PM, Doğacan Güney wrote: > > > > Hello again, > > > > > > > > 2011/3/14 Markus Jelsma > > > > > > > >>> Hello, > > > >>> > > > >>> 2011/3/14 Markus Jelsma > > > >>> > > > Hi Doğacan, > > > > > > Are you, at some point, running out of heap space? In my > > > experience, that's the common cause of increased load and > > > excessivly high > > > > response > > > > > times (or time > > > outs). > > > >>> > > > >>> How much of a heap size would be enough? Our index size is growing > > > >>> slowly but we did not have this problem > > > >>> a couple weeks ago where index size was maybe 100mb smaller. > > > >> > > > >> Telling how much heap space is needed isn't easy to say. It usually > > > >> needs to > > > >> be increased when you run out of memory and get those nasty OOM > > > > errors, > > > > > >> are you getting them? > > > >> Replication eventes will increase heap usage due to cache warming > > > >> queries and > > > >> autowarming. > > > > > > > > Nope, no OOM errors. > > > > > > > >>> We left most of the caches in solrconfig as default and only > > > > increased > > > > > >>> filterCache to 1024. We only ask for "id"s (which > > > >>> are unique) and no other fields during queries (though we do > > > > faceting). > > > > > >>> Btw, 1.6gb of our index is stored fields (we store > > > >>> everything for now, even though we do not get them during queries), > > > > and > > > > > >>> about 1gb of index. > > > >> > > > >> Hmm, it seems 4000 would be enough indeed. What about the > > > >> fieldCache, are there > > > >> a lot of entries? Is there an insanity count? Do you use boost > > > >> functions? > > > > > > > > Insanity count is 0 and fieldCAche has 12 entries. We do use some > > > > boosting functions. > > > > > > > > Btw, I am monitoring output via jconsole with 8gb of ram and it still > > > > goes to 8gb every 20 seconds or so, > > > > gc runs, falls down to 1gb. > > > > > > > > Btw, our current revision was just a random choice but up until two > > > > weeks > > > > > > ago it has been rock-solid so we have been > > > > reluctant to update to another version. Would you recommend upgrading > > > > to > > > > > > latest trunk? > > > > > > > >> It might not have anything to do with memory at all but i'm just > > > > asking. > > > > > >> There > > > >> may be a bug in your revision causing this. > > > >> > > > >>> Anyway, Xmx was 4000m, we tried increasing it to 8000m but did not > > > > get > > > > > >> any > > > >> > > > >>> improvement in load. I can try monitoring with Jconsole > > > >>> with 8gigs of heap to see if it helps. > > > >>> > > > Cheers, > > > > > > > Hello everyone, > > > > > > > > First of all here is our Solr setup: > > > > > > > > - Solr nightly build 986158 > > > > - Running solr inside the default jetty comes with solr build > > > > - 1 write only Master , 4 read only Slaves (quad core 5640 with > > > > 24gb > > > > > >> of > > > >> > > > > RAM) - Index replicated (on optimize) to slaves via Solr > > > > Replication > > > > > > - Size of index is around 2.5gb > > > > - No incremental writes, index is created from scratch(delete old > > > > > > documents > > > > > > > -> commit new documents -> optimize) every 6 hours > > > > - Avg # of request per second is around 60 (for a single slave) > > > > - Avg time per request is around 25ms (before having problems) > > > > - Load on each is slave is around 2 > > > > > > > > We are using this set-up for months wit
Re: Solr performance issue
Hello, 2011/3/14 Markus Jelsma > That depends on your GC settings and generation sizes. And, instead of > UseParallelGC you'd better use UseParNewGC in combination with CMS. > > JConsole now shows a different profile output but load is still high and performance is still bad. Btw, here is the thread profile from newrelic: https://skitch.com/meralan/rwscm/thread-profiler-solr-new-relic-rpm Note that we do use a form of sharding so I maybe all the time spent waiting for handleRequestBody is results from sharding? > See 22: http://java.sun.com/docs/hotspot/gc1.4.2/faq.html > > > It's actually, as I understand it, expected JVM behavior to see the heap > > rise to close to it's limit before it gets GC'd, that's how Java GC > > works. Whether that should happen every 20 seconds or what, I don't > nkow. > > > > Another option is setting better JVM garbage collection arguments, so GC > > doesn't "stop the world" so often. I have had good luck with my Solr > > using this: -XX:+UseParallelGC > > > > On 3/14/2011 4:15 PM, Doğacan Güney wrote: > > > Hello again, > > > > > > 2011/3/14 Markus Jelsma > > > > > >>> Hello, > > >>> > > >>> 2011/3/14 Markus Jelsma > > >>> > > Hi Doğacan, > > > > Are you, at some point, running out of heap space? In my experience, > > that's the common cause of increased load and excessivly high > response > > times (or time > > outs). > > >>> > > >>> How much of a heap size would be enough? Our index size is growing > > >>> slowly but we did not have this problem > > >>> a couple weeks ago where index size was maybe 100mb smaller. > > >> > > >> Telling how much heap space is needed isn't easy to say. It usually > > >> needs to > > >> be increased when you run out of memory and get those nasty OOM > errors, > > >> are you getting them? > > >> Replication eventes will increase heap usage due to cache warming > > >> queries and > > >> autowarming. > > > > > > Nope, no OOM errors. > > > > > >>> We left most of the caches in solrconfig as default and only > increased > > >>> filterCache to 1024. We only ask for "id"s (which > > >>> are unique) and no other fields during queries (though we do > faceting). > > >>> Btw, 1.6gb of our index is stored fields (we store > > >>> everything for now, even though we do not get them during queries), > and > > >>> about 1gb of index. > > >> > > >> Hmm, it seems 4000 would be enough indeed. What about the fieldCache, > > >> are there > > >> a lot of entries? Is there an insanity count? Do you use boost > > >> functions? > > > > > > Insanity count is 0 and fieldCAche has 12 entries. We do use some > > > boosting functions. > > > > > > Btw, I am monitoring output via jconsole with 8gb of ram and it still > > > goes to 8gb every 20 seconds or so, > > > gc runs, falls down to 1gb. > > > > > > Btw, our current revision was just a random choice but up until two > weeks > > > ago it has been rock-solid so we have been > > > reluctant to update to another version. Would you recommend upgrading > to > > > latest trunk? > > > > > >> It might not have anything to do with memory at all but i'm just > asking. > > >> There > > >> may be a bug in your revision causing this. > > >> > > >>> Anyway, Xmx was 4000m, we tried increasing it to 8000m but did not > get > > >> > > >> any > > >> > > >>> improvement in load. I can try monitoring with Jconsole > > >>> with 8gigs of heap to see if it helps. > > >>> > > Cheers, > > > > > Hello everyone, > > > > > > First of all here is our Solr setup: > > > > > > - Solr nightly build 986158 > > > - Running solr inside the default jetty comes with solr build > > > - 1 write only Master , 4 read only Slaves (quad core 5640 with > 24gb > > >> > > >> of > > >> > > > RAM) - Index replicated (on optimize) to slaves via Solr > Replication > > > - Size of index is around 2.5gb > > > - No incremental writes, index is created from scratch(delete old > > > > documents > > > > > -> commit new documents -> optimize) every 6 hours > > > - Avg # of request per second is around 60 (for a single slave) > > > - Avg time per request is around 25ms (before having problems) > > > - Load on each is slave is around 2 > > > > > > We are using this set-up for months without any problem. However > last > > > > week > > > > > we started to experience very weird performance problems like : > > > > > > - Avg time per request increased from 25ms to 200-300ms (even > higher > > >> > > >> if > > >> > > we > > > > > don't restart the slaves) > > > - Load on each slave increased from 2 to 15-20 (solr uses %400-%600 > > > cpu) > > > > > > When we profile solr we see two very strange things : > > > > > > 1 - This is the jconsole output: > > > > > > https://skitch.com/meralan/rwwcf/mail-886x691 > > > > > > As you see gc runs for every 10-15 seconds and collects more t
Re: Solr performance issue
That depends on your GC settings and generation sizes. And, instead of UseParallelGC you'd better use UseParNewGC in combination with CMS. See 22: http://java.sun.com/docs/hotspot/gc1.4.2/faq.html > It's actually, as I understand it, expected JVM behavior to see the heap > rise to close to it's limit before it gets GC'd, that's how Java GC > works. Whether that should happen every 20 seconds or what, I don't nkow. > > Another option is setting better JVM garbage collection arguments, so GC > doesn't "stop the world" so often. I have had good luck with my Solr > using this: -XX:+UseParallelGC > > On 3/14/2011 4:15 PM, Doğacan Güney wrote: > > Hello again, > > > > 2011/3/14 Markus Jelsma > > > >>> Hello, > >>> > >>> 2011/3/14 Markus Jelsma > >>> > Hi Doğacan, > > Are you, at some point, running out of heap space? In my experience, > that's the common cause of increased load and excessivly high response > times (or time > outs). > >>> > >>> How much of a heap size would be enough? Our index size is growing > >>> slowly but we did not have this problem > >>> a couple weeks ago where index size was maybe 100mb smaller. > >> > >> Telling how much heap space is needed isn't easy to say. It usually > >> needs to > >> be increased when you run out of memory and get those nasty OOM errors, > >> are you getting them? > >> Replication eventes will increase heap usage due to cache warming > >> queries and > >> autowarming. > > > > Nope, no OOM errors. > > > >>> We left most of the caches in solrconfig as default and only increased > >>> filterCache to 1024. We only ask for "id"s (which > >>> are unique) and no other fields during queries (though we do faceting). > >>> Btw, 1.6gb of our index is stored fields (we store > >>> everything for now, even though we do not get them during queries), and > >>> about 1gb of index. > >> > >> Hmm, it seems 4000 would be enough indeed. What about the fieldCache, > >> are there > >> a lot of entries? Is there an insanity count? Do you use boost > >> functions? > > > > Insanity count is 0 and fieldCAche has 12 entries. We do use some > > boosting functions. > > > > Btw, I am monitoring output via jconsole with 8gb of ram and it still > > goes to 8gb every 20 seconds or so, > > gc runs, falls down to 1gb. > > > > Btw, our current revision was just a random choice but up until two weeks > > ago it has been rock-solid so we have been > > reluctant to update to another version. Would you recommend upgrading to > > latest trunk? > > > >> It might not have anything to do with memory at all but i'm just asking. > >> There > >> may be a bug in your revision causing this. > >> > >>> Anyway, Xmx was 4000m, we tried increasing it to 8000m but did not get > >> > >> any > >> > >>> improvement in load. I can try monitoring with Jconsole > >>> with 8gigs of heap to see if it helps. > >>> > Cheers, > > > Hello everyone, > > > > First of all here is our Solr setup: > > > > - Solr nightly build 986158 > > - Running solr inside the default jetty comes with solr build > > - 1 write only Master , 4 read only Slaves (quad core 5640 with 24gb > >> > >> of > >> > > RAM) - Index replicated (on optimize) to slaves via Solr Replication > > - Size of index is around 2.5gb > > - No incremental writes, index is created from scratch(delete old > > documents > > > -> commit new documents -> optimize) every 6 hours > > - Avg # of request per second is around 60 (for a single slave) > > - Avg time per request is around 25ms (before having problems) > > - Load on each is slave is around 2 > > > > We are using this set-up for months without any problem. However last > > week > > > we started to experience very weird performance problems like : > > > > - Avg time per request increased from 25ms to 200-300ms (even higher > >> > >> if > >> > we > > > don't restart the slaves) > > - Load on each slave increased from 2 to 15-20 (solr uses %400-%600 > > cpu) > > > > When we profile solr we see two very strange things : > > > > 1 - This is the jconsole output: > > > > https://skitch.com/meralan/rwwcf/mail-886x691 > > > > As you see gc runs for every 10-15 seconds and collects more than 1 > >> > >> gb > >> > > of memory. (Actually if you wait more than 10 minutes you see spikes > > up to > > 4gb > > > consistently) > > > > 2 - This is the newrelic output : > > > > https://skitch.com/meralan/rwwci/solr-requests-solr-new-relic-rpm > > > > As you see solr spent ridiculously long time in > > SolrDispatchFilter.doFilter() method. > > > > > > Apart form these, when we clean the index directory, re-replicate and > > restart each slave one by one we see a relief in the system but > >> > >> after > >> > some
Re: Solr performance issue
You might also want to add the following switches for your GC log. > JAVA_OPTS="$JAVA_OPTS -verbose:gc -XX:+PrintGCTimeStamps > -XX:+PrintGCDetails - Xloggc:/var/log/tomcat6/gc.log" -XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCApplicationStoppedTime > > Also, what JVM version are you using and what are your other JVM settings? > Are Xms and Xmx at the same value? I see you're using the throughput > collector. You might want to use CMS because it partially runs > concurrently (the low- pause collector) and has less stop-the-world > interruptions. > > http://download.oracle.com/javase/6/docs/technotes/guides/vm/cms-6.html > > Again, this may not be the issue ;) > > > Btw, our current revision was just a random choice but up until two weeks > > ago it has been rock-solid so we have been > > reluctant to update to another version. Would you recommend upgrading to > > latest trunk? > > I don't know what changes have been made since your revision. Please > consult the CHANGES.txt for that. > > > > It might not have anything to do with memory at all but i'm just > > > asking. There > > > may be a bug in your revision causing this. > > > > > > > Anyway, Xmx was 4000m, we tried increasing it to 8000m but did not > > > > get > > > > > > any > > > > > > > improvement in load. I can try monitoring with Jconsole > > > > with 8gigs of heap to see if it helps. > > > > > > > > > Cheers, > > > > > > > > > > > Hello everyone, > > > > > > > > > > > > First of all here is our Solr setup: > > > > > > > > > > > > - Solr nightly build 986158 > > > > > > - Running solr inside the default jetty comes with solr build > > > > > > - 1 write only Master , 4 read only Slaves (quad core 5640 with > > > > > > 24gb > > > > > > of > > > > > > > > > RAM) - Index replicated (on optimize) to slaves via Solr > > > > > > Replication - Size of index is around 2.5gb > > > > > > - No incremental writes, index is created from scratch(delete old > > > > > > > > > > documents > > > > > > > > > > > -> commit new documents -> optimize) every 6 hours > > > > > > - Avg # of request per second is around 60 (for a single slave) > > > > > > - Avg time per request is around 25ms (before having problems) > > > > > > - Load on each is slave is around 2 > > > > > > > > > > > > We are using this set-up for months without any problem. However > > > > > > last > > > > > > > > > > week > > > > > > > > > > > we started to experience very weird performance problems like : > > > > > > > > > > > > - Avg time per request increased from 25ms to 200-300ms (even > > > > > > higher > > > > > > if > > > > > > > > we > > > > > > > > > > > don't restart the slaves) > > > > > > - Load on each slave increased from 2 to 15-20 (solr uses > > > > > > %400-%600 cpu) > > > > > > > > > > > > When we profile solr we see two very strange things : > > > > > > > > > > > > 1 - This is the jconsole output: > > > > > > > > > > > > https://skitch.com/meralan/rwwcf/mail-886x691 > > > > > > > > > > > > As you see gc runs for every 10-15 seconds and collects more than > > > > > > 1 > > > > > > gb > > > > > > > > > of memory. (Actually if you wait more than 10 minutes you see > > > > > > spikes up to > > > > > > > > > > 4gb > > > > > > > > > > > consistently) > > > > > > > > > > > > 2 - This is the newrelic output : > > > > > > > > > > > > https://skitch.com/meralan/rwwci/solr-requests-solr-new-relic-rpm > > > > > > > > > > > > As you see solr spent ridiculously long time in > > > > > > SolrDispatchFilter.doFilter() method. > > > > > > > > > > > > > > > > > > Apart form these, when we clean the index directory, re-replicate > > > > > > and restart each slave one by one we see a relief in the system > > > > > > but > > > > > > after > > > > > > > > some > > > > > > > > > > > time servers start to melt down again. Although deleting index > > > > > > and replicating doesn't solve the problem, we think that these > > > > > > problems > > > > > > are > > > > > > > > > somehow related to replication. Because symptoms started after > > > > > > > > > > replication > > > > > > > > > > > and once it heals itself after replication. I also see > > > > > > lucene-write.lock files in slaves (we don't have write.lock files > > > > > > in the master) which I think we shouldn't see. > > > > > > > > > > > > > > > > > > If anyone can give any sort of ideas, we will appreciate it. > > > > > > > > > > > > Regards, > > > > > > Dogacan Guney
Re: Solr performance issue
It's actually, as I understand it, expected JVM behavior to see the heap rise to close to it's limit before it gets GC'd, that's how Java GC works. Whether that should happen every 20 seconds or what, I don't nkow. Another option is setting better JVM garbage collection arguments, so GC doesn't "stop the world" so often. I have had good luck with my Solr using this: -XX:+UseParallelGC On 3/14/2011 4:15 PM, Doğacan Güney wrote: Hello again, 2011/3/14 Markus Jelsma Hello, 2011/3/14 Markus Jelsma Hi Doğacan, Are you, at some point, running out of heap space? In my experience, that's the common cause of increased load and excessivly high response times (or time outs). How much of a heap size would be enough? Our index size is growing slowly but we did not have this problem a couple weeks ago where index size was maybe 100mb smaller. Telling how much heap space is needed isn't easy to say. It usually needs to be increased when you run out of memory and get those nasty OOM errors, are you getting them? Replication eventes will increase heap usage due to cache warming queries and autowarming. Nope, no OOM errors. We left most of the caches in solrconfig as default and only increased filterCache to 1024. We only ask for "id"s (which are unique) and no other fields during queries (though we do faceting). Btw, 1.6gb of our index is stored fields (we store everything for now, even though we do not get them during queries), and about 1gb of index. Hmm, it seems 4000 would be enough indeed. What about the fieldCache, are there a lot of entries? Is there an insanity count? Do you use boost functions? Insanity count is 0 and fieldCAche has 12 entries. We do use some boosting functions. Btw, I am monitoring output via jconsole with 8gb of ram and it still goes to 8gb every 20 seconds or so, gc runs, falls down to 1gb. Btw, our current revision was just a random choice but up until two weeks ago it has been rock-solid so we have been reluctant to update to another version. Would you recommend upgrading to latest trunk? It might not have anything to do with memory at all but i'm just asking. There may be a bug in your revision causing this. Anyway, Xmx was 4000m, we tried increasing it to 8000m but did not get any improvement in load. I can try monitoring with Jconsole with 8gigs of heap to see if it helps. Cheers, Hello everyone, First of all here is our Solr setup: - Solr nightly build 986158 - Running solr inside the default jetty comes with solr build - 1 write only Master , 4 read only Slaves (quad core 5640 with 24gb of RAM) - Index replicated (on optimize) to slaves via Solr Replication - Size of index is around 2.5gb - No incremental writes, index is created from scratch(delete old documents -> commit new documents -> optimize) every 6 hours - Avg # of request per second is around 60 (for a single slave) - Avg time per request is around 25ms (before having problems) - Load on each is slave is around 2 We are using this set-up for months without any problem. However last week we started to experience very weird performance problems like : - Avg time per request increased from 25ms to 200-300ms (even higher if we don't restart the slaves) - Load on each slave increased from 2 to 15-20 (solr uses %400-%600 cpu) When we profile solr we see two very strange things : 1 - This is the jconsole output: https://skitch.com/meralan/rwwcf/mail-886x691 As you see gc runs for every 10-15 seconds and collects more than 1 gb of memory. (Actually if you wait more than 10 minutes you see spikes up to 4gb consistently) 2 - This is the newrelic output : https://skitch.com/meralan/rwwci/solr-requests-solr-new-relic-rpm As you see solr spent ridiculously long time in SolrDispatchFilter.doFilter() method. Apart form these, when we clean the index directory, re-replicate and restart each slave one by one we see a relief in the system but after some time servers start to melt down again. Although deleting index and replicating doesn't solve the problem, we think that these problems are somehow related to replication. Because symptoms started after replication and once it heals itself after replication. I also see lucene-write.lock files in slaves (we don't have write.lock files in the master) which I think we shouldn't see. If anyone can give any sort of ideas, we will appreciate it. Regards, Dogacan Guney
Re: Solr performance issue
> Nope, no OOM errors. That's a good start! > Insanity count is 0 and fieldCAche has 12 entries. We do use some boosting > functions. > > Btw, I am monitoring output via jconsole with 8gb of ram and it still goes > to 8gb every 20 seconds or so, > gc runs, falls down to 1gb. Hmm, maybe the garbage collector takes up a lot of CPU time. Could you check your garbage collector log? It must be enabled via some JVM options: JAVA_OPTS="$JAVA_OPTS -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails - Xloggc:/var/log/tomcat6/gc.log" Also, what JVM version are you using and what are your other JVM settings? Are Xms and Xmx at the same value? I see you're using the throughput collector. You might want to use CMS because it partially runs concurrently (the low- pause collector) and has less stop-the-world interruptions. http://download.oracle.com/javase/6/docs/technotes/guides/vm/cms-6.html Again, this may not be the issue ;) > > Btw, our current revision was just a random choice but up until two weeks > ago it has been rock-solid so we have been > reluctant to update to another version. Would you recommend upgrading to > latest trunk? I don't know what changes have been made since your revision. Please consult the CHANGES.txt for that. > > > It might not have anything to do with memory at all but i'm just asking. > > There > > may be a bug in your revision causing this. > > > > > Anyway, Xmx was 4000m, we tried increasing it to 8000m but did not get > > > > any > > > > > improvement in load. I can try monitoring with Jconsole > > > with 8gigs of heap to see if it helps. > > > > > > > Cheers, > > > > > > > > > Hello everyone, > > > > > > > > > > First of all here is our Solr setup: > > > > > > > > > > - Solr nightly build 986158 > > > > > - Running solr inside the default jetty comes with solr build > > > > > - 1 write only Master , 4 read only Slaves (quad core 5640 with > > > > > 24gb > > > > of > > > > > > > RAM) - Index replicated (on optimize) to slaves via Solr > > > > > Replication - Size of index is around 2.5gb > > > > > - No incremental writes, index is created from scratch(delete old > > > > > > > > documents > > > > > > > > > -> commit new documents -> optimize) every 6 hours > > > > > - Avg # of request per second is around 60 (for a single slave) > > > > > - Avg time per request is around 25ms (before having problems) > > > > > - Load on each is slave is around 2 > > > > > > > > > > We are using this set-up for months without any problem. However > > > > > last > > > > > > > > week > > > > > > > > > we started to experience very weird performance problems like : > > > > > > > > > > - Avg time per request increased from 25ms to 200-300ms (even > > > > > higher > > > > if > > > > > > we > > > > > > > > > don't restart the slaves) > > > > > - Load on each slave increased from 2 to 15-20 (solr uses %400-%600 > > > > > cpu) > > > > > > > > > > When we profile solr we see two very strange things : > > > > > > > > > > 1 - This is the jconsole output: > > > > > > > > > > https://skitch.com/meralan/rwwcf/mail-886x691 > > > > > > > > > > As you see gc runs for every 10-15 seconds and collects more than 1 > > > > gb > > > > > > > of memory. (Actually if you wait more than 10 minutes you see > > > > > spikes up to > > > > > > > > 4gb > > > > > > > > > consistently) > > > > > > > > > > 2 - This is the newrelic output : > > > > > > > > > > https://skitch.com/meralan/rwwci/solr-requests-solr-new-relic-rpm > > > > > > > > > > As you see solr spent ridiculously long time in > > > > > SolrDispatchFilter.doFilter() method. > > > > > > > > > > > > > > > Apart form these, when we clean the index directory, re-replicate > > > > > and restart each slave one by one we see a relief in the system > > > > > but > > > > after > > > > > > some > > > > > > > > > time servers start to melt down again. Although deleting index and > > > > > replicating doesn't solve the problem, we think that these problems > > > > are > > > > > > > somehow related to replication. Because symptoms started after > > > > > > > > replication > > > > > > > > > and once it heals itself after replication. I also see > > > > > lucene-write.lock files in slaves (we don't have write.lock files > > > > > in the master) which I think we shouldn't see. > > > > > > > > > > > > > > > If anyone can give any sort of ideas, we will appreciate it. > > > > > > > > > > Regards, > > > > > Dogacan Guney
Re: Solr performance issue
Hello again, 2011/3/14 Markus Jelsma > > Hello, > > > > 2011/3/14 Markus Jelsma > > > > > Hi Doğacan, > > > > > > Are you, at some point, running out of heap space? In my experience, > > > that's the common cause of increased load and excessivly high response > > > times (or time > > > outs). > > > > How much of a heap size would be enough? Our index size is growing slowly > > but we did not have this problem > > a couple weeks ago where index size was maybe 100mb smaller. > > Telling how much heap space is needed isn't easy to say. It usually needs > to > be increased when you run out of memory and get those nasty OOM errors, are > you getting them? > Replication eventes will increase heap usage due to cache warming queries > and > autowarming. > > Nope, no OOM errors. > > > > We left most of the caches in solrconfig as default and only increased > > filterCache to 1024. We only ask for "id"s (which > > are unique) and no other fields during queries (though we do faceting). > > Btw, 1.6gb of our index is stored fields (we store > > everything for now, even though we do not get them during queries), and > > about 1gb of index. > > Hmm, it seems 4000 would be enough indeed. What about the fieldCache, are > there > a lot of entries? Is there an insanity count? Do you use boost functions? > > Insanity count is 0 and fieldCAche has 12 entries. We do use some boosting functions. Btw, I am monitoring output via jconsole with 8gb of ram and it still goes to 8gb every 20 seconds or so, gc runs, falls down to 1gb. Btw, our current revision was just a random choice but up until two weeks ago it has been rock-solid so we have been reluctant to update to another version. Would you recommend upgrading to latest trunk? > It might not have anything to do with memory at all but i'm just asking. > There > may be a bug in your revision causing this. > > > > > Anyway, Xmx was 4000m, we tried increasing it to 8000m but did not get > any > > improvement in load. I can try monitoring with Jconsole > > with 8gigs of heap to see if it helps. > > > > > Cheers, > > > > > > > Hello everyone, > > > > > > > > First of all here is our Solr setup: > > > > > > > > - Solr nightly build 986158 > > > > - Running solr inside the default jetty comes with solr build > > > > - 1 write only Master , 4 read only Slaves (quad core 5640 with 24gb > of > > > > RAM) - Index replicated (on optimize) to slaves via Solr Replication > > > > - Size of index is around 2.5gb > > > > - No incremental writes, index is created from scratch(delete old > > > > > > documents > > > > > > > -> commit new documents -> optimize) every 6 hours > > > > - Avg # of request per second is around 60 (for a single slave) > > > > - Avg time per request is around 25ms (before having problems) > > > > - Load on each is slave is around 2 > > > > > > > > We are using this set-up for months without any problem. However last > > > > > > week > > > > > > > we started to experience very weird performance problems like : > > > > > > > > - Avg time per request increased from 25ms to 200-300ms (even higher > if > > > > > > we > > > > > > > don't restart the slaves) > > > > - Load on each slave increased from 2 to 15-20 (solr uses %400-%600 > > > > cpu) > > > > > > > > When we profile solr we see two very strange things : > > > > > > > > 1 - This is the jconsole output: > > > > > > > > https://skitch.com/meralan/rwwcf/mail-886x691 > > > > > > > > As you see gc runs for every 10-15 seconds and collects more than 1 > gb > > > > of memory. (Actually if you wait more than 10 minutes you see spikes > > > > up to > > > > > > 4gb > > > > > > > consistently) > > > > > > > > 2 - This is the newrelic output : > > > > > > > > https://skitch.com/meralan/rwwci/solr-requests-solr-new-relic-rpm > > > > > > > > As you see solr spent ridiculously long time in > > > > SolrDispatchFilter.doFilter() method. > > > > > > > > > > > > Apart form these, when we clean the index directory, re-replicate and > > > > restart each slave one by one we see a relief in the system but > after > > > > > > some > > > > > > > time servers start to melt down again. Although deleting index and > > > > replicating doesn't solve the problem, we think that these problems > are > > > > somehow related to replication. Because symptoms started after > > > > > > replication > > > > > > > and once it heals itself after replication. I also see > > > > lucene-write.lock files in slaves (we don't have write.lock files in > > > > the master) which I think we shouldn't see. > > > > > > > > > > > > If anyone can give any sort of ideas, we will appreciate it. > > > > > > > > Regards, > > > > Dogacan Guney > -- Doğacan Güney
Re: Solr performance issue
I've definitely had cases in 1.4.1 where even though I didn't have an OOM error, Solr was being weirdly slow, and increasing the JVM heap size fixed it. I can't explain why it happened, or exactly how you'd know this was going on, I didn't see anything odd in the logs to indicate, I just tried increasing the JVM heap to see what happened, and it worked great. The one case I remember specifically is when I was using the StatsComponent, with a stats.facet. Pathologically slow, increasing heap magically made it go down to negligible again. On 3/14/2011 3:38 PM, Markus Jelsma wrote: Hello, 2011/3/14 Markus Jelsma Hi Doğacan, Are you, at some point, running out of heap space? In my experience, that's the common cause of increased load and excessivly high response times (or time outs). How much of a heap size would be enough? Our index size is growing slowly but we did not have this problem a couple weeks ago where index size was maybe 100mb smaller. Telling how much heap space is needed isn't easy to say. It usually needs to be increased when you run out of memory and get those nasty OOM errors, are you getting them? Replication eventes will increase heap usage due to cache warming queries and autowarming. We left most of the caches in solrconfig as default and only increased filterCache to 1024. We only ask for "id"s (which are unique) and no other fields during queries (though we do faceting). Btw, 1.6gb of our index is stored fields (we store everything for now, even though we do not get them during queries), and about 1gb of index. Hmm, it seems 4000 would be enough indeed. What about the fieldCache, are there a lot of entries? Is there an insanity count? Do you use boost functions? It might not have anything to do with memory at all but i'm just asking. There may be a bug in your revision causing this. Anyway, Xmx was 4000m, we tried increasing it to 8000m but did not get any improvement in load. I can try monitoring with Jconsole with 8gigs of heap to see if it helps. Cheers, Hello everyone, First of all here is our Solr setup: - Solr nightly build 986158 - Running solr inside the default jetty comes with solr build - 1 write only Master , 4 read only Slaves (quad core 5640 with 24gb of RAM) - Index replicated (on optimize) to slaves via Solr Replication - Size of index is around 2.5gb - No incremental writes, index is created from scratch(delete old documents -> commit new documents -> optimize) every 6 hours - Avg # of request per second is around 60 (for a single slave) - Avg time per request is around 25ms (before having problems) - Load on each is slave is around 2 We are using this set-up for months without any problem. However last week we started to experience very weird performance problems like : - Avg time per request increased from 25ms to 200-300ms (even higher if we don't restart the slaves) - Load on each slave increased from 2 to 15-20 (solr uses %400-%600 cpu) When we profile solr we see two very strange things : 1 - This is the jconsole output: https://skitch.com/meralan/rwwcf/mail-886x691 As you see gc runs for every 10-15 seconds and collects more than 1 gb of memory. (Actually if you wait more than 10 minutes you see spikes up to 4gb consistently) 2 - This is the newrelic output : https://skitch.com/meralan/rwwci/solr-requests-solr-new-relic-rpm As you see solr spent ridiculously long time in SolrDispatchFilter.doFilter() method. Apart form these, when we clean the index directory, re-replicate and restart each slave one by one we see a relief in the system but after some time servers start to melt down again. Although deleting index and replicating doesn't solve the problem, we think that these problems are somehow related to replication. Because symptoms started after replication and once it heals itself after replication. I also see lucene-write.lock files in slaves (we don't have write.lock files in the master) which I think we shouldn't see. If anyone can give any sort of ideas, we will appreciate it. Regards, Dogacan Guney
Re: Solr performance issue
> Hello, > > 2011/3/14 Markus Jelsma > > > Hi Doğacan, > > > > Are you, at some point, running out of heap space? In my experience, > > that's the common cause of increased load and excessivly high response > > times (or time > > outs). > > How much of a heap size would be enough? Our index size is growing slowly > but we did not have this problem > a couple weeks ago where index size was maybe 100mb smaller. Telling how much heap space is needed isn't easy to say. It usually needs to be increased when you run out of memory and get those nasty OOM errors, are you getting them? Replication eventes will increase heap usage due to cache warming queries and autowarming. > > We left most of the caches in solrconfig as default and only increased > filterCache to 1024. We only ask for "id"s (which > are unique) and no other fields during queries (though we do faceting). > Btw, 1.6gb of our index is stored fields (we store > everything for now, even though we do not get them during queries), and > about 1gb of index. Hmm, it seems 4000 would be enough indeed. What about the fieldCache, are there a lot of entries? Is there an insanity count? Do you use boost functions? It might not have anything to do with memory at all but i'm just asking. There may be a bug in your revision causing this. > > Anyway, Xmx was 4000m, we tried increasing it to 8000m but did not get any > improvement in load. I can try monitoring with Jconsole > with 8gigs of heap to see if it helps. > > > Cheers, > > > > > Hello everyone, > > > > > > First of all here is our Solr setup: > > > > > > - Solr nightly build 986158 > > > - Running solr inside the default jetty comes with solr build > > > - 1 write only Master , 4 read only Slaves (quad core 5640 with 24gb of > > > RAM) - Index replicated (on optimize) to slaves via Solr Replication > > > - Size of index is around 2.5gb > > > - No incremental writes, index is created from scratch(delete old > > > > documents > > > > > -> commit new documents -> optimize) every 6 hours > > > - Avg # of request per second is around 60 (for a single slave) > > > - Avg time per request is around 25ms (before having problems) > > > - Load on each is slave is around 2 > > > > > > We are using this set-up for months without any problem. However last > > > > week > > > > > we started to experience very weird performance problems like : > > > > > > - Avg time per request increased from 25ms to 200-300ms (even higher if > > > > we > > > > > don't restart the slaves) > > > - Load on each slave increased from 2 to 15-20 (solr uses %400-%600 > > > cpu) > > > > > > When we profile solr we see two very strange things : > > > > > > 1 - This is the jconsole output: > > > > > > https://skitch.com/meralan/rwwcf/mail-886x691 > > > > > > As you see gc runs for every 10-15 seconds and collects more than 1 gb > > > of memory. (Actually if you wait more than 10 minutes you see spikes > > > up to > > > > 4gb > > > > > consistently) > > > > > > 2 - This is the newrelic output : > > > > > > https://skitch.com/meralan/rwwci/solr-requests-solr-new-relic-rpm > > > > > > As you see solr spent ridiculously long time in > > > SolrDispatchFilter.doFilter() method. > > > > > > > > > Apart form these, when we clean the index directory, re-replicate and > > > restart each slave one by one we see a relief in the system but after > > > > some > > > > > time servers start to melt down again. Although deleting index and > > > replicating doesn't solve the problem, we think that these problems are > > > somehow related to replication. Because symptoms started after > > > > replication > > > > > and once it heals itself after replication. I also see > > > lucene-write.lock files in slaves (we don't have write.lock files in > > > the master) which I think we shouldn't see. > > > > > > > > > If anyone can give any sort of ideas, we will appreciate it. > > > > > > Regards, > > > Dogacan Guney
Re: Solr performance issue
Hello, 2011/3/14 Markus Jelsma > Hi Doğacan, > > Are you, at some point, running out of heap space? In my experience, that's > the common cause of increased load and excessivly high response times (or > time > outs). > > How much of a heap size would be enough? Our index size is growing slowly but we did not have this problem a couple weeks ago where index size was maybe 100mb smaller. We left most of the caches in solrconfig as default and only increased filterCache to 1024. We only ask for "id"s (which are unique) and no other fields during queries (though we do faceting). Btw, 1.6gb of our index is stored fields (we store everything for now, even though we do not get them during queries), and about 1gb of index. Anyway, Xmx was 4000m, we tried increasing it to 8000m but did not get any improvement in load. I can try monitoring with Jconsole with 8gigs of heap to see if it helps. > Cheers, > > > Hello everyone, > > > > First of all here is our Solr setup: > > > > - Solr nightly build 986158 > > - Running solr inside the default jetty comes with solr build > > - 1 write only Master , 4 read only Slaves (quad core 5640 with 24gb of > > RAM) - Index replicated (on optimize) to slaves via Solr Replication > > - Size of index is around 2.5gb > > - No incremental writes, index is created from scratch(delete old > documents > > -> commit new documents -> optimize) every 6 hours > > - Avg # of request per second is around 60 (for a single slave) > > - Avg time per request is around 25ms (before having problems) > > - Load on each is slave is around 2 > > > > We are using this set-up for months without any problem. However last > week > > we started to experience very weird performance problems like : > > > > - Avg time per request increased from 25ms to 200-300ms (even higher if > we > > don't restart the slaves) > > - Load on each slave increased from 2 to 15-20 (solr uses %400-%600 cpu) > > > > When we profile solr we see two very strange things : > > > > 1 - This is the jconsole output: > > > > https://skitch.com/meralan/rwwcf/mail-886x691 > > > > As you see gc runs for every 10-15 seconds and collects more than 1 gb of > > memory. (Actually if you wait more than 10 minutes you see spikes up to > 4gb > > consistently) > > > > 2 - This is the newrelic output : > > > > https://skitch.com/meralan/rwwci/solr-requests-solr-new-relic-rpm > > > > As you see solr spent ridiculously long time in > > SolrDispatchFilter.doFilter() method. > > > > > > Apart form these, when we clean the index directory, re-replicate and > > restart each slave one by one we see a relief in the system but after > some > > time servers start to melt down again. Although deleting index and > > replicating doesn't solve the problem, we think that these problems are > > somehow related to replication. Because symptoms started after > replication > > and once it heals itself after replication. I also see lucene-write.lock > > files in slaves (we don't have write.lock files in the master) which I > > think we shouldn't see. > > > > > > If anyone can give any sort of ideas, we will appreciate it. > > > > Regards, > > Dogacan Guney > -- Doğacan Güney
Re: Solr performance issue
Hi Doğacan, Are you, at some point, running out of heap space? In my experience, that's the common cause of increased load and excessivly high response times (or time outs). Cheers, > Hello everyone, > > First of all here is our Solr setup: > > - Solr nightly build 986158 > - Running solr inside the default jetty comes with solr build > - 1 write only Master , 4 read only Slaves (quad core 5640 with 24gb of > RAM) - Index replicated (on optimize) to slaves via Solr Replication > - Size of index is around 2.5gb > - No incremental writes, index is created from scratch(delete old documents > -> commit new documents -> optimize) every 6 hours > - Avg # of request per second is around 60 (for a single slave) > - Avg time per request is around 25ms (before having problems) > - Load on each is slave is around 2 > > We are using this set-up for months without any problem. However last week > we started to experience very weird performance problems like : > > - Avg time per request increased from 25ms to 200-300ms (even higher if we > don't restart the slaves) > - Load on each slave increased from 2 to 15-20 (solr uses %400-%600 cpu) > > When we profile solr we see two very strange things : > > 1 - This is the jconsole output: > > https://skitch.com/meralan/rwwcf/mail-886x691 > > As you see gc runs for every 10-15 seconds and collects more than 1 gb of > memory. (Actually if you wait more than 10 minutes you see spikes up to 4gb > consistently) > > 2 - This is the newrelic output : > > https://skitch.com/meralan/rwwci/solr-requests-solr-new-relic-rpm > > As you see solr spent ridiculously long time in > SolrDispatchFilter.doFilter() method. > > > Apart form these, when we clean the index directory, re-replicate and > restart each slave one by one we see a relief in the system but after some > time servers start to melt down again. Although deleting index and > replicating doesn't solve the problem, we think that these problems are > somehow related to replication. Because symptoms started after replication > and once it heals itself after replication. I also see lucene-write.lock > files in slaves (we don't have write.lock files in the master) which I > think we shouldn't see. > > > If anyone can give any sort of ideas, we will appreciate it. > > Regards, > Dogacan Guney
Solr performance issue
Hello everyone, First of all here is our Solr setup: - Solr nightly build 986158 - Running solr inside the default jetty comes with solr build - 1 write only Master , 4 read only Slaves (quad core 5640 with 24gb of RAM) - Index replicated (on optimize) to slaves via Solr Replication - Size of index is around 2.5gb - No incremental writes, index is created from scratch(delete old documents -> commit new documents -> optimize) every 6 hours - Avg # of request per second is around 60 (for a single slave) - Avg time per request is around 25ms (before having problems) - Load on each is slave is around 2 We are using this set-up for months without any problem. However last week we started to experience very weird performance problems like : - Avg time per request increased from 25ms to 200-300ms (even higher if we don't restart the slaves) - Load on each slave increased from 2 to 15-20 (solr uses %400-%600 cpu) When we profile solr we see two very strange things : 1 - This is the jconsole output: https://skitch.com/meralan/rwwcf/mail-886x691 As you see gc runs for every 10-15 seconds and collects more than 1 gb of memory. (Actually if you wait more than 10 minutes you see spikes up to 4gb consistently) 2 - This is the newrelic output : https://skitch.com/meralan/rwwci/solr-requests-solr-new-relic-rpm As you see solr spent ridiculously long time in SolrDispatchFilter.doFilter() method. Apart form these, when we clean the index directory, re-replicate and restart each slave one by one we see a relief in the system but after some time servers start to melt down again. Although deleting index and replicating doesn't solve the problem, we think that these problems are somehow related to replication. Because symptoms started after replication and once it heals itself after replication. I also see lucene-write.lock files in slaves (we don't have write.lock files in the master) which I think we shouldn't see. If anyone can give any sort of ideas, we will appreciate it. Regards, Dogacan Guney