[jira] [Commented] (SOLR-10130) Serious performance degradation in Solr 6.4.1 due to the new metrics collection
[ https://issues.apache.org/jira/browse/SOLR-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15987488#comment-15987488 ] Shawn Heisey commented on SOLR-10130: - Metrics was just a theory, sounds like that's not it. Thanks [~ab] for the assist. > Serious performance degradation in Solr 6.4.1 due to the new metrics > collection > --- > > Key: SOLR-10130 > URL: https://issues.apache.org/jira/browse/SOLR-10130 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 6.4, 6.4.1 > Environment: Centos 7, OpenJDK 1.8.0 update 111 >Reporter: Ere Maijala >Assignee: Andrzej Bialecki >Priority: Blocker > Labels: perfomance > Fix For: 6.4.2, master (7.0) > > Attachments: SOLR-10130.patch, SOLR-10130.patch, > solr-8983-console-f1.log > > > We've stumbled on serious performance issues after upgrading to Solr 6.4.1. > Looks like the new metrics collection system in MetricsDirectoryFactory is > causing a major slowdown. This happens with an index configuration that, as > far as I can see, has no metrics specific configuration and uses > luceneMatchVersion 5.5.0. In practice a moderate load will completely bog > down the server with Solr threads constantly using up all CPU (600% on 6 core > machine) capacity with a load that normally where we normally see an average > load of < 50%. > I took stack traces (I'll attach them) and noticed that the threads are > spending time in com.codahale.metrics.Meter.mark. I tested building Solr > 6.4.1 with the metrics collection disabled in MetricsDirectoryFactory getByte > and getBytes methods and was unable to reproduce the issue. > As far as I can see there are several issues: > 1. Collecting metrics on every single byte read is slow. > 2. Having it enabled by default is not a good idea. > 3. The comment "enable coarse-grained metrics by default" at > https://github.com/apache/lucene-solr/blob/branch_6x/solr/core/src/java/org/apache/solr/update/SolrIndexConfig.java#L104 > implies that only coarse-grained metrics should be enabled by default, and > this contradicts with collecting metrics on every single byte read. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10130) Serious performance degradation in Solr 6.4.1 due to the new metrics collection
[ https://issues.apache.org/jira/browse/SOLR-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15987240#comment-15987240 ] Matthew Sporleder commented on SOLR-10130: -- Both are running java version "1.8.0_45" sun jdk > Serious performance degradation in Solr 6.4.1 due to the new metrics > collection > --- > > Key: SOLR-10130 > URL: https://issues.apache.org/jira/browse/SOLR-10130 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 6.4, 6.4.1 > Environment: Centos 7, OpenJDK 1.8.0 update 111 >Reporter: Ere Maijala >Assignee: Andrzej Bialecki >Priority: Blocker > Labels: perfomance > Fix For: 6.4.2, master (7.0) > > Attachments: SOLR-10130.patch, SOLR-10130.patch, > solr-8983-console-f1.log > > > We've stumbled on serious performance issues after upgrading to Solr 6.4.1. > Looks like the new metrics collection system in MetricsDirectoryFactory is > causing a major slowdown. This happens with an index configuration that, as > far as I can see, has no metrics specific configuration and uses > luceneMatchVersion 5.5.0. In practice a moderate load will completely bog > down the server with Solr threads constantly using up all CPU (600% on 6 core > machine) capacity with a load that normally where we normally see an average > load of < 50%. > I took stack traces (I'll attach them) and noticed that the threads are > spending time in com.codahale.metrics.Meter.mark. I tested building Solr > 6.4.1 with the metrics collection disabled in MetricsDirectoryFactory getByte > and getBytes methods and was unable to reproduce the issue. > As far as I can see there are several issues: > 1. Collecting metrics on every single byte read is slow. > 2. Having it enabled by default is not a good idea. > 3. The comment "enable coarse-grained metrics by default" at > https://github.com/apache/lucene-solr/blob/branch_6x/solr/core/src/java/org/apache/solr/update/SolrIndexConfig.java#L104 > implies that only coarse-grained metrics should be enabled by default, and > this contradicts with collecting metrics on every single byte read. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10130) Serious performance degradation in Solr 6.4.1 due to the new metrics collection
[ https://issues.apache.org/jira/browse/SOLR-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15987160#comment-15987160 ] Andrzej Bialecki commented on SOLR-10130: -- Did you change JDK version between these two installs? I found an old issue (but still open!) that may indicate it's a JDK bug: https://github.com/netty/netty/issues/327 . There are other similar reports for Jetty, but for older versions ... can't say whether that's relevant here. However, what these stacktraces do NOT show is anything related to metrics. > Serious performance degradation in Solr 6.4.1 due to the new metrics > collection > --- > > Key: SOLR-10130 > URL: https://issues.apache.org/jira/browse/SOLR-10130 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 6.4, 6.4.1 > Environment: Centos 7, OpenJDK 1.8.0 update 111 >Reporter: Ere Maijala >Assignee: Andrzej Bialecki >Priority: Blocker > Labels: perfomance > Fix For: 6.4.2, master (7.0) > > Attachments: SOLR-10130.patch, SOLR-10130.patch, > solr-8983-console-f1.log > > > We've stumbled on serious performance issues after upgrading to Solr 6.4.1. > Looks like the new metrics collection system in MetricsDirectoryFactory is > causing a major slowdown. This happens with an index configuration that, as > far as I can see, has no metrics specific configuration and uses > luceneMatchVersion 5.5.0. In practice a moderate load will completely bog > down the server with Solr threads constantly using up all CPU (600% on 6 core > machine) capacity with a load that normally where we normally see an average > load of < 50%. > I took stack traces (I'll attach them) and noticed that the threads are > spending time in com.codahale.metrics.Meter.mark. I tested building Solr > 6.4.1 with the metrics collection disabled in MetricsDirectoryFactory getByte > and getBytes methods and was unable to reproduce the issue. > As far as I can see there are several issues: > 1. Collecting metrics on every single byte read is slow. > 2. Having it enabled by default is not a good idea. > 3. The comment "enable coarse-grained metrics by default" at > https://github.com/apache/lucene-solr/blob/branch_6x/solr/core/src/java/org/apache/solr/update/SolrIndexConfig.java#L104 > implies that only coarse-grained metrics should be enabled by default, and > this contradicts with collecting metrics on every single byte read. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10130) Serious performance degradation in Solr 6.4.1 due to the new metrics collection
[ https://issues.apache.org/jira/browse/SOLR-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15987079#comment-15987079 ] Matthew Sporleder commented on SOLR-10130: -- Not sure I have the tooling right now for a full drill down, but here are some examples of a thread dump: {code} "qtp968514068-37953" - Thread t@37953 java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) - locked <2e101d2c> (a sun.nio.ch.Util$2) - locked <59c1f901> (a java.util.Collections$UnmodifiableSet) - locked <54c6a926> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:101) at org.eclipse.jetty.io.ManagedSelector$SelectorProducer.select(ManagedSelector.java:243) at org.eclipse.jetty.io.ManagedSelector$SelectorProducer.produce(ManagedSelector.java:191) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:249) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) at java.lang.Thread.run(Thread.java:745) Locked ownable synchronizers: - None "qtp968514068-37952" - Thread t@37952 java.lang.Thread.State: TIMED_WAITING at sun.misc.Unsafe.park(Native Method) - parking to wait for <2d78e562> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078) at org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:392) at org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:563) at org.eclipse.jetty.util.thread.QueuedThreadPool.access$800(QueuedThreadPool.java:48) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:626) at java.lang.Thread.run(Thread.java:745) {code} Most are in that TIMED_WAITING and most CPU time is spend on org.eclipse.jetty.util.BlockingArrayQueue.poll according to visualvm > Serious performance degradation in Solr 6.4.1 due to the new metrics > collection > --- > > Key: SOLR-10130 > URL: https://issues.apache.org/jira/browse/SOLR-10130 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 6.4, 6.4.1 > Environment: Centos 7, OpenJDK 1.8.0 update 111 >Reporter: Ere Maijala >Assignee: Andrzej Bialecki >Priority: Blocker > Labels: perfomance > Fix For: 6.4.2, master (7.0) > > Attachments: SOLR-10130.patch, SOLR-10130.patch, > solr-8983-console-f1.log > > > We've stumbled on serious performance issues after upgrading to Solr 6.4.1. > Looks like the new metrics collection system in MetricsDirectoryFactory is > causing a major slowdown. This happens with an index configuration that, as > far as I can see, has no metrics specific configuration and uses > luceneMatchVersion 5.5.0. In practice a moderate load will completely bog > down the server with Solr threads constantly using up all CPU (600% on 6 core > machine) capacity with a load that normally where we normally see an average > load of < 50%. > I took stack traces (I'll attach them) and noticed that the threads are > spending time in com.codahale.metrics.Meter.mark. I tested building Solr > 6.4.1 with the metrics collection disabled in MetricsDirectoryFactory getByte > and getBytes methods and was unable to reproduce the issue. > As far as I can see there are several issues: > 1. Collecting metrics on every single byte read is slow. > 2. Having it enabled by default is not a good idea. > 3. The comment "enable coarse-grained metrics by default" at > https://github.com/apache/lucene-solr/blob/branch_6x/solr/core/src/java/org/apache/solr/update/SolrIndexConfig.java#L104 > implies that only coarse-grained metrics should be enabled by default, and > this contradicts with collecting metrics on every single byte read. -- This message was sent by Atlassian JIRA (v
[jira] [Commented] (SOLR-10130) Serious performance degradation in Solr 6.4.1 due to the new metrics collection
[ https://issues.apache.org/jira/browse/SOLR-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15986913#comment-15986913 ] Andrzej Bialecki commented on SOLR-10130: -- I don't think this should make much of a difference - {{InstrumentedQueuedThreadPool}} only exposes gauges, which basically don't add CPU overhead unless accessed, and {{InstrumentedHandler}} collects only a few specific metrics, so the overhead should also be minimal, in the order of microseconds / request. A drill-down into these threads to find their hot-spots would be useful. > Serious performance degradation in Solr 6.4.1 due to the new metrics > collection > --- > > Key: SOLR-10130 > URL: https://issues.apache.org/jira/browse/SOLR-10130 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 6.4, 6.4.1 > Environment: Centos 7, OpenJDK 1.8.0 update 111 >Reporter: Ere Maijala >Assignee: Andrzej Bialecki >Priority: Blocker > Labels: perfomance > Fix For: 6.4.2, master (7.0) > > Attachments: SOLR-10130.patch, SOLR-10130.patch, > solr-8983-console-f1.log > > > We've stumbled on serious performance issues after upgrading to Solr 6.4.1. > Looks like the new metrics collection system in MetricsDirectoryFactory is > causing a major slowdown. This happens with an index configuration that, as > far as I can see, has no metrics specific configuration and uses > luceneMatchVersion 5.5.0. In practice a moderate load will completely bog > down the server with Solr threads constantly using up all CPU (600% on 6 core > machine) capacity with a load that normally where we normally see an average > load of < 50%. > I took stack traces (I'll attach them) and noticed that the threads are > spending time in com.codahale.metrics.Meter.mark. I tested building Solr > 6.4.1 with the metrics collection disabled in MetricsDirectoryFactory getByte > and getBytes methods and was unable to reproduce the issue. > As far as I can see there are several issues: > 1. Collecting metrics on every single byte read is slow. > 2. Having it enabled by default is not a good idea. > 3. The comment "enable coarse-grained metrics by default" at > https://github.com/apache/lucene-solr/blob/branch_6x/solr/core/src/java/org/apache/solr/update/SolrIndexConfig.java#L104 > implies that only coarse-grained metrics should be enabled by default, and > this contradicts with collecting metrics on every single byte read. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10130) Serious performance degradation in Solr 6.4.1 due to the new metrics collection
[ https://issues.apache.org/jira/browse/SOLR-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15986859#comment-15986859 ] Shawn Heisey commented on SOLR-10130: - Have a question related to this issue. Somebody on the IRC channel running 6.4.2 is seeing continued performance degradation compared to 4.x. They were running an earlier 6.4.x release, until they were advised about this issue. Looking at the utilization for threads, the top threads on 6.4.2 are all named starting with qtp, which I believe means they are Jetty threads. https://gist.github.com/msporleder-work/7313ebedbdab2e178ca0aa2e889d006b If I'm not mistaken, we enabled container-level metrics with the changes that went into 6.4.0. If that's true, do we perhaps have those metrics dialed up to 11? > Serious performance degradation in Solr 6.4.1 due to the new metrics > collection > --- > > Key: SOLR-10130 > URL: https://issues.apache.org/jira/browse/SOLR-10130 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 6.4, 6.4.1 > Environment: Centos 7, OpenJDK 1.8.0 update 111 >Reporter: Ere Maijala >Assignee: Andrzej Bialecki >Priority: Blocker > Labels: perfomance > Fix For: 6.4.2, master (7.0) > > Attachments: SOLR-10130.patch, SOLR-10130.patch, > solr-8983-console-f1.log > > > We've stumbled on serious performance issues after upgrading to Solr 6.4.1. > Looks like the new metrics collection system in MetricsDirectoryFactory is > causing a major slowdown. This happens with an index configuration that, as > far as I can see, has no metrics specific configuration and uses > luceneMatchVersion 5.5.0. In practice a moderate load will completely bog > down the server with Solr threads constantly using up all CPU (600% on 6 core > machine) capacity with a load that normally where we normally see an average > load of < 50%. > I took stack traces (I'll attach them) and noticed that the threads are > spending time in com.codahale.metrics.Meter.mark. I tested building Solr > 6.4.1 with the metrics collection disabled in MetricsDirectoryFactory getByte > and getBytes methods and was unable to reproduce the issue. > As far as I can see there are several issues: > 1. Collecting metrics on every single byte read is slow. > 2. Having it enabled by default is not a good idea. > 3. The comment "enable coarse-grained metrics by default" at > https://github.com/apache/lucene-solr/blob/branch_6x/solr/core/src/java/org/apache/solr/update/SolrIndexConfig.java#L104 > implies that only coarse-grained metrics should be enabled by default, and > this contradicts with collecting metrics on every single byte read. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10130) Serious performance degradation in Solr 6.4.1 due to the new metrics collection
[ https://issues.apache.org/jira/browse/SOLR-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15926179#comment-15926179 ] Shawn Heisey commented on SOLR-10130: - The reassignment was accidental, fixed. I hate the fact that Jira responds with real actions to just typing on the keyboard. I sometimes forget which window has the focus, assume that an SSH session I can clearly see is active, and find that I'm giving unknown commands to something that accepts keypresses as commands, like Thunderbird or Jira. > Serious performance degradation in Solr 6.4.1 due to the new metrics > collection > --- > > Key: SOLR-10130 > URL: https://issues.apache.org/jira/browse/SOLR-10130 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 6.4.1, 6.4.0 > Environment: Centos 7, OpenJDK 1.8.0 update 111 >Reporter: Ere Maijala >Assignee: Andrzej Bialecki >Priority: Blocker > Labels: perfomance > Fix For: master (7.0), 6.4.2 > > Attachments: SOLR-10130.patch, SOLR-10130.patch, > solr-8983-console-f1.log > > > We've stumbled on serious performance issues after upgrading to Solr 6.4.1. > Looks like the new metrics collection system in MetricsDirectoryFactory is > causing a major slowdown. This happens with an index configuration that, as > far as I can see, has no metrics specific configuration and uses > luceneMatchVersion 5.5.0. In practice a moderate load will completely bog > down the server with Solr threads constantly using up all CPU (600% on 6 core > machine) capacity with a load that normally where we normally see an average > load of < 50%. > I took stack traces (I'll attach them) and noticed that the threads are > spending time in com.codahale.metrics.Meter.mark. I tested building Solr > 6.4.1 with the metrics collection disabled in MetricsDirectoryFactory getByte > and getBytes methods and was unable to reproduce the issue. > As far as I can see there are several issues: > 1. Collecting metrics on every single byte read is slow. > 2. Having it enabled by default is not a good idea. > 3. The comment "enable coarse-grained metrics by default" at > https://github.com/apache/lucene-solr/blob/branch_6x/solr/core/src/java/org/apache/solr/update/SolrIndexConfig.java#L104 > implies that only coarse-grained metrics should be enabled by default, and > this contradicts with collecting metrics on every single byte read. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10130) Serious performance degradation in Solr 6.4.1 due to the new metrics collection
[ https://issues.apache.org/jira/browse/SOLR-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15881006#comment-15881006 ] Ishan Chattopadhyaya commented on SOLR-10130: - Adding a link to https://issues.apache.org/jira/browse/SOLR-10182 for backing out the changes that caused these perf degradations. > Serious performance degradation in Solr 6.4.1 due to the new metrics > collection > --- > > Key: SOLR-10130 > URL: https://issues.apache.org/jira/browse/SOLR-10130 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 6.4.1 > Environment: Centos 7, OpenJDK 1.8.0 update 111 >Reporter: Ere Maijala >Assignee: Andrzej Bialecki >Priority: Blocker > Labels: perfomance > Fix For: master (7.0), 6.4.2 > > Attachments: SOLR-10130.patch, SOLR-10130.patch, > solr-8983-console-f1.log > > > We've stumbled on serious performance issues after upgrading to Solr 6.4.1. > Looks like the new metrics collection system in MetricsDirectoryFactory is > causing a major slowdown. This happens with an index configuration that, as > far as I can see, has no metrics specific configuration and uses > luceneMatchVersion 5.5.0. In practice a moderate load will completely bog > down the server with Solr threads constantly using up all CPU (600% on 6 core > machine) capacity with a load that normally where we normally see an average > load of < 50%. > I took stack traces (I'll attach them) and noticed that the threads are > spending time in com.codahale.metrics.Meter.mark. I tested building Solr > 6.4.1 with the metrics collection disabled in MetricsDirectoryFactory getByte > and getBytes methods and was unable to reproduce the issue. > As far as I can see there are several issues: > 1. Collecting metrics on every single byte read is slow. > 2. Having it enabled by default is not a good idea. > 3. The comment "enable coarse-grained metrics by default" at > https://github.com/apache/lucene-solr/blob/branch_6x/solr/core/src/java/org/apache/solr/update/SolrIndexConfig.java#L104 > implies that only coarse-grained metrics should be enabled by default, and > this contradicts with collecting metrics on every single byte read. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10130) Serious performance degradation in Solr 6.4.1 due to the new metrics collection
[ https://issues.apache.org/jira/browse/SOLR-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15871772#comment-15871772 ] Ere Maijala commented on SOLR-10130: I still don't have proper benchmarks, but I've tested enough to say with fair confidence that this is fixed for us. > Serious performance degradation in Solr 6.4.1 due to the new metrics > collection > --- > > Key: SOLR-10130 > URL: https://issues.apache.org/jira/browse/SOLR-10130 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 6.4.1 > Environment: Centos 7, OpenJDK 1.8.0 update 111 >Reporter: Ere Maijala >Assignee: Andrzej Bialecki >Priority: Blocker > Labels: perfomance > Fix For: master (7.0), 6.4.2 > > Attachments: SOLR-10130.patch, SOLR-10130.patch, > solr-8983-console-f1.log > > > We've stumbled on serious performance issues after upgrading to Solr 6.4.1. > Looks like the new metrics collection system in MetricsDirectoryFactory is > causing a major slowdown. This happens with an index configuration that, as > far as I can see, has no metrics specific configuration and uses > luceneMatchVersion 5.5.0. In practice a moderate load will completely bog > down the server with Solr threads constantly using up all CPU (600% on 6 core > machine) capacity with a load that normally where we normally see an average > load of < 50%. > I took stack traces (I'll attach them) and noticed that the threads are > spending time in com.codahale.metrics.Meter.mark. I tested building Solr > 6.4.1 with the metrics collection disabled in MetricsDirectoryFactory getByte > and getBytes methods and was unable to reproduce the issue. > As far as I can see there are several issues: > 1. Collecting metrics on every single byte read is slow. > 2. Having it enabled by default is not a good idea. > 3. The comment "enable coarse-grained metrics by default" at > https://github.com/apache/lucene-solr/blob/branch_6x/solr/core/src/java/org/apache/solr/update/SolrIndexConfig.java#L104 > implies that only coarse-grained metrics should be enabled by default, and > this contradicts with collecting metrics on every single byte read. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10130) Serious performance degradation in Solr 6.4.1 due to the new metrics collection
[ https://issues.apache.org/jira/browse/SOLR-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15870760#comment-15870760 ] Erick Erickson commented on SOLR-10130: --- How on earth did you get a 4.7G tlog? It looks like you somehow didn't commit, shut the node down are replaying a ton of docs (well, how much does 4.7G weigh anyway?) from the tlog. So, simple test: 1> wait for the node to come up. 2> insure you've issued a hard commit 3> try restarting. My claim is that the restart will be reasonable and the slowness you're seeing is a result of somehow shutting down without doing a commit. Of course depending on your autocommit interval you may not need to do the hard commit before restarting... Best, Erick > Serious performance degradation in Solr 6.4.1 due to the new metrics > collection > --- > > Key: SOLR-10130 > URL: https://issues.apache.org/jira/browse/SOLR-10130 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 6.4.1 > Environment: Centos 7, OpenJDK 1.8.0 update 111 >Reporter: Ere Maijala >Assignee: Andrzej Bialecki >Priority: Blocker > Labels: perfomance > Fix For: master (7.0), 6.4.2 > > Attachments: SOLR-10130.patch, SOLR-10130.patch, > solr-8983-console-f1.log > > > We've stumbled on serious performance issues after upgrading to Solr 6.4.1. > Looks like the new metrics collection system in MetricsDirectoryFactory is > causing a major slowdown. This happens with an index configuration that, as > far as I can see, has no metrics specific configuration and uses > luceneMatchVersion 5.5.0. In practice a moderate load will completely bog > down the server with Solr threads constantly using up all CPU (600% on 6 core > machine) capacity with a load that normally where we normally see an average > load of < 50%. > I took stack traces (I'll attach them) and noticed that the threads are > spending time in com.codahale.metrics.Meter.mark. I tested building Solr > 6.4.1 with the metrics collection disabled in MetricsDirectoryFactory getByte > and getBytes methods and was unable to reproduce the issue. > As far as I can see there are several issues: > 1. Collecting metrics on every single byte read is slow. > 2. Having it enabled by default is not a good idea. > 3. The comment "enable coarse-grained metrics by default" at > https://github.com/apache/lucene-solr/blob/branch_6x/solr/core/src/java/org/apache/solr/update/SolrIndexConfig.java#L104 > implies that only coarse-grained metrics should be enabled by default, and > this contradicts with collecting metrics on every single byte read. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10130) Serious performance degradation in Solr 6.4.1 due to the new metrics collection
[ https://issues.apache.org/jira/browse/SOLR-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15870751#comment-15870751 ] Walter Underwood commented on SOLR-10130: - This might be part of it: [wunder@new-solr-c01.test3]# ls -lh /solr/data/questions_shard2_replica1/data/tlog/ total 4.7G -rw-r--r-- 1 bin bin 4.7G Feb 13 11:04 tlog.000 [wunder@new-solr-c01.test3]# du -sh /solr/data/questions_shard2_replica1/data/* 8.4G/solr/data/questions_shard2_replica1/data/index 4.0K/solr/data/questions_shard2_replica1/data/snapshot_metadata 4.7G/solr/data/questions_shard2_replica1/data/tlog Last Modified: 3 days ago Num Docs: 3683075 Max Doc: 3683075 Heap Memory Usage: -1 Deleted Docs: 0 Version: 2737 Segment Count: 26 Optimized: yes Current: yes > Serious performance degradation in Solr 6.4.1 due to the new metrics > collection > --- > > Key: SOLR-10130 > URL: https://issues.apache.org/jira/browse/SOLR-10130 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 6.4.1 > Environment: Centos 7, OpenJDK 1.8.0 update 111 >Reporter: Ere Maijala >Assignee: Andrzej Bialecki >Priority: Blocker > Labels: perfomance > Fix For: master (7.0), 6.4.2 > > Attachments: SOLR-10130.patch, SOLR-10130.patch, > solr-8983-console-f1.log > > > We've stumbled on serious performance issues after upgrading to Solr 6.4.1. > Looks like the new metrics collection system in MetricsDirectoryFactory is > causing a major slowdown. This happens with an index configuration that, as > far as I can see, has no metrics specific configuration and uses > luceneMatchVersion 5.5.0. In practice a moderate load will completely bog > down the server with Solr threads constantly using up all CPU (600% on 6 core > machine) capacity with a load that normally where we normally see an average > load of < 50%. > I took stack traces (I'll attach them) and noticed that the threads are > spending time in com.codahale.metrics.Meter.mark. I tested building Solr > 6.4.1 with the metrics collection disabled in MetricsDirectoryFactory getByte > and getBytes methods and was unable to reproduce the issue. > As far as I can see there are several issues: > 1. Collecting metrics on every single byte read is slow. > 2. Having it enabled by default is not a good idea. > 3. The comment "enable coarse-grained metrics by default" at > https://github.com/apache/lucene-solr/blob/branch_6x/solr/core/src/java/org/apache/solr/update/SolrIndexConfig.java#L104 > implies that only coarse-grained metrics should be enabled by default, and > this contradicts with collecting metrics on every single byte read. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10130) Serious performance degradation in Solr 6.4.1 due to the new metrics collection
[ https://issues.apache.org/jira/browse/SOLR-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15870734#comment-15870734 ] Andrzej Bialecki commented on SOLR-10130: -- bq.Is there any way you could pull/build a new version of Solr 6.4 (or apply the patch on this JIRA locally) and try? I'd hate to have the 6.4.2 release get out (coming soon, due to this) and not have fixed a different issue. I concur. [~wunder] - we are not sure if your situation is caused by the issue fixed here or by some other bug, it would be very helpful if you could try a build that contains this patch to see if it solves the problem in your environment. > Serious performance degradation in Solr 6.4.1 due to the new metrics > collection > --- > > Key: SOLR-10130 > URL: https://issues.apache.org/jira/browse/SOLR-10130 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 6.4.1 > Environment: Centos 7, OpenJDK 1.8.0 update 111 >Reporter: Ere Maijala >Assignee: Andrzej Bialecki >Priority: Blocker > Labels: perfomance > Fix For: master (7.0), 6.4.2 > > Attachments: SOLR-10130.patch, SOLR-10130.patch, > solr-8983-console-f1.log > > > We've stumbled on serious performance issues after upgrading to Solr 6.4.1. > Looks like the new metrics collection system in MetricsDirectoryFactory is > causing a major slowdown. This happens with an index configuration that, as > far as I can see, has no metrics specific configuration and uses > luceneMatchVersion 5.5.0. In practice a moderate load will completely bog > down the server with Solr threads constantly using up all CPU (600% on 6 core > machine) capacity with a load that normally where we normally see an average > load of < 50%. > I took stack traces (I'll attach them) and noticed that the threads are > spending time in com.codahale.metrics.Meter.mark. I tested building Solr > 6.4.1 with the metrics collection disabled in MetricsDirectoryFactory getByte > and getBytes methods and was unable to reproduce the issue. > As far as I can see there are several issues: > 1. Collecting metrics on every single byte read is slow. > 2. Having it enabled by default is not a good idea. > 3. The comment "enable coarse-grained metrics by default" at > https://github.com/apache/lucene-solr/blob/branch_6x/solr/core/src/java/org/apache/solr/update/SolrIndexConfig.java#L104 > implies that only coarse-grained metrics should be enabled by default, and > this contradicts with collecting metrics on every single byte read. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10130) Serious performance degradation in Solr 6.4.1 due to the new metrics collection
[ https://issues.apache.org/jira/browse/SOLR-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15870444#comment-15870444 ] Erick Erickson commented on SOLR-10130: --- Ah, ok. I take it this is a replica with a bunch of data in it? Although this doesn't make sense, there shouldn't be that much work do fire up a core absent tlog replay and the like but it sounds like you're far beyond that so I'm missing something. Is there any way you could pull/build a new version of Solr 6.4 (or apply the patch on this JIRA locally) and try? I'd hate to have the 6.4.2 release get out (coming soon, due to this) and not have fixed a different issue. > Serious performance degradation in Solr 6.4.1 due to the new metrics > collection > --- > > Key: SOLR-10130 > URL: https://issues.apache.org/jira/browse/SOLR-10130 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 6.4.1 > Environment: Centos 7, OpenJDK 1.8.0 update 111 >Reporter: Ere Maijala >Assignee: Andrzej Bialecki >Priority: Blocker > Labels: perfomance > Fix For: master (7.0), 6.4.2 > > Attachments: SOLR-10130.patch, SOLR-10130.patch, > solr-8983-console-f1.log > > > We've stumbled on serious performance issues after upgrading to Solr 6.4.1. > Looks like the new metrics collection system in MetricsDirectoryFactory is > causing a major slowdown. This happens with an index configuration that, as > far as I can see, has no metrics specific configuration and uses > luceneMatchVersion 5.5.0. In practice a moderate load will completely bog > down the server with Solr threads constantly using up all CPU (600% on 6 core > machine) capacity with a load that normally where we normally see an average > load of < 50%. > I took stack traces (I'll attach them) and noticed that the threads are > spending time in com.codahale.metrics.Meter.mark. I tested building Solr > 6.4.1 with the metrics collection disabled in MetricsDirectoryFactory getByte > and getBytes methods and was unable to reproduce the issue. > As far as I can see there are several issues: > 1. Collecting metrics on every single byte read is slow. > 2. Having it enabled by default is not a good idea. > 3. The comment "enable coarse-grained metrics by default" at > https://github.com/apache/lucene-solr/blob/branch_6x/solr/core/src/java/org/apache/solr/update/SolrIndexConfig.java#L104 > implies that only coarse-grained metrics should be enabled by default, and > this contradicts with collecting metrics on every single byte read. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10130) Serious performance degradation in Solr 6.4.1 due to the new metrics collection
[ https://issues.apache.org/jira/browse/SOLR-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15870429#comment-15870429 ] Walter Underwood commented on SOLR-10130: - I'm looking at how long the core is marked "recovering" in the cloud view of the admin UI. There shouldn't be any recovery. The server process is restarted hours after the most recent update. I think this is how long it takes to get the core loaded and ready for search. Startup time, really. > Serious performance degradation in Solr 6.4.1 due to the new metrics > collection > --- > > Key: SOLR-10130 > URL: https://issues.apache.org/jira/browse/SOLR-10130 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 6.4.1 > Environment: Centos 7, OpenJDK 1.8.0 update 111 >Reporter: Ere Maijala >Assignee: Andrzej Bialecki >Priority: Blocker > Labels: perfomance > Fix For: master (7.0), 6.4.2 > > Attachments: SOLR-10130.patch, SOLR-10130.patch, > solr-8983-console-f1.log > > > We've stumbled on serious performance issues after upgrading to Solr 6.4.1. > Looks like the new metrics collection system in MetricsDirectoryFactory is > causing a major slowdown. This happens with an index configuration that, as > far as I can see, has no metrics specific configuration and uses > luceneMatchVersion 5.5.0. In practice a moderate load will completely bog > down the server with Solr threads constantly using up all CPU (600% on 6 core > machine) capacity with a load that normally where we normally see an average > load of < 50%. > I took stack traces (I'll attach them) and noticed that the threads are > spending time in com.codahale.metrics.Meter.mark. I tested building Solr > 6.4.1 with the metrics collection disabled in MetricsDirectoryFactory getByte > and getBytes methods and was unable to reproduce the issue. > As far as I can see there are several issues: > 1. Collecting metrics on every single byte read is slow. > 2. Having it enabled by default is not a good idea. > 3. The comment "enable coarse-grained metrics by default" at > https://github.com/apache/lucene-solr/blob/branch_6x/solr/core/src/java/org/apache/solr/update/SolrIndexConfig.java#L104 > implies that only coarse-grained metrics should be enabled by default, and > this contradicts with collecting metrics on every single byte read. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10130) Serious performance degradation in Solr 6.4.1 due to the new metrics collection
[ https://issues.apache.org/jira/browse/SOLR-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15870379#comment-15870379 ] Erick Erickson commented on SOLR-10130: --- [~wunder]: If it's easy, could you try a manual fetchindex? Which you can even do in cloud mode. See: https://cwiki.apache.org/confluence/display/solr/Index+Replication#IndexReplication-HTTPAPICommandsfortheReplicationHandler Or maybe just see if the logs show that this very long recovery happens when you have a full recovery, i.e. you're copying the full index down from the leader/master... > Serious performance degradation in Solr 6.4.1 due to the new metrics > collection > --- > > Key: SOLR-10130 > URL: https://issues.apache.org/jira/browse/SOLR-10130 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 6.4.1 > Environment: Centos 7, OpenJDK 1.8.0 update 111 >Reporter: Ere Maijala >Assignee: Andrzej Bialecki >Priority: Blocker > Labels: perfomance > Fix For: master (7.0), 6.4.2 > > Attachments: SOLR-10130.patch, SOLR-10130.patch, > solr-8983-console-f1.log > > > We've stumbled on serious performance issues after upgrading to Solr 6.4.1. > Looks like the new metrics collection system in MetricsDirectoryFactory is > causing a major slowdown. This happens with an index configuration that, as > far as I can see, has no metrics specific configuration and uses > luceneMatchVersion 5.5.0. In practice a moderate load will completely bog > down the server with Solr threads constantly using up all CPU (600% on 6 core > machine) capacity with a load that normally where we normally see an average > load of < 50%. > I took stack traces (I'll attach them) and noticed that the threads are > spending time in com.codahale.metrics.Meter.mark. I tested building Solr > 6.4.1 with the metrics collection disabled in MetricsDirectoryFactory getByte > and getBytes methods and was unable to reproduce the issue. > As far as I can see there are several issues: > 1. Collecting metrics on every single byte read is slow. > 2. Having it enabled by default is not a good idea. > 3. The comment "enable coarse-grained metrics by default" at > https://github.com/apache/lucene-solr/blob/branch_6x/solr/core/src/java/org/apache/solr/update/SolrIndexConfig.java#L104 > implies that only coarse-grained metrics should be enabled by default, and > this contradicts with collecting metrics on every single byte read. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10130) Serious performance degradation in Solr 6.4.1 due to the new metrics collection
[ https://issues.apache.org/jira/browse/SOLR-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15870198#comment-15870198 ] Walter Underwood commented on SOLR-10130: - Also, recovery is much, much slower in 6.4. Each core is about 8 GB. After a server process restart, the core is recovering for a few minutes in 6.3, but for about a half hour in 6.4. > Serious performance degradation in Solr 6.4.1 due to the new metrics > collection > --- > > Key: SOLR-10130 > URL: https://issues.apache.org/jira/browse/SOLR-10130 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 6.4.1 > Environment: Centos 7, OpenJDK 1.8.0 update 111 >Reporter: Ere Maijala >Assignee: Andrzej Bialecki >Priority: Blocker > Labels: perfomance > Fix For: master (7.0), 6.4.2 > > Attachments: SOLR-10130.patch, SOLR-10130.patch, > solr-8983-console-f1.log > > > We've stumbled on serious performance issues after upgrading to Solr 6.4.1. > Looks like the new metrics collection system in MetricsDirectoryFactory is > causing a major slowdown. This happens with an index configuration that, as > far as I can see, has no metrics specific configuration and uses > luceneMatchVersion 5.5.0. In practice a moderate load will completely bog > down the server with Solr threads constantly using up all CPU (600% on 6 core > machine) capacity with a load that normally where we normally see an average > load of < 50%. > I took stack traces (I'll attach them) and noticed that the threads are > spending time in com.codahale.metrics.Meter.mark. I tested building Solr > 6.4.1 with the metrics collection disabled in MetricsDirectoryFactory getByte > and getBytes methods and was unable to reproduce the issue. > As far as I can see there are several issues: > 1. Collecting metrics on every single byte read is slow. > 2. Having it enabled by default is not a good idea. > 3. The comment "enable coarse-grained metrics by default" at > https://github.com/apache/lucene-solr/blob/branch_6x/solr/core/src/java/org/apache/solr/update/SolrIndexConfig.java#L104 > implies that only coarse-grained metrics should be enabled by default, and > this contradicts with collecting metrics on every single byte read. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10130) Serious performance degradation in Solr 6.4.1 due to the new metrics collection
[ https://issues.apache.org/jira/browse/SOLR-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15870165#comment-15870165 ] Walter Underwood commented on SOLR-10130: - The slowdown is impressive under heavy query load. Here are two load benchmarks with a 16 node cluster, c4.8xlarge instances (36 CPUs, 60 GB RAM), 15.7 million docs, 4 shards, replication factor 4 using production query logs. These are very long text queries, up to 40 words. Benchmark runs for two or three hours, depending on my patience. Java 8u121, G1 collector. 6.4.0 with 1000 requests/minute is running out of CPU. Median and 95th percentile response times for an ngram/prefix match are 7.5 and 9.8 seconds. For a word match, they are 11 and 25.4 seconds. 6.3.0 with 6000 rpm, the times are 0.4 and 2.7 seconds, and 0.7 and 4.3 seconds, respectively. CPU usage is under 50%. Short version, 6.4 is 10X slower than 6.3 handling 1/6 the load. > Serious performance degradation in Solr 6.4.1 due to the new metrics > collection > --- > > Key: SOLR-10130 > URL: https://issues.apache.org/jira/browse/SOLR-10130 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 6.4.1 > Environment: Centos 7, OpenJDK 1.8.0 update 111 >Reporter: Ere Maijala >Assignee: Andrzej Bialecki >Priority: Blocker > Labels: perfomance > Fix For: master (7.0), 6.4.2 > > Attachments: SOLR-10130.patch, SOLR-10130.patch, > solr-8983-console-f1.log > > > We've stumbled on serious performance issues after upgrading to Solr 6.4.1. > Looks like the new metrics collection system in MetricsDirectoryFactory is > causing a major slowdown. This happens with an index configuration that, as > far as I can see, has no metrics specific configuration and uses > luceneMatchVersion 5.5.0. In practice a moderate load will completely bog > down the server with Solr threads constantly using up all CPU (600% on 6 core > machine) capacity with a load that normally where we normally see an average > load of < 50%. > I took stack traces (I'll attach them) and noticed that the threads are > spending time in com.codahale.metrics.Meter.mark. I tested building Solr > 6.4.1 with the metrics collection disabled in MetricsDirectoryFactory getByte > and getBytes methods and was unable to reproduce the issue. > As far as I can see there are several issues: > 1. Collecting metrics on every single byte read is slow. > 2. Having it enabled by default is not a good idea. > 3. The comment "enable coarse-grained metrics by default" at > https://github.com/apache/lucene-solr/blob/branch_6x/solr/core/src/java/org/apache/solr/update/SolrIndexConfig.java#L104 > implies that only coarse-grained metrics should be enabled by default, and > this contradicts with collecting metrics on every single byte read. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10130) Serious performance degradation in Solr 6.4.1 due to the new metrics collection
[ https://issues.apache.org/jira/browse/SOLR-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15869753#comment-15869753 ] Alessandro Benedetti commented on SOLR-10130: - Thanks, for the clear explanation! Good spot ! > Serious performance degradation in Solr 6.4.1 due to the new metrics > collection > --- > > Key: SOLR-10130 > URL: https://issues.apache.org/jira/browse/SOLR-10130 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 6.4.1 > Environment: Centos 7, OpenJDK 1.8.0 update 111 >Reporter: Ere Maijala >Assignee: Andrzej Bialecki >Priority: Blocker > Labels: perfomance > Fix For: master (7.0), 6.4.2 > > Attachments: SOLR-10130.patch, SOLR-10130.patch, > solr-8983-console-f1.log > > > We've stumbled on serious performance issues after upgrading to Solr 6.4.1. > Looks like the new metrics collection system in MetricsDirectoryFactory is > causing a major slowdown. This happens with an index configuration that, as > far as I can see, has no metrics specific configuration and uses > luceneMatchVersion 5.5.0. In practice a moderate load will completely bog > down the server with Solr threads constantly using up all CPU (600% on 6 core > machine) capacity with a load that normally where we normally see an average > load of < 50%. > I took stack traces (I'll attach them) and noticed that the threads are > spending time in com.codahale.metrics.Meter.mark. I tested building Solr > 6.4.1 with the metrics collection disabled in MetricsDirectoryFactory getByte > and getBytes methods and was unable to reproduce the issue. > As far as I can see there are several issues: > 1. Collecting metrics on every single byte read is slow. > 2. Having it enabled by default is not a good idea. > 3. The comment "enable coarse-grained metrics by default" at > https://github.com/apache/lucene-solr/blob/branch_6x/solr/core/src/java/org/apache/solr/update/SolrIndexConfig.java#L104 > implies that only coarse-grained metrics should be enabled by default, and > this contradicts with collecting metrics on every single byte read. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10130) Serious performance degradation in Solr 6.4.1 due to the new metrics collection
[ https://issues.apache.org/jira/browse/SOLR-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15869702#comment-15869702 ] Andrzej Bialecki commented on SOLR-10130: -- Prefix query that matches many terms causes many seek & read ops, which meant that the instrumentation in {{org.apache.solr.core.MetricsDirectoryFactory$MetricsInput.readByte}} was called for every small read. This normally wouldn't matter for regular Directory implementations because they use caching extensively, precisely to avoid the overhead of reading single bytes, but {{MetricsDirectory}} being a wrapper on top of any Directory implementation couldn't benefit from this caching and still maintain the read/write counters. The overhead of individual {{Meter.mark}} call is in the order of microseconds, but invoking it a few million times resulted in significant slowdown. > Serious performance degradation in Solr 6.4.1 due to the new metrics > collection > --- > > Key: SOLR-10130 > URL: https://issues.apache.org/jira/browse/SOLR-10130 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 6.4.1 > Environment: Centos 7, OpenJDK 1.8.0 update 111 >Reporter: Ere Maijala >Assignee: Andrzej Bialecki >Priority: Blocker > Labels: perfomance > Fix For: master (7.0), 6.4.2 > > Attachments: SOLR-10130.patch, SOLR-10130.patch, > solr-8983-console-f1.log > > > We've stumbled on serious performance issues after upgrading to Solr 6.4.1. > Looks like the new metrics collection system in MetricsDirectoryFactory is > causing a major slowdown. This happens with an index configuration that, as > far as I can see, has no metrics specific configuration and uses > luceneMatchVersion 5.5.0. In practice a moderate load will completely bog > down the server with Solr threads constantly using up all CPU (600% on 6 core > machine) capacity with a load that normally where we normally see an average > load of < 50%. > I took stack traces (I'll attach them) and noticed that the threads are > spending time in com.codahale.metrics.Meter.mark. I tested building Solr > 6.4.1 with the metrics collection disabled in MetricsDirectoryFactory getByte > and getBytes methods and was unable to reproduce the issue. > As far as I can see there are several issues: > 1. Collecting metrics on every single byte read is slow. > 2. Having it enabled by default is not a good idea. > 3. The comment "enable coarse-grained metrics by default" at > https://github.com/apache/lucene-solr/blob/branch_6x/solr/core/src/java/org/apache/solr/update/SolrIndexConfig.java#L104 > implies that only coarse-grained metrics should be enabled by default, and > this contradicts with collecting metrics on every single byte read. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10130) Serious performance degradation in Solr 6.4.1 due to the new metrics collection
[ https://issues.apache.org/jira/browse/SOLR-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15869625#comment-15869625 ] Alessandro Benedetti commented on SOLR-10130: - What was causing the Query time slowdown for prefix queries ? Has this been discovered ? > Serious performance degradation in Solr 6.4.1 due to the new metrics > collection > --- > > Key: SOLR-10130 > URL: https://issues.apache.org/jira/browse/SOLR-10130 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 6.4.1 > Environment: Centos 7, OpenJDK 1.8.0 update 111 >Reporter: Ere Maijala >Assignee: Andrzej Bialecki >Priority: Blocker > Labels: perfomance > Fix For: master (7.0), 6.4.2 > > Attachments: SOLR-10130.patch, SOLR-10130.patch, > solr-8983-console-f1.log > > > We've stumbled on serious performance issues after upgrading to Solr 6.4.1. > Looks like the new metrics collection system in MetricsDirectoryFactory is > causing a major slowdown. This happens with an index configuration that, as > far as I can see, has no metrics specific configuration and uses > luceneMatchVersion 5.5.0. In practice a moderate load will completely bog > down the server with Solr threads constantly using up all CPU (600% on 6 core > machine) capacity with a load that normally where we normally see an average > load of < 50%. > I took stack traces (I'll attach them) and noticed that the threads are > spending time in com.codahale.metrics.Meter.mark. I tested building Solr > 6.4.1 with the metrics collection disabled in MetricsDirectoryFactory getByte > and getBytes methods and was unable to reproduce the issue. > As far as I can see there are several issues: > 1. Collecting metrics on every single byte read is slow. > 2. Having it enabled by default is not a good idea. > 3. The comment "enable coarse-grained metrics by default" at > https://github.com/apache/lucene-solr/blob/branch_6x/solr/core/src/java/org/apache/solr/update/SolrIndexConfig.java#L104 > implies that only coarse-grained metrics should be enabled by default, and > this contradicts with collecting metrics on every single byte read. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10130) Serious performance degradation in Solr 6.4.1 due to the new metrics collection
[ https://issues.apache.org/jira/browse/SOLR-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15868973#comment-15868973 ] Walter Underwood commented on SOLR-10130: - I don't have hard numbers, but core recovery after a restart with 6.4.0 was taking a really long time. Maybe 30 minutes. Back-reved to 6.3.0, it is maybe five minutes. > Serious performance degradation in Solr 6.4.1 due to the new metrics > collection > --- > > Key: SOLR-10130 > URL: https://issues.apache.org/jira/browse/SOLR-10130 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 6.4.1 > Environment: Centos 7, OpenJDK 1.8.0 update 111 >Reporter: Ere Maijala >Assignee: Andrzej Bialecki >Priority: Blocker > Labels: perfomance > Fix For: master (7.0), 6.4.2 > > Attachments: SOLR-10130.patch, SOLR-10130.patch, > solr-8983-console-f1.log > > > We've stumbled on serious performance issues after upgrading to Solr 6.4.1. > Looks like the new metrics collection system in MetricsDirectoryFactory is > causing a major slowdown. This happens with an index configuration that, as > far as I can see, has no metrics specific configuration and uses > luceneMatchVersion 5.5.0. In practice a moderate load will completely bog > down the server with Solr threads constantly using up all CPU (600% on 6 core > machine) capacity with a load that normally where we normally see an average > load of < 50%. > I took stack traces (I'll attach them) and noticed that the threads are > spending time in com.codahale.metrics.Meter.mark. I tested building Solr > 6.4.1 with the metrics collection disabled in MetricsDirectoryFactory getByte > and getBytes methods and was unable to reproduce the issue. > As far as I can see there are several issues: > 1. Collecting metrics on every single byte read is slow. > 2. Having it enabled by default is not a good idea. > 3. The comment "enable coarse-grained metrics by default" at > https://github.com/apache/lucene-solr/blob/branch_6x/solr/core/src/java/org/apache/solr/update/SolrIndexConfig.java#L104 > implies that only coarse-grained metrics should be enabled by default, and > this contradicts with collecting metrics on every single byte read. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10130) Serious performance degradation in Solr 6.4.1 due to the new metrics collection
[ https://issues.apache.org/jira/browse/SOLR-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15868222#comment-15868222 ] Henrik commented on SOLR-10130: --- We just deployed the latest from branch_6_4 (a9eb001f44) and our systems are performing normally again. Thanks for your work on this [~ab]! > Serious performance degradation in Solr 6.4.1 due to the new metrics > collection > --- > > Key: SOLR-10130 > URL: https://issues.apache.org/jira/browse/SOLR-10130 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 6.4.1 > Environment: Centos 7, OpenJDK 1.8.0 update 111 >Reporter: Ere Maijala >Assignee: Andrzej Bialecki >Priority: Blocker > Labels: perfomance > Fix For: master (7.0), 6.4.2 > > Attachments: SOLR-10130.patch, SOLR-10130.patch, > solr-8983-console-f1.log > > > We've stumbled on serious performance issues after upgrading to Solr 6.4.1. > Looks like the new metrics collection system in MetricsDirectoryFactory is > causing a major slowdown. This happens with an index configuration that, as > far as I can see, has no metrics specific configuration and uses > luceneMatchVersion 5.5.0. In practice a moderate load will completely bog > down the server with Solr threads constantly using up all CPU (600% on 6 core > machine) capacity with a load that normally where we normally see an average > load of < 50%. > I took stack traces (I'll attach them) and noticed that the threads are > spending time in com.codahale.metrics.Meter.mark. I tested building Solr > 6.4.1 with the metrics collection disabled in MetricsDirectoryFactory getByte > and getBytes methods and was unable to reproduce the issue. > As far as I can see there are several issues: > 1. Collecting metrics on every single byte read is slow. > 2. Having it enabled by default is not a good idea. > 3. The comment "enable coarse-grained metrics by default" at > https://github.com/apache/lucene-solr/blob/branch_6x/solr/core/src/java/org/apache/solr/update/SolrIndexConfig.java#L104 > implies that only coarse-grained metrics should be enabled by default, and > this contradicts with collecting metrics on every single byte read. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10130) Serious performance degradation in Solr 6.4.1 due to the new metrics collection
[ https://issues.apache.org/jira/browse/SOLR-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15867912#comment-15867912 ] Andrzej Bialecki commented on SOLR-10130: -- You can of course build Solr yourself from {{branch_6_4}} that contains the patch. I don't know of any specific timeline for 6.4.2, but this is a pretty serious issue so I think we should do it soon - let's discuss this on mailing lists. > Serious performance degradation in Solr 6.4.1 due to the new metrics > collection > --- > > Key: SOLR-10130 > URL: https://issues.apache.org/jira/browse/SOLR-10130 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 6.4.1 > Environment: Centos 7, OpenJDK 1.8.0 update 111 >Reporter: Ere Maijala >Assignee: Andrzej Bialecki >Priority: Blocker > Labels: perfomance > Fix For: master (7.0), 6.4.2 > > Attachments: SOLR-10130.patch, SOLR-10130.patch, > solr-8983-console-f1.log > > > We've stumbled on serious performance issues after upgrading to Solr 6.4.1. > Looks like the new metrics collection system in MetricsDirectoryFactory is > causing a major slowdown. This happens with an index configuration that, as > far as I can see, has no metrics specific configuration and uses > luceneMatchVersion 5.5.0. In practice a moderate load will completely bog > down the server with Solr threads constantly using up all CPU (600% on 6 core > machine) capacity with a load that normally where we normally see an average > load of < 50%. > I took stack traces (I'll attach them) and noticed that the threads are > spending time in com.codahale.metrics.Meter.mark. I tested building Solr > 6.4.1 with the metrics collection disabled in MetricsDirectoryFactory getByte > and getBytes methods and was unable to reproduce the issue. > As far as I can see there are several issues: > 1. Collecting metrics on every single byte read is slow. > 2. Having it enabled by default is not a good idea. > 3. The comment "enable coarse-grained metrics by default" at > https://github.com/apache/lucene-solr/blob/branch_6x/solr/core/src/java/org/apache/solr/update/SolrIndexConfig.java#L104 > implies that only coarse-grained metrics should be enabled by default, and > this contradicts with collecting metrics on every single byte read. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10130) Serious performance degradation in Solr 6.4.1 due to the new metrics collection
[ https://issues.apache.org/jira/browse/SOLR-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15867900#comment-15867900 ] bidorbuy commented on SOLR-10130: - [~ab] would I have to do a Solr build myself to get the patch in or should I rather wait for 6.4.2 (if so, any indication of when it would be released)? > Serious performance degradation in Solr 6.4.1 due to the new metrics > collection > --- > > Key: SOLR-10130 > URL: https://issues.apache.org/jira/browse/SOLR-10130 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 6.4.1 > Environment: Centos 7, OpenJDK 1.8.0 update 111 >Reporter: Ere Maijala >Assignee: Andrzej Bialecki >Priority: Blocker > Labels: perfomance > Fix For: master (7.0), 6.4.2 > > Attachments: SOLR-10130.patch, SOLR-10130.patch, > solr-8983-console-f1.log > > > We've stumbled on serious performance issues after upgrading to Solr 6.4.1. > Looks like the new metrics collection system in MetricsDirectoryFactory is > causing a major slowdown. This happens with an index configuration that, as > far as I can see, has no metrics specific configuration and uses > luceneMatchVersion 5.5.0. In practice a moderate load will completely bog > down the server with Solr threads constantly using up all CPU (600% on 6 core > machine) capacity with a load that normally where we normally see an average > load of < 50%. > I took stack traces (I'll attach them) and noticed that the threads are > spending time in com.codahale.metrics.Meter.mark. I tested building Solr > 6.4.1 with the metrics collection disabled in MetricsDirectoryFactory getByte > and getBytes methods and was unable to reproduce the issue. > As far as I can see there are several issues: > 1. Collecting metrics on every single byte read is slow. > 2. Having it enabled by default is not a good idea. > 3. The comment "enable coarse-grained metrics by default" at > https://github.com/apache/lucene-solr/blob/branch_6x/solr/core/src/java/org/apache/solr/update/SolrIndexConfig.java#L104 > implies that only coarse-grained metrics should be enabled by default, and > this contradicts with collecting metrics on every single byte read. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10130) Serious performance degradation in Solr 6.4.1 due to the new metrics collection
[ https://issues.apache.org/jira/browse/SOLR-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15867867#comment-15867867 ] ASF subversion and git services commented on SOLR-10130: Commit b6f49dc1fb4ad6ef890ae1d09f6d4c0584bb6f64 in lucene-solr's branch refs/heads/master from [~ab] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b6f49dc ] SOLR-10130 Serious performance degradation in Solr 6.4.1 due to the new metrics collection. > Serious performance degradation in Solr 6.4.1 due to the new metrics > collection > --- > > Key: SOLR-10130 > URL: https://issues.apache.org/jira/browse/SOLR-10130 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 6.4.1 > Environment: Centos 7, OpenJDK 1.8.0 update 111 >Reporter: Ere Maijala >Assignee: Andrzej Bialecki >Priority: Blocker > Labels: perfomance > Fix For: master (7.0), 6.4.2 > > Attachments: SOLR-10130.patch, SOLR-10130.patch, > solr-8983-console-f1.log > > > We've stumbled on serious performance issues after upgrading to Solr 6.4.1. > Looks like the new metrics collection system in MetricsDirectoryFactory is > causing a major slowdown. This happens with an index configuration that, as > far as I can see, has no metrics specific configuration and uses > luceneMatchVersion 5.5.0. In practice a moderate load will completely bog > down the server with Solr threads constantly using up all CPU (600% on 6 core > machine) capacity with a load that normally where we normally see an average > load of < 50%. > I took stack traces (I'll attach them) and noticed that the threads are > spending time in com.codahale.metrics.Meter.mark. I tested building Solr > 6.4.1 with the metrics collection disabled in MetricsDirectoryFactory getByte > and getBytes methods and was unable to reproduce the issue. > As far as I can see there are several issues: > 1. Collecting metrics on every single byte read is slow. > 2. Having it enabled by default is not a good idea. > 3. The comment "enable coarse-grained metrics by default" at > https://github.com/apache/lucene-solr/blob/branch_6x/solr/core/src/java/org/apache/solr/update/SolrIndexConfig.java#L104 > implies that only coarse-grained metrics should be enabled by default, and > this contradicts with collecting metrics on every single byte read. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10130) Serious performance degradation in Solr 6.4.1 due to the new metrics collection
[ https://issues.apache.org/jira/browse/SOLR-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15867817#comment-15867817 ] ASF subversion and git services commented on SOLR-10130: Commit 835c96ba97a01c61978535c0e8fe34708755dc28 in lucene-solr's branch refs/heads/branch_6x from [~ab] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=835c96b ] SOLR-10130 Serious performance degradation in Solr 6.4.1 due to the new metrics collection. > Serious performance degradation in Solr 6.4.1 due to the new metrics > collection > --- > > Key: SOLR-10130 > URL: https://issues.apache.org/jira/browse/SOLR-10130 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 6.4.1 > Environment: Centos 7, OpenJDK 1.8.0 update 111 >Reporter: Ere Maijala >Assignee: Andrzej Bialecki >Priority: Blocker > Labels: perfomance > Fix For: master (7.0), 6.4.2 > > Attachments: SOLR-10130.patch, SOLR-10130.patch, > solr-8983-console-f1.log > > > We've stumbled on serious performance issues after upgrading to Solr 6.4.1. > Looks like the new metrics collection system in MetricsDirectoryFactory is > causing a major slowdown. This happens with an index configuration that, as > far as I can see, has no metrics specific configuration and uses > luceneMatchVersion 5.5.0. In practice a moderate load will completely bog > down the server with Solr threads constantly using up all CPU (600% on 6 core > machine) capacity with a load that normally where we normally see an average > load of < 50%. > I took stack traces (I'll attach them) and noticed that the threads are > spending time in com.codahale.metrics.Meter.mark. I tested building Solr > 6.4.1 with the metrics collection disabled in MetricsDirectoryFactory getByte > and getBytes methods and was unable to reproduce the issue. > As far as I can see there are several issues: > 1. Collecting metrics on every single byte read is slow. > 2. Having it enabled by default is not a good idea. > 3. The comment "enable coarse-grained metrics by default" at > https://github.com/apache/lucene-solr/blob/branch_6x/solr/core/src/java/org/apache/solr/update/SolrIndexConfig.java#L104 > implies that only coarse-grained metrics should be enabled by default, and > this contradicts with collecting metrics on every single byte read. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10130) Serious performance degradation in Solr 6.4.1 due to the new metrics collection
[ https://issues.apache.org/jira/browse/SOLR-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15867744#comment-15867744 ] ASF subversion and git services commented on SOLR-10130: Commit a9eb001f44ca846b64d4ed6e46af316fe12ce3d0 in lucene-solr's branch refs/heads/branch_6_4 from [~ab] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a9eb001 ] SOLR-10130 Serious performance degradation in Solr 6.4.1 due to the new metrics collection. > Serious performance degradation in Solr 6.4.1 due to the new metrics > collection > --- > > Key: SOLR-10130 > URL: https://issues.apache.org/jira/browse/SOLR-10130 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 6.4.1 > Environment: Centos 7, OpenJDK 1.8.0 update 111 >Reporter: Ere Maijala >Assignee: Andrzej Bialecki >Priority: Blocker > Labels: perfomance > Fix For: master (7.0), 6.4.2 > > Attachments: SOLR-10130.patch, SOLR-10130.patch, > solr-8983-console-f1.log > > > We've stumbled on serious performance issues after upgrading to Solr 6.4.1. > Looks like the new metrics collection system in MetricsDirectoryFactory is > causing a major slowdown. This happens with an index configuration that, as > far as I can see, has no metrics specific configuration and uses > luceneMatchVersion 5.5.0. In practice a moderate load will completely bog > down the server with Solr threads constantly using up all CPU (600% on 6 core > machine) capacity with a load that normally where we normally see an average > load of < 50%. > I took stack traces (I'll attach them) and noticed that the threads are > spending time in com.codahale.metrics.Meter.mark. I tested building Solr > 6.4.1 with the metrics collection disabled in MetricsDirectoryFactory getByte > and getBytes methods and was unable to reproduce the issue. > As far as I can see there are several issues: > 1. Collecting metrics on every single byte read is slow. > 2. Having it enabled by default is not a good idea. > 3. The comment "enable coarse-grained metrics by default" at > https://github.com/apache/lucene-solr/blob/branch_6x/solr/core/src/java/org/apache/solr/update/SolrIndexConfig.java#L104 > implies that only coarse-grained metrics should be enabled by default, and > this contradicts with collecting metrics on every single byte read. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10130) Serious performance degradation in Solr 6.4.1 due to the new metrics collection
[ https://issues.apache.org/jira/browse/SOLR-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15867647#comment-15867647 ] bidorbuy commented on SOLR-10130: - Same issue here. Worked perfectly fine on Solr 6.2.0 and CPU is trashing on Solr 6.4.1. I didn't see this bug report and logged a duplicate - https://issues.apache.org/jira/browse/SOLR-10140 showing slowdown in comparison. In our case, Solr 6.4.1 works perfectly fine under production load for about 1 hour and then CPU starts trashing. From the New Relic reports you will see that Solr 6.4.1 is flaring CPU substantially more than prior versions. > Serious performance degradation in Solr 6.4.1 due to the new metrics > collection > --- > > Key: SOLR-10130 > URL: https://issues.apache.org/jira/browse/SOLR-10130 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 6.4.1 > Environment: Centos 7, OpenJDK 1.8.0 update 111 >Reporter: Ere Maijala >Assignee: Andrzej Bialecki >Priority: Blocker > Labels: perfomance > Fix For: master (7.0), 6.4.2 > > Attachments: SOLR-10130.patch, solr-8983-console-f1.log > > > We've stumbled on serious performance issues after upgrading to Solr 6.4.1. > Looks like the new metrics collection system in MetricsDirectoryFactory is > causing a major slowdown. This happens with an index configuration that, as > far as I can see, has no metrics specific configuration and uses > luceneMatchVersion 5.5.0. In practice a moderate load will completely bog > down the server with Solr threads constantly using up all CPU (600% on 6 core > machine) capacity with a load that normally where we normally see an average > load of < 50%. > I took stack traces (I'll attach them) and noticed that the threads are > spending time in com.codahale.metrics.Meter.mark. I tested building Solr > 6.4.1 with the metrics collection disabled in MetricsDirectoryFactory getByte > and getBytes methods and was unable to reproduce the issue. > As far as I can see there are several issues: > 1. Collecting metrics on every single byte read is slow. > 2. Having it enabled by default is not a good idea. > 3. The comment "enable coarse-grained metrics by default" at > https://github.com/apache/lucene-solr/blob/branch_6x/solr/core/src/java/org/apache/solr/update/SolrIndexConfig.java#L104 > implies that only coarse-grained metrics should be enabled by default, and > this contradicts with collecting metrics on every single byte read. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10130) Serious performance degradation in Solr 6.4.1 due to the new metrics collection
[ https://issues.apache.org/jira/browse/SOLR-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15867402#comment-15867402 ] Ere Maijala commented on SOLR-10130: I don't have proper benchmarks at hand, but I can support others' findings about the serious query performance degradation. I suppose it's easily overlooked when testing with light concurrency, but with enough concurrent queries being served it gets CPU-heavy. We use queries with a lot of filters so that may play a role too. I'll see if I came come up with a reproducible-enough test results from our actual queries. > Serious performance degradation in Solr 6.4.1 due to the new metrics > collection > --- > > Key: SOLR-10130 > URL: https://issues.apache.org/jira/browse/SOLR-10130 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 6.4.1 > Environment: Centos 7, OpenJDK 1.8.0 update 111 >Reporter: Ere Maijala >Assignee: Andrzej Bialecki >Priority: Blocker > Labels: perfomance > Fix For: master (7.0), 6.4.2 > > Attachments: SOLR-10130.patch, solr-8983-console-f1.log > > > We've stumbled on serious performance issues after upgrading to Solr 6.4.1. > Looks like the new metrics collection system in MetricsDirectoryFactory is > causing a major slowdown. This happens with an index configuration that, as > far as I can see, has no metrics specific configuration and uses > luceneMatchVersion 5.5.0. In practice a moderate load will completely bog > down the server with Solr threads constantly using up all CPU (600% on 6 core > machine) capacity with a load that normally where we normally see an average > load of < 50%. > I took stack traces (I'll attach them) and noticed that the threads are > spending time in com.codahale.metrics.Meter.mark. I tested building Solr > 6.4.1 with the metrics collection disabled in MetricsDirectoryFactory getByte > and getBytes methods and was unable to reproduce the issue. > As far as I can see there are several issues: > 1. Collecting metrics on every single byte read is slow. > 2. Having it enabled by default is not a good idea. > 3. The comment "enable coarse-grained metrics by default" at > https://github.com/apache/lucene-solr/blob/branch_6x/solr/core/src/java/org/apache/solr/update/SolrIndexConfig.java#L104 > implies that only coarse-grained metrics should be enabled by default, and > this contradicts with collecting metrics on every single byte read. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10130) Serious performance degradation in Solr 6.4.1 due to the new metrics collection
[ https://issues.apache.org/jira/browse/SOLR-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15866888#comment-15866888 ] Ishan Chattopadhyaya commented on SOLR-10130: - {code} Prefix query times for 6.4.1 with SOLR-10130 patch - java -cp target/org.apache.solr.tests.upgradetests-0.0.1-SNAPSHOT-jar-with-dependencies.jar:. org.apache.solr.tests.upgradetests.SimpleBenchmarks -v 72f75b2503fa0aa4f0aff76d439874feb923bb0e -patchUrl https://issues.apache.org/jira/secure/attachment/12852444/SOLR-10130.patch -Nodes 1 -Shards 1 -Replicas 1 -numDocs 1 -threads 4 -benchmarkType generalQuerying Got results for prefix queries: 1 Max time (prefix queries): 1716 Total time (prefix queries): 852266 {code} > Serious performance degradation in Solr 6.4.1 due to the new metrics > collection > --- > > Key: SOLR-10130 > URL: https://issues.apache.org/jira/browse/SOLR-10130 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 6.4.1 > Environment: Centos 7, OpenJDK 1.8.0 update 111 >Reporter: Ere Maijala >Assignee: Andrzej Bialecki >Priority: Blocker > Labels: perfomance > Fix For: master (7.0), 6.4.2 > > Attachments: SOLR-10130.patch, solr-8983-console-f1.log > > > We've stumbled on serious performance issues after upgrading to Solr 6.4.1. > Looks like the new metrics collection system in MetricsDirectoryFactory is > causing a major slowdown. This happens with an index configuration that, as > far as I can see, has no metrics specific configuration and uses > luceneMatchVersion 5.5.0. In practice a moderate load will completely bog > down the server with Solr threads constantly using up all CPU (600% on 6 core > machine) capacity with a load that normally where we normally see an average > load of < 50%. > I took stack traces (I'll attach them) and noticed that the threads are > spending time in com.codahale.metrics.Meter.mark. I tested building Solr > 6.4.1 with the metrics collection disabled in MetricsDirectoryFactory getByte > and getBytes methods and was unable to reproduce the issue. > As far as I can see there are several issues: > 1. Collecting metrics on every single byte read is slow. > 2. Having it enabled by default is not a good idea. > 3. The comment "enable coarse-grained metrics by default" at > https://github.com/apache/lucene-solr/blob/branch_6x/solr/core/src/java/org/apache/solr/update/SolrIndexConfig.java#L104 > implies that only coarse-grained metrics should be enabled by default, and > this contradicts with collecting metrics on every single byte read. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10130) Serious performance degradation in Solr 6.4.1 due to the new metrics collection
[ https://issues.apache.org/jira/browse/SOLR-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15866806#comment-15866806 ] Ishan Chattopadhyaya commented on SOLR-10130: - I could reproduce a 1.6x slowdown for prefix queries. Benchmarking suite: https://github.com/chatman/solr-upgrade-tests/blob/master/BENCHMARKS.md Environment: packet.net, Type 0 server (https://www.packet.net/bare-metal/servers/type-0/) {code} Prefix query times for 6.4.1 java -cp target/org.apache.solr.tests.upgradetests-0.0.1-SNAPSHOT-jar-with-dependencies.jar:. org.apache.solr.tests.upgradetests.SimpleBenchmarks -v 72f75b2503fa0aa4f0aff76d439874feb923bb0e -Nodes 1 -Shards 1 -Replicas 1 -numDocs 1 -threads 4 -benchmarkType generalQuerying Got results for prefix queries: 1 Max time (prefix queries): 2156ms Total time (prefix queries): 1324856ms Prefix query times for 6.3.0 java -cp target/org.apache.solr.tests.upgradetests-0.0.1-SNAPSHOT-jar-with-dependencies.jar:. org.apache.solr.tests.upgradetests.SimpleBenchmarks -v 6fa26fe8553b7b65dee96da741f2c1adf4cb6216 -patchUrl http://147.75.108.131/LUCENE-7651.patch -Nodes 1 -Shards 1 -Replicas 1 -numDocs 1 -threads 4 -benchmarkType generalQuerying Got results for prefix queries: 1 Max time (prefix queries): 1358ms Total time (prefix queries): 839534ms {code} Notes: 1. The -threads parameter here is for no. of indexing threads, and number of querying threads is 4 times that, i.e. 16 in this case. 2. Total time is the sum of all times, as reported in the response header's "QTime". Max time is the QTime for the worst performing query. > Serious performance degradation in Solr 6.4.1 due to the new metrics > collection > --- > > Key: SOLR-10130 > URL: https://issues.apache.org/jira/browse/SOLR-10130 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 6.4.1 > Environment: Centos 7, OpenJDK 1.8.0 update 111 >Reporter: Ere Maijala >Assignee: Andrzej Bialecki >Priority: Blocker > Labels: perfomance > Fix For: master (7.0), 6.4.2 > > Attachments: SOLR-10130.patch, solr-8983-console-f1.log > > > We've stumbled on serious performance issues after upgrading to Solr 6.4.1. > Looks like the new metrics collection system in MetricsDirectoryFactory is > causing a major slowdown. This happens with an index configuration that, as > far as I can see, has no metrics specific configuration and uses > luceneMatchVersion 5.5.0. In practice a moderate load will completely bog > down the server with Solr threads constantly using up all CPU (600% on 6 core > machine) capacity with a load that normally where we normally see an average > load of < 50%. > I took stack traces (I'll attach them) and noticed that the threads are > spending time in com.codahale.metrics.Meter.mark. I tested building Solr > 6.4.1 with the metrics collection disabled in MetricsDirectoryFactory getByte > and getBytes methods and was unable to reproduce the issue. > As far as I can see there are several issues: > 1. Collecting metrics on every single byte read is slow. > 2. Having it enabled by default is not a good idea. > 3. The comment "enable coarse-grained metrics by default" at > https://github.com/apache/lucene-solr/blob/branch_6x/solr/core/src/java/org/apache/solr/update/SolrIndexConfig.java#L104 > implies that only coarse-grained metrics should be enabled by default, and > this contradicts with collecting metrics on every single byte read. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10130) Serious performance degradation in Solr 6.4.1 due to the new metrics collection
[ https://issues.apache.org/jira/browse/SOLR-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15866092#comment-15866092 ] Walter Underwood commented on SOLR-10130: - I have a JMeter-based load script I can share. It replays access logs. I reload the collection to clear caches, run warming queries, then run queries at a controlled rate. After, it calculates percentiles. This was a test of 6.4.1. Really slow. The errors are usually log lines with queries so long that they are truncated and end up with bad syntax. There is one column per request handler, so these results are for /auto, /mobile, /select, and /srp. Mon Feb 13 12:01:29 PST 2017 ; INFO testing solr-cloud.test.cheggnet.com:8983 Mon Feb 13 12:01:29 PST 2017 ; INFO testing with 2000 requests/min Mon Feb 13 12:01:29 PST 2017 ; INFO testing with 24 requests Mon Feb 13 12:01:29 PST 2017 : splitting log into cache warming (first 2000 lines) and benchmark for /home/wunder/2016-12-12-peak-questions-traffic-clean.log Mon Feb 13 12:01:36 PST 2017 : starting cache warming to solr-cloud.test.cheggnet.com:8983 Mon Feb 13 12:24:29 PST 2017 : starting benchmarking to solr-cloud.test.cheggnet.com:8983 Mon Feb 13 12:24:29 PST 2017 : benchmark should run for 120 minutes Mon Feb 13 12:24:29 PST 2017 : to get a count of requests sent so far, use "wc -l out-32688.jtl" Mon Feb 13 14:55:01 PST 2017 : WARNING 207 error responses from solr-cloud.test.cheggnet.com Mon Feb 13 14:55:01 PST 2017 : INFO Removing 207 error responses from JMeter output file before analysis Mon Feb 13 14:55:01 PST 2017 : analyzing results /home/wunder/search-test/load-test Mon Feb 13 14:55:04 PST 2017 : 25th percentiles are 3151.0,3389.0,9329.0,5647.0 Mon Feb 13 14:55:04 PST 2017 : medians are 6101.0,10579.0,18692.0,8780.0 Mon Feb 13 14:55:04 PST 2017 : 75th percentiles are 6871.0,12499.0,25000.0,12580.0 Mon Feb 13 14:55:04 PST 2017 : 90th percentiles are 7593.0,13481.0,27623.0,14068.0 Mon Feb 13 14:55:04 PST 2017 : 95th percentiles are 8079.0,14039.0,28566.0,16606.0 Mon Feb 13 14:55:04 PST 2017 : full results are in test.csv > Serious performance degradation in Solr 6.4.1 due to the new metrics > collection > --- > > Key: SOLR-10130 > URL: https://issues.apache.org/jira/browse/SOLR-10130 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 6.4.1 > Environment: Centos 7, OpenJDK 1.8.0 update 111 >Reporter: Ere Maijala >Assignee: Andrzej Bialecki >Priority: Blocker > Labels: perfomance > Fix For: master (7.0), 6.4.2 > > Attachments: SOLR-10130.patch, solr-8983-console-f1.log > > > We've stumbled on serious performance issues after upgrading to Solr 6.4.1. > Looks like the new metrics collection system in MetricsDirectoryFactory is > causing a major slowdown. This happens with an index configuration that, as > far as I can see, has no metrics specific configuration and uses > luceneMatchVersion 5.5.0. In practice a moderate load will completely bog > down the server with Solr threads constantly using up all CPU (600% on 6 core > machine) capacity with a load that normally where we normally see an average > load of < 50%. > I took stack traces (I'll attach them) and noticed that the threads are > spending time in com.codahale.metrics.Meter.mark. I tested building Solr > 6.4.1 with the metrics collection disabled in MetricsDirectoryFactory getByte > and getBytes methods and was unable to reproduce the issue. > As far as I can see there are several issues: > 1. Collecting metrics on every single byte read is slow. > 2. Having it enabled by default is not a good idea. > 3. The comment "enable coarse-grained metrics by default" at > https://github.com/apache/lucene-solr/blob/branch_6x/solr/core/src/java/org/apache/solr/update/SolrIndexConfig.java#L104 > implies that only coarse-grained metrics should be enabled by default, and > this contradicts with collecting metrics on every single byte read. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10130) Serious performance degradation in Solr 6.4.1 due to the new metrics collection
[ https://issues.apache.org/jira/browse/SOLR-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15866077#comment-15866077 ] Ishan Chattopadhyaya commented on SOLR-10130: - Thanks Yonik. I'm working on query performance benchmarks for this. > Serious performance degradation in Solr 6.4.1 due to the new metrics > collection > --- > > Key: SOLR-10130 > URL: https://issues.apache.org/jira/browse/SOLR-10130 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 6.4.1 > Environment: Centos 7, OpenJDK 1.8.0 update 111 >Reporter: Ere Maijala >Assignee: Andrzej Bialecki >Priority: Blocker > Labels: perfomance > Fix For: master (7.0), 6.4.2 > > Attachments: SOLR-10130.patch, solr-8983-console-f1.log > > > We've stumbled on serious performance issues after upgrading to Solr 6.4.1. > Looks like the new metrics collection system in MetricsDirectoryFactory is > causing a major slowdown. This happens with an index configuration that, as > far as I can see, has no metrics specific configuration and uses > luceneMatchVersion 5.5.0. In practice a moderate load will completely bog > down the server with Solr threads constantly using up all CPU (600% on 6 core > machine) capacity with a load that normally where we normally see an average > load of < 50%. > I took stack traces (I'll attach them) and noticed that the threads are > spending time in com.codahale.metrics.Meter.mark. I tested building Solr > 6.4.1 with the metrics collection disabled in MetricsDirectoryFactory getByte > and getBytes methods and was unable to reproduce the issue. > As far as I can see there are several issues: > 1. Collecting metrics on every single byte read is slow. > 2. Having it enabled by default is not a good idea. > 3. The comment "enable coarse-grained metrics by default" at > https://github.com/apache/lucene-solr/blob/branch_6x/solr/core/src/java/org/apache/solr/update/SolrIndexConfig.java#L104 > implies that only coarse-grained metrics should be enabled by default, and > this contradicts with collecting metrics on every single byte read. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10130) Serious performance degradation in Solr 6.4.1 due to the new metrics collection
[ https://issues.apache.org/jira/browse/SOLR-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15866067#comment-15866067 ] Yonik Seeley commented on SOLR-10130: - Just a matter of how many little IOs are involved in your request. I was easily able to reproduce a 5x slowdown with a prefix query that matches many terms. > Serious performance degradation in Solr 6.4.1 due to the new metrics > collection > --- > > Key: SOLR-10130 > URL: https://issues.apache.org/jira/browse/SOLR-10130 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 6.4.1 > Environment: Centos 7, OpenJDK 1.8.0 update 111 >Reporter: Ere Maijala >Assignee: Andrzej Bialecki >Priority: Blocker > Labels: perfomance > Fix For: master (7.0), 6.4.2 > > Attachments: SOLR-10130.patch, solr-8983-console-f1.log > > > We've stumbled on serious performance issues after upgrading to Solr 6.4.1. > Looks like the new metrics collection system in MetricsDirectoryFactory is > causing a major slowdown. This happens with an index configuration that, as > far as I can see, has no metrics specific configuration and uses > luceneMatchVersion 5.5.0. In practice a moderate load will completely bog > down the server with Solr threads constantly using up all CPU (600% on 6 core > machine) capacity with a load that normally where we normally see an average > load of < 50%. > I took stack traces (I'll attach them) and noticed that the threads are > spending time in com.codahale.metrics.Meter.mark. I tested building Solr > 6.4.1 with the metrics collection disabled in MetricsDirectoryFactory getByte > and getBytes methods and was unable to reproduce the issue. > As far as I can see there are several issues: > 1. Collecting metrics on every single byte read is slow. > 2. Having it enabled by default is not a good idea. > 3. The comment "enable coarse-grained metrics by default" at > https://github.com/apache/lucene-solr/blob/branch_6x/solr/core/src/java/org/apache/solr/update/SolrIndexConfig.java#L104 > implies that only coarse-grained metrics should be enabled by default, and > this contradicts with collecting metrics on every single byte read. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10130) Serious performance degradation in Solr 6.4.1 due to the new metrics collection
[ https://issues.apache.org/jira/browse/SOLR-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15865980#comment-15865980 ] Walter Underwood commented on SOLR-10130: - Sorry, the 6000 rpm was with 6.2.1, not 6.4.0. I've backrev'ed the cluster to 6.3.0 and I'll be running load benchmarks today. > Serious performance degradation in Solr 6.4.1 due to the new metrics > collection > --- > > Key: SOLR-10130 > URL: https://issues.apache.org/jira/browse/SOLR-10130 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 6.4.1 > Environment: Centos 7, OpenJDK 1.8.0 update 111 >Reporter: Ere Maijala >Assignee: Andrzej Bialecki >Priority: Blocker > Labels: perfomance > Fix For: master (7.0), 6.4.2 > > Attachments: SOLR-10130.patch, solr-8983-console-f1.log > > > We've stumbled on serious performance issues after upgrading to Solr 6.4.1. > Looks like the new metrics collection system in MetricsDirectoryFactory is > causing a major slowdown. This happens with an index configuration that, as > far as I can see, has no metrics specific configuration and uses > luceneMatchVersion 5.5.0. In practice a moderate load will completely bog > down the server with Solr threads constantly using up all CPU (600% on 6 core > machine) capacity with a load that normally where we normally see an average > load of < 50%. > I took stack traces (I'll attach them) and noticed that the threads are > spending time in com.codahale.metrics.Meter.mark. I tested building Solr > 6.4.1 with the metrics collection disabled in MetricsDirectoryFactory getByte > and getBytes methods and was unable to reproduce the issue. > As far as I can see there are several issues: > 1. Collecting metrics on every single byte read is slow. > 2. Having it enabled by default is not a good idea. > 3. The comment "enable coarse-grained metrics by default" at > https://github.com/apache/lucene-solr/blob/branch_6x/solr/core/src/java/org/apache/solr/update/SolrIndexConfig.java#L104 > implies that only coarse-grained metrics should be enabled by default, and > this contradicts with collecting metrics on every single byte read. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10130) Serious performance degradation in Solr 6.4.1 due to the new metrics collection
[ https://issues.apache.org/jira/browse/SOLR-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15865945#comment-15865945 ] Henrik commented on SOLR-10130: --- We've also seen performance degradation with SolrCloud on 6.4.1, as I've posted on solr-user ( http://lucene.472066.n3.nabble.com/Performance-degradation-after-upgrading-from-6-2-1-to-6-4-1-td4320226.html ): Here are a couple of graphs. As you can see, 6.4.1 was introduced 2/10 12:00: https://www.dropbox.com/s/qrc0wodain50azz/solr1.png?dl=0 https://www.dropbox.com/s/sdk30imm8jlomz2/solr2.png?dl=0 https://www.dropbox.com/s/rgd8bq86i3c5mga/solr2b.png?dl=0 These are two very different usage scenarios: * Solr1 has constant updates and very volatile data (30 minutes TTL, 20 shards with no replicas, across 8 servers). Requests in the 99 percentile went from ~400ms to 1000-1500ms. (Hystrix cutoff at 1.5s) * Solr2 is a more traditional instance with long-lived data (updated once a day, 24 shards with 2 replicas, across 8 servers). Requests in the 99 percentile went from ~400ms to at least 1s. (Hystrix cutoff at 1s) > Serious performance degradation in Solr 6.4.1 due to the new metrics > collection > --- > > Key: SOLR-10130 > URL: https://issues.apache.org/jira/browse/SOLR-10130 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 6.4.1 > Environment: Centos 7, OpenJDK 1.8.0 update 111 >Reporter: Ere Maijala >Assignee: Andrzej Bialecki >Priority: Blocker > Labels: perfomance > Fix For: master (7.0), 6.4.2 > > Attachments: SOLR-10130.patch, solr-8983-console-f1.log > > > We've stumbled on serious performance issues after upgrading to Solr 6.4.1. > Looks like the new metrics collection system in MetricsDirectoryFactory is > causing a major slowdown. This happens with an index configuration that, as > far as I can see, has no metrics specific configuration and uses > luceneMatchVersion 5.5.0. In practice a moderate load will completely bog > down the server with Solr threads constantly using up all CPU (600% on 6 core > machine) capacity with a load that normally where we normally see an average > load of < 50%. > I took stack traces (I'll attach them) and noticed that the threads are > spending time in com.codahale.metrics.Meter.mark. I tested building Solr > 6.4.1 with the metrics collection disabled in MetricsDirectoryFactory getByte > and getBytes methods and was unable to reproduce the issue. > As far as I can see there are several issues: > 1. Collecting metrics on every single byte read is slow. > 2. Having it enabled by default is not a good idea. > 3. The comment "enable coarse-grained metrics by default" at > https://github.com/apache/lucene-solr/blob/branch_6x/solr/core/src/java/org/apache/solr/update/SolrIndexConfig.java#L104 > implies that only coarse-grained metrics should be enabled by default, and > this contradicts with collecting metrics on every single byte read. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10130) Serious performance degradation in Solr 6.4.1 due to the new metrics collection
[ https://issues.apache.org/jira/browse/SOLR-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15865912#comment-15865912 ] Andrzej Bialecki commented on SOLR-10130: -- I haven't been able to reproduce such drastic slowdown using simple benchmarks - example results from indexing using {{post}} tool, fairly representative from several runs on each branch: {code} * branch_6_3 * branch_6_4 * jira/solr-10130 {code} Profiler indeed shows that one of the hotspots on branch_6_4 is the {{Meter.mark}} code that is called in {{org.apache.solr.core.MetricsDirectoryFactory$MetricsInput.readByte}}. In my test the profiler showed that this consumes ~ 3% CPU, which is indeed something that we should avoid and turn off by default. However, this still doesn't explain the order of magnitude slowdown reported above. [~emaijala] and [~wunder] - please apply the above patch in your environment and see what is the impact. It makes sense to make this change anyway, so I'm going to apply this or similar version to all affected branches, but maybe there's more we can do here. > Serious performance degradation in Solr 6.4.1 due to the new metrics > collection > --- > > Key: SOLR-10130 > URL: https://issues.apache.org/jira/browse/SOLR-10130 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 6.4.1 > Environment: Centos 7, OpenJDK 1.8.0 update 111 >Reporter: Ere Maijala >Assignee: Andrzej Bialecki >Priority: Blocker > Labels: perfomance > Attachments: SOLR-10130.patch, solr-8983-console-f1.log > > > We've stumbled on serious performance issues after upgrading to Solr 6.4.1. > Looks like the new metrics collection system in MetricsDirectoryFactory is > causing a major slowdown. This happens with an index configuration that, as > far as I can see, has no metrics specific configuration and uses > luceneMatchVersion 5.5.0. In practice a moderate load will completely bog > down the server with Solr threads constantly using up all CPU (600% on 6 core > machine) capacity with a load that normally where we normally see an average > load of < 50%. > I took stack traces (I'll attach them) and noticed that the threads are > spending time in com.codahale.metrics.Meter.mark. I tested building Solr > 6.4.1 with the metrics collection disabled in MetricsDirectoryFactory getByte > and getBytes methods and was unable to reproduce the issue. > As far as I can see there are several issues: > 1. Collecting metrics on every single byte read is slow. > 2. Having it enabled by default is not a good idea. > 3. The comment "enable coarse-grained metrics by default" at > https://github.com/apache/lucene-solr/blob/branch_6x/solr/core/src/java/org/apache/solr/update/SolrIndexConfig.java#L104 > implies that only coarse-grained metrics should be enabled by default, and > this contradicts with collecting metrics on every single byte read. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10130) Serious performance degradation in Solr 6.4.1 due to the new metrics collection
[ https://issues.apache.org/jira/browse/SOLR-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15865584#comment-15865584 ] Ishan Chattopadhyaya commented on SOLR-10130: - 6.3.0 was faster (same as 6.4.1 with patch). {code} java -cp target/org.apache.solr.tests.upgradetests-0.0.1-SNAPSHOT-jar-with-dependencies.jar:. org.apache.solr.tests.upgradetests.SimpleBenchmarks -v 6fa26fe8553b7b65dee96da741f2c1adf4cb6216 -patchUrl http://147.75.108.131/LUCENE-7651.patch -Nodes 1 -Shards 1 -Replicas 1 -numDocs 10 -threads 6 -benchmarkType generalIndexing Indexing times: 168,167 {code} > Serious performance degradation in Solr 6.4.1 due to the new metrics > collection > --- > > Key: SOLR-10130 > URL: https://issues.apache.org/jira/browse/SOLR-10130 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 6.4.1 > Environment: Centos 7, OpenJDK 1.8.0 update 111 >Reporter: Ere Maijala >Assignee: Andrzej Bialecki >Priority: Blocker > Labels: perfomance > Attachments: SOLR-10130.patch, solr-8983-console-f1.log > > > We've stumbled on serious performance issues after upgrading to Solr 6.4.1. > Looks like the new metrics collection system in MetricsDirectoryFactory is > causing a major slowdown. This happens with an index configuration that, as > far as I can see, has no metrics specific configuration and uses > luceneMatchVersion 5.5.0. In practice a moderate load will completely bog > down the server with Solr threads constantly using up all CPU (600% on 6 core > machine) capacity with a load that normally where we normally see an average > load of < 50%. > I took stack traces (I'll attach them) and noticed that the threads are > spending time in com.codahale.metrics.Meter.mark. I tested building Solr > 6.4.1 with the metrics collection disabled in MetricsDirectoryFactory getByte > and getBytes methods and was unable to reproduce the issue. > As far as I can see there are several issues: > 1. Collecting metrics on every single byte read is slow. > 2. Having it enabled by default is not a good idea. > 3. The comment "enable coarse-grained metrics by default" at > https://github.com/apache/lucene-solr/blob/branch_6x/solr/core/src/java/org/apache/solr/update/SolrIndexConfig.java#L104 > implies that only coarse-grained metrics should be enabled by default, and > this contradicts with collecting metrics on every single byte read. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10130) Serious performance degradation in Solr 6.4.1 due to the new metrics collection
[ https://issues.apache.org/jira/browse/SOLR-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15865510#comment-15865510 ] Ishan Chattopadhyaya commented on SOLR-10130: - 6.4.0 shows very similar numbers as compared to 6.4.1 {code} 6.4.0 Without patch java -cp target/org.apache.solr.tests.upgradetests-0.0.1-SNAPSHOT-jar-with-dependencies.jar:. org.apache.solr.tests.upgradetests.SimpleBenchmarks -v 680153de29c5b01d4a8afad88d4a7b84ab01e145 -Nodes 1 -Shards 1 -Replicas 1 -numDocs 10 -threads 6 -benchmarkType generalIndexing Indexing times: 191,184 {code} > Serious performance degradation in Solr 6.4.1 due to the new metrics > collection > --- > > Key: SOLR-10130 > URL: https://issues.apache.org/jira/browse/SOLR-10130 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 6.4.1 > Environment: Centos 7, OpenJDK 1.8.0 update 111 >Reporter: Ere Maijala >Assignee: Andrzej Bialecki >Priority: Blocker > Labels: perfomance > Attachments: SOLR-10130.patch, solr-8983-console-f1.log > > > We've stumbled on serious performance issues after upgrading to Solr 6.4.1. > Looks like the new metrics collection system in MetricsDirectoryFactory is > causing a major slowdown. This happens with an index configuration that, as > far as I can see, has no metrics specific configuration and uses > luceneMatchVersion 5.5.0. In practice a moderate load will completely bog > down the server with Solr threads constantly using up all CPU (600% on 6 core > machine) capacity with a load that normally where we normally see an average > load of < 50%. > I took stack traces (I'll attach them) and noticed that the threads are > spending time in com.codahale.metrics.Meter.mark. I tested building Solr > 6.4.1 with the metrics collection disabled in MetricsDirectoryFactory getByte > and getBytes methods and was unable to reproduce the issue. > As far as I can see there are several issues: > 1. Collecting metrics on every single byte read is slow. > 2. Having it enabled by default is not a good idea. > 3. The comment "enable coarse-grained metrics by default" at > https://github.com/apache/lucene-solr/blob/branch_6x/solr/core/src/java/org/apache/solr/update/SolrIndexConfig.java#L104 > implies that only coarse-grained metrics should be enabled by default, and > this contradicts with collecting metrics on every single byte read. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10130) Serious performance degradation in Solr 6.4.1 due to the new metrics collection
[ https://issues.apache.org/jira/browse/SOLR-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15865492#comment-15865492 ] Andrzej Bialecki commented on SOLR-10130: -- I would've expected a much larger difference with this patch, if this indeed was the cause of the slowdown - the patch completely turns off metrics collection at directory and index level. bq. With 6.4.0, we were handling 6000 requests/minute. With 6.4.1 it is 1000 rpm [~wunder] This is odd, too - the same metrics code is present in both 6.4.1 and 6.4.0, with the same defaults, so I would expect that both versions should show similar performance. Could you please collect some stacktraces (or sample / profile) to verify that you see the same hotspots as [~emaijala] ? > Serious performance degradation in Solr 6.4.1 due to the new metrics > collection > --- > > Key: SOLR-10130 > URL: https://issues.apache.org/jira/browse/SOLR-10130 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 6.4.1 > Environment: Centos 7, OpenJDK 1.8.0 update 111 >Reporter: Ere Maijala >Assignee: Andrzej Bialecki >Priority: Blocker > Labels: perfomance > Attachments: SOLR-10130.patch, solr-8983-console-f1.log > > > We've stumbled on serious performance issues after upgrading to Solr 6.4.1. > Looks like the new metrics collection system in MetricsDirectoryFactory is > causing a major slowdown. This happens with an index configuration that, as > far as I can see, has no metrics specific configuration and uses > luceneMatchVersion 5.5.0. In practice a moderate load will completely bog > down the server with Solr threads constantly using up all CPU (600% on 6 core > machine) capacity with a load that normally where we normally see an average > load of < 50%. > I took stack traces (I'll attach them) and noticed that the threads are > spending time in com.codahale.metrics.Meter.mark. I tested building Solr > 6.4.1 with the metrics collection disabled in MetricsDirectoryFactory getByte > and getBytes methods and was unable to reproduce the issue. > As far as I can see there are several issues: > 1. Collecting metrics on every single byte read is slow. > 2. Having it enabled by default is not a good idea. > 3. The comment "enable coarse-grained metrics by default" at > https://github.com/apache/lucene-solr/blob/branch_6x/solr/core/src/java/org/apache/solr/update/SolrIndexConfig.java#L104 > implies that only coarse-grained metrics should be enabled by default, and > this contradicts with collecting metrics on every single byte read. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10130) Serious performance degradation in Solr 6.4.1 due to the new metrics collection
[ https://issues.apache.org/jira/browse/SOLR-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15865470#comment-15865470 ] Ishan Chattopadhyaya commented on SOLR-10130: - I ran some benchmarks, with and without this patch. {code} Benchmarking suite: https://github.com/chatman/solr-upgrade-tests/blob/master/BENCHMARKS.md Environment: packet.net, Type 0 server (https://www.packet.net/bare-metal/servers/type-0/) 6.4.1 Without patch java -cp target/org.apache.solr.tests.upgradetests-0.0.1-SNAPSHOT-jar-with-dependencies.jar:. org.apache.solr.tests.upgradetests.SimpleBenchmarks -v 72f75b2503fa0aa4f0aff76d439874feb923bb0e -Nodes 1 -Shards 1 -Replicas 1 -numDocs 10 -threads 6 -benchmarkType generalIndexing Indexing times: 188,190 6.4.1 With patch java -cp target/org.apache.solr.tests.upgradetests-0.0.1-SNAPSHOT-jar-with-dependencies.jar:. org.apache.solr.tests.upgradetests.SimpleBenchmarks -v 72f75b2503fa0aa4f0aff76d439874feb923bb0e -patchUrl https://issues.apache.org/jira/secure/attachment/12852444/SOLR-10130.patch -Nodes 1 -Shards 1 -Replicas 1 -numDocs 10 -threads 6 -benchmarkType generalIndexing Indexing times: 171,165 {code} > Serious performance degradation in Solr 6.4.1 due to the new metrics > collection > --- > > Key: SOLR-10130 > URL: https://issues.apache.org/jira/browse/SOLR-10130 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 6.4.1 > Environment: Centos 7, OpenJDK 1.8.0 update 111 >Reporter: Ere Maijala >Assignee: Andrzej Bialecki >Priority: Blocker > Labels: perfomance > Attachments: SOLR-10130.patch, solr-8983-console-f1.log > > > We've stumbled on serious performance issues after upgrading to Solr 6.4.1. > Looks like the new metrics collection system in MetricsDirectoryFactory is > causing a major slowdown. This happens with an index configuration that, as > far as I can see, has no metrics specific configuration and uses > luceneMatchVersion 5.5.0. In practice a moderate load will completely bog > down the server with Solr threads constantly using up all CPU (600% on 6 core > machine) capacity with a load that normally where we normally see an average > load of < 50%. > I took stack traces (I'll attach them) and noticed that the threads are > spending time in com.codahale.metrics.Meter.mark. I tested building Solr > 6.4.1 with the metrics collection disabled in MetricsDirectoryFactory getByte > and getBytes methods and was unable to reproduce the issue. > As far as I can see there are several issues: > 1. Collecting metrics on every single byte read is slow. > 2. Having it enabled by default is not a good idea. > 3. The comment "enable coarse-grained metrics by default" at > https://github.com/apache/lucene-solr/blob/branch_6x/solr/core/src/java/org/apache/solr/update/SolrIndexConfig.java#L104 > implies that only coarse-grained metrics should be enabled by default, and > this contradicts with collecting metrics on every single byte read. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10130) Serious performance degradation in Solr 6.4.1 due to the new metrics collection
[ https://issues.apache.org/jira/browse/SOLR-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15864585#comment-15864585 ] Andrzej Bialecki commented on SOLR-10130: -- bq. Does disabling metrics fix it or we we need to go back to 6.4.0? Unfortunately no, these metrics are always turned on both in 6.4.0 and in 6.4.1. I'll upload a patch that disables this by default and allows turning it on via a solrconfig.xml knob. > Serious performance degradation in Solr 6.4.1 due to the new metrics > collection > --- > > Key: SOLR-10130 > URL: https://issues.apache.org/jira/browse/SOLR-10130 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 6.4.1 > Environment: Centos 7, OpenJDK 1.8.0 update 111 >Reporter: Ere Maijala >Assignee: Andrzej Bialecki >Priority: Blocker > Labels: perfomance > Attachments: solr-8983-console-f1.log > > > We've stumbled on serious performance issues after upgrading to Solr 6.4.1. > Looks like the new metrics collection system in MetricsDirectoryFactory is > causing a major slowdown. This happens with an index configuration that, as > far as I can see, has no metrics specific configuration and uses > luceneMatchVersion 5.5.0. In practice a moderate load will completely bog > down the server with Solr threads constantly using up all CPU (600% on 6 core > machine) capacity with a load that normally where we normally see an average > load of < 50%. > I took stack traces (I'll attach them) and noticed that the threads are > spending time in com.codahale.metrics.Meter.mark. I tested building Solr > 6.4.1 with the metrics collection disabled in MetricsDirectoryFactory getByte > and getBytes methods and was unable to reproduce the issue. > As far as I can see there are several issues: > 1. Collecting metrics on every single byte read is slow. > 2. Having it enabled by default is not a good idea. > 3. The comment "enable coarse-grained metrics by default" at > https://github.com/apache/lucene-solr/blob/branch_6x/solr/core/src/java/org/apache/solr/update/SolrIndexConfig.java#L104 > implies that only coarse-grained metrics should be enabled by default, and > this contradicts with collecting metrics on every single byte read. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10130) Serious performance degradation in Solr 6.4.1 due to the new metrics collection
[ https://issues.apache.org/jira/browse/SOLR-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15863948#comment-15863948 ] Walter Underwood commented on SOLR-10130: - I’m seeing similar problems here. With 6.4.0, we were handling 6000 requests/minute. With 6.4.1 it is 1000 rpm with median response times around 2.5 seconds. I also switched to the G1 collector. I’m going to back that out and retest today to see if the performance comes back. Does disabling metrics fix it or we we need to go back to 6.4.0? wunder > Serious performance degradation in Solr 6.4.1 due to the new metrics > collection > --- > > Key: SOLR-10130 > URL: https://issues.apache.org/jira/browse/SOLR-10130 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: metrics >Affects Versions: 6.4.1 > Environment: Centos 7, OpenJDK 1.8.0 update 111 >Reporter: Ere Maijala >Priority: Blocker > Labels: perfomance > Attachments: solr-8983-console-f1.log > > > We've stumbled on serious performance issues after upgrading to Solr 6.4.1. > Looks like the new metrics collection system in MetricsDirectoryFactory is > causing a major slowdown. This happens with an index configuration that, as > far as I can see, has no metrics specific configuration and uses > luceneMatchVersion 5.5.0. In practice a moderate load will completely bog > down the server with Solr threads constantly using up all CPU (600% on 6 core > machine) capacity with a load that normally where we normally see an average > load of < 50%. > I took stack traces (I'll attach them) and noticed that the threads are > spending time in com.codahale.metrics.Meter.mark. I tested building Solr > 6.4.1 with the metrics collection disabled in MetricsDirectoryFactory getByte > and getBytes methods and was unable to reproduce the issue. > As far as I can see there are several issues: > 1. Collecting metrics on every single byte read is slow. > 2. Having it enabled by default is not a good idea. > 3. The comment "enable coarse-grained metrics by default" at > https://github.com/apache/lucene-solr/blob/branch_6x/solr/core/src/java/org/apache/solr/update/SolrIndexConfig.java#L104 > implies that only coarse-grained metrics should be enabled by default, and > this contradicts with collecting metrics on every single byte read. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org