[jira] [Commented] (SOLR-9935) When hl.method=unified add support for hl.fragsize param

2018-03-14 Thread Mohsen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16398647#comment-16398647
 ] 

Mohsen commented on SOLR-9935:
--

Is this really resolved as of Solr 6.4? I tested with both Solr 6.4.1 and Solr 
7.1 installations and none of them recognize hl.fragsize when unified method is 
used.

> When hl.method=unified add support for hl.fragsize param
> 
>
> Key: SOLR-9935
> URL: https://issues.apache.org/jira/browse/SOLR-9935
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: highlighter
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
> Fix For: 6.4
>
> Attachments: SOLR_9935_UH_fragsize.patch, SOLR_9935_UH_fragsize.patch
>
>
> In LUCENE-7620 the UnifiedHighlighter is getting a BreakIterator that allows 
> it to support the equivalent of Solr's {{hl.fragsize}}.  So lets support this 
> on the Solr side.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: TestReplicationHandler is failing 100% (master and 7.x / 7.3)

2018-03-14 Thread Alan Woodward
I’m happy either way, but if it’s a bug can we get it fixed quickly?  Can you 
take ownership of this one Andrzej?

> On 14 Mar 2018, at 11:24, Andrzej Białecki  > wrote:
> 
> Hi,
> 
> This test has always been fragile, but recently it’s been failing 100%, most 
> often in ‘doTestIndexFetchOnMasterRestart’.
> 
> I don’t know the replication handler enough to be able to find the real 
> reason behind these failures, but there are two possibilities that I see:
> 
> * the test has a bug and needs to be fixed - and if we can’t fix it soon then 
> with 7.3 release imminent we could BadApple it until it’s properly fixed
> 
> * or actually the replication handler has a bug, which needs to be fixed - in 
> which case I propose to bump up SOLR-12078 to Blocker.
> 
> I’m open to suggestions.
> 
> —
> 
> Andrzej Białecki
> 



[JENKINS] Lucene-Solr-Tests-7.3 - Build # 1 - Unstable

2018-03-14 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Tests-7.3/1/

1 tests failed.
FAILED:  
org.apache.solr.handler.admin.SegmentsInfoRequestHandlerTest.testSegmentInfosVersion

Error Message:
Exception during query

Stack Trace:
java.lang.RuntimeException: Exception during query
at 
__randomizedtesting.SeedInfo.seed([48A65369438C676E:B078C6813BE0B63D]:0)
at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:908)
at 
org.apache.solr.handler.admin.SegmentsInfoRequestHandlerTest.testSegmentInfosVersion(SegmentsInfoRequestHandlerTest.java:68)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:934)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:970)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:984)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:943)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:829)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:879)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:890)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: REQUEST FAILED: 
xpath=2=count(//lst[@name='segments']/lst/str[@name='version'][.='7.3.0'])
xml response was: 

00_01215242018-03-14T12:57:27.215Zflush7.3.0_10190412018-03-14T12:57:27.218Zflush7.3.0_20208542018-03-14T12:57:27.229Zflush7.3.0_30190412018-03-14T12:57:27.230Zflush7.3.0


request 

[jira] [Commented] (SOLR-11882) SolrMetric registries retain references to SolrCores when closed

2018-03-14 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16399440#comment-16399440
 ] 

Erick Erickson commented on SOLR-11882:
---

[~ab] OK, I think the light finally dawned. We're talking about two different 
cases and they both have to be handled.

1> transient core case, the one I'm started with. In this case, the core is 
closed out and _may_, some time in the near or far future be opened again. In 
this case the patch from 28-Jan is probably almost fine although there's still 
a (probably small but unacceptable) chance that a new version of the core would 
be opened before the closer thread got 'round to closing the old one.

2> reopening a core which is the case you're talking about in your comment 
1-Feb.

In <2> there's no problem with cores accumulating due to the reference in the 
metrics code since they've been released by the new assignment already.

Does that make sense?

And is there a good way other than inspection to test any fixes I make?

Thanks!

> SolrMetric registries retain references to SolrCores when closed
> 
>
> Key: SOLR-11882
> URL: https://issues.apache.org/jira/browse/SOLR-11882
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: metrics, Server
>Affects Versions: 7.1
>Reporter: Eros Taborelli
>Assignee: Erick Erickson
>Priority: Major
> Attachments: SOLR-11882.patch, SOLR-11882.patch, SOLR-11882.patch, 
> SOLR-11882.patch, create-cores.zip, solr-dump-full_Leak_Suspects.zip, 
> solr.config.zip
>
>
> *Description:*
> Our setup involves using a lot of small cores (possibly hundred thousand), 
> but working only on a few of them at any given time.
> We already followed all recommendations in this guide: 
> [https://wiki.apache.org/solr/LotsOfCores]
> We noticed that after creating/loading around 1000-2000 empty cores, with no 
> documents inside, the heap consumption went through the roof despite having 
> set transientCacheSize to only 64 (heap size set to 12G).
> All cores are correctly set to loadOnStartup=false and transient=true, and we 
> have verified via logs that the cores in excess are actually being closed.
> However, a reference remains in the 
> org.apache.solr.metrics.SolrMetricManager#registries that is never removed 
> until a core if fully unloaded.
> Restarting the JVM loads all cores in the admin UI, but doesn't populate the 
> ConcurrentHashMap until a core is actually fully loaded.
> I reproduced the issue on a smaller scale (transientCacheSize = 5, heap size 
> = 512m) and made a report (attached) using eclipse MAT.
> *Desired outcome:*
> When a transient core is closed, the references in the SolrMetricManager 
> should be removed, in the same fashion the reporters for the core are also 
> closed and removed.
> In alternative, a unloadOnClose=true|false flag could be implemented to fully 
> unload a transient core when closed due to the cache size.
> *Note:*
> The documentation mentions everywhere that the unused cores will be unloaded, 
> but it's misleading as the cores are never fully unloaded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



new candidate list for BadApple-ing on Saturday, 17-Mar

2018-03-14 Thread Erick Erickson
We had a drop off in the number of failing tests over the last couple
of days, so I'm
going to ignore the fails 11-12 Mar. Or it was a temporary increase
for those days, take your
pick ;)


I'll check against Hoss' and Mark's lists and only BadApple tests
that've failed Friday
or later.

In particular what do the Lucene folks think about the two Lucene tests?


junit.framework.TestSuite.org.apache.solr.cloud.TestLeaderElectionZkExpiry

org.apache.lucene.index.TestIndexSorting.testRandom3
org.apache.lucene.index.TestIndexWriterWithThreads.testCloseWithThreads

org.apache.solr.cloud.api.collections.CollectionsAPIAsyncDistributedZkTest.testAsyncIdBackCompat
org.apache.solr.cloud.autoscaling.ComputePlanActionTest.testNodeWithMultipleReplicasLost
org.apache.solr.cloud.autoscaling.ScheduledMaintenanceTriggerTest.testInactiveShardCleanup
org.apache.solr.cloud.ConcurrentCreateRoutedAliasTest.testConcurrentCreateRoutedAliasComplex
org.apache.solr.cloud.DocValuesNotIndexedTest.testGroupingDVOnly
org.apache.solr.cloud.LeaderVoteWaitTimeoutTest.basicTest
org.apache.solr.cloud.LeaderVoteWaitTimeoutTest.testMostInSyncReplicasCanWinElection
org.apache.solr.cloud.MoveReplicaHDFSTest.testFailedMove
org.apache.solr.cloud.SSLMigrationTest.test
org.apache.solr.cloud.TestTlogReplica.testCreateDelete
org.apache.solr.handler.admin.SegmentsInfoRequestHandlerTest.testSegmentInfosVersion
org.apache.solr.handler.TestSolrConfigHandlerCloud.test
org.apache.solr.logging.TestLogWatcher.testLog4jWatcher
org.apache.solr.spelling.SpellCheckCollatorTest.testEstimatedHitCounts


Fails by day are below for reference. If they're _not_ listed above, they'll be
left to run for another week:

11-Mar fails:

junit.framework.TestSuite.org.apache.solr.cloud.BasicDistributedZkTestream.StreamExpressionTest.testGammaDistribution
junit.framework.TestSuite.org.apache.solr.cloud.TestCloudPivotFacet
junit.framework.TestSuite.org.apache.solr.cloud.TriLevelCompositeIdRoutingTest
junit.framework.TestSuite.org.apache.solr.cloud.ZkControllerTest
junit.framework.TestSuite.org.apache.solr.search.join.BlockJoinFacetDistribTest
org.apache.solr.client.solrj.io.storg.apache.solr.cloud.BasicDistributedZkTest.test
org.apache.solr.cloud.autoscaling.ScheduledMaintenanceTriggerTest.testInactiveShardCleanup
org.apache.solr.cloud.BasicDistributedZkTest.test
org.apache.solr.cloud.FullSolrCloudDistribCmdsTest
org.apache.solr.cloud.FullSolrCloudDistribCmdsTest.test
org.apache.solr.cloud.hdfs.HdfsUnloadDistributedZkTest
org.apache.solr.cloud.hdfs.HdfsUnloadDistributedZkTest.test
org.apache.solr.cloud.hdfs.StressHdfsTest.test
org.apache.solr.cloud.MoveReplicaHDFSTest.testFailedMove
org.apache.solr.cloud.TestCloudPivotFacet.test
org.apache.solr.cloud.TriLevelCompositeIdRoutingTest.test
org.apache.solr.logging.TestLogWatcher.testLog4jWatcher

12-Mar fails:
junit.framework.TestSuite.org.apache.lucene.search.spans.TestSpanSearchEquivalence
junit.framework.TestSuite.org.apache.solr.cloud.autoscaling.TriggerIntegrationTest
junit.framework.TestSuite.org.apache.solr.cloud.BasicDistributedZkTest
junit.framework.TestSuite.org.apache.solr.cloud.hdfs.HdfsUnloadDistributedZkTest
junit.framework.TestSuite.org.apache.solr.cloud.TestLeaderElectionZkExpiry
org.apache.lucene.index.TestDuelingCodecsAtNight.testBigEquals
org.apache.lucene.search.spans.TestSpanSearchEquivalence.testSpanNearIncreasingSloppiness
org.apache.solr.cloud.autoscaling.AutoAddReplicasIntegrationTest
org.apache.solr.cloud.autoscaling.HdfsAutoAddReplicasIntegrationTest
org.apache.solr.cloud.autoscaling.ScheduledMaintenanceTriggerTest
org.apache.solr.cloud.autoscaling.ScheduledMaintenanceTriggerTest.testInactiveShardCleanup
org.apache.solr.cloud.autoscaling.TriggerIntegrationTest.testEventQueue
org.apache.solr.cloud.autoscaling.TriggerIntegrationTest.testMetricTrigger
org.apache.solr.cloud.BasicDistributedZkTest.test
org.apache.solr.cloud.CollectionsAPISolrJTest.testSplitShard
org.apache.solr.cloud.LeaderVoteWaitTimeoutTest.testMostInSyncReplicasCanWinElection
org.apache.solr.cloud.MoveReplicaHDFSTest.testFailedMove
org.apache.solr.cloud.SSLMigrationTest.test
org.apache.solr.cloud.TestRandomFlRTGCloud.testRandomizedUpdatesAndRTGs
org.apache.solr.core.TestJmxIntegration
org.apache.solr.handler.admin.AutoscalingHistoryHandlerTest
org.apache.solr.handler.TestReplicationHandler
org.apache.solr.handler.TestReplicationHandler.doTestReplicateAfterCoreReload
org.apache.solr.logging.TestLogWatcher.testLog4jWatcher
org.apache.solr.ltr.TestLTRReRankingPipeline
org.apache.solr.TestDistributedSearch.test

13-Mar fails:
junit.framework.TestSuite.org.apache.solr.cloud.TestLeaderElectionZkExpiry
org.apache.lucene.index.TestIndexSorting.testRandom3
org.apache.lucene.index.TestIndexWriterWithThreads.testCloseWithThreads
org.apache.solr.cloud.autoscaling.ScheduledMaintenanceTriggerTest.testInactiveShardCleanup
org.apache.solr.cloud.ConcurrentCreateRoutedAliasTest.testConcurrentCreateRoutedAliasComplex

[JENKINS] Lucene-Solr-master-Solaris (64bit/jdk1.8.0) - Build # 1732 - Failure!

2018-03-14 Thread Policeman Jenkins Server
Error processing tokens: Error while parsing action 
'Text/ZeroOrMore/FirstOf/Token/DelimitedToken/DelimitedToken_Action3' at input 
position (line 79, pos 4):
)"}
   ^

java.lang.OutOfMemoryError: Java heap space

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12088) Shards with dead replicas cause increased write latency

2018-03-14 Thread Jerry Bao (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16399167#comment-16399167
 ] 

Jerry Bao commented on SOLR-12088:
--

Your scenario is what I experienced, so yes :)

1. 30 nodes in the cluster

2. There are no nodes part of the cluster that aren't hosting any replicas.

3. Indexing via Lucidwork's Fusion (which I assume is using a SolrJ based 
client)

4. Latency is measured through our own service's instrumentation of roundtrip 
time to index.

> Shards with dead replicas cause increased write latency
> ---
>
> Key: SOLR-12088
> URL: https://issues.apache.org/jira/browse/SOLR-12088
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Affects Versions: 7.2
>Reporter: Jerry Bao
>Priority: Major
>
> If a collection's shard contains dead replicas, write latency to the 
> collection is increased. For example, if a collection has 10 shards with a 
> replication factor of 3, and one of those shards contains 3 replicas and 3 
> downed replicas, write latency is increased in comparison to a shard that 
> contains only 3 replicas.
> My feeling here is that downed replicas should be completely ignored and not 
> cause issues to other alive replicas in terms of write latency.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-Tests-7.x - Build # 502 - Unstable

2018-03-14 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Tests-7.x/502/

1 tests failed.
FAILED:  org.apache.solr.cloud.MoveReplicaHDFSTest.testFailedMove

Error Message:
No live SolrServers available to handle this 
request:[http://127.0.0.1:48662/solr/MoveReplicaHDFSTest_failed_coll_true, 
http://127.0.0.1:45021/solr/MoveReplicaHDFSTest_failed_coll_true]

Stack Trace:
org.apache.solr.client.solrj.SolrServerException: No live SolrServers available 
to handle this 
request:[http://127.0.0.1:48662/solr/MoveReplicaHDFSTest_failed_coll_true, 
http://127.0.0.1:45021/solr/MoveReplicaHDFSTest_failed_coll_true]
at 
__randomizedtesting.SeedInfo.seed([5D01BD12E1A4D902:F7CC6EE056770CD2]:0)
at 
org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:462)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1105)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:885)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:992)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:818)
at 
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:942)
at 
org.apache.solr.cloud.MoveReplicaTest.testFailedMove(MoveReplicaTest.java:307)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:934)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:970)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:984)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:943)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:829)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:879)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:890)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
at 

[jira] [Commented] (SOLR-12083) RealTimeGetComponent fails for INPLACE_UPDATE when Cdcr enabled

2018-03-14 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16399218#comment-16399218
 ] 

ASF subversion and git services commented on SOLR-12083:


Commit f8bbfcdc75af2fe9cfbd6e507fba81d720406402 in lucene-solr's branch 
refs/heads/branch_7x from [~varun_saxena]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f8bbfcd ]

SOLR-12083: Fix RealTime GET to work on a cluster running CDCR when using 
Solr's in-place updates

(cherry picked from commit 57524f1)


> RealTimeGetComponent fails for INPLACE_UPDATE when Cdcr enabled 
> 
>
> Key: SOLR-12083
> URL: https://issues.apache.org/jira/browse/SOLR-12083
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Affects Versions: 7.2, 7.2.1, 7.3
>Reporter: Amrit Sarkar
>Assignee: Varun Thacker
>Priority: Major
> Attachments: SOLR-12083-A-within-test-framework.patch, 
> SOLR-12083-B-wo-test-framework.patch, SOLR-12083.patch, SOLR-12083.patch, 
> SOLR-12083.patch, SOLR-12083.patch, SOLR-12083.patch, 
> add_support_for_random_ulog_in_tests.patch
>
>
> When we were adding bi-directional sync support in CDCR ( SOLR-11003 ) we 
> changed the CDCR Update Log codec to write an extra bits. 
> When we use the RealTimeGet component on a cluster running CDCR and have 
> in-place updates in the update log we will falsely trip an assert thus 
> causing the request to fail
> Here's the proposed change
> {code:java}
> - assert entry.size() == 5;
> + if (ulog instanceof CdcrUpdateLog) {
> +   assert entry.size() == 6;
> + }
> + else {
> +   assert entry.size() == 5;
> + }{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12088) Shards with dead replicas cause increased write latency

2018-03-14 Thread Jerry Bao (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16399184#comment-16399184
 ] 

Jerry Bao commented on SOLR-12088:
--

[~erickerickson] I don't have an answer to your question; this issue occurred 
from movement of replicas where the movement did not completely clean up the 
state of the replicas, causing it to be a zombie replicas (data gone but state 
still exists after movement).

Your thinking definitely could explain why theres a higher latency of indexing 
times. That makes the most sense to me. How long is this timeout?

> Shards with dead replicas cause increased write latency
> ---
>
> Key: SOLR-12088
> URL: https://issues.apache.org/jira/browse/SOLR-12088
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Affects Versions: 7.2
>Reporter: Jerry Bao
>Priority: Major
>
> If a collection's shard contains dead replicas, write latency to the 
> collection is increased. For example, if a collection has 10 shards with a 
> replication factor of 3, and one of those shards contains 3 replicas and 3 
> downed replicas, write latency is increased in comparison to a shard that 
> contains only 3 replicas.
> My feeling here is that downed replicas should be completely ignored and not 
> cause issues to other alive replicas in terms of write latency.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12083) RealTimeGetComponent fails for INPLACE_UPDATE when Cdcr enabled

2018-03-14 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16399216#comment-16399216
 ] 

ASF subversion and git services commented on SOLR-12083:


Commit 57524f1d4179f3ab57ffa53ba8f5e4dd1e198a11 in lucene-solr's branch 
refs/heads/master from [~varun_saxena]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=57524f1 ]

SOLR-12083: Fix RealTime GET to work on a cluster running CDCR when using 
Solr's in-place updates


> RealTimeGetComponent fails for INPLACE_UPDATE when Cdcr enabled 
> 
>
> Key: SOLR-12083
> URL: https://issues.apache.org/jira/browse/SOLR-12083
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Affects Versions: 7.2, 7.2.1, 7.3
>Reporter: Amrit Sarkar
>Assignee: Varun Thacker
>Priority: Major
> Attachments: SOLR-12083-A-within-test-framework.patch, 
> SOLR-12083-B-wo-test-framework.patch, SOLR-12083.patch, SOLR-12083.patch, 
> SOLR-12083.patch, SOLR-12083.patch, SOLR-12083.patch, 
> add_support_for_random_ulog_in_tests.patch
>
>
> When we were adding bi-directional sync support in CDCR ( SOLR-11003 ) we 
> changed the CDCR Update Log codec to write an extra bits. 
> When we use the RealTimeGet component on a cluster running CDCR and have 
> in-place updates in the update log we will falsely trip an assert thus 
> causing the request to fail
> Here's the proposed change
> {code:java}
> - assert entry.size() == 5;
> + if (ulog instanceof CdcrUpdateLog) {
> +   assert entry.size() == 6;
> + }
> + else {
> +   assert entry.size() == 5;
> + }{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12083) RealTimeGetComponent fails for INPLACE_UPDATE when Cdcr enabled

2018-03-14 Thread Varun Thacker (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16399224#comment-16399224
 ] 

Varun Thacker commented on SOLR-12083:
--

Until INFRA-15850 is resolved the user tagged with the commit will not be me 

> RealTimeGetComponent fails for INPLACE_UPDATE when Cdcr enabled 
> 
>
> Key: SOLR-12083
> URL: https://issues.apache.org/jira/browse/SOLR-12083
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Affects Versions: 7.2, 7.2.1, 7.3
>Reporter: Amrit Sarkar
>Assignee: Varun Thacker
>Priority: Major
> Attachments: SOLR-12083-A-within-test-framework.patch, 
> SOLR-12083-B-wo-test-framework.patch, SOLR-12083.patch, SOLR-12083.patch, 
> SOLR-12083.patch, SOLR-12083.patch, SOLR-12083.patch, 
> add_support_for_random_ulog_in_tests.patch
>
>
> When we were adding bi-directional sync support in CDCR ( SOLR-11003 ) we 
> changed the CDCR Update Log codec to write an extra bits. 
> When we use the RealTimeGet component on a cluster running CDCR and have 
> in-place updates in the update log we will falsely trip an assert thus 
> causing the request to fail
> Here's the proposed change
> {code:java}
> - assert entry.size() == 5;
> + if (ulog instanceof CdcrUpdateLog) {
> +   assert entry.size() == 6;
> + }
> + else {
> +   assert entry.size() == 5;
> + }{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-11960) Add collection level properties

2018-03-14 Thread Peter Rusko (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-11960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Rusko updated SOLR-11960:
---
Attachment: SOLR-11960_2.patch

> Add collection level properties
> ---
>
> Key: SOLR-11960
> URL: https://issues.apache.org/jira/browse/SOLR-11960
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Peter Rusko
>Assignee: Tomás Fernández Löbbe
>Priority: Blocker
> Fix For: 7.3
>
> Attachments: SOLR-11960.patch, SOLR-11960.patch, SOLR-11960.patch, 
> SOLR-11960.patch, SOLR-11960.patch, SOLR-11960_2.patch
>
>
> Solr has cluster properties, but no easy and extendable way of defining 
> properties that affect a single collection. Collection properties could be 
> stored in a single zookeeper node per collection, making it possible to 
> trigger zookeeper watchers for only those Solr nodes that have cores of that 
> collection.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11960) Add collection level properties

2018-03-14 Thread Peter Rusko (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16399125#comment-16399125
 ] 

Peter Rusko commented on SOLR-11960:


I've updated the patch. Removed the watcher creation/deletion when registering 
and unregistering the core and I'm also re-registering watchers in 
createClusterStateWatchersAndUpdate now.

> Add collection level properties
> ---
>
> Key: SOLR-11960
> URL: https://issues.apache.org/jira/browse/SOLR-11960
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Peter Rusko
>Assignee: Tomás Fernández Löbbe
>Priority: Blocker
> Fix For: 7.3
>
> Attachments: SOLR-11960.patch, SOLR-11960.patch, SOLR-11960.patch, 
> SOLR-11960.patch, SOLR-11960.patch, SOLR-11960_2.patch
>
>
> Solr has cluster properties, but no easy and extendable way of defining 
> properties that affect a single collection. Collection properties could be 
> stored in a single zookeeper node per collection, making it possible to 
> trigger zookeeper watchers for only those Solr nodes that have cores of that 
> collection.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12088) Shards with dead replicas cause increased write latency

2018-03-14 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16399138#comment-16399138
 ] 

Erick Erickson commented on SOLR-12088:
---

Another question:

Does the latency persist longer than any system timeouts? Put another way:
If you start all the Solr instances fresh and some nodes are down, is there 
still latency? 

What I'm thinking of here is that it may take up until some timeout for each 
Solr instance to "see" that the node is down.

For instance, if I kill a Solr node with a -9, it has no chance to tell ZK (and 
ZK in turn inform the rest of the collection) that it's going away. The rest of 
the collection finds out about this by one of several methods, all involving 
some timeout (ZK occasionally pings, leader sends request to update etc.).

So if this is transient it may be functioning as expected, but if it lasts well 
past all the possible timeouts it's another story.


> Shards with dead replicas cause increased write latency
> ---
>
> Key: SOLR-12088
> URL: https://issues.apache.org/jira/browse/SOLR-12088
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Affects Versions: 7.2
>Reporter: Jerry Bao
>Priority: Major
>
> If a collection's shard contains dead replicas, write latency to the 
> collection is increased. For example, if a collection has 10 shards with a 
> replication factor of 3, and one of those shards contains 3 replicas and 3 
> downed replicas, write latency is increased in comparison to a shard that 
> contains only 3 replicas.
> My feeling here is that downed replicas should be completely ignored and not 
> cause issues to other alive replicas in terms of write latency.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12083) RealTimeGetComponent fails for INPLACE_UPDATE when Cdcr enabled

2018-03-14 Thread Varun Thacker (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16399205#comment-16399205
 ] 

Varun Thacker commented on SOLR-12083:
--

precommit, all tests, beasted 10 rounds of TestInPlaceUpdatesDistrib and 
TestRealTimeGet all passed

Committing shortly

> RealTimeGetComponent fails for INPLACE_UPDATE when Cdcr enabled 
> 
>
> Key: SOLR-12083
> URL: https://issues.apache.org/jira/browse/SOLR-12083
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Affects Versions: 7.2, 7.2.1, 7.3
>Reporter: Amrit Sarkar
>Assignee: Varun Thacker
>Priority: Major
> Attachments: SOLR-12083-A-within-test-framework.patch, 
> SOLR-12083-B-wo-test-framework.patch, SOLR-12083.patch, SOLR-12083.patch, 
> SOLR-12083.patch, SOLR-12083.patch, SOLR-12083.patch, 
> add_support_for_random_ulog_in_tests.patch
>
>
> When we were adding bi-directional sync support in CDCR ( SOLR-11003 ) we 
> changed the CDCR Update Log codec to write an extra bits. 
> When we use the RealTimeGet component on a cluster running CDCR and have 
> in-place updates in the update log we will falsely trip an assert thus 
> causing the request to fail
> Here's the proposed change
> {code:java}
> - assert entry.size() == 5;
> + if (ulog instanceof CdcrUpdateLog) {
> +   assert entry.size() == 6;
> + }
> + else {
> +   assert entry.size() == 5;
> + }{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12078) Reproducable Failure in TestReplicationHandler.doTestIndexFetchOnMasterRestart

2018-03-14 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16399389#comment-16399389
 ] 

ASF subversion and git services commented on SOLR-12078:


Commit cb453ce110b1a0b03373909c36d3d1bc25983b71 in lucene-solr's branch 
refs/heads/master from [~shalinmangar]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=cb453ce ]

SOLR-12078: Fixed reproducable Failure in 
TestReplicationHandler.doTestIndexFetchOnMasterRestart that happened due to 
using stale http connections


> Reproducable Failure in TestReplicationHandler.doTestIndexFetchOnMasterRestart
> --
>
> Key: SOLR-12078
> URL: https://issues.apache.org/jira/browse/SOLR-12078
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.3, master (8.0)
> Environment: Building on Ubuntu 17.4
> openjdk version "1.8.0_151"
> OpenJDK Runtime Environment (build 1.8.0_151-8u151-b12-0ubuntu0.17.04.2-b12)
> OpenJDK 64-Bit Server VM (build 25.151-b12, mixed mode)
>Reporter: Gus Heck
>Priority: Major
> Attachments: SOLR-12078.patch
>
>
> With the recent focus on bad tests lately, I decided to inspect some failures 
> that occurred in tests unrelated to my present task when I ran the tests 
> preparing for a pull request and found this failure which reproduces:
> ant test  -Dtestcase=TestReplicationHandler 
> -Dtests.method=doTestIndexFetchOnMasterRestart -Dtests.seed=884DCF71D210D14A 
> -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=et 
> -Dtests.timezone=Cuba -Dtests.asserts=true -Dtests.file.encoding=UTF-8
>  
> Key excerpt of the log:
> {code:java}
>    [junit4]   2> NOTE: reproduce with: ant test  
> -Dtestcase=TestReplicationHandler 
> -Dtests.method=doTestIndexFetchOnMasterRestart -Dtests.seed=884DCF71D210D14A 
> -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=et 
> -Dtests.timezone=Cuba -Dtests.asserts=true -Dtests.file.encoding=UTF-8
>    [junit4] ERROR   2.35s | 
> TestReplicationHandler.doTestIndexFetchOnMasterRestart <<<
>    [junit4]    > Throwable #1: 
> org.apache.solr.client.solrj.SolrServerException: IOException occured when 
> talking to server at: http://127.0.0.1:37753/solr/collection1
>    [junit4]    >    at 
> __randomizedtesting.SeedInfo.seed([884DCF71D210D14A:50BA0B9579CB1316]:0)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:657)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:173)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:138)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:152)
>    [junit4]    >    at 
> org.apache.solr.handler.TestReplicationHandler.index(TestReplicationHandler.java:180)
>    [junit4]    >    at 
> org.apache.solr.handler.TestReplicationHandler.doTestIndexFetchOnMasterRestart(TestReplicationHandler.java:643)
>    [junit4]    >    at java.lang.Thread.run(Thread.java:748)
>    [junit4]    > Caused by: org.apache.http.NoHttpResponseException: 
> 127.0.0.1:37753 failed to respond
>    [junit4]    >    at 
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:141)
>    [junit4]    >    at 
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
>    [junit4]    >    at 
> org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
>    [junit4]    >    at 
> org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
>    [junit4]    >    at 
> org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165)
>    [junit4]    >    at 
> org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
>    [junit4]    >    at 
> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
>    [junit4]    >    at 
> org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
>    [junit4]    >    at 
> org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
>    [junit4]    >    at 
> org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
>    [junit4]    >    at 
> 

[jira] [Commented] (SOLR-12078) Reproducable Failure in TestReplicationHandler.doTestIndexFetchOnMasterRestart

2018-03-14 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16399390#comment-16399390
 ] 

ASF subversion and git services commented on SOLR-12078:


Commit 46f1b7f654259e9191dc5282e1d58a12ea9a4025 in lucene-solr's branch 
refs/heads/branch_7x from [~shalinmangar]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=46f1b7f ]

SOLR-12078: Fixed reproducable Failure in 
TestReplicationHandler.doTestIndexFetchOnMasterRestart that happened due to 
using stale http connections

(cherry picked from commit cb453ce)


> Reproducable Failure in TestReplicationHandler.doTestIndexFetchOnMasterRestart
> --
>
> Key: SOLR-12078
> URL: https://issues.apache.org/jira/browse/SOLR-12078
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.3, master (8.0)
> Environment: Building on Ubuntu 17.4
> openjdk version "1.8.0_151"
> OpenJDK Runtime Environment (build 1.8.0_151-8u151-b12-0ubuntu0.17.04.2-b12)
> OpenJDK 64-Bit Server VM (build 25.151-b12, mixed mode)
>Reporter: Gus Heck
>Priority: Major
> Attachments: SOLR-12078.patch
>
>
> With the recent focus on bad tests lately, I decided to inspect some failures 
> that occurred in tests unrelated to my present task when I ran the tests 
> preparing for a pull request and found this failure which reproduces:
> ant test  -Dtestcase=TestReplicationHandler 
> -Dtests.method=doTestIndexFetchOnMasterRestart -Dtests.seed=884DCF71D210D14A 
> -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=et 
> -Dtests.timezone=Cuba -Dtests.asserts=true -Dtests.file.encoding=UTF-8
>  
> Key excerpt of the log:
> {code:java}
>    [junit4]   2> NOTE: reproduce with: ant test  
> -Dtestcase=TestReplicationHandler 
> -Dtests.method=doTestIndexFetchOnMasterRestart -Dtests.seed=884DCF71D210D14A 
> -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=et 
> -Dtests.timezone=Cuba -Dtests.asserts=true -Dtests.file.encoding=UTF-8
>    [junit4] ERROR   2.35s | 
> TestReplicationHandler.doTestIndexFetchOnMasterRestart <<<
>    [junit4]    > Throwable #1: 
> org.apache.solr.client.solrj.SolrServerException: IOException occured when 
> talking to server at: http://127.0.0.1:37753/solr/collection1
>    [junit4]    >    at 
> __randomizedtesting.SeedInfo.seed([884DCF71D210D14A:50BA0B9579CB1316]:0)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:657)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:173)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:138)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:152)
>    [junit4]    >    at 
> org.apache.solr.handler.TestReplicationHandler.index(TestReplicationHandler.java:180)
>    [junit4]    >    at 
> org.apache.solr.handler.TestReplicationHandler.doTestIndexFetchOnMasterRestart(TestReplicationHandler.java:643)
>    [junit4]    >    at java.lang.Thread.run(Thread.java:748)
>    [junit4]    > Caused by: org.apache.http.NoHttpResponseException: 
> 127.0.0.1:37753 failed to respond
>    [junit4]    >    at 
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:141)
>    [junit4]    >    at 
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
>    [junit4]    >    at 
> org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
>    [junit4]    >    at 
> org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
>    [junit4]    >    at 
> org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165)
>    [junit4]    >    at 
> org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
>    [junit4]    >    at 
> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
>    [junit4]    >    at 
> org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
>    [junit4]    >    at 
> org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
>    [junit4]    >    at 
> org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
>    

[jira] [Commented] (SOLR-12078) Reproducable Failure in TestReplicationHandler.doTestIndexFetchOnMasterRestart

2018-03-14 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16399396#comment-16399396
 ] 

ASF subversion and git services commented on SOLR-12078:


Commit ace126bb0ed6fbfec12e133c2e15633a68e84aad in lucene-solr's branch 
refs/heads/branch_7_3 from [~shalinmangar]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ace126b ]

SOLR-12078: Fixed reproducable Failure in 
TestReplicationHandler.doTestIndexFetchOnMasterRestart that happened due to 
using stale http connections

(cherry picked from commit cb453ce)

(cherry picked from commit 46f1b7f)


> Reproducable Failure in TestReplicationHandler.doTestIndexFetchOnMasterRestart
> --
>
> Key: SOLR-12078
> URL: https://issues.apache.org/jira/browse/SOLR-12078
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.3, master (8.0)
> Environment: Building on Ubuntu 17.4
> openjdk version "1.8.0_151"
> OpenJDK Runtime Environment (build 1.8.0_151-8u151-b12-0ubuntu0.17.04.2-b12)
> OpenJDK 64-Bit Server VM (build 25.151-b12, mixed mode)
>Reporter: Gus Heck
>Priority: Major
> Attachments: SOLR-12078.patch
>
>
> With the recent focus on bad tests lately, I decided to inspect some failures 
> that occurred in tests unrelated to my present task when I ran the tests 
> preparing for a pull request and found this failure which reproduces:
> ant test  -Dtestcase=TestReplicationHandler 
> -Dtests.method=doTestIndexFetchOnMasterRestart -Dtests.seed=884DCF71D210D14A 
> -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=et 
> -Dtests.timezone=Cuba -Dtests.asserts=true -Dtests.file.encoding=UTF-8
>  
> Key excerpt of the log:
> {code:java}
>    [junit4]   2> NOTE: reproduce with: ant test  
> -Dtestcase=TestReplicationHandler 
> -Dtests.method=doTestIndexFetchOnMasterRestart -Dtests.seed=884DCF71D210D14A 
> -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=et 
> -Dtests.timezone=Cuba -Dtests.asserts=true -Dtests.file.encoding=UTF-8
>    [junit4] ERROR   2.35s | 
> TestReplicationHandler.doTestIndexFetchOnMasterRestart <<<
>    [junit4]    > Throwable #1: 
> org.apache.solr.client.solrj.SolrServerException: IOException occured when 
> talking to server at: http://127.0.0.1:37753/solr/collection1
>    [junit4]    >    at 
> __randomizedtesting.SeedInfo.seed([884DCF71D210D14A:50BA0B9579CB1316]:0)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:657)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:173)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:138)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:152)
>    [junit4]    >    at 
> org.apache.solr.handler.TestReplicationHandler.index(TestReplicationHandler.java:180)
>    [junit4]    >    at 
> org.apache.solr.handler.TestReplicationHandler.doTestIndexFetchOnMasterRestart(TestReplicationHandler.java:643)
>    [junit4]    >    at java.lang.Thread.run(Thread.java:748)
>    [junit4]    > Caused by: org.apache.http.NoHttpResponseException: 
> 127.0.0.1:37753 failed to respond
>    [junit4]    >    at 
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:141)
>    [junit4]    >    at 
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
>    [junit4]    >    at 
> org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
>    [junit4]    >    at 
> org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
>    [junit4]    >    at 
> org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165)
>    [junit4]    >    at 
> org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
>    [junit4]    >    at 
> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
>    [junit4]    >    at 
> org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
>    [junit4]    >    at 
> org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
>    [junit4]    >    at 
> 

[jira] [Commented] (SOLR-12088) Shards with dead replicas cause increased write latency

2018-03-14 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16399238#comment-16399238
 ] 

Erick Erickson commented on SOLR-12088:
---

Jerry:

It Depends (tm). Most are reasonably short, 15 seconds to a couple of minutes. 
So if you're seeing this last much longer than that it's a red herring.

Solr itself should be able to clean up dead replicas, what version are you 
using? By that I mean you can re-issue DELETEREPLICA and it "should" work.

There's a bit of a rough patch if you have legacyCloud set. Prior to 7x this 
was the default, and nodes could reconstruct themselves in ZK, the key is 
whether your ZooKeeper tree has partial collections representations in 
/clusterstate.json, likely corresponding to these deleted replicas. If that's 
the case, you can 


> stop the Solr instance

> manually remove the dead replicas

> start Solr back up.

once all that's done for the dead replicas, 

> replace /clusterstate.json with a single pair of empty brackets {} but ONLY 
> if your /collections/whatever/state.json has the complete, accurate picture 
> of the collection in question. This caveat is _very_ important because if you 
> _don't_ have a valid state.json (i.e. you're in "state format 2") then you'll 
> lose your collections, so be _very_ cautious.

Now, all that said, if performance is still slow after many minutes, it's a bug 
we should fix. The cluster maintenance stuff is steadily improving BTW.

Erick

> Shards with dead replicas cause increased write latency
> ---
>
> Key: SOLR-12088
> URL: https://issues.apache.org/jira/browse/SOLR-12088
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Affects Versions: 7.2
>Reporter: Jerry Bao
>Priority: Major
>
> If a collection's shard contains dead replicas, write latency to the 
> collection is increased. For example, if a collection has 10 shards with a 
> replication factor of 3, and one of those shards contains 3 replicas and 3 
> downed replicas, write latency is increased in comparison to a shard that 
> contains only 3 replicas.
> My feeling here is that downed replicas should be completely ignored and not 
> cause issues to other alive replicas in terms of write latency.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12078) Reproducable Failure in TestReplicationHandler.doTestIndexFetchOnMasterRestart

2018-03-14 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16399373#comment-16399373
 ] 

Shalin Shekhar Mangar commented on SOLR-12078:
--

This was interesting. The masterClient has an internal connection pool. This 
pool has HTTP connections made to the master jetty during the 
clearIndexWithReplication() method. The test attempts to make a request after 
jetty is restarted. At this point the pool returns a stale connection which 
results in the NoHttpResponseException. The connection pool has a stale check 
set to 3 seconds by default. So the fix is either to sleep for 3 seconds or 
close and re-create the masterClient. I opted for the latter to fix the test.

Now this did not use to be the case when I wrote this test. In SOLR-4509, the 
httpclient behavior was changed from performing a stale check at lease time to 
a periodic stale check but since the test was marked as AwaitsFix, we never ran 
into this problem. The problem was exposed when Erick marked the test as 
BadApple instead of AwaitsFix.

> Reproducable Failure in TestReplicationHandler.doTestIndexFetchOnMasterRestart
> --
>
> Key: SOLR-12078
> URL: https://issues.apache.org/jira/browse/SOLR-12078
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.3, master (8.0)
> Environment: Building on Ubuntu 17.4
> openjdk version "1.8.0_151"
> OpenJDK Runtime Environment (build 1.8.0_151-8u151-b12-0ubuntu0.17.04.2-b12)
> OpenJDK 64-Bit Server VM (build 25.151-b12, mixed mode)
>Reporter: Gus Heck
>Priority: Major
> Attachments: SOLR-12078.patch
>
>
> With the recent focus on bad tests lately, I decided to inspect some failures 
> that occurred in tests unrelated to my present task when I ran the tests 
> preparing for a pull request and found this failure which reproduces:
> ant test  -Dtestcase=TestReplicationHandler 
> -Dtests.method=doTestIndexFetchOnMasterRestart -Dtests.seed=884DCF71D210D14A 
> -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=et 
> -Dtests.timezone=Cuba -Dtests.asserts=true -Dtests.file.encoding=UTF-8
>  
> Key excerpt of the log:
> {code:java}
>    [junit4]   2> NOTE: reproduce with: ant test  
> -Dtestcase=TestReplicationHandler 
> -Dtests.method=doTestIndexFetchOnMasterRestart -Dtests.seed=884DCF71D210D14A 
> -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=et 
> -Dtests.timezone=Cuba -Dtests.asserts=true -Dtests.file.encoding=UTF-8
>    [junit4] ERROR   2.35s | 
> TestReplicationHandler.doTestIndexFetchOnMasterRestart <<<
>    [junit4]    > Throwable #1: 
> org.apache.solr.client.solrj.SolrServerException: IOException occured when 
> talking to server at: http://127.0.0.1:37753/solr/collection1
>    [junit4]    >    at 
> __randomizedtesting.SeedInfo.seed([884DCF71D210D14A:50BA0B9579CB1316]:0)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:657)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:173)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:138)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:152)
>    [junit4]    >    at 
> org.apache.solr.handler.TestReplicationHandler.index(TestReplicationHandler.java:180)
>    [junit4]    >    at 
> org.apache.solr.handler.TestReplicationHandler.doTestIndexFetchOnMasterRestart(TestReplicationHandler.java:643)
>    [junit4]    >    at java.lang.Thread.run(Thread.java:748)
>    [junit4]    > Caused by: org.apache.http.NoHttpResponseException: 
> 127.0.0.1:37753 failed to respond
>    [junit4]    >    at 
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:141)
>    [junit4]    >    at 
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
>    [junit4]    >    at 
> org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
>    [junit4]    >    at 
> org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
>    [junit4]    >    at 
> org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165)
>    [junit4]    >    at 
> 

[jira] [Updated] (SOLR-12078) Reproducable Failure in TestReplicationHandler.doTestIndexFetchOnMasterRestart

2018-03-14 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-12078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-12078:
-
Attachment: SOLR-12078.patch

> Reproducable Failure in TestReplicationHandler.doTestIndexFetchOnMasterRestart
> --
>
> Key: SOLR-12078
> URL: https://issues.apache.org/jira/browse/SOLR-12078
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.3, master (8.0)
> Environment: Building on Ubuntu 17.4
> openjdk version "1.8.0_151"
> OpenJDK Runtime Environment (build 1.8.0_151-8u151-b12-0ubuntu0.17.04.2-b12)
> OpenJDK 64-Bit Server VM (build 25.151-b12, mixed mode)
>Reporter: Gus Heck
>Priority: Major
> Attachments: SOLR-12078.patch
>
>
> With the recent focus on bad tests lately, I decided to inspect some failures 
> that occurred in tests unrelated to my present task when I ran the tests 
> preparing for a pull request and found this failure which reproduces:
> ant test  -Dtestcase=TestReplicationHandler 
> -Dtests.method=doTestIndexFetchOnMasterRestart -Dtests.seed=884DCF71D210D14A 
> -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=et 
> -Dtests.timezone=Cuba -Dtests.asserts=true -Dtests.file.encoding=UTF-8
>  
> Key excerpt of the log:
> {code:java}
>    [junit4]   2> NOTE: reproduce with: ant test  
> -Dtestcase=TestReplicationHandler 
> -Dtests.method=doTestIndexFetchOnMasterRestart -Dtests.seed=884DCF71D210D14A 
> -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=et 
> -Dtests.timezone=Cuba -Dtests.asserts=true -Dtests.file.encoding=UTF-8
>    [junit4] ERROR   2.35s | 
> TestReplicationHandler.doTestIndexFetchOnMasterRestart <<<
>    [junit4]    > Throwable #1: 
> org.apache.solr.client.solrj.SolrServerException: IOException occured when 
> talking to server at: http://127.0.0.1:37753/solr/collection1
>    [junit4]    >    at 
> __randomizedtesting.SeedInfo.seed([884DCF71D210D14A:50BA0B9579CB1316]:0)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:657)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:173)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:138)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:152)
>    [junit4]    >    at 
> org.apache.solr.handler.TestReplicationHandler.index(TestReplicationHandler.java:180)
>    [junit4]    >    at 
> org.apache.solr.handler.TestReplicationHandler.doTestIndexFetchOnMasterRestart(TestReplicationHandler.java:643)
>    [junit4]    >    at java.lang.Thread.run(Thread.java:748)
>    [junit4]    > Caused by: org.apache.http.NoHttpResponseException: 
> 127.0.0.1:37753 failed to respond
>    [junit4]    >    at 
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:141)
>    [junit4]    >    at 
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
>    [junit4]    >    at 
> org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
>    [junit4]    >    at 
> org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
>    [junit4]    >    at 
> org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165)
>    [junit4]    >    at 
> org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
>    [junit4]    >    at 
> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
>    [junit4]    >    at 
> org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
>    [junit4]    >    at 
> org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
>    [junit4]    >    at 
> org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
>    [junit4]    >    at 
> org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111)
>    [junit4]    >    at 
> org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
>    [junit4]    >    at 
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
>    [junit4]    >    at 
> 

[jira] [Resolved] (SOLR-12078) Reproducable Failure in TestReplicationHandler.doTestIndexFetchOnMasterRestart

2018-03-14 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-12078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar resolved SOLR-12078.
--
   Resolution: Fixed
 Assignee: Shalin Shekhar Mangar
Fix Version/s: master (8.0)
   7.3

> Reproducable Failure in TestReplicationHandler.doTestIndexFetchOnMasterRestart
> --
>
> Key: SOLR-12078
> URL: https://issues.apache.org/jira/browse/SOLR-12078
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.3, master (8.0)
> Environment: Building on Ubuntu 17.4
> openjdk version "1.8.0_151"
> OpenJDK Runtime Environment (build 1.8.0_151-8u151-b12-0ubuntu0.17.04.2-b12)
> OpenJDK 64-Bit Server VM (build 25.151-b12, mixed mode)
>Reporter: Gus Heck
>Assignee: Shalin Shekhar Mangar
>Priority: Major
> Fix For: 7.3, master (8.0)
>
> Attachments: SOLR-12078.patch
>
>
> With the recent focus on bad tests lately, I decided to inspect some failures 
> that occurred in tests unrelated to my present task when I ran the tests 
> preparing for a pull request and found this failure which reproduces:
> ant test  -Dtestcase=TestReplicationHandler 
> -Dtests.method=doTestIndexFetchOnMasterRestart -Dtests.seed=884DCF71D210D14A 
> -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=et 
> -Dtests.timezone=Cuba -Dtests.asserts=true -Dtests.file.encoding=UTF-8
>  
> Key excerpt of the log:
> {code:java}
>    [junit4]   2> NOTE: reproduce with: ant test  
> -Dtestcase=TestReplicationHandler 
> -Dtests.method=doTestIndexFetchOnMasterRestart -Dtests.seed=884DCF71D210D14A 
> -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=et 
> -Dtests.timezone=Cuba -Dtests.asserts=true -Dtests.file.encoding=UTF-8
>    [junit4] ERROR   2.35s | 
> TestReplicationHandler.doTestIndexFetchOnMasterRestart <<<
>    [junit4]    > Throwable #1: 
> org.apache.solr.client.solrj.SolrServerException: IOException occured when 
> talking to server at: http://127.0.0.1:37753/solr/collection1
>    [junit4]    >    at 
> __randomizedtesting.SeedInfo.seed([884DCF71D210D14A:50BA0B9579CB1316]:0)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:657)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:173)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:138)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:152)
>    [junit4]    >    at 
> org.apache.solr.handler.TestReplicationHandler.index(TestReplicationHandler.java:180)
>    [junit4]    >    at 
> org.apache.solr.handler.TestReplicationHandler.doTestIndexFetchOnMasterRestart(TestReplicationHandler.java:643)
>    [junit4]    >    at java.lang.Thread.run(Thread.java:748)
>    [junit4]    > Caused by: org.apache.http.NoHttpResponseException: 
> 127.0.0.1:37753 failed to respond
>    [junit4]    >    at 
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:141)
>    [junit4]    >    at 
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
>    [junit4]    >    at 
> org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
>    [junit4]    >    at 
> org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
>    [junit4]    >    at 
> org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165)
>    [junit4]    >    at 
> org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
>    [junit4]    >    at 
> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
>    [junit4]    >    at 
> org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
>    [junit4]    >    at 
> org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
>    [junit4]    >    at 
> org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
>    [junit4]    >    at 
> org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111)
>    [junit4]    >    at 
> org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
>    

Re: TestReplicationHandler is failing 100% (master and 7.x / 7.3)

2018-03-14 Thread Shalin Shekhar Mangar
This is fixed. I committed the fix to master, branch_7x and branch_7_3
branches.

On Wed, Mar 14, 2018 at 9:44 PM, Alan Woodward  wrote:

> Thanks Shalin!
>
>
> On 14 Mar 2018, at 15:50, Shalin Shekhar Mangar 
> wrote:
>
> I'll take a look at it tomorrow morning my time.
>
> On Wed, Mar 14, 2018 at 9:07 PM, Andrzej Białecki <
> andrzej.biale...@lucidworks.com> wrote:
>
>> Well … I looked at it briefly but I have no idea what’s going on there. I
>> could dig into it nonetheless, but if there’s someone who already knows the
>> replication handler ins and outs it would probably get fixed sooner...
>>
>>
>> On 14 Mar 2018, at 14:23, Alan Woodward  wrote:
>>
>> I’m happy either way, but if it’s a bug can we get it fixed quickly?  Can
>> you take ownership of this one Andrzej?
>>
>> On 14 Mar 2018, at 11:24, Andrzej Białecki  wrote:
>>
>> Hi,
>>
>> This test has always been fragile, but recently it’s been failing 100%,
>> most often in ‘doTestIndexFetchOnMasterRestart’.
>>
>> I don’t know the replication handler enough to be able to find the real
>> reason behind these failures, but there are two possibilities that I see:
>>
>> * the test has a bug and needs to be fixed - and if we can’t fix it soon
>> then with 7.3 release imminent we could BadApple it until it’s properly
>> fixed
>>
>> * or actually the replication handler has a bug, which needs to be fixed
>> - in which case I propose to bump up SOLR-12078 to Blocker.
>>
>> I’m open to suggestions.
>>
>> —
>>
>> Andrzej Białecki
>>
>>
>>
>>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>
>
>


-- 
Regards,
Shalin Shekhar Mangar.


[jira] [Created] (SOLR-12095) AutoScalingHandler should validate triggers before updating zookeeper

2018-03-14 Thread Shalin Shekhar Mangar (JIRA)
Shalin Shekhar Mangar created SOLR-12095:


 Summary: AutoScalingHandler should validate triggers before 
updating zookeeper
 Key: SOLR-12095
 URL: https://issues.apache.org/jira/browse/SOLR-12095
 Project: Solr
  Issue Type: Sub-task
  Security Level: Public (Default Security Level. Issues are Public)
  Components: AutoScaling, SolrCloud
Reporter: Shalin Shekhar Mangar
 Fix For: 7.4, master (8.0)


We validate policy and preferences before updating the configuration in 
Zookeeper but we don't do that today for triggers. So users can put wrong or 
unknown parameters and there won't be any complains from the API but the at 
runtime exceptions will be thrown/logged.

We should change the trigger API to have a validation step. The catch here is 
that it may require us to instantiate the trigger class.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12063) Fix tlog entry indexes in UpdateLog for CDCR to work smoothly.

2018-03-14 Thread Amrit Sarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16399425#comment-16399425
 ] 

Amrit Sarkar commented on SOLR-12063:
-

Updated patch, included {{TestStressRecovery}}, precommit, beasts of 10 rounds 
pass.

> Fix tlog entry indexes in UpdateLog for CDCR to work smoothly.
> --
>
> Key: SOLR-12063
> URL: https://issues.apache.org/jira/browse/SOLR-12063
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Affects Versions: 7.2, 7.2.1
>Reporter: Amrit Sarkar
>Assignee: Varun Thacker
>Priority: Major
> Attachments: SOLR-12063.patch, SOLR-12063.patch, SOLR-12063.patch, 
> SOLR-12063.patch, test-report-PeerSyncTest, test-report-TestStressRecovery
>
>
> In *UpdateLog*, {{RecentUpdates}} reads the entry of tlogs, and throughout 
> the project the entry indexes for various operations are consistent, but odd 
> in this part. As we included new entry in TransactionLog for CDCR, read 
> operations in {{update()}} method of {{RecentUpdates}} throw error rightfully 
> as elements are read from wrong indexes of tlog entry. The entry indexes of 
> llog should be consistent throughout.
> {code}
>   [beaster]   2> 27394 WARN  (qtp97093533-72) [n:127.0.0.1:44658_solr 
> c:cdcr-cluster1 s:shard1 r:core_node3 x:cdcr-cluster1_shard1_replica_n1] 
> o.a.s.u.UpdateLog Unexpected log entry or corrupt log.  Entry=[2, 
> -1594312216007409664, [B@28e6859c, true]
>   [beaster]   2> java.lang.ClassCastException: java.lang.Boolean cannot be 
> cast to [B
>   [beaster]   2>  at 
> org.apache.solr.update.UpdateLog$RecentUpdates.update(UpdateLog.java:1443)
>   [beaster]   2>  at 
> org.apache.solr.update.UpdateLog$RecentUpdates.(UpdateLog.java:1340)
>   [beaster]   2>  at 
> org.apache.solr.update.UpdateLog.getRecentUpdates(UpdateLog.java:1513)
>   [beaster]   2>  at 
> org.apache.solr.handler.CdcrRequestHandler.handleShardCheckpointAction(CdcrRequestHandler.java:448)
>   [beaster]   2>  at 
> org.apache.solr.handler.CdcrRequestHandler.handleRequestBody(CdcrRequestHandler.java:198)
>   [beaster]   2>  at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:195)
>   [beaster]   2>  at 
> org.apache.solr.core.SolrCore.execute(SolrCore.java:2503)
>   [beaster]   2>  at 
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:711)
>   [beaster]   2>  at 
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:517)
>   [beaster]   2>  at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:384)
>   [beaster]   2>  at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:330)
>   [beaster]   2>  at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637)
>   [beaster]   2>  at 
> org.apache.solr.client.solrj.embedded.JettySolrRunner$DebugFilter.doFilter(JettySolrRunner.java:139)
>   [beaster]   2>  at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637)
>   [beaster]   2>  at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)
>   [beaster]   2>  at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:188)
>   [beaster]   2>  at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)
>   [beaster]   2>  at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:188)
>   [beaster]   2>  at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253)
>   [beaster]   2>  at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:168)
>   [beaster]   2>  at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)
>   [beaster]   2>  at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-12063) Fix tlog entry indexes in UpdateLog for CDCR to work smoothly.

2018-03-14 Thread Amrit Sarkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-12063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amrit Sarkar updated SOLR-12063:

Attachment: SOLR-12063.patch

> Fix tlog entry indexes in UpdateLog for CDCR to work smoothly.
> --
>
> Key: SOLR-12063
> URL: https://issues.apache.org/jira/browse/SOLR-12063
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: CDCR
>Affects Versions: 7.2, 7.2.1
>Reporter: Amrit Sarkar
>Assignee: Varun Thacker
>Priority: Major
> Attachments: SOLR-12063.patch, SOLR-12063.patch, SOLR-12063.patch, 
> SOLR-12063.patch, SOLR-12063.patch, test-report-PeerSyncTest, 
> test-report-TestStressRecovery
>
>
> In *UpdateLog*, {{RecentUpdates}} reads the entry of tlogs, and throughout 
> the project the entry indexes for various operations are consistent, but odd 
> in this part. As we included new entry in TransactionLog for CDCR, read 
> operations in {{update()}} method of {{RecentUpdates}} throw error rightfully 
> as elements are read from wrong indexes of tlog entry. The entry indexes of 
> llog should be consistent throughout.
> {code}
>   [beaster]   2> 27394 WARN  (qtp97093533-72) [n:127.0.0.1:44658_solr 
> c:cdcr-cluster1 s:shard1 r:core_node3 x:cdcr-cluster1_shard1_replica_n1] 
> o.a.s.u.UpdateLog Unexpected log entry or corrupt log.  Entry=[2, 
> -1594312216007409664, [B@28e6859c, true]
>   [beaster]   2> java.lang.ClassCastException: java.lang.Boolean cannot be 
> cast to [B
>   [beaster]   2>  at 
> org.apache.solr.update.UpdateLog$RecentUpdates.update(UpdateLog.java:1443)
>   [beaster]   2>  at 
> org.apache.solr.update.UpdateLog$RecentUpdates.(UpdateLog.java:1340)
>   [beaster]   2>  at 
> org.apache.solr.update.UpdateLog.getRecentUpdates(UpdateLog.java:1513)
>   [beaster]   2>  at 
> org.apache.solr.handler.CdcrRequestHandler.handleShardCheckpointAction(CdcrRequestHandler.java:448)
>   [beaster]   2>  at 
> org.apache.solr.handler.CdcrRequestHandler.handleRequestBody(CdcrRequestHandler.java:198)
>   [beaster]   2>  at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:195)
>   [beaster]   2>  at 
> org.apache.solr.core.SolrCore.execute(SolrCore.java:2503)
>   [beaster]   2>  at 
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:711)
>   [beaster]   2>  at 
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:517)
>   [beaster]   2>  at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:384)
>   [beaster]   2>  at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:330)
>   [beaster]   2>  at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637)
>   [beaster]   2>  at 
> org.apache.solr.client.solrj.embedded.JettySolrRunner$DebugFilter.doFilter(JettySolrRunner.java:139)
>   [beaster]   2>  at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1637)
>   [beaster]   2>  at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)
>   [beaster]   2>  at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:188)
>   [beaster]   2>  at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)
>   [beaster]   2>  at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:188)
>   [beaster]   2>  at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253)
>   [beaster]   2>  at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:168)
>   [beaster]   2>  at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)
>   [beaster]   2>  at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: TestReplicationHandler is failing 100% (master and 7.x / 7.3)

2018-03-14 Thread Erick Erickson
Shalin:

Should we remove (actually, comment out with a date?) the BadApple
annotation for doTestIndexFetchOnMasterRestart? And do you think your
fixes have any influence on the other BadApple
(doTestIndexAndConfigReplication)?

There's no problem with un-BadApple-ing test that have been or are
being worked on, and we'd get more test coverage that way.

Or I can do that on Saturday if you'd prefer, assuming the Jenkins
BadApple tests don't show failures.



On Wed, Mar 14, 2018 at 1:57 PM, Shalin Shekhar Mangar
 wrote:
> This is fixed. I committed the fix to master, branch_7x and branch_7_3
> branches.
>
> On Wed, Mar 14, 2018 at 9:44 PM, Alan Woodward  wrote:
>>
>> Thanks Shalin!
>>
>>
>> On 14 Mar 2018, at 15:50, Shalin Shekhar Mangar 
>> wrote:
>>
>> I'll take a look at it tomorrow morning my time.
>>
>> On Wed, Mar 14, 2018 at 9:07 PM, Andrzej Białecki
>>  wrote:
>>>
>>> Well … I looked at it briefly but I have no idea what’s going on there. I
>>> could dig into it nonetheless, but if there’s someone who already knows the
>>> replication handler ins and outs it would probably get fixed sooner...
>>>
>>>
>>> On 14 Mar 2018, at 14:23, Alan Woodward  wrote:
>>>
>>> I’m happy either way, but if it’s a bug can we get it fixed quickly?  Can
>>> you take ownership of this one Andrzej?
>>>
>>> On 14 Mar 2018, at 11:24, Andrzej Białecki  wrote:
>>>
>>> Hi,
>>>
>>> This test has always been fragile, but recently it’s been failing 100%,
>>> most often in ‘doTestIndexFetchOnMasterRestart’.
>>>
>>> I don’t know the replication handler enough to be able to find the real
>>> reason behind these failures, but there are two possibilities that I see:
>>>
>>> * the test has a bug and needs to be fixed - and if we can’t fix it soon
>>> then with 7.3 release imminent we could BadApple it until it’s properly
>>> fixed
>>>
>>> * or actually the replication handler has a bug, which needs to be fixed
>>> - in which case I propose to bump up SOLR-12078 to Blocker.
>>>
>>> I’m open to suggestions.
>>>
>>> —
>>>
>>> Andrzej Białecki
>>>
>>>
>>>
>>
>>
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>>
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



<    1   2