[JENKINS] Lucene-Solr-SmokeRelease-master - Build # 998 - Still Failing

2018-04-06 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-SmokeRelease-master/998/

No tests ran.

Build Log:
[...truncated 23735 lines...]
[asciidoctor:convert] asciidoctor: ERROR: about-this-guide.adoc: line 1: 
invalid part, must have at least one section (e.g., chapter, appendix, etc.)
[asciidoctor:convert] asciidoctor: ERROR: solr-glossary.adoc: line 1: invalid 
part, must have at least one section (e.g., chapter, appendix, etc.)
 [java] Processed 2190 links (1746 relative) to 3004 anchors in 243 files
 [echo] Validated Links & Anchors via: 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/solr/build/solr-ref-guide/bare-bones-html/

-dist-changes:
 [copy] Copying 4 files to 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/solr/package/changes

-dist-keys:
  [get] Getting: http://home.apache.org/keys/group/lucene.asc
  [get] To: 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/solr/package/KEYS

package:

-unpack-solr-tgz:

-ensure-solr-tgz-exists:
[mkdir] Created dir: 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/solr/build/solr.tgz.unpacked
[untar] Expanding: 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/solr/package/solr-8.0.0.tgz
 into 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/solr/build/solr.tgz.unpacked

generate-maven-artifacts:

resolve:

resolve:

ivy-availability-check:
[loadresource] Do not set property disallowed.ivy.jars.list as its length is 0.

-ivy-fail-disallowed-ivy-version:

ivy-fail:

ivy-configure:
[ivy:configure] :: loading settings :: file = 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/lucene/top-level-ivy-settings.xml

resolve:

ivy-availability-check:
[loadresource] Do not set property disallowed.ivy.jars.list as its length is 0.

-ivy-fail-disallowed-ivy-version:

ivy-fail:

ivy-configure:
[ivy:configure] :: loading settings :: file = 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/lucene/top-level-ivy-settings.xml

resolve:

resolve:

ivy-availability-check:
[loadresource] Do not set property disallowed.ivy.jars.list as its length is 0.

-ivy-fail-disallowed-ivy-version:

ivy-fail:

ivy-configure:
[ivy:configure] :: loading settings :: file = 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/lucene/top-level-ivy-settings.xml

resolve:

ivy-availability-check:
[loadresource] Do not set property disallowed.ivy.jars.list as its length is 0.

-ivy-fail-disallowed-ivy-version:

ivy-fail:

ivy-configure:
[ivy:configure] :: loading settings :: file = 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/lucene/top-level-ivy-settings.xml

resolve:

ivy-availability-check:
[loadresource] Do not set property disallowed.ivy.jars.list as its length is 0.

-ivy-fail-disallowed-ivy-version:

ivy-fail:

ivy-configure:
[ivy:configure] :: loading settings :: file = 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/lucene/top-level-ivy-settings.xml

resolve:

ivy-availability-check:
[loadresource] Do not set property disallowed.ivy.jars.list as its length is 0.

-ivy-fail-disallowed-ivy-version:

ivy-fail:

ivy-configure:
[ivy:configure] :: loading settings :: file = 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/lucene/top-level-ivy-settings.xml

resolve:

ivy-availability-check:
[loadresource] Do not set property disallowed.ivy.jars.list as its length is 0.

-ivy-fail-disallowed-ivy-version:

ivy-fail:

ivy-configure:
[ivy:configure] :: loading settings :: file = 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/lucene/top-level-ivy-settings.xml

resolve:

ivy-availability-check:
[loadresource] Do not set property disallowed.ivy.jars.list as its length is 0.

-ivy-fail-disallowed-ivy-version:

ivy-fail:

ivy-configure:
[ivy:configure] :: loading settings :: file = 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/lucene/top-level-ivy-settings.xml

resolve:

ivy-availability-check:
[loadresource] Do not set property disallowed.ivy.jars.list as its length is 0.

-ivy-fail-disallowed-ivy-version:

ivy-fail:

ivy-configure:
[ivy:configure] :: loading settings :: file = 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/lucene/top-level-ivy-settings.xml

resolve:

ivy-availability-check:
[loadresource] Do not set property disallowed.ivy.jars.list as its length is 0.

-ivy-fail-disallowed-ivy-version:

ivy-fail:

ivy-configure:
[ivy:configure] :: loading settings :: file = 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/lucene/top-level-ivy-settings.xml

resolve:

ivy-availability-check:
[loadresource] Do not set property disallowed.ivy.jars.list as its length is 0.

-ivy-fail-disallowed-ivy-version:

ivy-fail:

ivy-configure:
[ivy:configure] :: loading settings :: file = 

[JENKINS] Lucene-Solr-Tests-7.x - Build # 551 - Unstable

2018-04-06 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Tests-7.x/551/

1 tests failed.
FAILED:  
org.apache.solr.cloud.autoscaling.sim.TestTriggerIntegration.testEventQueue

Error Message:
action did not start

Stack Trace:
java.lang.AssertionError: action did not start
at 
__randomizedtesting.SeedInfo.seed([49C7821C2C0FC23D:8072C0B2256804C8]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.assertTrue(Assert.java:43)
at 
org.apache.solr.cloud.autoscaling.sim.TestTriggerIntegration.testEventQueue(TestTriggerIntegration.java:640)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:934)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:970)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:984)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:943)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:829)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:879)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:890)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
at java.lang.Thread.run(Thread.java:748)




Build Log:
[...truncated 12133 lines...]
   [junit4] Suite: org.apache.solr.cloud.autoscaling.sim.TestTriggerIntegration
   [junit4]   2> 13803 INFO  
(SUITE-TestTriggerIntegration-seed#[49C7821C2C0FC23D]-worker) [] 
o.a.s.SolrTestCaseJ4 SecureRandom sanity checks: 
test.solr.allowed.securerandom=null & 

[JENKINS] Lucene-Solr-BadApples-Tests-7.x - Build # 34 - Still Unstable

2018-04-06 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-BadApples-Tests-7.x/34/

2 tests failed.
FAILED:  org.apache.solr.cloud.TestTlogReplica.testAddDocs

Error Message:
Could not load collection from ZK: tlog_replica_test_add_docs

Stack Trace:
org.apache.solr.common.SolrException: Could not load collection from ZK: 
tlog_replica_test_add_docs
at 
org.apache.solr.common.cloud.ZkStateReader.getCollectionLive(ZkStateReader.java:1250)
at 
org.apache.solr.common.cloud.ZkStateReader$LazyCollectionRef.get(ZkStateReader.java:679)
at 
org.apache.solr.common.cloud.ClusterState.getCollectionOrNull(ClusterState.java:148)
at 
org.apache.solr.common.cloud.ClusterState.getCollectionOrNull(ClusterState.java:131)
at 
org.apache.solr.cloud.TestTlogReplica.tearDown(TestTlogReplica.java:122)
at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:992)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:943)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:829)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:879)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:890)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException: 
KeeperErrorCode = Session expired for 
/collections/tlog_replica_test_add_docs/state.json
at org.apache.zookeeper.KeeperException.create(KeeperException.java:130)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1215)
 

[jira] [Commented] (SOLR-12201) TestReplicationHandler.doTestIndexFetchOnMasterRestart(): unexpected replication failures

2018-04-06 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429174#comment-16429174
 ] 

Steve Rowe commented on SOLR-12201:
---

I temporarily put another assertion at the top of the failing test to make sure 
that the replication failure count is zero, and it is in all the test runs I've 
done; these failures are not the result of other test methods' failing to 
cleanup properly.

> TestReplicationHandler.doTestIndexFetchOnMasterRestart(): unexpected 
> replication failures
> -
>
> Key: SOLR-12201
> URL: https://issues.apache.org/jira/browse/SOLR-12201
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Steve Rowe
>Priority: Major
>
> This is a BadApple'd test, and in local beasting failed 31/100 iterations.
> E.g. from 
> [https://builds.apache.org/job/Lucene-Solr-BadApples-Tests-master/24/]:
> {noformat}
>[junit4]   1> SHALIN: 
> {responseHeader={status=0,QTime=150},details={indexSize=11.2 
> KB,indexPath=/home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/solr/build/solr-core/test/J1/temp/solr.handler.TestReplicationHandler_C1A11EE85E7B0C57-001/solr-instance-008/./collection1/data/index/,commits=[{indexVersion=1523043739675,generation=2,filelist=[_0.fdt,
>  _0.fdx, _0.fnm, _0.nvd, _0.nvm, _0.si, _0_FSTOrd50_0.doc, _0_FSTOrd50_0.tbk, 
> _0_FSTOrd50_0.tix, 
> segments_2]}],isMaster=false,isSlave=true,indexVersion=1523043739675,generation=2,slave={masterDetails={indexSize=11.27
>  
> KB,indexPath=/home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/solr/build/solr-core/test/J1/temp/solr.handler.TestReplicationHandler_C1A11EE85E7B0C57-001/solr-instance-007/./collection1/data/index/,commits=[{indexVersion=0,generation=1,filelist=[segments_1]},
>  {indexVersion=1523043739675,generation=2,filelist=[_0.fdt, _0.fdx, _0.fnm, 
> _0.nvd, _0.nvm, _0.si, _0_FSTOrd50_0.doc, _0_FSTOrd50_0.tbk, 
> _0_FSTOrd50_0.tix, 
> segments_2]}],isMaster=true,isSlave=false,indexVersion=1523043739675,generation=2,master={confFiles=schema.xml,replicateAfter=[commit,
>  
> startup],replicationEnabled=true,replicableVersion=1523043739675,replicableGeneration=2}},masterUrl=http://127.0.0.1:36880/solr/collection1,pollInterval=00:00:01,nextExecutionAt=Fri
>  Apr 06 20:42:21 BST 2018,indexReplicatedAt=Fri Apr 06 20:42:20 BST 
> 2018,indexReplicatedAtList=[Fri Apr 06 20:42:20 BST 2018, Fri Apr 06 20:42:17 
> BST 2018],replicationFailedAtList=[Fri Apr 06 20:42:17 BST 
> 2018],timesIndexReplicated=2,lastCycleBytesDownloaded=11650,timesFailed=1,replicationFailedAt=Fri
>  Apr 06 20:42:17 BST 2018,previousCycleTimeInSeconds=0,currentDate=Fri Apr 06 
> 20:42:21 BST 2018,isPollingDisabled=false,isReplicating=false}}}
> [...]
>[junit4]   2> NOTE: reproduce with: ant test  
> -Dtestcase=TestReplicationHandler 
> -Dtests.method=doTestIndexFetchOnMasterRestart -Dtests.seed=C1A11EE85E7B0C57 
> -Dtests.multiplier=2 -Dtests.slow=true -Dtests.badapples=true 
> -Dtests.locale=en -Dtests.timezone=Europe/Isle_of_Man -Dtests.asserts=true 
> -Dtests.file.encoding=UTF-8
>[junit4] FAILURE 9.39s J1 | 
> TestReplicationHandler.doTestIndexFetchOnMasterRestart <<<
>[junit4]> Throwable #1: java.lang.AssertionError: expected:<1> but 
> was:<2>
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([C1A11EE85E7B0C57:1956DA0CF5A0CE0B]:0)
>[junit4]>  at 
> org.apache.solr.handler.TestReplicationHandler.doTestIndexFetchOnMasterRestart(TestReplicationHandler.java:666)
>[junit4]>  at java.lang.Thread.run(Thread.java:748)
> {noformat}
> The failed assertion is on line 666:
> {code:java|title=TestReplicationHandler.java}
> 666:assertEquals(1, 
> Integer.parseInt(getSlaveDetails("timesIndexReplicated")));
> 667:String timesFailed = getSlaveDetails("timesFailed");
> 668:assertEquals(0, Integer.parseInt(timesFailed != null ?  timesFailed : 
> "0"));
> {code}
> {{getSlaveDetails()}} prints out the properties it retrieves as JSON 
> following "SHALIN:" -- see the log excerpt above.  {{timesIndexReplicated}} 
> is 2 instead of 1 because there was an unexpected replication failure: 
> {{timesFailed}} is 1; if the assertion at line 666 were not there, then the 
> one asserting zero replication failures, at line 668, would fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12078) Reproducable Failure in TestReplicationHandler.doTestIndexFetchOnMasterRestart

2018-04-06 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429169#comment-16429169
 ] 

Steve Rowe commented on SOLR-12078:
---

I opened SOLR-12201 for the other failure [~mkhludnev] mentioned above.

> Reproducable Failure in TestReplicationHandler.doTestIndexFetchOnMasterRestart
> --
>
> Key: SOLR-12078
> URL: https://issues.apache.org/jira/browse/SOLR-12078
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.3, master (8.0)
> Environment: Building on Ubuntu 17.4
> openjdk version "1.8.0_151"
> OpenJDK Runtime Environment (build 1.8.0_151-8u151-b12-0ubuntu0.17.04.2-b12)
> OpenJDK 64-Bit Server VM (build 25.151-b12, mixed mode)
>Reporter: Gus Heck
>Assignee: Shalin Shekhar Mangar
>Priority: Major
> Fix For: 7.3, master (8.0)
>
> Attachments: SOLR-12078.patch
>
>
> With the recent focus on bad tests lately, I decided to inspect some failures 
> that occurred in tests unrelated to my present task when I ran the tests 
> preparing for a pull request and found this failure which reproduces:
> ant test  -Dtestcase=TestReplicationHandler 
> -Dtests.method=doTestIndexFetchOnMasterRestart -Dtests.seed=884DCF71D210D14A 
> -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=et 
> -Dtests.timezone=Cuba -Dtests.asserts=true -Dtests.file.encoding=UTF-8
>  
> Key excerpt of the log:
> {code:java}
>    [junit4]   2> NOTE: reproduce with: ant test  
> -Dtestcase=TestReplicationHandler 
> -Dtests.method=doTestIndexFetchOnMasterRestart -Dtests.seed=884DCF71D210D14A 
> -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=et 
> -Dtests.timezone=Cuba -Dtests.asserts=true -Dtests.file.encoding=UTF-8
>    [junit4] ERROR   2.35s | 
> TestReplicationHandler.doTestIndexFetchOnMasterRestart <<<
>    [junit4]    > Throwable #1: 
> org.apache.solr.client.solrj.SolrServerException: IOException occured when 
> talking to server at: http://127.0.0.1:37753/solr/collection1
>    [junit4]    >    at 
> __randomizedtesting.SeedInfo.seed([884DCF71D210D14A:50BA0B9579CB1316]:0)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:657)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:173)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:138)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:152)
>    [junit4]    >    at 
> org.apache.solr.handler.TestReplicationHandler.index(TestReplicationHandler.java:180)
>    [junit4]    >    at 
> org.apache.solr.handler.TestReplicationHandler.doTestIndexFetchOnMasterRestart(TestReplicationHandler.java:643)
>    [junit4]    >    at java.lang.Thread.run(Thread.java:748)
>    [junit4]    > Caused by: org.apache.http.NoHttpResponseException: 
> 127.0.0.1:37753 failed to respond
>    [junit4]    >    at 
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:141)
>    [junit4]    >    at 
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
>    [junit4]    >    at 
> org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
>    [junit4]    >    at 
> org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
>    [junit4]    >    at 
> org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165)
>    [junit4]    >    at 
> org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
>    [junit4]    >    at 
> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
>    [junit4]    >    at 
> org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
>    [junit4]    >    at 
> org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
>    [junit4]    >    at 
> org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
>    [junit4]    >    at 
> org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111)
>    [junit4]    >    at 
> org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
>    [junit4]    >    at 
> 

[jira] [Created] (SOLR-12201) TestReplicationHandler.doTestIndexFetchOnMasterRestart(): unexpected replication failures

2018-04-06 Thread Steve Rowe (JIRA)
Steve Rowe created SOLR-12201:
-

 Summary: TestReplicationHandler.doTestIndexFetchOnMasterRestart(): 
unexpected replication failures
 Key: SOLR-12201
 URL: https://issues.apache.org/jira/browse/SOLR-12201
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Steve Rowe


This is a BadApple'd test, and in local beasting failed 31/100 iterations.

E.g. from 
[https://builds.apache.org/job/Lucene-Solr-BadApples-Tests-master/24/]:

{noformat}
   [junit4]   1> SHALIN: 
{responseHeader={status=0,QTime=150},details={indexSize=11.2 
KB,indexPath=/home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/solr/build/solr-core/test/J1/temp/solr.handler.TestReplicationHandler_C1A11EE85E7B0C57-001/solr-instance-008/./collection1/data/index/,commits=[{indexVersion=1523043739675,generation=2,filelist=[_0.fdt,
 _0.fdx, _0.fnm, _0.nvd, _0.nvm, _0.si, _0_FSTOrd50_0.doc, _0_FSTOrd50_0.tbk, 
_0_FSTOrd50_0.tix, 
segments_2]}],isMaster=false,isSlave=true,indexVersion=1523043739675,generation=2,slave={masterDetails={indexSize=11.27
 
KB,indexPath=/home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/solr/build/solr-core/test/J1/temp/solr.handler.TestReplicationHandler_C1A11EE85E7B0C57-001/solr-instance-007/./collection1/data/index/,commits=[{indexVersion=0,generation=1,filelist=[segments_1]},
 {indexVersion=1523043739675,generation=2,filelist=[_0.fdt, _0.fdx, _0.fnm, 
_0.nvd, _0.nvm, _0.si, _0_FSTOrd50_0.doc, _0_FSTOrd50_0.tbk, _0_FSTOrd50_0.tix, 
segments_2]}],isMaster=true,isSlave=false,indexVersion=1523043739675,generation=2,master={confFiles=schema.xml,replicateAfter=[commit,
 
startup],replicationEnabled=true,replicableVersion=1523043739675,replicableGeneration=2}},masterUrl=http://127.0.0.1:36880/solr/collection1,pollInterval=00:00:01,nextExecutionAt=Fri
 Apr 06 20:42:21 BST 2018,indexReplicatedAt=Fri Apr 06 20:42:20 BST 
2018,indexReplicatedAtList=[Fri Apr 06 20:42:20 BST 2018, Fri Apr 06 20:42:17 
BST 2018],replicationFailedAtList=[Fri Apr 06 20:42:17 BST 
2018],timesIndexReplicated=2,lastCycleBytesDownloaded=11650,timesFailed=1,replicationFailedAt=Fri
 Apr 06 20:42:17 BST 2018,previousCycleTimeInSeconds=0,currentDate=Fri Apr 06 
20:42:21 BST 2018,isPollingDisabled=false,isReplicating=false}}}
[...]
   [junit4]   2> NOTE: reproduce with: ant test  
-Dtestcase=TestReplicationHandler 
-Dtests.method=doTestIndexFetchOnMasterRestart -Dtests.seed=C1A11EE85E7B0C57 
-Dtests.multiplier=2 -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=en 
-Dtests.timezone=Europe/Isle_of_Man -Dtests.asserts=true 
-Dtests.file.encoding=UTF-8
   [junit4] FAILURE 9.39s J1 | 
TestReplicationHandler.doTestIndexFetchOnMasterRestart <<<
   [junit4]> Throwable #1: java.lang.AssertionError: expected:<1> but 
was:<2>
   [junit4]>at 
__randomizedtesting.SeedInfo.seed([C1A11EE85E7B0C57:1956DA0CF5A0CE0B]:0)
   [junit4]>at 
org.apache.solr.handler.TestReplicationHandler.doTestIndexFetchOnMasterRestart(TestReplicationHandler.java:666)
   [junit4]>at java.lang.Thread.run(Thread.java:748)
{noformat}

The failed assertion is on line 666:

{code:java|title=TestReplicationHandler.java}
666:assertEquals(1, 
Integer.parseInt(getSlaveDetails("timesIndexReplicated")));
667:String timesFailed = getSlaveDetails("timesFailed");
668:assertEquals(0, Integer.parseInt(timesFailed != null ?  timesFailed : 
"0"));
{code}

{{getSlaveDetails()}} prints out the properties it retrieves as JSON following 
"SHALIN:" -- see the log excerpt above.  {{timesIndexReplicated}} is 2 instead 
of 1 because there was an unexpected replication failure: {{timesFailed}} is 1; 
if the assertion at line 666 were not there, then the one asserting zero 
replication failures, at line 668, would fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-7.x-Linux (64bit/jdk-10) - Build # 1659 - Unstable!

2018-04-06 Thread Policeman Jenkins Server
Build: https://jenkins.thetaphi.de/job/Lucene-Solr-7.x-Linux/1659/
Java: 64bit/jdk-10 -XX:-UseCompressedOops -XX:+UseSerialGC

1 tests failed.
FAILED:  org.apache.solr.cloud.autoscaling.ScheduledTriggerTest.testTrigger

Error Message:
expected:<3> but was:<2>

Stack Trace:
java.lang.AssertionError: expected:<3> but was:<2>
at 
__randomizedtesting.SeedInfo.seed([EE21BE2C1D225D75:8DEA88AE84ED2E58]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:472)
at org.junit.Assert.assertEquals(Assert.java:456)
at 
org.apache.solr.cloud.autoscaling.ScheduledTriggerTest.scheduledTriggerTest(ScheduledTriggerTest.java:111)
at 
org.apache.solr.cloud.autoscaling.ScheduledTriggerTest.testTrigger(ScheduledTriggerTest.java:64)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:564)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:934)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:970)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:984)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:943)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:829)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:879)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:890)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
at java.base/java.lang.Thread.run(Thread.java:844)




Build Log:
[...truncated 

[jira] [Commented] (LUCENE-7976) Make TieredMergePolicy respect maxSegmentSizeMB and allow singleton merges of very large segments

2018-04-06 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429059#comment-16429059
 ] 

Michael McCandless commented on LUCENE-7976:


I think this change might be cleaner if we can reformulate the desired outcomes 
using the existing "generate candidates and score them" approach?

E.g. for singleton merges ... I wonder if we could just relax TMP to allow it 
to consider merges with fewer than {{maxMergeAtOnce}}, and then "improve" the 
scoring function to give a good score to cases that would reclaim > X% 
deletions?

> Make TieredMergePolicy respect maxSegmentSizeMB and allow singleton merges of 
> very large segments
> -
>
> Key: LUCENE-7976
> URL: https://issues.apache.org/jira/browse/LUCENE-7976
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
> Attachments: LUCENE-7976.patch, LUCENE-7976.patch
>
>
> We're seeing situations "in the wild" where there are very large indexes (on 
> disk) handled quite easily in a single Lucene index. This is particularly 
> true as features like docValues move data into MMapDirectory space. The 
> current TMP algorithm allows on the order of 50% deleted documents as per a 
> dev list conversation with Mike McCandless (and his blog here:  
> https://www.elastic.co/blog/lucenes-handling-of-deleted-documents).
> Especially in the current era of very large indexes in aggregate, (think many 
> TB) solutions like "you need to distribute your collection over more shards" 
> become very costly. Additionally, the tempting "optimize" button exacerbates 
> the issue since once you form, say, a 100G segment (by 
> optimizing/forceMerging) it is not eligible for merging until 97.5G of the 
> docs in it are deleted (current default 5G max segment size).
> The proposal here would be to add a new parameter to TMP, something like 
>  (no, that's not serious name, suggestions 
> welcome) which would default to 100 (or the same behavior we have now).
> So if I set this parameter to, say, 20%, and the max segment size stays at 
> 5G, the following would happen when segments were selected for merging:
> > any segment with > 20% deleted documents would be merged or rewritten NO 
> > MATTER HOW LARGE. There are two cases,
> >> the segment has < 5G "live" docs. In that case it would be merged with 
> >> smaller segments to bring the resulting segment up to 5G. If no smaller 
> >> segments exist, it would just be rewritten
> >> The segment has > 5G "live" docs (the result of a forceMerge or optimize). 
> >> It would be rewritten into a single segment removing all deleted docs no 
> >> matter how big it is to start. The 100G example above would be rewritten 
> >> to an 80G segment for instance.
> Of course this would lead to potentially much more I/O which is why the 
> default would be the same behavior we see now. As it stands now, though, 
> there's no way to recover from an optimize/forceMerge except to re-index from 
> scratch. We routinely see 200G-300G Lucene indexes at this point "in the 
> wild" with 10s of  shards replicated 3 or more times. And that doesn't even 
> include having these over HDFS.
> Alternatives welcome! Something like the above seems minimally invasive. A 
> new merge policy is certainly an alternative.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12078) Reproducable Failure in TestReplicationHandler.doTestIndexFetchOnMasterRestart

2018-04-06 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429056#comment-16429056
 ] 

Steve Rowe commented on SOLR-12078:
---

[~mkhludnev]: the test is (still) BadApple'd, and has shown up a couple times 
on BadApple runs.  Locally I ran 100 beasting iterations and 31/100 failed with 
the error you listed.  I'm looking into it, but will likely make a new issue 
for it, since it looks different than the problem this issue was opened for.

> Reproducable Failure in TestReplicationHandler.doTestIndexFetchOnMasterRestart
> --
>
> Key: SOLR-12078
> URL: https://issues.apache.org/jira/browse/SOLR-12078
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.3, master (8.0)
> Environment: Building on Ubuntu 17.4
> openjdk version "1.8.0_151"
> OpenJDK Runtime Environment (build 1.8.0_151-8u151-b12-0ubuntu0.17.04.2-b12)
> OpenJDK 64-Bit Server VM (build 25.151-b12, mixed mode)
>Reporter: Gus Heck
>Assignee: Shalin Shekhar Mangar
>Priority: Major
> Fix For: 7.3, master (8.0)
>
> Attachments: SOLR-12078.patch
>
>
> With the recent focus on bad tests lately, I decided to inspect some failures 
> that occurred in tests unrelated to my present task when I ran the tests 
> preparing for a pull request and found this failure which reproduces:
> ant test  -Dtestcase=TestReplicationHandler 
> -Dtests.method=doTestIndexFetchOnMasterRestart -Dtests.seed=884DCF71D210D14A 
> -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=et 
> -Dtests.timezone=Cuba -Dtests.asserts=true -Dtests.file.encoding=UTF-8
>  
> Key excerpt of the log:
> {code:java}
>    [junit4]   2> NOTE: reproduce with: ant test  
> -Dtestcase=TestReplicationHandler 
> -Dtests.method=doTestIndexFetchOnMasterRestart -Dtests.seed=884DCF71D210D14A 
> -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=et 
> -Dtests.timezone=Cuba -Dtests.asserts=true -Dtests.file.encoding=UTF-8
>    [junit4] ERROR   2.35s | 
> TestReplicationHandler.doTestIndexFetchOnMasterRestart <<<
>    [junit4]    > Throwable #1: 
> org.apache.solr.client.solrj.SolrServerException: IOException occured when 
> talking to server at: http://127.0.0.1:37753/solr/collection1
>    [junit4]    >    at 
> __randomizedtesting.SeedInfo.seed([884DCF71D210D14A:50BA0B9579CB1316]:0)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:657)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:173)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:138)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:152)
>    [junit4]    >    at 
> org.apache.solr.handler.TestReplicationHandler.index(TestReplicationHandler.java:180)
>    [junit4]    >    at 
> org.apache.solr.handler.TestReplicationHandler.doTestIndexFetchOnMasterRestart(TestReplicationHandler.java:643)
>    [junit4]    >    at java.lang.Thread.run(Thread.java:748)
>    [junit4]    > Caused by: org.apache.http.NoHttpResponseException: 
> 127.0.0.1:37753 failed to respond
>    [junit4]    >    at 
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:141)
>    [junit4]    >    at 
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
>    [junit4]    >    at 
> org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
>    [junit4]    >    at 
> org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
>    [junit4]    >    at 
> org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165)
>    [junit4]    >    at 
> org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
>    [junit4]    >    at 
> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
>    [junit4]    >    at 
> org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
>    [junit4]    >    at 
> org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
>    [junit4]    >    at 
> org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
>    [junit4]    >    

[jira] [Commented] (LUCENE-7976) Make TieredMergePolicy respect maxSegmentSizeMB and allow singleton merges of very large segments

2018-04-06 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429054#comment-16429054
 ] 

Michael McCandless commented on LUCENE-7976:


I'm confused what this means:
{quote}I further propose that this be an _optional_ argument to the command 
that would override the setting in solrconfig.xml (if any). WDYT?
{quote}
We are in Lucene not Solr here – I think what you mean is you want to change 
the {{forceMerge}} and {{forceMergeDeletes}} APIs in {{IndexWriter}} and the 
merge policy to optionally (default would be unbounded) accept a parameter to 
set the {{maxMergedSegmentSizeMB}}?

> Make TieredMergePolicy respect maxSegmentSizeMB and allow singleton merges of 
> very large segments
> -
>
> Key: LUCENE-7976
> URL: https://issues.apache.org/jira/browse/LUCENE-7976
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
> Attachments: LUCENE-7976.patch, LUCENE-7976.patch
>
>
> We're seeing situations "in the wild" where there are very large indexes (on 
> disk) handled quite easily in a single Lucene index. This is particularly 
> true as features like docValues move data into MMapDirectory space. The 
> current TMP algorithm allows on the order of 50% deleted documents as per a 
> dev list conversation with Mike McCandless (and his blog here:  
> https://www.elastic.co/blog/lucenes-handling-of-deleted-documents).
> Especially in the current era of very large indexes in aggregate, (think many 
> TB) solutions like "you need to distribute your collection over more shards" 
> become very costly. Additionally, the tempting "optimize" button exacerbates 
> the issue since once you form, say, a 100G segment (by 
> optimizing/forceMerging) it is not eligible for merging until 97.5G of the 
> docs in it are deleted (current default 5G max segment size).
> The proposal here would be to add a new parameter to TMP, something like 
>  (no, that's not serious name, suggestions 
> welcome) which would default to 100 (or the same behavior we have now).
> So if I set this parameter to, say, 20%, and the max segment size stays at 
> 5G, the following would happen when segments were selected for merging:
> > any segment with > 20% deleted documents would be merged or rewritten NO 
> > MATTER HOW LARGE. There are two cases,
> >> the segment has < 5G "live" docs. In that case it would be merged with 
> >> smaller segments to bring the resulting segment up to 5G. If no smaller 
> >> segments exist, it would just be rewritten
> >> The segment has > 5G "live" docs (the result of a forceMerge or optimize). 
> >> It would be rewritten into a single segment removing all deleted docs no 
> >> matter how big it is to start. The 100G example above would be rewritten 
> >> to an 80G segment for instance.
> Of course this would lead to potentially much more I/O which is why the 
> default would be the same behavior we see now. As it stands now, though, 
> there's no way to recover from an optimize/forceMerge except to re-index from 
> scratch. We routinely see 200G-300G Lucene indexes at this point "in the 
> wild" with 10s of  shards replicated 3 or more times. And that doesn't even 
> include having these over HDFS.
> Alternatives welcome! Something like the above seems minimally invasive. A 
> new merge policy is certainly an alternative.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12078) Reproducable Failure in TestReplicationHandler.doTestIndexFetchOnMasterRestart

2018-04-06 Thread Mikhail Khludnev (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429040#comment-16429040
 ] 

Mikhail Khludnev commented on SOLR-12078:
-

https://builds.apache.org/job/PreCommit-SOLR-Build/42/testReport/org.apache.solr.handler/TestReplicationHandler/doTestIndexFetchOnMasterRestart/

{quote}
java.lang.AssertionError: expected:<1> but was:<2>
at 
__randomizedtesting.SeedInfo.seed([C1A11EE85E7B0C57:1956DA0CF5A0CE0B]:0)
at 
org.apache.solr.handler.TestReplicationHandler.doTestIndexFetchOnMasterRestart(TestReplicationHandler.java:666)
   <-- well, I see 
{quote}

WDYT? 

> Reproducable Failure in TestReplicationHandler.doTestIndexFetchOnMasterRestart
> --
>
> Key: SOLR-12078
> URL: https://issues.apache.org/jira/browse/SOLR-12078
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.3, master (8.0)
> Environment: Building on Ubuntu 17.4
> openjdk version "1.8.0_151"
> OpenJDK Runtime Environment (build 1.8.0_151-8u151-b12-0ubuntu0.17.04.2-b12)
> OpenJDK 64-Bit Server VM (build 25.151-b12, mixed mode)
>Reporter: Gus Heck
>Assignee: Shalin Shekhar Mangar
>Priority: Major
> Fix For: 7.3, master (8.0)
>
> Attachments: SOLR-12078.patch
>
>
> With the recent focus on bad tests lately, I decided to inspect some failures 
> that occurred in tests unrelated to my present task when I ran the tests 
> preparing for a pull request and found this failure which reproduces:
> ant test  -Dtestcase=TestReplicationHandler 
> -Dtests.method=doTestIndexFetchOnMasterRestart -Dtests.seed=884DCF71D210D14A 
> -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=et 
> -Dtests.timezone=Cuba -Dtests.asserts=true -Dtests.file.encoding=UTF-8
>  
> Key excerpt of the log:
> {code:java}
>    [junit4]   2> NOTE: reproduce with: ant test  
> -Dtestcase=TestReplicationHandler 
> -Dtests.method=doTestIndexFetchOnMasterRestart -Dtests.seed=884DCF71D210D14A 
> -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=et 
> -Dtests.timezone=Cuba -Dtests.asserts=true -Dtests.file.encoding=UTF-8
>    [junit4] ERROR   2.35s | 
> TestReplicationHandler.doTestIndexFetchOnMasterRestart <<<
>    [junit4]    > Throwable #1: 
> org.apache.solr.client.solrj.SolrServerException: IOException occured when 
> talking to server at: http://127.0.0.1:37753/solr/collection1
>    [junit4]    >    at 
> __randomizedtesting.SeedInfo.seed([884DCF71D210D14A:50BA0B9579CB1316]:0)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:657)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:173)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:138)
>    [junit4]    >    at 
> org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:152)
>    [junit4]    >    at 
> org.apache.solr.handler.TestReplicationHandler.index(TestReplicationHandler.java:180)
>    [junit4]    >    at 
> org.apache.solr.handler.TestReplicationHandler.doTestIndexFetchOnMasterRestart(TestReplicationHandler.java:643)
>    [junit4]    >    at java.lang.Thread.run(Thread.java:748)
>    [junit4]    > Caused by: org.apache.http.NoHttpResponseException: 
> 127.0.0.1:37753 failed to respond
>    [junit4]    >    at 
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:141)
>    [junit4]    >    at 
> org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
>    [junit4]    >    at 
> org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
>    [junit4]    >    at 
> org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
>    [junit4]    >    at 
> org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165)
>    [junit4]    >    at 
> org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
>    [junit4]    >    at 
> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
>    [junit4]    >    at 
> org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
>    [junit4]    >    at 
> 

[jira] [Commented] (LUCENE-7976) Make TieredMergePolicy respect maxSegmentSizeMB and allow singleton merges of very large segments

2018-04-06 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429038#comment-16429038
 ] 

Michael McCandless commented on LUCENE-7976:


Phew suddenly a lot of action here!  I'll review the patch soon, but wanted to 
answer:
{quote}[~mikemccand] do you know the condition in TieredMergePolicy that 
segmentsToMerge is used? 
{quote}
Right, the idea here is that if you call {{forceMerge}}, we will only merge 
those segments present in the index at the moment {{forceMerge}} started.  Any 
newly written segments due to concurrent indexing will not participate in that 
{{forceMerge}} (unless you go and call {{forceMerge}} again).  I think it's a 
bug that {{forceMergeDeletes}} doesn't do the same thing?  Otherwise the 
operation can run endlessly?

> Make TieredMergePolicy respect maxSegmentSizeMB and allow singleton merges of 
> very large segments
> -
>
> Key: LUCENE-7976
> URL: https://issues.apache.org/jira/browse/LUCENE-7976
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
> Attachments: LUCENE-7976.patch, LUCENE-7976.patch
>
>
> We're seeing situations "in the wild" where there are very large indexes (on 
> disk) handled quite easily in a single Lucene index. This is particularly 
> true as features like docValues move data into MMapDirectory space. The 
> current TMP algorithm allows on the order of 50% deleted documents as per a 
> dev list conversation with Mike McCandless (and his blog here:  
> https://www.elastic.co/blog/lucenes-handling-of-deleted-documents).
> Especially in the current era of very large indexes in aggregate, (think many 
> TB) solutions like "you need to distribute your collection over more shards" 
> become very costly. Additionally, the tempting "optimize" button exacerbates 
> the issue since once you form, say, a 100G segment (by 
> optimizing/forceMerging) it is not eligible for merging until 97.5G of the 
> docs in it are deleted (current default 5G max segment size).
> The proposal here would be to add a new parameter to TMP, something like 
>  (no, that's not serious name, suggestions 
> welcome) which would default to 100 (or the same behavior we have now).
> So if I set this parameter to, say, 20%, and the max segment size stays at 
> 5G, the following would happen when segments were selected for merging:
> > any segment with > 20% deleted documents would be merged or rewritten NO 
> > MATTER HOW LARGE. There are two cases,
> >> the segment has < 5G "live" docs. In that case it would be merged with 
> >> smaller segments to bring the resulting segment up to 5G. If no smaller 
> >> segments exist, it would just be rewritten
> >> The segment has > 5G "live" docs (the result of a forceMerge or optimize). 
> >> It would be rewritten into a single segment removing all deleted docs no 
> >> matter how big it is to start. The 100G example above would be rewritten 
> >> to an 80G segment for instance.
> Of course this would lead to potentially much more I/O which is why the 
> default would be the same behavior we see now. As it stands now, though, 
> there's no way to recover from an optimize/forceMerge except to re-index from 
> scratch. We routinely see 200G-300G Lucene indexes at this point "in the 
> wild" with 10s of  shards replicated 3 or more times. And that doesn't even 
> include having these over HDFS.
> Alternatives welcome! Something like the above seems minimally invasive. A 
> new merge policy is certainly an alternative.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-12200) ZkControllerTest failure probably caused by #testReadConfigName

2018-04-06 Thread Mikhail Khludnev (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-12200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated SOLR-12200:

Attachment: zk.fail.txt.gz

> ZkControllerTest failure probably caused by #testReadConfigName
> ---
>
> Key: SOLR-12200
> URL: https://issues.apache.org/jira/browse/SOLR-12200
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Mikhail Khludnev
>Priority: Major
> Attachments: zk.fail.txt.gz
>
>
> Failure seems suspiciously the same. 
>[junit4]   2> 499919 INFO  
> (TEST-ZkControllerTest.testReadConfigName-seed#[BC856CC565039E77]) 
> [n:127.0.0.1:8983_solr] o.a.s.c.Overseer Overseer 
> (id=73578760132362243-127.0.0.1:8983_solr-n_00) closing
>[junit4]   2> 499920 INFO  
> (OverseerStateUpdate-73578760132362243-127.0.0.1:8983_solr-n_00) [
> ] o.a.s.c.Overseer Overseer Loop exiting : 127.0.0.1:8983_solr
>[junit4]   2> 499920 ERROR 
> (OverseerCollectionConfigSetProcessor-73578760132362243-127.0.0.1:8983_solr-n_00)
>  [] o.a.s.c.OverseerTaskProcessor Unable to prioritize overseer
>[junit4]   2> java.lang.InterruptedException: null
>[junit4]   2>at java.lang.Object.wait(Native Method) ~[?:1.8.0_152]
>[junit4]   2>at java.lang.Object.wait(Object.java:502) 
> ~[?:1.8.0_152]
>[junit4]   2>at 
> org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1409) 
> ~[zookeeper-3.4.11.jar:3.4
> then it spins in SessionExpiredException, all tests pass but suite fails due 
> to leaking Overseer. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-12200) ZkControllerTest failure probably caused by #testReadConfigName

2018-04-06 Thread Mikhail Khludnev (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-12200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated SOLR-12200:

Summary: ZkControllerTest failure probably caused by #testReadConfigName  
(was: ZkControllerTest failure. probably #testReadConfigName)

> ZkControllerTest failure probably caused by #testReadConfigName
> ---
>
> Key: SOLR-12200
> URL: https://issues.apache.org/jira/browse/SOLR-12200
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Mikhail Khludnev
>Priority: Major
>
> Failure seems suspiciously the same. 
>[junit4]   2> 499919 INFO  
> (TEST-ZkControllerTest.testReadConfigName-seed#[BC856CC565039E77]) 
> [n:127.0.0.1:8983_solr] o.a.s.c.Overseer Overseer 
> (id=73578760132362243-127.0.0.1:8983_solr-n_00) closing
>[junit4]   2> 499920 INFO  
> (OverseerStateUpdate-73578760132362243-127.0.0.1:8983_solr-n_00) [
> ] o.a.s.c.Overseer Overseer Loop exiting : 127.0.0.1:8983_solr
>[junit4]   2> 499920 ERROR 
> (OverseerCollectionConfigSetProcessor-73578760132362243-127.0.0.1:8983_solr-n_00)
>  [] o.a.s.c.OverseerTaskProcessor Unable to prioritize overseer
>[junit4]   2> java.lang.InterruptedException: null
>[junit4]   2>at java.lang.Object.wait(Native Method) ~[?:1.8.0_152]
>[junit4]   2>at java.lang.Object.wait(Object.java:502) 
> ~[?:1.8.0_152]
>[junit4]   2>at 
> org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1409) 
> ~[zookeeper-3.4.11.jar:3.4
> then it spins in SessionExpiredException, all tests pass but suite fails due 
> to leaking Overseer. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-12200) ZkControllerTest failure. probably #testReadConfigName

2018-04-06 Thread Mikhail Khludnev (JIRA)
Mikhail Khludnev created SOLR-12200:
---

 Summary: ZkControllerTest failure. probably #testReadConfigName
 Key: SOLR-12200
 URL: https://issues.apache.org/jira/browse/SOLR-12200
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: SolrCloud
Reporter: Mikhail Khludnev


Failure seems suspiciously the same. 
   [junit4]   2> 499919 INFO  
(TEST-ZkControllerTest.testReadConfigName-seed#[BC856CC565039E77]) 
[n:127.0.0.1:8983_solr] o.a.s.c.Overseer Overseer 
(id=73578760132362243-127.0.0.1:8983_solr-n_00) closing
   [junit4]   2> 499920 INFO  
(OverseerStateUpdate-73578760132362243-127.0.0.1:8983_solr-n_00) [] 
o.a.s.c.Overseer Overseer Loop exiting : 127.0.0.1:8983_solr
   [junit4]   2> 499920 ERROR 
(OverseerCollectionConfigSetProcessor-73578760132362243-127.0.0.1:8983_solr-n_00)
 [] o.a.s.c.OverseerTaskProcessor Unable to prioritize overseer
   [junit4]   2> java.lang.InterruptedException: null
   [junit4]   2>at java.lang.Object.wait(Native Method) ~[?:1.8.0_152]
   [junit4]   2>at java.lang.Object.wait(Object.java:502) ~[?:1.8.0_152]
   [junit4]   2>at 
org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1409) 
~[zookeeper-3.4.11.jar:3.4

then it spins in SessionExpiredException, all tests pass but suite fails due to 
leaking Overseer. 




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7976) Make TieredMergePolicy respect maxSegmentSizeMB and allow singleton merges of very large segments

2018-04-06 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428991#comment-16428991
 ] 

Erick Erickson commented on LUCENE-7976:


[~mikemccand] re: do you know the condition in TieredMergePolicy that 
segmentsToMerge is used

Figured it out. Basically the forceMerge doesn't want to merge more than 
maxMergeAtOnceExplicit segments at once so there may be multiple passes (gosh, 
almost looks like Map/Reduce).

bq: I started looking at this and we already have maxSegments as a parameter to 
optimize and there's a really hacky way to use that (if it's not present on the 
command, set it to Integer.MAX_VALUE) and that's justugly. So changing that 
to maxMergeSegmentSizeMB seems cleaner.

Changing my mind about this. I found a better way to deal with an external 
(Solr-level) optimize command. For the update command, default maxSegments to 
MAX_INT, assume there's no limit and always respect maxMergedSegmentMB. If the 
user _does_ specify the max number of segments allowed, do what they say even 
if it means creating one giant segment.



> Make TieredMergePolicy respect maxSegmentSizeMB and allow singleton merges of 
> very large segments
> -
>
> Key: LUCENE-7976
> URL: https://issues.apache.org/jira/browse/LUCENE-7976
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
> Attachments: LUCENE-7976.patch, LUCENE-7976.patch
>
>
> We're seeing situations "in the wild" where there are very large indexes (on 
> disk) handled quite easily in a single Lucene index. This is particularly 
> true as features like docValues move data into MMapDirectory space. The 
> current TMP algorithm allows on the order of 50% deleted documents as per a 
> dev list conversation with Mike McCandless (and his blog here:  
> https://www.elastic.co/blog/lucenes-handling-of-deleted-documents).
> Especially in the current era of very large indexes in aggregate, (think many 
> TB) solutions like "you need to distribute your collection over more shards" 
> become very costly. Additionally, the tempting "optimize" button exacerbates 
> the issue since once you form, say, a 100G segment (by 
> optimizing/forceMerging) it is not eligible for merging until 97.5G of the 
> docs in it are deleted (current default 5G max segment size).
> The proposal here would be to add a new parameter to TMP, something like 
>  (no, that's not serious name, suggestions 
> welcome) which would default to 100 (or the same behavior we have now).
> So if I set this parameter to, say, 20%, and the max segment size stays at 
> 5G, the following would happen when segments were selected for merging:
> > any segment with > 20% deleted documents would be merged or rewritten NO 
> > MATTER HOW LARGE. There are two cases,
> >> the segment has < 5G "live" docs. In that case it would be merged with 
> >> smaller segments to bring the resulting segment up to 5G. If no smaller 
> >> segments exist, it would just be rewritten
> >> The segment has > 5G "live" docs (the result of a forceMerge or optimize). 
> >> It would be rewritten into a single segment removing all deleted docs no 
> >> matter how big it is to start. The 100G example above would be rewritten 
> >> to an 80G segment for instance.
> Of course this would lead to potentially much more I/O which is why the 
> default would be the same behavior we see now. As it stands now, though, 
> there's no way to recover from an optimize/forceMerge except to re-index from 
> scratch. We routinely see 200G-300G Lucene indexes at this point "in the 
> wild" with 10s of  shards replicated 3 or more times. And that doesn't even 
> include having these over HDFS.
> Alternatives welcome! Something like the above seems minimally invasive. A 
> new merge policy is certainly an alternative.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-7627) Either automate or stop listing the "Versions of Major Components" in CHANGES.txt

2018-04-06 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428987#comment-16428987
 ] 

Steve Rowe edited comment on SOLR-7627 at 4/6/18 8:58 PM:
--

FYI {{get_solr_init_changes()}} in {{addVersion.py}}, which is run by the 
release manager after branching for a release, automates initial populaton of 
this {{CHANGES.txt}} section, pulling versions from 
{{ivy-versions.properties}}.  This won't keep the versions in sync though.


was (Author: steve_rowe):
FYI {{get_solr_init_changes()}} in {{addVersion.py}}, which is run by the 
release manager after branching for a release, automates filling out this 
{{CHANGES.txt}} section, pulling versions from {{ivy-versions.properties}}.  
This won't keep the versions in sync though.

> Either automate or stop listing the "Versions of Major Components" in 
> CHANGES.txt
> -
>
> Key: SOLR-7627
> URL: https://issues.apache.org/jira/browse/SOLR-7627
> Project: Solr
>  Issue Type: Improvement
>Reporter: Hoss Man
>Priority: Major
>
> At some point along the way, we got in the practice of having a "Versions of 
> Major Components" sub-section in changes.txt for each release ... in addition 
> to the normal practice of recording the individual Jiras when deps are 
> upgraded.
> maintaining this sub-section accurately seems very tedious and error prone 
> (see SOLR-7626) so it seems like we should either:
> * stop doing this completely and trust the users to look at the ivy files for 
> the dependencies
> * find a way to automate this so we don't have to do it manually.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7627) Either automate or stop listing the "Versions of Major Components" in CHANGES.txt

2018-04-06 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428987#comment-16428987
 ] 

Steve Rowe commented on SOLR-7627:
--

FYI {{get_solr_init_changes()}} in {{addVersion.py}}, which is run by the 
release manager after branching for a release, automates filling out this 
{{CHANGES.txt}} section, pulling versions from {{ivy-versions.properties}}.  
This won't keep the versions in sync though.

> Either automate or stop listing the "Versions of Major Components" in 
> CHANGES.txt
> -
>
> Key: SOLR-7627
> URL: https://issues.apache.org/jira/browse/SOLR-7627
> Project: Solr
>  Issue Type: Improvement
>Reporter: Hoss Man
>Priority: Major
>
> At some point along the way, we got in the practice of having a "Versions of 
> Major Components" sub-section in changes.txt for each release ... in addition 
> to the normal practice of recording the individual Jiras when deps are 
> upgraded.
> maintaining this sub-section accurately seems very tedious and error prone 
> (see SOLR-7626) so it seems like we should either:
> * stop doing this completely and trust the users to look at the ivy files for 
> the dependencies
> * find a way to automate this so we don't have to do it manually.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12155) Solr 7.2.1 deadlock in UnInvertedField.getUnInvertedField()

2018-04-06 Thread Lucene/Solr QA (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428934#comment-16428934
 ] 

Lucene/Solr QA commented on SOLR-12155:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Release audit (RAT) {color} | 
{color:green}  0m 54s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} Check forbidden APIs {color} | {color:red} 
 0m 54s{color} | {color:red} Check forbidden APIs check-forbidden-apis failed 
{color} |
| {color:red}-1{color} | {color:red} Validate source patterns {color} | 
{color:red}  0m 54s{color} | {color:red} Check forbidden APIs 
check-forbidden-apis failed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 77m  8s{color} 
| {color:red} core in the patch failed. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 79m 27s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | solr.cloud.autoscaling.ComputePlanActionTest |
|   | solr.cloud.TestTlogReplica |
|   | solr.handler.TestReplicationHandler |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | SOLR-12155 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12917770/SOLR-12155.patch |
| Optional Tests |  compile  javac  unit  ratsources  checkforbiddenapis  
validatesourcepatterns  |
| uname | Linux lucene2-us-west.apache.org 4.4.0-112-generic #135-Ubuntu SMP 
Fri Jan 19 11:48:36 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | ant |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh
 |
| git revision | master / 0f53adb |
| ant | version: Apache Ant(TM) version 1.9.6 compiled on July 8 2015 |
| Default Java | 1.8.0_152 |
| Check forbidden APIs | 
https://builds.apache.org/job/PreCommit-SOLR-Build/42/artifact/out/patch-check-forbidden-apis-solr.txt
 |
| Validate source patterns | 
https://builds.apache.org/job/PreCommit-SOLR-Build/42/artifact/out/patch-check-forbidden-apis-solr.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-SOLR-Build/42/artifact/out/patch-unit-solr_core.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-SOLR-Build/42/testReport/ |
| modules | C: solr solr/core U: solr |
| Console output | 
https://builds.apache.org/job/PreCommit-SOLR-Build/42/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> Solr 7.2.1 deadlock in UnInvertedField.getUnInvertedField() 
> 
>
> Key: SOLR-12155
> URL: https://issues.apache.org/jira/browse/SOLR-12155
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.2.1
>Reporter: Kishor gandham
>Assignee: Mikhail Khludnev
>Priority: Major
> Fix For: 7.4
>
> Attachments: SOLR-12155.patch, SOLR-12155.patch, SOLR-12155.patch, 
> SOLR-12155.patch, stack.txt
>
>
> I am attaching a stack trace from our production Solr (7.2.1). Occasionally, 
> we are seeing SOLR becoming unresponsive. We are then forced to kill the JVM 
> and start solr again.
> We have a lot of facet queries and our index has approximately 15 million 
> documents. We have recently started using json.facet queries and some of the 
> facet fields use DocValues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8244) SearcherTaxonomyManager.refreshIfNeeded leaks file handles on exception

2018-04-06 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428909#comment-16428909
 ] 

Michael McCandless commented on LUCENE-8244:


I attached patch w/ test case and fix.

> SearcherTaxonomyManager.refreshIfNeeded leaks file handles on exception
> ---
>
> Key: LUCENE-8244
> URL: https://issues.apache.org/jira/browse/LUCENE-8244
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/facet
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Major
> Attachments: LUCENE-8244.patch
>
>
> This method first refreshes the main index, and then the taxonomy, but if an 
> exception is hit while refreshing the taxonomy, it fails to close the new 
> reader it opened from the main index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8244) SearcherTaxonomyManager.refreshIfNeeded leaks file handles on exception

2018-04-06 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-8244:
---
Attachment: LUCENE-8244.patch

> SearcherTaxonomyManager.refreshIfNeeded leaks file handles on exception
> ---
>
> Key: LUCENE-8244
> URL: https://issues.apache.org/jira/browse/LUCENE-8244
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/facet
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Major
> Attachments: LUCENE-8244.patch
>
>
> This method first refreshes the main index, and then the taxonomy, but if an 
> exception is hit while refreshing the taxonomy, it fails to close the new 
> reader it opened from the main index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8244) SearcherTaxonomyManager.refreshIfNeeded leaks file handles on exception

2018-04-06 Thread Michael McCandless (JIRA)
Michael McCandless created LUCENE-8244:
--

 Summary: SearcherTaxonomyManager.refreshIfNeeded leaks file 
handles on exception
 Key: LUCENE-8244
 URL: https://issues.apache.org/jira/browse/LUCENE-8244
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/facet
Reporter: Michael McCandless
Assignee: Michael McCandless


This method first refreshes the main index, and then the taxonomy, but if an 
exception is hit while refreshing the taxonomy, it fails to close the new 
reader it opened from the main index.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-BadApples-NightlyTests-master - Build # 6 - Still Failing

2018-04-06 Thread Apache Jenkins Server
Build: 
https://builds.apache.org/job/Lucene-Solr-BadApples-NightlyTests-master/6/

8 tests failed.
FAILED:  org.apache.solr.cloud.FullSolrCloudDistribCmdsTest.test

Error Message:
Java heap space

Stack Trace:
java.lang.OutOfMemoryError: Java heap space
at 
__randomizedtesting.SeedInfo.seed([3AD4C155A2C8DB65:B280FE8F0C34B69D]:0)
at java.util.Arrays.copyOf(Arrays.java:3332)
at 
java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
at 
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:649)
at java.lang.StringBuilder.append(StringBuilder.java:202)
at 
org.apache.http.client.utils.URLEncodedUtils.urlEncode(URLEncodedUtils.java:536)
at 
org.apache.http.client.utils.URLEncodedUtils.encodeFormFields(URLEncodedUtils.java:652)
at 
org.apache.http.client.utils.URLEncodedUtils.format(URLEncodedUtils.java:404)
at 
org.apache.http.client.utils.URLEncodedUtils.format(URLEncodedUtils.java:382)
at 
org.apache.http.client.entity.UrlEncodedFormEntity.(UrlEncodedFormEntity.java:75)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.fillContentStream(HttpSolrClient.java:513)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.createMethod(HttpSolrClient.java:420)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:253)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
at 
org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:483)
at 
org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:413)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1106)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:886)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:819)
at 
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:974)
at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:990)
at 
org.apache.solr.cloud.CloudInspectUtil.compareResults(CloudInspectUtil.java:228)
at 
org.apache.solr.cloud.CloudInspectUtil.compareResults(CloudInspectUtil.java:167)
at 
org.apache.solr.cloud.FullSolrCloudDistribCmdsTest.testIndexingBatchPerRequestWithHttpSolrClient(FullSolrCloudDistribCmdsTest.java:668)
at 
org.apache.solr.cloud.FullSolrCloudDistribCmdsTest.test(FullSolrCloudDistribCmdsTest.java:152)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:934)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:970)


FAILED:  org.apache.solr.cloud.TestTlogReplica.testRecovery

Error Message:
Can not find doc 8 in https://127.0.0.1:39842/solr

Stack Trace:
java.lang.AssertionError: Can not find doc 8 in https://127.0.0.1:39842/solr
at 
__randomizedtesting.SeedInfo.seed([3AD4C155A2C8DB65:FB24B8F98F9811C2]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.assertTrue(Assert.java:43)
at org.junit.Assert.assertNotNull(Assert.java:526)
at 
org.apache.solr.cloud.TestTlogReplica.checkRTG(TestTlogReplica.java:889)
at 
org.apache.solr.cloud.TestTlogReplica.testRecovery(TestTlogReplica.java:603)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:934)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:970)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:984)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at 

[jira] [Commented] (LUCENE-8243) IndexWriter might delete DV update files if addIndices are invovled

2018-04-06 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428896#comment-16428896
 ] 

Simon Willnauer commented on LUCENE-8243:
-

LGTM thanks mike!

> IndexWriter might delete DV update files if addIndices are invovled
> ---
>
> Key: LUCENE-8243
> URL: https://issues.apache.org/jira/browse/LUCENE-8243
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 7.4, master (8.0)
>Reporter: Simon Willnauer
>Assignee: Michael McCandless
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: LUCENE-8243.patch, broken_dv_update.patch
>
>
> the attached test fails with this output:
> {noformat}
> /Library/Java/JavaVirtualMachines/jdk-9.0.1.jdk/Contents/Home/bin/java -ea 
> -Djava.security.egd=file:/dev/./urandom 
> -Didea.test.cyclic.buffer.size=1048576 -Dfile.encoding=UTF-8 -classpath 
> "/Applications/IntelliJ 
> IDEA.app/Contents/lib/idea_rt.jar:/Applications/IntelliJ 
> IDEA.app/Contents/plugins/junit/lib/junit-rt.jar:/Applications/IntelliJ 
> IDEA.app/Contents/plugins/junit/lib/junit5-rt.jar:/Users/simonw/projects/lucene-solr/idea-build/lucene/test-framework/classes/test:/Users/simonw/projects/lucene-solr/idea-build/lucene/test-framework/classes/java:/Users/simonw/projects/lucene-solr/lucene/test-framework/lib/junit-4.10.jar:/Users/simonw/projects/lucene-solr/lucene/test-framework/lib/randomizedtesting-runner-2.5.3.jar:/Users/simonw/projects/lucene-solr/idea-build/lucene/codecs/classes/java:/Users/simonw/projects/lucene-solr/idea-build/lucene/core/classes/java:/Users/simonw/projects/lucene-solr/idea-build/lucene/core/classes/test"
>  com.intellij.rt.execution.junit.JUnitStarter -ideVersion5 -junit4 
> org.apache.lucene.index.TestAddIndexes,testAddIndexesDVUpdate
> IFD 0 [2018-04-06T19:27:27.176036Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: init: 
> current segments file is "segments_1"; 
> deletionPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy@27cf18f0
> IFD 0 [2018-04-06T19:27:27.188066Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: init: 
> load commit "segments_1"
> IFD 0 [2018-04-06T19:27:27.189800Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: init: 
> seg=_0 set nextWriteDelGen=2 vs current=1
> IFD 0 [2018-04-06T19:27:27.190053Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: init: 
> removing unreferenced file "_0_1_Lucene70_0.dvd"
> IFD 0 [2018-04-06T19:27:27.190224Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: init: 
> removing unreferenced file "_0_1.fnm"
> IFD 0 [2018-04-06T19:27:27.190371Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: init: 
> removing unreferenced file "_0_1_Lucene70_0.dvm"
> IFD 0 [2018-04-06T19:27:27.190528Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: delete 
> [_0_1_Lucene70_0.dvd, _0_1.fnm, _0_1_Lucene70_0.dvm]
> IFD 0 [2018-04-06T19:27:27.192558Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: now 
> checkpoint "_0(8.0.0):C1:fieldInfosGen=1:dvGen=1" [1 segments ; isCommit = 
> false]
> IFD 0 [2018-04-06T19:27:27.192806Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: 0 msec 
> to checkpoint
> IW 0 [2018-04-06T19:27:27.193012Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: init: 
> create=false
> IW 0 [2018-04-06T19:27:27.193428Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: 
> dir=MockDirectoryWrapper(RAMDirectory@79d3c690 
> lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@795a0c8b)
> index=_0(8.0.0):C1:fieldInfosGen=1:dvGen=1
> version=8.0.0
> analyzer=org.apache.lucene.analysis.MockAnalyzer
> ramBufferSizeMB=16.0
> maxBufferedDocs=503
> mergedSegmentWarmer=null
> delPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy
> commit=null
> openMode=CREATE_OR_APPEND
> similarity=org.apache.lucene.search.similarities.AssertingSimilarity
> mergeScheduler=org.apache.lucene.index.SerialMergeScheduler@2f3feff6
> codec=FastDecompressionCompressingStoredFields(storedFieldsFormat=CompressingStoredFieldsFormat(compressionMode=FAST_DECOMPRESSION,
>  chunkSize=8, maxDocsPerChunk=6, blockSize=201), 
> termVectorsFormat=CompressingTermVectorsFormat(compressionMode=FAST_DECOMPRESSION,
>  chunkSize=8, blockSize=201))
> infoStream=org.apache.lucene.util.PrintStreamInfoStream
> mergePolicy=[TieredMergePolicy: maxMergeAtOnce=41, maxMergeAtOnceExplicit=44, 
> maxMergedSegmentMB=6.255859375, floorSegmentMB=0.38671875, 
> forceMergeDeletesPctAllowed=4.456652110760543, segmentsPerTier=31.0, 
> maxCFSSegmentSizeMB=8.796093022207999E12, noCFSRatio=0.877330376985384
> 

[jira] [Commented] (LUCENE-8243) IndexWriter might delete DV update files if addIndices are invovled

2018-04-06 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428866#comment-16428866
 ] 

Michael McCandless commented on LUCENE-8243:


I just attached a patch that I think fixes the issue – the problem was 
{{IW.addIndexes(Directory[])}} was creating a new {{SegmentCommitInfo}} for the 
copied segment, but was failing to preserve the files associated with doc 
values updates.  All Lucene tests, and {{ant precommit}} pass.

> IndexWriter might delete DV update files if addIndices are invovled
> ---
>
> Key: LUCENE-8243
> URL: https://issues.apache.org/jira/browse/LUCENE-8243
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 7.4, master (8.0)
>Reporter: Simon Willnauer
>Assignee: Michael McCandless
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: LUCENE-8243.patch, broken_dv_update.patch
>
>
> the attached test fails with this output:
> {noformat}
> /Library/Java/JavaVirtualMachines/jdk-9.0.1.jdk/Contents/Home/bin/java -ea 
> -Djava.security.egd=file:/dev/./urandom 
> -Didea.test.cyclic.buffer.size=1048576 -Dfile.encoding=UTF-8 -classpath 
> "/Applications/IntelliJ 
> IDEA.app/Contents/lib/idea_rt.jar:/Applications/IntelliJ 
> IDEA.app/Contents/plugins/junit/lib/junit-rt.jar:/Applications/IntelliJ 
> IDEA.app/Contents/plugins/junit/lib/junit5-rt.jar:/Users/simonw/projects/lucene-solr/idea-build/lucene/test-framework/classes/test:/Users/simonw/projects/lucene-solr/idea-build/lucene/test-framework/classes/java:/Users/simonw/projects/lucene-solr/lucene/test-framework/lib/junit-4.10.jar:/Users/simonw/projects/lucene-solr/lucene/test-framework/lib/randomizedtesting-runner-2.5.3.jar:/Users/simonw/projects/lucene-solr/idea-build/lucene/codecs/classes/java:/Users/simonw/projects/lucene-solr/idea-build/lucene/core/classes/java:/Users/simonw/projects/lucene-solr/idea-build/lucene/core/classes/test"
>  com.intellij.rt.execution.junit.JUnitStarter -ideVersion5 -junit4 
> org.apache.lucene.index.TestAddIndexes,testAddIndexesDVUpdate
> IFD 0 [2018-04-06T19:27:27.176036Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: init: 
> current segments file is "segments_1"; 
> deletionPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy@27cf18f0
> IFD 0 [2018-04-06T19:27:27.188066Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: init: 
> load commit "segments_1"
> IFD 0 [2018-04-06T19:27:27.189800Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: init: 
> seg=_0 set nextWriteDelGen=2 vs current=1
> IFD 0 [2018-04-06T19:27:27.190053Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: init: 
> removing unreferenced file "_0_1_Lucene70_0.dvd"
> IFD 0 [2018-04-06T19:27:27.190224Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: init: 
> removing unreferenced file "_0_1.fnm"
> IFD 0 [2018-04-06T19:27:27.190371Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: init: 
> removing unreferenced file "_0_1_Lucene70_0.dvm"
> IFD 0 [2018-04-06T19:27:27.190528Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: delete 
> [_0_1_Lucene70_0.dvd, _0_1.fnm, _0_1_Lucene70_0.dvm]
> IFD 0 [2018-04-06T19:27:27.192558Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: now 
> checkpoint "_0(8.0.0):C1:fieldInfosGen=1:dvGen=1" [1 segments ; isCommit = 
> false]
> IFD 0 [2018-04-06T19:27:27.192806Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: 0 msec 
> to checkpoint
> IW 0 [2018-04-06T19:27:27.193012Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: init: 
> create=false
> IW 0 [2018-04-06T19:27:27.193428Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: 
> dir=MockDirectoryWrapper(RAMDirectory@79d3c690 
> lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@795a0c8b)
> index=_0(8.0.0):C1:fieldInfosGen=1:dvGen=1
> version=8.0.0
> analyzer=org.apache.lucene.analysis.MockAnalyzer
> ramBufferSizeMB=16.0
> maxBufferedDocs=503
> mergedSegmentWarmer=null
> delPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy
> commit=null
> openMode=CREATE_OR_APPEND
> similarity=org.apache.lucene.search.similarities.AssertingSimilarity
> mergeScheduler=org.apache.lucene.index.SerialMergeScheduler@2f3feff6
> codec=FastDecompressionCompressingStoredFields(storedFieldsFormat=CompressingStoredFieldsFormat(compressionMode=FAST_DECOMPRESSION,
>  chunkSize=8, maxDocsPerChunk=6, blockSize=201), 
> termVectorsFormat=CompressingTermVectorsFormat(compressionMode=FAST_DECOMPRESSION,
>  chunkSize=8, blockSize=201))
> infoStream=org.apache.lucene.util.PrintStreamInfoStream
> 

[jira] [Updated] (LUCENE-8243) IndexWriter might delete DV update files if addIndices are invovled

2018-04-06 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-8243:
---
Attachment: LUCENE-8243.patch

> IndexWriter might delete DV update files if addIndices are invovled
> ---
>
> Key: LUCENE-8243
> URL: https://issues.apache.org/jira/browse/LUCENE-8243
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 7.4, master (8.0)
>Reporter: Simon Willnauer
>Assignee: Michael McCandless
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: LUCENE-8243.patch, broken_dv_update.patch
>
>
> the attached test fails with this output:
> {noformat}
> /Library/Java/JavaVirtualMachines/jdk-9.0.1.jdk/Contents/Home/bin/java -ea 
> -Djava.security.egd=file:/dev/./urandom 
> -Didea.test.cyclic.buffer.size=1048576 -Dfile.encoding=UTF-8 -classpath 
> "/Applications/IntelliJ 
> IDEA.app/Contents/lib/idea_rt.jar:/Applications/IntelliJ 
> IDEA.app/Contents/plugins/junit/lib/junit-rt.jar:/Applications/IntelliJ 
> IDEA.app/Contents/plugins/junit/lib/junit5-rt.jar:/Users/simonw/projects/lucene-solr/idea-build/lucene/test-framework/classes/test:/Users/simonw/projects/lucene-solr/idea-build/lucene/test-framework/classes/java:/Users/simonw/projects/lucene-solr/lucene/test-framework/lib/junit-4.10.jar:/Users/simonw/projects/lucene-solr/lucene/test-framework/lib/randomizedtesting-runner-2.5.3.jar:/Users/simonw/projects/lucene-solr/idea-build/lucene/codecs/classes/java:/Users/simonw/projects/lucene-solr/idea-build/lucene/core/classes/java:/Users/simonw/projects/lucene-solr/idea-build/lucene/core/classes/test"
>  com.intellij.rt.execution.junit.JUnitStarter -ideVersion5 -junit4 
> org.apache.lucene.index.TestAddIndexes,testAddIndexesDVUpdate
> IFD 0 [2018-04-06T19:27:27.176036Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: init: 
> current segments file is "segments_1"; 
> deletionPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy@27cf18f0
> IFD 0 [2018-04-06T19:27:27.188066Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: init: 
> load commit "segments_1"
> IFD 0 [2018-04-06T19:27:27.189800Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: init: 
> seg=_0 set nextWriteDelGen=2 vs current=1
> IFD 0 [2018-04-06T19:27:27.190053Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: init: 
> removing unreferenced file "_0_1_Lucene70_0.dvd"
> IFD 0 [2018-04-06T19:27:27.190224Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: init: 
> removing unreferenced file "_0_1.fnm"
> IFD 0 [2018-04-06T19:27:27.190371Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: init: 
> removing unreferenced file "_0_1_Lucene70_0.dvm"
> IFD 0 [2018-04-06T19:27:27.190528Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: delete 
> [_0_1_Lucene70_0.dvd, _0_1.fnm, _0_1_Lucene70_0.dvm]
> IFD 0 [2018-04-06T19:27:27.192558Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: now 
> checkpoint "_0(8.0.0):C1:fieldInfosGen=1:dvGen=1" [1 segments ; isCommit = 
> false]
> IFD 0 [2018-04-06T19:27:27.192806Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: 0 msec 
> to checkpoint
> IW 0 [2018-04-06T19:27:27.193012Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: init: 
> create=false
> IW 0 [2018-04-06T19:27:27.193428Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: 
> dir=MockDirectoryWrapper(RAMDirectory@79d3c690 
> lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@795a0c8b)
> index=_0(8.0.0):C1:fieldInfosGen=1:dvGen=1
> version=8.0.0
> analyzer=org.apache.lucene.analysis.MockAnalyzer
> ramBufferSizeMB=16.0
> maxBufferedDocs=503
> mergedSegmentWarmer=null
> delPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy
> commit=null
> openMode=CREATE_OR_APPEND
> similarity=org.apache.lucene.search.similarities.AssertingSimilarity
> mergeScheduler=org.apache.lucene.index.SerialMergeScheduler@2f3feff6
> codec=FastDecompressionCompressingStoredFields(storedFieldsFormat=CompressingStoredFieldsFormat(compressionMode=FAST_DECOMPRESSION,
>  chunkSize=8, maxDocsPerChunk=6, blockSize=201), 
> termVectorsFormat=CompressingTermVectorsFormat(compressionMode=FAST_DECOMPRESSION,
>  chunkSize=8, blockSize=201))
> infoStream=org.apache.lucene.util.PrintStreamInfoStream
> mergePolicy=[TieredMergePolicy: maxMergeAtOnce=41, maxMergeAtOnceExplicit=44, 
> maxMergedSegmentMB=6.255859375, floorSegmentMB=0.38671875, 
> forceMergeDeletesPctAllowed=4.456652110760543, segmentsPerTier=31.0, 
> maxCFSSegmentSizeMB=8.796093022207999E12, noCFSRatio=0.877330376985384
> 

[jira] [Commented] (LUCENE-8238) WordDelimiterFilter javadocs reference nonexistent parameters

2018-04-06 Thread Mike Sokolov (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428860#comment-16428860
 ] 

Mike Sokolov commented on LUCENE-8238:
--

Thanks!

On Fri, Apr 6, 2018 at 3:35 PM, Michael McCandless (JIRA) 



> WordDelimiterFilter javadocs reference nonexistent parameters
> -
>
> Key: LUCENE-8238
> URL: https://issues.apache.org/jira/browse/LUCENE-8238
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Reporter: Mike Sokolov
>Priority: Minor
> Fix For: trunk
>
> Attachments: WDGF.patch
>
>
> The javadocs for both WDF and WDGF include a pretty detailed discussion about 
> the proper use of the "combinations" parameter, but no such parameter exists. 
> I don't know the history here, but it sounds as if the docs might be 
> referring to some previous incarnation of this filter, perhaps in the context 
> of some (now-defunct) Solr configuration.
> The docs should be updated to reference the actual option names that are 
> provided by the class today.
>  
> I've attached a patch



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8240) Make TokenStreamComponents.setReader public

2018-04-06 Thread Mike Sokolov (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428854#comment-16428854
 ] 

Mike Sokolov commented on LUCENE-8240:
--

Well, I don't have much more to say, but perhaps this background from our use 
case will sway you :) We did try breaking up our large catchall field into 
separate fields, since it is more natural for Lucene than having these 
sub-fields. However we have so many of them (100s) that the performance of our 
queries was poor due to the zillions of term queries we had to generate, and in 
the end smooshing all these little fields together into one big one, with this 
switchable analyzer ended up being the best tradeoff.

> Make TokenStreamComponents.setReader public
> ---
>
> Key: LUCENE-8240
> URL: https://issues.apache.org/jira/browse/LUCENE-8240
> Project: Lucene - Core
>  Issue Type: Wish
>  Components: modules/analysis
>Reporter: Mike Sokolov
>Priority: Major
> Attachments: SubFieldAnalyzer.java
>
>
> The simplest change for this would be to make 
> TokenStreamComponents.setReader() public. Another alternative would be to 
> provide a SubFieldAnalyzer along the lines of what is attached, although for 
> reasons given below I think this implementation is a little hacky and would 
> ideally be supported in a different way before making *that* part of a public 
> Lucene API.
> Exposing this method would allow a third-party extension to access it in 
> order to wrap TokenStreamComponents. My use case is a SubFieldAnalyzer 
> (attached, for reference) that applies different analysis to different 
> instances of a field. This supports a big "catch-all" field that has 
> different (index-time) text processing. The way we implement that is by 
> creating a TokenStreamComponents that wraps separate per-subfield components 
> and switches among them when setReader() is called.
> Why setReader()? This is the only part of the API where we can inject this 
> notion of subfields. setReader() is called with a Reader for each field 
> instance, and we supply a special Reader that identifies its subfield.
> This is a bit hacky – ideally subfields would be first-class citizens in the 
> Analyzer API, so eg there would be methods like 
> Analyzer.createComponents(String fieldName, String subFieldName), etc. 
> However this seems like a pretty big change for an experimental feature, so 
> it seems like an OK tradeoff to live with the Reader-per-subfield hack for 
> now.
> Currently SubFieldAnalyzer has to live in org.apache.lucene.analysis package 
> in order to call TokenStreamComponents.setReader (on a separate instance) and 
> propitiate java's code-hiding rules, which is awkward.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8243) IndexWriter might delete DV update files if addIndices are invovled

2018-04-06 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428842#comment-16428842
 ] 

Michael McCandless commented on LUCENE-8243:


I'll dig ...

> IndexWriter might delete DV update files if addIndices are invovled
> ---
>
> Key: LUCENE-8243
> URL: https://issues.apache.org/jira/browse/LUCENE-8243
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 7.4, master (8.0)
>Reporter: Simon Willnauer
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: broken_dv_update.patch
>
>
> the attached test fails with this output:
> {noformat}
> /Library/Java/JavaVirtualMachines/jdk-9.0.1.jdk/Contents/Home/bin/java -ea 
> -Djava.security.egd=file:/dev/./urandom 
> -Didea.test.cyclic.buffer.size=1048576 -Dfile.encoding=UTF-8 -classpath 
> "/Applications/IntelliJ 
> IDEA.app/Contents/lib/idea_rt.jar:/Applications/IntelliJ 
> IDEA.app/Contents/plugins/junit/lib/junit-rt.jar:/Applications/IntelliJ 
> IDEA.app/Contents/plugins/junit/lib/junit5-rt.jar:/Users/simonw/projects/lucene-solr/idea-build/lucene/test-framework/classes/test:/Users/simonw/projects/lucene-solr/idea-build/lucene/test-framework/classes/java:/Users/simonw/projects/lucene-solr/lucene/test-framework/lib/junit-4.10.jar:/Users/simonw/projects/lucene-solr/lucene/test-framework/lib/randomizedtesting-runner-2.5.3.jar:/Users/simonw/projects/lucene-solr/idea-build/lucene/codecs/classes/java:/Users/simonw/projects/lucene-solr/idea-build/lucene/core/classes/java:/Users/simonw/projects/lucene-solr/idea-build/lucene/core/classes/test"
>  com.intellij.rt.execution.junit.JUnitStarter -ideVersion5 -junit4 
> org.apache.lucene.index.TestAddIndexes,testAddIndexesDVUpdate
> IFD 0 [2018-04-06T19:27:27.176036Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: init: 
> current segments file is "segments_1"; 
> deletionPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy@27cf18f0
> IFD 0 [2018-04-06T19:27:27.188066Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: init: 
> load commit "segments_1"
> IFD 0 [2018-04-06T19:27:27.189800Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: init: 
> seg=_0 set nextWriteDelGen=2 vs current=1
> IFD 0 [2018-04-06T19:27:27.190053Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: init: 
> removing unreferenced file "_0_1_Lucene70_0.dvd"
> IFD 0 [2018-04-06T19:27:27.190224Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: init: 
> removing unreferenced file "_0_1.fnm"
> IFD 0 [2018-04-06T19:27:27.190371Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: init: 
> removing unreferenced file "_0_1_Lucene70_0.dvm"
> IFD 0 [2018-04-06T19:27:27.190528Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: delete 
> [_0_1_Lucene70_0.dvd, _0_1.fnm, _0_1_Lucene70_0.dvm]
> IFD 0 [2018-04-06T19:27:27.192558Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: now 
> checkpoint "_0(8.0.0):C1:fieldInfosGen=1:dvGen=1" [1 segments ; isCommit = 
> false]
> IFD 0 [2018-04-06T19:27:27.192806Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: 0 msec 
> to checkpoint
> IW 0 [2018-04-06T19:27:27.193012Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: init: 
> create=false
> IW 0 [2018-04-06T19:27:27.193428Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: 
> dir=MockDirectoryWrapper(RAMDirectory@79d3c690 
> lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@795a0c8b)
> index=_0(8.0.0):C1:fieldInfosGen=1:dvGen=1
> version=8.0.0
> analyzer=org.apache.lucene.analysis.MockAnalyzer
> ramBufferSizeMB=16.0
> maxBufferedDocs=503
> mergedSegmentWarmer=null
> delPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy
> commit=null
> openMode=CREATE_OR_APPEND
> similarity=org.apache.lucene.search.similarities.AssertingSimilarity
> mergeScheduler=org.apache.lucene.index.SerialMergeScheduler@2f3feff6
> codec=FastDecompressionCompressingStoredFields(storedFieldsFormat=CompressingStoredFieldsFormat(compressionMode=FAST_DECOMPRESSION,
>  chunkSize=8, maxDocsPerChunk=6, blockSize=201), 
> termVectorsFormat=CompressingTermVectorsFormat(compressionMode=FAST_DECOMPRESSION,
>  chunkSize=8, blockSize=201))
> infoStream=org.apache.lucene.util.PrintStreamInfoStream
> mergePolicy=[TieredMergePolicy: maxMergeAtOnce=41, maxMergeAtOnceExplicit=44, 
> maxMergedSegmentMB=6.255859375, floorSegmentMB=0.38671875, 
> forceMergeDeletesPctAllowed=4.456652110760543, segmentsPerTier=31.0, 
> maxCFSSegmentSizeMB=8.796093022207999E12, noCFSRatio=0.877330376985384
> 

[jira] [Assigned] (LUCENE-8243) IndexWriter might delete DV update files if addIndices are invovled

2018-04-06 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reassigned LUCENE-8243:
--

Assignee: Michael McCandless

> IndexWriter might delete DV update files if addIndices are invovled
> ---
>
> Key: LUCENE-8243
> URL: https://issues.apache.org/jira/browse/LUCENE-8243
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 7.4, master (8.0)
>Reporter: Simon Willnauer
>Assignee: Michael McCandless
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: broken_dv_update.patch
>
>
> the attached test fails with this output:
> {noformat}
> /Library/Java/JavaVirtualMachines/jdk-9.0.1.jdk/Contents/Home/bin/java -ea 
> -Djava.security.egd=file:/dev/./urandom 
> -Didea.test.cyclic.buffer.size=1048576 -Dfile.encoding=UTF-8 -classpath 
> "/Applications/IntelliJ 
> IDEA.app/Contents/lib/idea_rt.jar:/Applications/IntelliJ 
> IDEA.app/Contents/plugins/junit/lib/junit-rt.jar:/Applications/IntelliJ 
> IDEA.app/Contents/plugins/junit/lib/junit5-rt.jar:/Users/simonw/projects/lucene-solr/idea-build/lucene/test-framework/classes/test:/Users/simonw/projects/lucene-solr/idea-build/lucene/test-framework/classes/java:/Users/simonw/projects/lucene-solr/lucene/test-framework/lib/junit-4.10.jar:/Users/simonw/projects/lucene-solr/lucene/test-framework/lib/randomizedtesting-runner-2.5.3.jar:/Users/simonw/projects/lucene-solr/idea-build/lucene/codecs/classes/java:/Users/simonw/projects/lucene-solr/idea-build/lucene/core/classes/java:/Users/simonw/projects/lucene-solr/idea-build/lucene/core/classes/test"
>  com.intellij.rt.execution.junit.JUnitStarter -ideVersion5 -junit4 
> org.apache.lucene.index.TestAddIndexes,testAddIndexesDVUpdate
> IFD 0 [2018-04-06T19:27:27.176036Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: init: 
> current segments file is "segments_1"; 
> deletionPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy@27cf18f0
> IFD 0 [2018-04-06T19:27:27.188066Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: init: 
> load commit "segments_1"
> IFD 0 [2018-04-06T19:27:27.189800Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: init: 
> seg=_0 set nextWriteDelGen=2 vs current=1
> IFD 0 [2018-04-06T19:27:27.190053Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: init: 
> removing unreferenced file "_0_1_Lucene70_0.dvd"
> IFD 0 [2018-04-06T19:27:27.190224Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: init: 
> removing unreferenced file "_0_1.fnm"
> IFD 0 [2018-04-06T19:27:27.190371Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: init: 
> removing unreferenced file "_0_1_Lucene70_0.dvm"
> IFD 0 [2018-04-06T19:27:27.190528Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: delete 
> [_0_1_Lucene70_0.dvd, _0_1.fnm, _0_1_Lucene70_0.dvm]
> IFD 0 [2018-04-06T19:27:27.192558Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: now 
> checkpoint "_0(8.0.0):C1:fieldInfosGen=1:dvGen=1" [1 segments ; isCommit = 
> false]
> IFD 0 [2018-04-06T19:27:27.192806Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: 0 msec 
> to checkpoint
> IW 0 [2018-04-06T19:27:27.193012Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: init: 
> create=false
> IW 0 [2018-04-06T19:27:27.193428Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: 
> dir=MockDirectoryWrapper(RAMDirectory@79d3c690 
> lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@795a0c8b)
> index=_0(8.0.0):C1:fieldInfosGen=1:dvGen=1
> version=8.0.0
> analyzer=org.apache.lucene.analysis.MockAnalyzer
> ramBufferSizeMB=16.0
> maxBufferedDocs=503
> mergedSegmentWarmer=null
> delPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy
> commit=null
> openMode=CREATE_OR_APPEND
> similarity=org.apache.lucene.search.similarities.AssertingSimilarity
> mergeScheduler=org.apache.lucene.index.SerialMergeScheduler@2f3feff6
> codec=FastDecompressionCompressingStoredFields(storedFieldsFormat=CompressingStoredFieldsFormat(compressionMode=FAST_DECOMPRESSION,
>  chunkSize=8, maxDocsPerChunk=6, blockSize=201), 
> termVectorsFormat=CompressingTermVectorsFormat(compressionMode=FAST_DECOMPRESSION,
>  chunkSize=8, blockSize=201))
> infoStream=org.apache.lucene.util.PrintStreamInfoStream
> mergePolicy=[TieredMergePolicy: maxMergeAtOnce=41, maxMergeAtOnceExplicit=44, 
> maxMergedSegmentMB=6.255859375, floorSegmentMB=0.38671875, 
> forceMergeDeletesPctAllowed=4.456652110760543, segmentsPerTier=31.0, 
> maxCFSSegmentSizeMB=8.796093022207999E12, noCFSRatio=0.877330376985384
> 

[jira] [Commented] (LUCENE-8243) IndexWriter might delete DV update files if addIndices are invovled

2018-04-06 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428833#comment-16428833
 ] 

Simon Willnauer commented on LUCENE-8243:
-

this was found by Nhat Nguyen I just open the issue (https://github.com/dnhatn/)

> IndexWriter might delete DV update files if addIndices are invovled
> ---
>
> Key: LUCENE-8243
> URL: https://issues.apache.org/jira/browse/LUCENE-8243
> Project: Lucene - Core
>  Issue Type: Bug
>Affects Versions: 7.4, master (8.0)
>Reporter: Simon Willnauer
>Priority: Major
> Fix For: 7.4, master (8.0)
>
> Attachments: broken_dv_update.patch
>
>
> the attached test fails with this output:
> {noformat}
> /Library/Java/JavaVirtualMachines/jdk-9.0.1.jdk/Contents/Home/bin/java -ea 
> -Djava.security.egd=file:/dev/./urandom 
> -Didea.test.cyclic.buffer.size=1048576 -Dfile.encoding=UTF-8 -classpath 
> "/Applications/IntelliJ 
> IDEA.app/Contents/lib/idea_rt.jar:/Applications/IntelliJ 
> IDEA.app/Contents/plugins/junit/lib/junit-rt.jar:/Applications/IntelliJ 
> IDEA.app/Contents/plugins/junit/lib/junit5-rt.jar:/Users/simonw/projects/lucene-solr/idea-build/lucene/test-framework/classes/test:/Users/simonw/projects/lucene-solr/idea-build/lucene/test-framework/classes/java:/Users/simonw/projects/lucene-solr/lucene/test-framework/lib/junit-4.10.jar:/Users/simonw/projects/lucene-solr/lucene/test-framework/lib/randomizedtesting-runner-2.5.3.jar:/Users/simonw/projects/lucene-solr/idea-build/lucene/codecs/classes/java:/Users/simonw/projects/lucene-solr/idea-build/lucene/core/classes/java:/Users/simonw/projects/lucene-solr/idea-build/lucene/core/classes/test"
>  com.intellij.rt.execution.junit.JUnitStarter -ideVersion5 -junit4 
> org.apache.lucene.index.TestAddIndexes,testAddIndexesDVUpdate
> IFD 0 [2018-04-06T19:27:27.176036Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: init: 
> current segments file is "segments_1"; 
> deletionPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy@27cf18f0
> IFD 0 [2018-04-06T19:27:27.188066Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: init: 
> load commit "segments_1"
> IFD 0 [2018-04-06T19:27:27.189800Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: init: 
> seg=_0 set nextWriteDelGen=2 vs current=1
> IFD 0 [2018-04-06T19:27:27.190053Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: init: 
> removing unreferenced file "_0_1_Lucene70_0.dvd"
> IFD 0 [2018-04-06T19:27:27.190224Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: init: 
> removing unreferenced file "_0_1.fnm"
> IFD 0 [2018-04-06T19:27:27.190371Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: init: 
> removing unreferenced file "_0_1_Lucene70_0.dvm"
> IFD 0 [2018-04-06T19:27:27.190528Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: delete 
> [_0_1_Lucene70_0.dvd, _0_1.fnm, _0_1_Lucene70_0.dvm]
> IFD 0 [2018-04-06T19:27:27.192558Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: now 
> checkpoint "_0(8.0.0):C1:fieldInfosGen=1:dvGen=1" [1 segments ; isCommit = 
> false]
> IFD 0 [2018-04-06T19:27:27.192806Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: 0 msec 
> to checkpoint
> IW 0 [2018-04-06T19:27:27.193012Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: init: 
> create=false
> IW 0 [2018-04-06T19:27:27.193428Z; 
> TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: 
> dir=MockDirectoryWrapper(RAMDirectory@79d3c690 
> lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@795a0c8b)
> index=_0(8.0.0):C1:fieldInfosGen=1:dvGen=1
> version=8.0.0
> analyzer=org.apache.lucene.analysis.MockAnalyzer
> ramBufferSizeMB=16.0
> maxBufferedDocs=503
> mergedSegmentWarmer=null
> delPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy
> commit=null
> openMode=CREATE_OR_APPEND
> similarity=org.apache.lucene.search.similarities.AssertingSimilarity
> mergeScheduler=org.apache.lucene.index.SerialMergeScheduler@2f3feff6
> codec=FastDecompressionCompressingStoredFields(storedFieldsFormat=CompressingStoredFieldsFormat(compressionMode=FAST_DECOMPRESSION,
>  chunkSize=8, maxDocsPerChunk=6, blockSize=201), 
> termVectorsFormat=CompressingTermVectorsFormat(compressionMode=FAST_DECOMPRESSION,
>  chunkSize=8, blockSize=201))
> infoStream=org.apache.lucene.util.PrintStreamInfoStream
> mergePolicy=[TieredMergePolicy: maxMergeAtOnce=41, maxMergeAtOnceExplicit=44, 
> maxMergedSegmentMB=6.255859375, floorSegmentMB=0.38671875, 
> forceMergeDeletesPctAllowed=4.456652110760543, segmentsPerTier=31.0, 
> maxCFSSegmentSizeMB=8.796093022207999E12, noCFSRatio=0.877330376985384
> 

[jira] [Resolved] (LUCENE-8238) WordDelimiterFilter javadocs reference nonexistent parameters

2018-04-06 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-8238.

Resolution: Fixed

Thanks [~sokolov]!

> WordDelimiterFilter javadocs reference nonexistent parameters
> -
>
> Key: LUCENE-8238
> URL: https://issues.apache.org/jira/browse/LUCENE-8238
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Reporter: Mike Sokolov
>Priority: Minor
> Fix For: trunk
>
> Attachments: WDGF.patch
>
>
> The javadocs for both WDF and WDGF include a pretty detailed discussion about 
> the proper use of the "combinations" parameter, but no such parameter exists. 
> I don't know the history here, but it sounds as if the docs might be 
> referring to some previous incarnation of this filter, perhaps in the context 
> of some (now-defunct) Solr configuration.
> The docs should be updated to reference the actual option names that are 
> provided by the class today.
>  
> I've attached a patch



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8238) WordDelimiterFilter javadocs reference nonexistent parameters

2018-04-06 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428830#comment-16428830
 ] 

ASF subversion and git services commented on LUCENE-8238:
-

Commit 77e2ed277aa2e606fcd679d7f26e90225b7d3b4f in lucene-solr's branch 
refs/heads/branch_7x from Mike McCandless
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=77e2ed2 ]

LUCENE-8238: improve javadocs for WordDelimiterFilter and 
WordDelimiterGraphFilter


> WordDelimiterFilter javadocs reference nonexistent parameters
> -
>
> Key: LUCENE-8238
> URL: https://issues.apache.org/jira/browse/LUCENE-8238
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Reporter: Mike Sokolov
>Priority: Minor
> Fix For: trunk
>
> Attachments: WDGF.patch
>
>
> The javadocs for both WDF and WDGF include a pretty detailed discussion about 
> the proper use of the "combinations" parameter, but no such parameter exists. 
> I don't know the history here, but it sounds as if the docs might be 
> referring to some previous incarnation of this filter, perhaps in the context 
> of some (now-defunct) Solr configuration.
> The docs should be updated to reference the actual option names that are 
> provided by the class today.
>  
> I've attached a patch



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8243) IndexWriter might delete DV update files if addIndices are invovled

2018-04-06 Thread Simon Willnauer (JIRA)
Simon Willnauer created LUCENE-8243:
---

 Summary: IndexWriter might delete DV update files if addIndices 
are invovled
 Key: LUCENE-8243
 URL: https://issues.apache.org/jira/browse/LUCENE-8243
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 7.4, master (8.0)
Reporter: Simon Willnauer
 Fix For: 7.4, master (8.0)
 Attachments: broken_dv_update.patch

the attached test fails with this output:


{noformat}
/Library/Java/JavaVirtualMachines/jdk-9.0.1.jdk/Contents/Home/bin/java -ea 
-Djava.security.egd=file:/dev/./urandom -Didea.test.cyclic.buffer.size=1048576 
-Dfile.encoding=UTF-8 -classpath "/Applications/IntelliJ 
IDEA.app/Contents/lib/idea_rt.jar:/Applications/IntelliJ 
IDEA.app/Contents/plugins/junit/lib/junit-rt.jar:/Applications/IntelliJ 
IDEA.app/Contents/plugins/junit/lib/junit5-rt.jar:/Users/simonw/projects/lucene-solr/idea-build/lucene/test-framework/classes/test:/Users/simonw/projects/lucene-solr/idea-build/lucene/test-framework/classes/java:/Users/simonw/projects/lucene-solr/lucene/test-framework/lib/junit-4.10.jar:/Users/simonw/projects/lucene-solr/lucene/test-framework/lib/randomizedtesting-runner-2.5.3.jar:/Users/simonw/projects/lucene-solr/idea-build/lucene/codecs/classes/java:/Users/simonw/projects/lucene-solr/idea-build/lucene/core/classes/java:/Users/simonw/projects/lucene-solr/idea-build/lucene/core/classes/test"
 com.intellij.rt.execution.junit.JUnitStarter -ideVersion5 -junit4 
org.apache.lucene.index.TestAddIndexes,testAddIndexesDVUpdate
IFD 0 [2018-04-06T19:27:27.176036Z; 
TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: init: 
current segments file is "segments_1"; 
deletionPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy@27cf18f0
IFD 0 [2018-04-06T19:27:27.188066Z; 
TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: init: load 
commit "segments_1"
IFD 0 [2018-04-06T19:27:27.189800Z; 
TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: init: 
seg=_0 set nextWriteDelGen=2 vs current=1
IFD 0 [2018-04-06T19:27:27.190053Z; 
TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: init: 
removing unreferenced file "_0_1_Lucene70_0.dvd"
IFD 0 [2018-04-06T19:27:27.190224Z; 
TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: init: 
removing unreferenced file "_0_1.fnm"
IFD 0 [2018-04-06T19:27:27.190371Z; 
TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: init: 
removing unreferenced file "_0_1_Lucene70_0.dvm"
IFD 0 [2018-04-06T19:27:27.190528Z; 
TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: delete 
[_0_1_Lucene70_0.dvd, _0_1.fnm, _0_1_Lucene70_0.dvm]
IFD 0 [2018-04-06T19:27:27.192558Z; 
TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: now 
checkpoint "_0(8.0.0):C1:fieldInfosGen=1:dvGen=1" [1 segments ; isCommit = 
false]
IFD 0 [2018-04-06T19:27:27.192806Z; 
TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: 0 msec to 
checkpoint
IW 0 [2018-04-06T19:27:27.193012Z; 
TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: init: 
create=false
IW 0 [2018-04-06T19:27:27.193428Z; 
TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: 
dir=MockDirectoryWrapper(RAMDirectory@79d3c690 
lockFactory=org.apache.lucene.store.SingleInstanceLockFactory@795a0c8b)
index=_0(8.0.0):C1:fieldInfosGen=1:dvGen=1
version=8.0.0
analyzer=org.apache.lucene.analysis.MockAnalyzer
ramBufferSizeMB=16.0
maxBufferedDocs=503
mergedSegmentWarmer=null
delPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy
commit=null
openMode=CREATE_OR_APPEND
similarity=org.apache.lucene.search.similarities.AssertingSimilarity
mergeScheduler=org.apache.lucene.index.SerialMergeScheduler@2f3feff6
codec=FastDecompressionCompressingStoredFields(storedFieldsFormat=CompressingStoredFieldsFormat(compressionMode=FAST_DECOMPRESSION,
 chunkSize=8, maxDocsPerChunk=6, blockSize=201), 
termVectorsFormat=CompressingTermVectorsFormat(compressionMode=FAST_DECOMPRESSION,
 chunkSize=8, blockSize=201))
infoStream=org.apache.lucene.util.PrintStreamInfoStream
mergePolicy=[TieredMergePolicy: maxMergeAtOnce=41, maxMergeAtOnceExplicit=44, 
maxMergedSegmentMB=6.255859375, floorSegmentMB=0.38671875, 
forceMergeDeletesPctAllowed=4.456652110760543, segmentsPerTier=31.0, 
maxCFSSegmentSizeMB=8.796093022207999E12, noCFSRatio=0.877330376985384
indexerThreadPool=org.apache.lucene.index.DocumentsWriterPerThreadPool@257ebcdb
readerPooling=true
perThreadHardLimitMB=1945
useCompoundFile=true
commitOnClose=true
indexSort=null
checkPendingFlushOnUpdate=true
softDeletesField=null
writer=org.apache.lucene.index.IndexWriter@17b77ed5

IW 0 [2018-04-06T19:27:27.194085Z; 
TEST-TestAddIndexes.testAddIndexesDVUpdate-seed#[9F04EE6B720B6BFD]]: 
MMapDirectory.UNMAP_SUPPORTED=true
IW 0 [2018-04-06T19:27:27.194347Z; 

[jira] [Commented] (LUCENE-8238) WordDelimiterFilter javadocs reference nonexistent parameters

2018-04-06 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428817#comment-16428817
 ] 

ASF subversion and git services commented on LUCENE-8238:
-

Commit 0f53adbee49015aa01e8f66945f82e88a9172c7c in lucene-solr's branch 
refs/heads/master from Mike McCandless
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=0f53adb ]

LUCENE-8238: improve javadocs for WordDelimiterFilter and 
WordDelimiterGraphFilter


> WordDelimiterFilter javadocs reference nonexistent parameters
> -
>
> Key: LUCENE-8238
> URL: https://issues.apache.org/jira/browse/LUCENE-8238
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Reporter: Mike Sokolov
>Priority: Minor
> Fix For: trunk
>
> Attachments: WDGF.patch
>
>
> The javadocs for both WDF and WDGF include a pretty detailed discussion about 
> the proper use of the "combinations" parameter, but no such parameter exists. 
> I don't know the history here, but it sounds as if the docs might be 
> referring to some previous incarnation of this filter, perhaps in the context 
> of some (now-defunct) Solr configuration.
> The docs should be updated to reference the actual option names that are 
> provided by the class today.
>  
> I've attached a patch



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-NightlyTests-master - Build # 1523 - Still Unstable

2018-04-06 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-master/1523/

5 tests failed.
FAILED:  org.apache.solr.cloud.DistribCursorPagingTest.test

Error Message:
Could not load collection from ZK: collection1

Stack Trace:
org.apache.solr.common.SolrException: Could not load collection from ZK: 
collection1
at 
__randomizedtesting.SeedInfo.seed([A5DE8909876BEF99:2D8AB6D329978261]:0)
at 
org.apache.solr.common.cloud.ZkStateReader.getCollectionLive(ZkStateReader.java:1250)
at 
org.apache.solr.common.cloud.ZkStateReader$LazyCollectionRef.get(ZkStateReader.java:679)
at 
org.apache.solr.common.cloud.ClusterState$CollectionRef.get(ClusterState.java:386)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.getDocCollection(CloudSolrClient.java:1208)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:851)
at 
org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:819)
at 
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:942)
at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:957)
at 
org.apache.solr.cloud.DistribCursorPagingTest.assertFullWalkNoDups(DistribCursorPagingTest.java:718)
at 
org.apache.solr.cloud.DistribCursorPagingTest.doRandomSortsOnLargeIndex(DistribCursorPagingTest.java:593)
at 
org.apache.solr.cloud.DistribCursorPagingTest.test(DistribCursorPagingTest.java:93)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:934)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:970)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:984)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:993)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:968)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:943)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:829)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:879)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:890)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 

[jira] [Commented] (LUCENE-8226) Don't use MemoryCodec for nightly runs of TestIndexSorting

2018-04-06 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428801#comment-16428801
 ] 

Michael McCandless commented on LUCENE-8226:


[~romseygeek] I think it's fine to reduce the doc count always to the 
non-nightly case; I don't see why we need 100K docs.

> Don't use MemoryCodec for nightly runs of TestIndexSorting
> --
>
> Key: LUCENE-8226
> URL: https://issues.apache.org/jira/browse/LUCENE-8226
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8226.patch, LUCENE-8226.patch
>
>
> Nightly runs of TestIndexSorting fail occasionally with OOM (see 
> [https://builds.apache.org/job/Lucene-Solr-NightlyTests-7.x/183/] for a 
> recent example, and it's been appearing in Erick's BadApple report too).  It 
> looks as this is normally due to the combination of a large docset and 
> MemoryCodec.  We should suppress MemoryCodec for these tests, on nightly runs 
> only if possible)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Welcome Cao Mạnh Đạt to the PMC

2018-04-06 Thread Michael McCandless
Welcome Đạt!

Mike McCandless

http://blog.mikemccandless.com

On Mon, Apr 2, 2018 at 3:50 PM, Adrien Grand  wrote:

> Fixing the subject of the email.
>
> Le lun. 2 avr. 2018 à 21:48, Adrien Grand  a écrit :
>
>> I am pleased to announce that Cao Mạnh Đạt has accepted the PMC's
>> invitation to join.
>>
>> Welcome Đạt!
>>
>


[jira] [Commented] (SOLR-12199) TestReplicationHandler.doTestRepeater(): TEST_PORT interpolation failure: Server refused connection at: http://127.0.0.1:TEST_PORT/solr

2018-04-06 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428792#comment-16428792
 ] 

ASF subversion and git services commented on SOLR-12199:


Commit 5c37b07a3d53e64c2f0cebd33eb7024d693d62f5 in lucene-solr's branch 
refs/heads/master from [~steve_rowe]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=5c37b07 ]

SOLR-12199: TestReplicationHandler.doTestRepeater(): TEST_PORT interpolation 
failure: Server refused connection at: http://127.0.0.1:TEST_PORT/solr


> TestReplicationHandler.doTestRepeater(): TEST_PORT interpolation failure: 
> Server refused connection at: http://127.0.0.1:TEST_PORT/solr 
> 
>
> Key: SOLR-12199
> URL: https://issues.apache.org/jira/browse/SOLR-12199
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Steve Rowe
>Assignee: Steve Rowe
>Priority: Major
> Fix For: 7.4
>
> Attachments: SOLR-12199.patch
>
>
> From [https://jenkins.thetaphi.de/job/Lucene-Solr-7.x-MacOSX/551/]:
> {noformat}
>[junit4]   2> 750759 INFO  
> (TEST-TestReplicationHandler.doTestRepeater-seed#[7078A21248E0962E]) [] 
> o.a.s.h.TestReplicationHandler Waited for 0ms and found 3 docs
>[junit4]   2> 750760 INFO  (qtp351238853-8844) [x:collection1] 
> o.a.s.c.S.Request [collection1]  webapp=/solr path=/replication 
> params={_trace=getDetails=/replication=javabin=2=details}
>  status=0 QTime=1
>[junit4]   2> 750761 INFO  (qtp351238853-8846) [x:collection1] 
> o.a.s.c.S.Request [collection1]  webapp=/solr path=/replication 
> params={_trace=getDetails=/replication=javabin=2=details}
>  status=0 QTime=0
>[junit4]   2> 750769 WARN  (qtp738580099-8901) [x:collection1] 
> o.a.s.h.ReplicationHandler Exception while invoking 'details' method for 
> replication on master 
>[junit4]   2> org.apache.solr.client.solrj.SolrServerException: Server 
> refused connection at: http://127.0.0.1:TEST_PORT/solr
>[junit4]   2>  at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:650)
>  ~[java/:?]
>[junit4]   2>  at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
>  ~[java/:?]
>[junit4]   2>  at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
>  ~[java/:?]
>[junit4]   2>  at 
> org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219) 
> ~[java/:?]
>[junit4]   2>  at 
> org.apache.solr.handler.IndexFetcher.getDetails(IndexFetcher.java:1852) 
> ~[java/:?]
> {noformat}
> I looked at {{TestReplicationHandler}} Jenkins failure logs from the last 
> couple days, and every single one has the same pattern: a WARN message from 
> {{doTestRepeater()}} about failure to connect with a URL containing port 
> {{TEST_PORT}} (rather than a numeric value).
> On the dev list Dawid Weiss 
> [wrote|https://lists.apache.org/thread.html/b9606be4ae70e58b4be8c3438e92f69361d59b4de566ec707dda3f24@%3Cdev.lucene.apache.org%3E]:
> {quote}
> I see this in TestReplicationHandler:
> {code:java}
>  /**
>   * character copy of file using UTF-8. If port is non-null, will be
> substituted any time "TEST_PORT" is found.
>   */
>  private static void copyFile(File src, File dst, Integer port,
> boolean internalCompression) throws IOException {
>BufferedReader in = new BufferedReader(new InputStreamReader(new
> FileInputStream(src), StandardCharsets.UTF_8));
>Writer out = new OutputStreamWriter(new FileOutputStream(dst),
> StandardCharsets.UTF_8);
>for (String line = in.readLine(); null != line; line = in.readLine()) {
>  if (null != port)
>line = line.replace("TEST_PORT", port.toString());
> {code}
> So it seems port is allowed to be null and then won't be substituted.
> This looks like a bug in the test scaffolding: this situation
> shouldn't be allowed; if a port cannot be acquired the test should
> fail much sooner?
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12199) TestReplicationHandler.doTestRepeater(): TEST_PORT interpolation failure: Server refused connection at: http://127.0.0.1:TEST_PORT/solr

2018-04-06 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428791#comment-16428791
 ] 

ASF subversion and git services commented on SOLR-12199:


Commit 1d8313ca8de2b9f5297b337f3156079be270dc6d in lucene-solr's branch 
refs/heads/branch_7x from [~steve_rowe]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1d8313c ]

SOLR-12199: TestReplicationHandler.doTestRepeater(): TEST_PORT interpolation 
failure: Server refused connection at: http://127.0.0.1:TEST_PORT/solr


> TestReplicationHandler.doTestRepeater(): TEST_PORT interpolation failure: 
> Server refused connection at: http://127.0.0.1:TEST_PORT/solr 
> 
>
> Key: SOLR-12199
> URL: https://issues.apache.org/jira/browse/SOLR-12199
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Steve Rowe
>Assignee: Steve Rowe
>Priority: Major
> Fix For: 7.4
>
> Attachments: SOLR-12199.patch
>
>
> From [https://jenkins.thetaphi.de/job/Lucene-Solr-7.x-MacOSX/551/]:
> {noformat}
>[junit4]   2> 750759 INFO  
> (TEST-TestReplicationHandler.doTestRepeater-seed#[7078A21248E0962E]) [] 
> o.a.s.h.TestReplicationHandler Waited for 0ms and found 3 docs
>[junit4]   2> 750760 INFO  (qtp351238853-8844) [x:collection1] 
> o.a.s.c.S.Request [collection1]  webapp=/solr path=/replication 
> params={_trace=getDetails=/replication=javabin=2=details}
>  status=0 QTime=1
>[junit4]   2> 750761 INFO  (qtp351238853-8846) [x:collection1] 
> o.a.s.c.S.Request [collection1]  webapp=/solr path=/replication 
> params={_trace=getDetails=/replication=javabin=2=details}
>  status=0 QTime=0
>[junit4]   2> 750769 WARN  (qtp738580099-8901) [x:collection1] 
> o.a.s.h.ReplicationHandler Exception while invoking 'details' method for 
> replication on master 
>[junit4]   2> org.apache.solr.client.solrj.SolrServerException: Server 
> refused connection at: http://127.0.0.1:TEST_PORT/solr
>[junit4]   2>  at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:650)
>  ~[java/:?]
>[junit4]   2>  at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
>  ~[java/:?]
>[junit4]   2>  at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
>  ~[java/:?]
>[junit4]   2>  at 
> org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219) 
> ~[java/:?]
>[junit4]   2>  at 
> org.apache.solr.handler.IndexFetcher.getDetails(IndexFetcher.java:1852) 
> ~[java/:?]
> {noformat}
> I looked at {{TestReplicationHandler}} Jenkins failure logs from the last 
> couple days, and every single one has the same pattern: a WARN message from 
> {{doTestRepeater()}} about failure to connect with a URL containing port 
> {{TEST_PORT}} (rather than a numeric value).
> On the dev list Dawid Weiss 
> [wrote|https://lists.apache.org/thread.html/b9606be4ae70e58b4be8c3438e92f69361d59b4de566ec707dda3f24@%3Cdev.lucene.apache.org%3E]:
> {quote}
> I see this in TestReplicationHandler:
> {code:java}
>  /**
>   * character copy of file using UTF-8. If port is non-null, will be
> substituted any time "TEST_PORT" is found.
>   */
>  private static void copyFile(File src, File dst, Integer port,
> boolean internalCompression) throws IOException {
>BufferedReader in = new BufferedReader(new InputStreamReader(new
> FileInputStream(src), StandardCharsets.UTF_8));
>Writer out = new OutputStreamWriter(new FileOutputStream(dst),
> StandardCharsets.UTF_8);
>for (String line = in.readLine(); null != line; line = in.readLine()) {
>  if (null != port)
>line = line.replace("TEST_PORT", port.toString());
> {code}
> So it seems port is allowed to be null and then won't be substituted.
> This looks like a bug in the test scaffolding: this situation
> shouldn't be allowed; if a port cannot be acquired the test should
> fail much sooner?
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-12199) TestReplicationHandler.doTestRepeater(): TEST_PORT interpolation failure: Server refused connection at: http://127.0.0.1:TEST_PORT/solr

2018-04-06 Thread Steve Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-12199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe resolved SOLR-12199.
---
   Resolution: Fixed
Fix Version/s: 7.4

> TestReplicationHandler.doTestRepeater(): TEST_PORT interpolation failure: 
> Server refused connection at: http://127.0.0.1:TEST_PORT/solr 
> 
>
> Key: SOLR-12199
> URL: https://issues.apache.org/jira/browse/SOLR-12199
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Steve Rowe
>Assignee: Steve Rowe
>Priority: Major
> Fix For: 7.4
>
> Attachments: SOLR-12199.patch
>
>
> From [https://jenkins.thetaphi.de/job/Lucene-Solr-7.x-MacOSX/551/]:
> {noformat}
>[junit4]   2> 750759 INFO  
> (TEST-TestReplicationHandler.doTestRepeater-seed#[7078A21248E0962E]) [] 
> o.a.s.h.TestReplicationHandler Waited for 0ms and found 3 docs
>[junit4]   2> 750760 INFO  (qtp351238853-8844) [x:collection1] 
> o.a.s.c.S.Request [collection1]  webapp=/solr path=/replication 
> params={_trace=getDetails=/replication=javabin=2=details}
>  status=0 QTime=1
>[junit4]   2> 750761 INFO  (qtp351238853-8846) [x:collection1] 
> o.a.s.c.S.Request [collection1]  webapp=/solr path=/replication 
> params={_trace=getDetails=/replication=javabin=2=details}
>  status=0 QTime=0
>[junit4]   2> 750769 WARN  (qtp738580099-8901) [x:collection1] 
> o.a.s.h.ReplicationHandler Exception while invoking 'details' method for 
> replication on master 
>[junit4]   2> org.apache.solr.client.solrj.SolrServerException: Server 
> refused connection at: http://127.0.0.1:TEST_PORT/solr
>[junit4]   2>  at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:650)
>  ~[java/:?]
>[junit4]   2>  at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
>  ~[java/:?]
>[junit4]   2>  at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
>  ~[java/:?]
>[junit4]   2>  at 
> org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219) 
> ~[java/:?]
>[junit4]   2>  at 
> org.apache.solr.handler.IndexFetcher.getDetails(IndexFetcher.java:1852) 
> ~[java/:?]
> {noformat}
> I looked at {{TestReplicationHandler}} Jenkins failure logs from the last 
> couple days, and every single one has the same pattern: a WARN message from 
> {{doTestRepeater()}} about failure to connect with a URL containing port 
> {{TEST_PORT}} (rather than a numeric value).
> On the dev list Dawid Weiss 
> [wrote|https://lists.apache.org/thread.html/b9606be4ae70e58b4be8c3438e92f69361d59b4de566ec707dda3f24@%3Cdev.lucene.apache.org%3E]:
> {quote}
> I see this in TestReplicationHandler:
> {code:java}
>  /**
>   * character copy of file using UTF-8. If port is non-null, will be
> substituted any time "TEST_PORT" is found.
>   */
>  private static void copyFile(File src, File dst, Integer port,
> boolean internalCompression) throws IOException {
>BufferedReader in = new BufferedReader(new InputStreamReader(new
> FileInputStream(src), StandardCharsets.UTF_8));
>Writer out = new OutputStreamWriter(new FileOutputStream(dst),
> StandardCharsets.UTF_8);
>for (String line = in.readLine(); null != line; line = in.readLine()) {
>  if (null != port)
>line = line.replace("TEST_PORT", port.toString());
> {code}
> So it seems port is allowed to be null and then won't be substituted.
> This looks like a bug in the test scaffolding: this situation
> shouldn't be allowed; if a port cannot be acquired the test should
> fail much sooner?
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8238) WordDelimiterFilter javadocs reference nonexistent parameters

2018-04-06 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428788#comment-16428788
 ] 

Michael McCandless commented on LUCENE-8238:


Thanks [~sokolov], this looks like a nice doc improvement ... I'll run 
precommit and push.

> WordDelimiterFilter javadocs reference nonexistent parameters
> -
>
> Key: LUCENE-8238
> URL: https://issues.apache.org/jira/browse/LUCENE-8238
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Reporter: Mike Sokolov
>Priority: Minor
> Fix For: trunk
>
> Attachments: WDGF.patch
>
>
> The javadocs for both WDF and WDGF include a pretty detailed discussion about 
> the proper use of the "combinations" parameter, but no such parameter exists. 
> I don't know the history here, but it sounds as if the docs might be 
> referring to some previous incarnation of this filter, perhaps in the context 
> of some (now-defunct) Solr configuration.
> The docs should be updated to reference the actual option names that are 
> provided by the class today.
>  
> I've attached a patch



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12199) TestReplicationHandler.doTestRepeater(): TEST_PORT interpolation failure: Server refused connection at: http://127.0.0.1:TEST_PORT/solr

2018-04-06 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428766#comment-16428766
 ] 

Steve Rowe commented on SOLR-12199:
---

I attached a patch that includes the test port on the repeater {{SolrInstance}} 
constructor (previously was specified as null), and also insures that the 
master port is set as the test port prior to copying all non-master config 
files (this was already done in a few places, but not all).

I beasted 100 iterations of {{TestReplicationHandler}} with the patch, 31 of 
which failed for various reasons, and the WARN log entries about {{TEST_PORT}} 
did not occur in any of the logs.

Committing shortly.

> TestReplicationHandler.doTestRepeater(): TEST_PORT interpolation failure: 
> Server refused connection at: http://127.0.0.1:TEST_PORT/solr 
> 
>
> Key: SOLR-12199
> URL: https://issues.apache.org/jira/browse/SOLR-12199
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Steve Rowe
>Assignee: Steve Rowe
>Priority: Major
> Attachments: SOLR-12199.patch
>
>
> From [https://jenkins.thetaphi.de/job/Lucene-Solr-7.x-MacOSX/551/]:
> {noformat}
>[junit4]   2> 750759 INFO  
> (TEST-TestReplicationHandler.doTestRepeater-seed#[7078A21248E0962E]) [] 
> o.a.s.h.TestReplicationHandler Waited for 0ms and found 3 docs
>[junit4]   2> 750760 INFO  (qtp351238853-8844) [x:collection1] 
> o.a.s.c.S.Request [collection1]  webapp=/solr path=/replication 
> params={_trace=getDetails=/replication=javabin=2=details}
>  status=0 QTime=1
>[junit4]   2> 750761 INFO  (qtp351238853-8846) [x:collection1] 
> o.a.s.c.S.Request [collection1]  webapp=/solr path=/replication 
> params={_trace=getDetails=/replication=javabin=2=details}
>  status=0 QTime=0
>[junit4]   2> 750769 WARN  (qtp738580099-8901) [x:collection1] 
> o.a.s.h.ReplicationHandler Exception while invoking 'details' method for 
> replication on master 
>[junit4]   2> org.apache.solr.client.solrj.SolrServerException: Server 
> refused connection at: http://127.0.0.1:TEST_PORT/solr
>[junit4]   2>  at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:650)
>  ~[java/:?]
>[junit4]   2>  at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
>  ~[java/:?]
>[junit4]   2>  at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
>  ~[java/:?]
>[junit4]   2>  at 
> org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219) 
> ~[java/:?]
>[junit4]   2>  at 
> org.apache.solr.handler.IndexFetcher.getDetails(IndexFetcher.java:1852) 
> ~[java/:?]
> {noformat}
> I looked at {{TestReplicationHandler}} Jenkins failure logs from the last 
> couple days, and every single one has the same pattern: a WARN message from 
> {{doTestRepeater()}} about failure to connect with a URL containing port 
> {{TEST_PORT}} (rather than a numeric value).
> On the dev list Dawid Weiss 
> [wrote|https://lists.apache.org/thread.html/b9606be4ae70e58b4be8c3438e92f69361d59b4de566ec707dda3f24@%3Cdev.lucene.apache.org%3E]:
> {quote}
> I see this in TestReplicationHandler:
> {code:java}
>  /**
>   * character copy of file using UTF-8. If port is non-null, will be
> substituted any time "TEST_PORT" is found.
>   */
>  private static void copyFile(File src, File dst, Integer port,
> boolean internalCompression) throws IOException {
>BufferedReader in = new BufferedReader(new InputStreamReader(new
> FileInputStream(src), StandardCharsets.UTF_8));
>Writer out = new OutputStreamWriter(new FileOutputStream(dst),
> StandardCharsets.UTF_8);
>for (String line = in.readLine(); null != line; line = in.readLine()) {
>  if (null != port)
>line = line.replace("TEST_PORT", port.toString());
> {code}
> So it seems port is allowed to be null and then won't be substituted.
> This looks like a bug in the test scaffolding: this situation
> shouldn't be allowed; if a port cannot be acquired the test should
> fail much sooner?
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-12199) TestReplicationHandler.doTestRepeater(): TEST_PORT interpolation failure: Server refused connection at: http://127.0.0.1:TEST_PORT/solr

2018-04-06 Thread Steve Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-12199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe updated SOLR-12199:
--
Attachment: SOLR-12199.patch

> TestReplicationHandler.doTestRepeater(): TEST_PORT interpolation failure: 
> Server refused connection at: http://127.0.0.1:TEST_PORT/solr 
> 
>
> Key: SOLR-12199
> URL: https://issues.apache.org/jira/browse/SOLR-12199
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Steve Rowe
>Assignee: Steve Rowe
>Priority: Major
> Attachments: SOLR-12199.patch
>
>
> From [https://jenkins.thetaphi.de/job/Lucene-Solr-7.x-MacOSX/551/]:
> {noformat}
>[junit4]   2> 750759 INFO  
> (TEST-TestReplicationHandler.doTestRepeater-seed#[7078A21248E0962E]) [] 
> o.a.s.h.TestReplicationHandler Waited for 0ms and found 3 docs
>[junit4]   2> 750760 INFO  (qtp351238853-8844) [x:collection1] 
> o.a.s.c.S.Request [collection1]  webapp=/solr path=/replication 
> params={_trace=getDetails=/replication=javabin=2=details}
>  status=0 QTime=1
>[junit4]   2> 750761 INFO  (qtp351238853-8846) [x:collection1] 
> o.a.s.c.S.Request [collection1]  webapp=/solr path=/replication 
> params={_trace=getDetails=/replication=javabin=2=details}
>  status=0 QTime=0
>[junit4]   2> 750769 WARN  (qtp738580099-8901) [x:collection1] 
> o.a.s.h.ReplicationHandler Exception while invoking 'details' method for 
> replication on master 
>[junit4]   2> org.apache.solr.client.solrj.SolrServerException: Server 
> refused connection at: http://127.0.0.1:TEST_PORT/solr
>[junit4]   2>  at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:650)
>  ~[java/:?]
>[junit4]   2>  at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
>  ~[java/:?]
>[junit4]   2>  at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
>  ~[java/:?]
>[junit4]   2>  at 
> org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219) 
> ~[java/:?]
>[junit4]   2>  at 
> org.apache.solr.handler.IndexFetcher.getDetails(IndexFetcher.java:1852) 
> ~[java/:?]
> {noformat}
> I looked at {{TestReplicationHandler}} Jenkins failure logs from the last 
> couple days, and every single one has the same pattern: a WARN message from 
> {{doTestRepeater()}} about failure to connect with a URL containing port 
> {{TEST_PORT}} (rather than a numeric value).
> On the dev list Dawid Weiss 
> [wrote|https://lists.apache.org/thread.html/b9606be4ae70e58b4be8c3438e92f69361d59b4de566ec707dda3f24@%3Cdev.lucene.apache.org%3E]:
> {quote}
> I see this in TestReplicationHandler:
> {code:java}
>  /**
>   * character copy of file using UTF-8. If port is non-null, will be
> substituted any time "TEST_PORT" is found.
>   */
>  private static void copyFile(File src, File dst, Integer port,
> boolean internalCompression) throws IOException {
>BufferedReader in = new BufferedReader(new InputStreamReader(new
> FileInputStream(src), StandardCharsets.UTF_8));
>Writer out = new OutputStreamWriter(new FileOutputStream(dst),
> StandardCharsets.UTF_8);
>for (String line = in.readLine(); null != line; line = in.readLine()) {
>  if (null != port)
>line = line.replace("TEST_PORT", port.toString());
> {code}
> So it seems port is allowed to be null and then won't be substituted.
> This looks like a bug in the test scaffolding: this situation
> shouldn't be allowed; if a port cannot be acquired the test should
> fail much sooner?
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-Solr-7.x-MacOSX (64bit/jdk-9) - Build # 551 - Unstable!

2018-04-06 Thread Steve Rowe
I created SOLR-12199.

--
Steve
www.lucidworks.com

> On Apr 2, 2018, at 2:31 PM, Dawid Weiss  wrote:
> 
> I see this in TestReplicationHandler:
> 
>  /**
>   * character copy of file using UTF-8. If port is non-null, will be
> substituted any time "TEST_PORT" is found.
>   */
>  private static void copyFile(File src, File dst, Integer port,
> boolean internalCompression) throws IOException {
>BufferedReader in = new BufferedReader(new InputStreamReader(new
> FileInputStream(src), StandardCharsets.UTF_8));
>Writer out = new OutputStreamWriter(new FileOutputStream(dst),
> StandardCharsets.UTF_8);
> 
>for (String line = in.readLine(); null != line; line = in.readLine()) {
> 
>  if (null != port)
>line = line.replace("TEST_PORT", port.toString());
> 
> So it seems port is allowed to be null and then won't be substituted.
> This looks like a bug in the test scaffolding: this situation
> shouldn't be allowed; if a port cannot be acquired the test should
> fail much sooner?
> 
> Dawid
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-12199) TestReplicationHandler.doTestRepeater(): TEST_PORT interpolation failure: Server refused connection at: http://127.0.0.1:TEST_PORT/solr

2018-04-06 Thread Steve Rowe (JIRA)
Steve Rowe created SOLR-12199:
-

 Summary: TestReplicationHandler.doTestRepeater(): TEST_PORT 
interpolation failure: Server refused connection at: 
http://127.0.0.1:TEST_PORT/solr 
 Key: SOLR-12199
 URL: https://issues.apache.org/jira/browse/SOLR-12199
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Steve Rowe
Assignee: Steve Rowe


>From [https://jenkins.thetaphi.de/job/Lucene-Solr-7.x-MacOSX/551/]:

{noformat}
   [junit4]   2> 750759 INFO  
(TEST-TestReplicationHandler.doTestRepeater-seed#[7078A21248E0962E]) [] 
o.a.s.h.TestReplicationHandler Waited for 0ms and found 3 docs
   [junit4]   2> 750760 INFO  (qtp351238853-8844) [x:collection1] 
o.a.s.c.S.Request [collection1]  webapp=/solr path=/replication 
params={_trace=getDetails=/replication=javabin=2=details} 
status=0 QTime=1
   [junit4]   2> 750761 INFO  (qtp351238853-8846) [x:collection1] 
o.a.s.c.S.Request [collection1]  webapp=/solr path=/replication 
params={_trace=getDetails=/replication=javabin=2=details} 
status=0 QTime=0
   [junit4]   2> 750769 WARN  (qtp738580099-8901) [x:collection1] 
o.a.s.h.ReplicationHandler Exception while invoking 'details' method for 
replication on master 
   [junit4]   2> org.apache.solr.client.solrj.SolrServerException: Server 
refused connection at: http://127.0.0.1:TEST_PORT/solr
   [junit4]   2>at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:650)
 ~[java/:?]
   [junit4]   2>at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
 ~[java/:?]
   [junit4]   2>at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
 ~[java/:?]
   [junit4]   2>at 
org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219) ~[java/:?]
   [junit4]   2>at 
org.apache.solr.handler.IndexFetcher.getDetails(IndexFetcher.java:1852) 
~[java/:?]
{noformat}

I looked at {{TestReplicationHandler}} Jenkins failure logs from the last 
couple days, and every single one has the same pattern: a WARN message from 
{{doTestRepeater()}} about failure to connect with a URL containing port 
{{TEST_PORT}} (rather than a numeric value).

On the dev list Dawid Weiss 
[wrote|https://lists.apache.org/thread.html/b9606be4ae70e58b4be8c3438e92f69361d59b4de566ec707dda3f24@%3Cdev.lucene.apache.org%3E]:

{quote}
I see this in TestReplicationHandler:

{code:java}
 /**
  * character copy of file using UTF-8. If port is non-null, will be
substituted any time "TEST_PORT" is found.
  */
 private static void copyFile(File src, File dst, Integer port,
boolean internalCompression) throws IOException {
   BufferedReader in = new BufferedReader(new InputStreamReader(new
FileInputStream(src), StandardCharsets.UTF_8));
   Writer out = new OutputStreamWriter(new FileOutputStream(dst),
StandardCharsets.UTF_8);

   for (String line = in.readLine(); null != line; line = in.readLine()) {

 if (null != port)
   line = line.replace("TEST_PORT", port.toString());
{code}

So it seems port is allowed to be null and then won't be substituted.
This looks like a bug in the test scaffolding: this situation
shouldn't be allowed; if a port cannot be acquired the test should
fail much sooner?
{quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8221) MoreLikeThis.setMaxDocFreqPct can easily int-overflow on larger indexes

2018-04-06 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428670#comment-16428670
 ] 

Dawid Weiss commented on LUCENE-8221:
-

I don't mind changing the formula (even if I disagree that catering for 
internal representation of deleted documents justifies this), but not as part 
of this issue. Changing the formula will change the results people get from 
MLT: this should go into a major release, not a point release; what I patched 
was a trivial overflow problem that doesn't touch any internals.

bq. And so is the range check in your patch, because percentage can be larger 
than 100% with the broken numDocs formula used here. When a percentage can be 
bigger than 100, man that's your first sign that shit is wrong!

To me the percentage remains within 0-100% with numDocs; you compute the 
threshold against the current state of your index (live documents). The 
computed value of the cutoff threshold is correct, it is the comparison against 
docFreq that isn't sound here because docFreq doesn't have deleted documents 
information. I don't quite understand the way you perceive only one of those as 
"correct" vs. "utter shit" and I don't think I want to explore this subject 
further.

Is it ok if I apply the overflow fix against 7.x, master and create a new issue 
cutting over to maxDoc (everywhere in mlt) and apply it to master only? If no, 
speak up.

> MoreLikeThis.setMaxDocFreqPct can easily int-overflow on larger indexes
> ---
>
> Key: LUCENE-8221
> URL: https://issues.apache.org/jira/browse/LUCENE-8221
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Attachments: LUCENE-8221.patch
>
>
> {code}
>   public void setMaxDocFreqPct(int maxPercentage) {
> this.maxDocFreq = maxPercentage * ir.numDocs() / 100;
>   }
> {code}
> The above overflows integer range into negative numbers on even fairly small 
> indexes (for maxPercentage = 75, it happens for just over 28 million 
> documents.
> We should make the computations on long range so that it doesn't overflow and 
> have a more strict argument validation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2899) Add OpenNLP Analysis capabilities as a module

2018-04-06 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428665#comment-16428665
 ] 

Lance Norskog commented on LUCENE-2899:
---

I apologize, [~Fatalityap], but I cannot help here. I have not worked with Solr 
for a few years.



> Add OpenNLP Analysis capabilities as a module
> -
>
> Key: LUCENE-2899
> URL: https://issues.apache.org/jira/browse/LUCENE-2899
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Grant Ingersoll
>Assignee: Steve Rowe
>Priority: Minor
> Fix For: 7.3, master (8.0)
>
> Attachments: LUCENE-2899-6.1.0.patch, LUCENE-2899-RJN.patch, 
> LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, 
> LUCENE-2899.patch, LUCENE-2899.patch, OpenNLPFilter.java, 
> OpenNLPTokenizer.java
>
>
> Now that OpenNLP is an ASF project and has a nice license, it would be nice 
> to have a submodule (under analysis) that exposed capabilities for it. Drew 
> Farris, Tom Morton and I have code that does:
> * Sentence Detection as a Tokenizer (could also be a TokenFilter, although it 
> would have to change slightly to buffer tokens)
> * NamedEntity recognition as a TokenFilter
> We are also planning a Tokenizer/TokenFilter that can put parts of speech as 
> either payloads (PartOfSpeechAttribute?) on a token or at the same position.
> I'd propose it go under:
> modules/analysis/opennlp



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12190) need to properly escape output in GraphMLResponseWriter

2018-04-06 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428633#comment-16428633
 ] 

Yonik Seeley commented on SOLR-12190:
-

Here's a patch.  It also fixes what looks like an issue with mixing writers 
(writer vs printWriter which wraps the writer) that could cause issues.

> need to properly escape output in GraphMLResponseWriter
> ---
>
> Key: SOLR-12190
> URL: https://issues.apache.org/jira/browse/SOLR-12190
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yonik Seeley
>Assignee: Yonik Seeley
>Priority: Major
> Attachments: SOLR-12190.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-12190) need to properly escape output in GraphMLResponseWriter

2018-04-06 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-12190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-12190:

Attachment: SOLR-12190.patch

> need to properly escape output in GraphMLResponseWriter
> ---
>
> Key: SOLR-12190
> URL: https://issues.apache.org/jira/browse/SOLR-12190
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yonik Seeley
>Assignee: Yonik Seeley
>Priority: Major
> Attachments: SOLR-12190.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7976) Make TieredMergePolicy respect maxSegmentSizeMB and allow singleton merges of very large segments

2018-04-06 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428605#comment-16428605
 ] 

Erick Erickson commented on LUCENE-7976:


Marc:

Thanks for looking, especially at how jumbled the code is right now!

I collected some preliminary stats on total bytes written, admittedly 
unscientific and hacky. I set a low maxMergedSegmentSizeMB and reindexed the 
same docs randomly. To my great surprise the new code wrote _fewer_ bytes than 
the current code. My expectation was just what you're pointing out, I expected 
to see the new stuff write a lot more bytes. This was with an index that 
respected max segment sizes.

On my plate today is to reconcile my expectations and measurements. What I 
_think_ happened is that Mike's clever cost measurements are getting in here. 

The singleton merge is not intended (I'll have to ensure I didn't screw this 
up, thanks for drawing attention to it) to be run against segments that respect 
the max segment size. It's supposed to be there to allow recovery from the case 
where someone optimized to 1 huge segment. If it leads to a lot of extra writes 
in that case I think it's acceptable. If it leads to a lot more bytes written 
in the case where the segments respect max segment size, I worry a lot

In the normal case, it's not that a segment are merged when it has > 20% 
deleted docs, it's that it becomes _eligible_ for merging even if it has > 50% 
maxSegmentSize "live" docs.. What I have to figure out (all help appreciated!) 
is how Mike's scoring algorithm influences this. The code starting with
 // Consider all merge starts:
is key here. Let's say I have 100 possible eligible segments and 30 
"maxMergeAtOnce". The code starts at 0 and collects up to 30 segments and 
scores that merge. Then it starts at 1, collects up to 30 segments and scores 
that. Repeat until you start at 70, keeping the "best" merge as determined by 
the scoring method and use the best-scoring one. What I _think_ is happening is 
that the large segments do grow past 20% before they're merged due to the 
scoring. 

And there's a whole discussion here about what's a "good" number and whether it 
should be user-configurable, I chose 20% semi-randomly (and hard-coded it!) 
just to get something going.

All that said, performance is the next big chunk of this I need to tackle, 
insuring that this doesn't become horribly I/O intensive. Or, as you suggest, 
we figure out a way to throttle it.

Or throw out the idea of singleton merges in the first place and, now that 
expungeDeletes respects max segment size too, tell users who've optimized down 
to single segments that they should occasionally run expungeDeletes as they 
replace documents.

> Make TieredMergePolicy respect maxSegmentSizeMB and allow singleton merges of 
> very large segments
> -
>
> Key: LUCENE-7976
> URL: https://issues.apache.org/jira/browse/LUCENE-7976
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
> Attachments: LUCENE-7976.patch, LUCENE-7976.patch
>
>
> We're seeing situations "in the wild" where there are very large indexes (on 
> disk) handled quite easily in a single Lucene index. This is particularly 
> true as features like docValues move data into MMapDirectory space. The 
> current TMP algorithm allows on the order of 50% deleted documents as per a 
> dev list conversation with Mike McCandless (and his blog here:  
> https://www.elastic.co/blog/lucenes-handling-of-deleted-documents).
> Especially in the current era of very large indexes in aggregate, (think many 
> TB) solutions like "you need to distribute your collection over more shards" 
> become very costly. Additionally, the tempting "optimize" button exacerbates 
> the issue since once you form, say, a 100G segment (by 
> optimizing/forceMerging) it is not eligible for merging until 97.5G of the 
> docs in it are deleted (current default 5G max segment size).
> The proposal here would be to add a new parameter to TMP, something like 
>  (no, that's not serious name, suggestions 
> welcome) which would default to 100 (or the same behavior we have now).
> So if I set this parameter to, say, 20%, and the max segment size stays at 
> 5G, the following would happen when segments were selected for merging:
> > any segment with > 20% deleted documents would be merged or rewritten NO 
> > MATTER HOW LARGE. There are two cases,
> >> the segment has < 5G "live" docs. In that case it would be merged with 
> >> smaller segments to bring the resulting segment up to 5G. If no smaller 
> >> segments exist, it would just be rewritten
> >> The segment has > 5G "live" docs (the result of a forceMerge or optimize). 

[jira] [Updated] (SOLR-12198) Stream Evaluators should not copy matrices needlessly

2018-04-06 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-12198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-12198:
--
Description: Currently several of the Stream Evaluators that work with 
matrices are creating multiple copies of the underlying multi-dimensional 
arrays. This can lead to excessive memory usage. This ticket will change these 
implementations so copies of the multi-dimensional arrays that back a matrix 
are only copied when the *copyOf* function is used.  (was: Currently a few of 
the Stream Evaluators that work with matrices are creating multiple copies of 
the underlying multi-dimensional arrays. This can lead to excessive memory 
usage. This ticket will change these implementations so copies of the 
multi-dimensional arrays that back a matrix are only copied when the *copyOf* 
function is used.)

> Stream Evaluators should not copy matrices needlessly
> -
>
> Key: SOLR-12198
> URL: https://issues.apache.org/jira/browse/SOLR-12198
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
> Fix For: 7.4
>
>
> Currently several of the Stream Evaluators that work with matrices are 
> creating multiple copies of the underlying multi-dimensional arrays. This can 
> lead to excessive memory usage. This ticket will change these implementations 
> so copies of the multi-dimensional arrays that back a matrix are only copied 
> when the *copyOf* function is used.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-12198) Stream Evaluators should not copy matrices needlessly

2018-04-06 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-12198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-12198:
--
Description: Currently a few of the Stream Evaluators that work with 
matrices are creating multiple copies of the underlying multi-dimensional 
arrays. This can lead to excessive memory usage. This ticket will change these 
implementations so copies of the multi-dimensional arrays that back a matrix 
are only copied when the *copyOf* function is used.  (was: Currently many of 
the Stream Evaluators that are working with matrices are creating multiple 
copies of the underlying multi-dimensional arrays. This can lead to excessive 
memory usage. This ticket will change the implementations so copies of the 
multi-dimensional arrays that back a matrix are only copied when the *copyOf* 
function is used.)

> Stream Evaluators should not copy matrices needlessly
> -
>
> Key: SOLR-12198
> URL: https://issues.apache.org/jira/browse/SOLR-12198
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
> Fix For: 7.4
>
>
> Currently a few of the Stream Evaluators that work with matrices are creating 
> multiple copies of the underlying multi-dimensional arrays. This can lead to 
> excessive memory usage. This ticket will change these implementations so 
> copies of the multi-dimensional arrays that back a matrix are only copied 
> when the *copyOf* function is used.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12194) Deprecate SolrRequest#setBasicAuthCredentials

2018-04-06 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428586#comment-16428586
 ] 

Hoss Man commented on SOLR-12194:
-

bq. the only way forward will be using the ClientBuilderFactory.

I thought the entire point of being able to specify credentials on the requests 
was so you could have a client application that used a single client, but 
specified different credentials as needed based on use case -- ex: pass through 
credentials from the upstream user?

> Deprecate SolrRequest#setBasicAuthCredentials
> -
>
> Key: SOLR-12194
> URL: https://issues.apache.org/jira/browse/SOLR-12194
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Reporter: Jan Høydahl
>Priority: Major
> Fix For: 7.4
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We should deprecate these methods in {{SolrRequest}}:
> {code:java}
>   public SolrRequest setBasicAuthCredentials(String user, String password)
>   public String getBasicAuthPassword()
>   public String getBasicAuthUser()
> {code}
> The only way forward will be using the ClientBuilderFactory.
> For 7.4 we should deprecate these, and for 8.0 (master) remove them. First we 
> need to migrate some tests etc that uses the old methods.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12194) Deprecate SolrRequest#setBasicAuthCredentials

2018-04-06 Thread Jason Gerlowski (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428579#comment-16428579
 ] 

Jason Gerlowski commented on SOLR-12194:


1. I'm a little concerned that this API change makes basic-auth setup (a pretty 
commonly used SolrClient feature) more arcane than it needs to be for users.  
Say what you want about the warts surrounding {{set/getBasicAuth*}} but at 
least the methods are easy for novice users to discover and use.  Maybe I'm 
missing some easy way to set up auth and debug it at runtime without these 
getters/setters though.

2. If we're building basic-auth on top of HttpClientBuilderFactory, does it 
make sense to remove the lucene.experimental designation on that interface?

> Deprecate SolrRequest#setBasicAuthCredentials
> -
>
> Key: SOLR-12194
> URL: https://issues.apache.org/jira/browse/SOLR-12194
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Reporter: Jan Høydahl
>Priority: Major
> Fix For: 7.4
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We should deprecate these methods in {{SolrRequest}}:
> {code:java}
>   public SolrRequest setBasicAuthCredentials(String user, String password)
>   public String getBasicAuthPassword()
>   public String getBasicAuthUser()
> {code}
> The only way forward will be using the ClientBuilderFactory.
> For 7.4 we should deprecate these, and for 8.0 (master) remove them. First we 
> need to migrate some tests etc that uses the old methods.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-12198) Stream Evaluators should not copy matrices needlessly

2018-04-06 Thread Joel Bernstein (JIRA)
Joel Bernstein created SOLR-12198:
-

 Summary: Stream Evaluators should not copy matrices needlessly
 Key: SOLR-12198
 URL: https://issues.apache.org/jira/browse/SOLR-12198
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Joel Bernstein


Currently many of the Stream Evaluators that are working with matrices are 
creating multiple copies of the underlying multi-dimensional arrays. This can 
lead to excessive memory usage. This ticket will change the implementations so 
copies of the multi-dimensional arrays that back a matrix are only copied when 
the *copyOf* function is used.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-12198) Stream Evaluators should not copy matrices needlessly

2018-04-06 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-12198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-12198:
--
Fix Version/s: 7.4

> Stream Evaluators should not copy matrices needlessly
> -
>
> Key: SOLR-12198
> URL: https://issues.apache.org/jira/browse/SOLR-12198
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
> Fix For: 7.4
>
>
> Currently many of the Stream Evaluators that are working with matrices are 
> creating multiple copies of the underlying multi-dimensional arrays. This can 
> lead to excessive memory usage. This ticket will change the implementations 
> so copies of the multi-dimensional arrays that back a matrix are only copied 
> when the *copyOf* function is used.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-12198) Stream Evaluators should not copy matrices needlessly

2018-04-06 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-12198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein reassigned SOLR-12198:
-

Assignee: Joel Bernstein

> Stream Evaluators should not copy matrices needlessly
> -
>
> Key: SOLR-12198
> URL: https://issues.apache.org/jira/browse/SOLR-12198
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
> Fix For: 7.4
>
>
> Currently many of the Stream Evaluators that are working with matrices are 
> creating multiple copies of the underlying multi-dimensional arrays. This can 
> lead to excessive memory usage. This ticket will change the implementations 
> so copies of the multi-dimensional arrays that back a matrix are only copied 
> when the *copyOf* function is used.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-10453) setBasicAuthHeader should be deprecated in favor of SolrClientBuilder methods

2018-04-06 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl resolved SOLR-10453.

Resolution: Duplicate

> setBasicAuthHeader should be deprecated in favor of SolrClientBuilder methods
> -
>
> Key: SOLR-10453
> URL: https://issues.apache.org/jira/browse/SOLR-10453
> Project: Solr
>  Issue Type: Sub-task
>  Components: SolrJ
>Reporter: Jason Gerlowski
>Assignee: Jan Høydahl
>Priority: Minor
> Fix For: 7.0
>
>
> Now that builders are in place for {{SolrClients}}, the setters used in each 
> {{SolrClient}} can be deprecated, and their functionality moved over to the 
> Builders. This change brings a few benefits:
> - unifies {{SolrClient}} configuration under the new Builders. It'll be nice 
> to have all the knobs, and levers used to tweak {{SolrClient}}s available in 
> a single place (the Builders).
> - reduces {{SolrClient}} thread-safety concerns. Currently, clients are 
> mutable. Using some {{SolrClient}} setters can result in erratic and "trappy" 
> behavior when the clients are used across multiple threads.
> This subtask endeavors to change this behavior for the {{setBasicAuthHeader}} 
> setter on all {{SolrClient}} implementations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-10453) setBasicAuthHeader should be deprecated in favor of SolrClientBuilder methods

2018-04-06 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl reassigned SOLR-10453:
--

Assignee: Jan Høydahl

> setBasicAuthHeader should be deprecated in favor of SolrClientBuilder methods
> -
>
> Key: SOLR-10453
> URL: https://issues.apache.org/jira/browse/SOLR-10453
> Project: Solr
>  Issue Type: Sub-task
>  Components: SolrJ
>Reporter: Jason Gerlowski
>Assignee: Jan Høydahl
>Priority: Minor
> Fix For: 7.0
>
>
> Now that builders are in place for {{SolrClients}}, the setters used in each 
> {{SolrClient}} can be deprecated, and their functionality moved over to the 
> Builders. This change brings a few benefits:
> - unifies {{SolrClient}} configuration under the new Builders. It'll be nice 
> to have all the knobs, and levers used to tweak {{SolrClient}}s available in 
> a single place (the Builders).
> - reduces {{SolrClient}} thread-safety concerns. Currently, clients are 
> mutable. Using some {{SolrClient}} setters can result in erratic and "trappy" 
> behavior when the clients are used across multiple threads.
> This subtask endeavors to change this behavior for the {{setBasicAuthHeader}} 
> setter on all {{SolrClient}} implementations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-12197) Implement sampling for logistic regression classifier

2018-04-06 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-12197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-12197:
--
Description: 
Currently the *train* Streaming Expression trains a logistic regression model 
by iterating over the entire distributed training set on each training 
iteration. Each training iteration involves building a matrix on each shard 
with the number of rows equal to the size of the training set contained on the 
shard. The number of columns will be the number of features. This scenario can 
create very large matrices when working with large training sets and feature 
sets.

This ticket will add a *sample* parameter which will limit the size of the 
training set on each iteration to a random sample of the training set. This 
will allow for much larger training sets.

  was:
Currently the *train* Streaming Expression trains a logistic regression model 
by iterating over the entire distributed training set on each pass. Each 
iteration involves building a matrix on each shard with the number of rows 
equal to the size of the training set contained on the shard. The number of 
columns will be the number of features. This scenario can create very large 
matrices when working with large training sets and feature sets.

This ticket will add a *sample* parameter which will limit the size of the 
training set on each iteration to a random sample of the training set. This 
will allow for much larger training sets.


> Implement sampling for logistic regression classifier
> -
>
> Key: SOLR-12197
> URL: https://issues.apache.org/jira/browse/SOLR-12197
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: streaming expressions
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
> Fix For: 7.4
>
>
> Currently the *train* Streaming Expression trains a logistic regression model 
> by iterating over the entire distributed training set on each training 
> iteration. Each training iteration involves building a matrix on each shard 
> with the number of rows equal to the size of the training set contained on 
> the shard. The number of columns will be the number of features. This 
> scenario can create very large matrices when working with large training sets 
> and feature sets.
> This ticket will add a *sample* parameter which will limit the size of the 
> training set on each iteration to a random sample of the training set. This 
> will allow for much larger training sets.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-12197) Implement sampling for logistic regression classifier

2018-04-06 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-12197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-12197:
--
Description: 
Currently the *train* Streaming Expression trains a logistic regression model 
by iterating over the entire distributed training set on each pass. Each 
iteration involves building a matrix on each shard with the number of rows 
equal to the size of the training set contained on the shard. The number of 
columns will be the number of features. This scenario can create very large 
matrices when working with large training sets and feature sets.

This ticket will add a *sample* parameter which will limit the size of the 
training set on each iteration to a random sample of the training set. This 
will allow for much larger training sets.

  was:
Currently the *train* Streaming Expression trains a logistic regression model 
by iterating over the entire distributed training set on each pass. Each 
iteration involves building a matrix on each shard with the number of rows 
being the size of the training set contained on the shard. The number of 
columns will be the number of features. This scenario can create very large 
matrices when working with large training sets and feature sets.

This ticket will add a *sample* parameter which will limit the size of the 
training set on each iteration to a random sample of the training set.


> Implement sampling for logistic regression classifier
> -
>
> Key: SOLR-12197
> URL: https://issues.apache.org/jira/browse/SOLR-12197
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: streaming expressions
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
> Fix For: 7.4
>
>
> Currently the *train* Streaming Expression trains a logistic regression model 
> by iterating over the entire distributed training set on each pass. Each 
> iteration involves building a matrix on each shard with the number of rows 
> equal to the size of the training set contained on the shard. The number of 
> columns will be the number of features. This scenario can create very large 
> matrices when working with large training sets and feature sets.
> This ticket will add a *sample* parameter which will limit the size of the 
> training set on each iteration to a random sample of the training set. This 
> will allow for much larger training sets.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-12197) Implement sampling for logistic regression classifier

2018-04-06 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-12197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-12197:
--
Component/s: streaming expressions

> Implement sampling for logistic regression classifier
> -
>
> Key: SOLR-12197
> URL: https://issues.apache.org/jira/browse/SOLR-12197
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: streaming expressions
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
> Fix For: 7.4
>
>
> Currently the *train* Streaming Expression trains a logistic regression model 
> by iterating over the entire distributed training set on each pass. Each 
> iteration involves building a matrix on each shard with the number of rows 
> being the size of the training set contained on the shard. The number of 
> columns will be the number of features. This scenario can create very large 
> matrices when working with large training sets and feature sets.
> This ticket will add a *sample* parameter which will limit the size of the 
> training set on each iteration to a random sample of the training set.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-12197) Implement sampling for logistic regression classifier

2018-04-06 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-12197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-12197:
--
Description: 
Currently the *train* Streaming Expression trains a logistic regression model 
by iterating over the entire distributed training set on each pass. Each 
iteration involves building a matrix on each shard with the number of rows 
being the size of the training set contained on the shard. The number of 
columns will be the number of features. This scenario can create very large 
matrices when working with large training sets and feature sets.

This ticket will add a *sample* parameter which will limit the size of the 
training set on each iteration to a random sample of the training set.

  was:
Currently the *train* function trains a logistic regression model by iterating 
over the entire distributed training set on each pass. Each iteration involves 
building a matrix on each shard with the number of rows being the size of the 
training set contained on the shard. The number of columns will be the number 
of features. This scenario can create very large matrices when working with 
large training sets and feature sets.

This ticket will add a *sample* parameter which will limit the size of the 
training set on each iteration to a random sample of the training set.


> Implement sampling for logistic regression classifier
> -
>
> Key: SOLR-12197
> URL: https://issues.apache.org/jira/browse/SOLR-12197
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Priority: Major
> Fix For: 7.4
>
>
> Currently the *train* Streaming Expression trains a logistic regression model 
> by iterating over the entire distributed training set on each pass. Each 
> iteration involves building a matrix on each shard with the number of rows 
> being the size of the training set contained on the shard. The number of 
> columns will be the number of features. This scenario can create very large 
> matrices when working with large training sets and feature sets.
> This ticket will add a *sample* parameter which will limit the size of the 
> training set on each iteration to a random sample of the training set.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-12197) Implement sampling for logistic regression classifier

2018-04-06 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-12197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein reassigned SOLR-12197:
-

Assignee: Joel Bernstein

> Implement sampling for logistic regression classifier
> -
>
> Key: SOLR-12197
> URL: https://issues.apache.org/jira/browse/SOLR-12197
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
> Fix For: 7.4
>
>
> Currently the *train* Streaming Expression trains a logistic regression model 
> by iterating over the entire distributed training set on each pass. Each 
> iteration involves building a matrix on each shard with the number of rows 
> being the size of the training set contained on the shard. The number of 
> columns will be the number of features. This scenario can create very large 
> matrices when working with large training sets and feature sets.
> This ticket will add a *sample* parameter which will limit the size of the 
> training set on each iteration to a random sample of the training set.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-12197) Implement sampling for logistic regression classifier

2018-04-06 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-12197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-12197:
--
Fix Version/s: 7.4

> Implement sampling for logistic regression classifier
> -
>
> Key: SOLR-12197
> URL: https://issues.apache.org/jira/browse/SOLR-12197
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Priority: Major
> Fix For: 7.4
>
>
> Currently the *train* Streaming Expression trains a logistic regression model 
> by iterating over the entire distributed training set on each pass. Each 
> iteration involves building a matrix on each shard with the number of rows 
> being the size of the training set contained on the shard. The number of 
> columns will be the number of features. This scenario can create very large 
> matrices when working with large training sets and feature sets.
> This ticket will add a *sample* parameter which will limit the size of the 
> training set on each iteration to a random sample of the training set.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-12197) Implement sampling for logistic regression classifier

2018-04-06 Thread Joel Bernstein (JIRA)
Joel Bernstein created SOLR-12197:
-

 Summary: Implement sampling for logistic regression classifier
 Key: SOLR-12197
 URL: https://issues.apache.org/jira/browse/SOLR-12197
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Joel Bernstein


Currently the *train* function trains a logistic regression model by iterating 
over the entire distributed training set on each pass. Each iteration involves 
building a matrix on each shard with the number of rows being the size of the 
training set contained on the shard. The number of columns will be the number 
of features. This scenario can create very large matrices when working with 
large training sets and feature sets.

This ticket will add a *sample* parameter which will limit the size of the 
training set on each iteration to a random sample of the training set.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-7.x-MacOSX (64bit/jdk1.8.0) - Build # 563 - Unstable!

2018-04-06 Thread Policeman Jenkins Server
Build: https://jenkins.thetaphi.de/job/Lucene-Solr-7.x-MacOSX/563/
Java: 64bit/jdk1.8.0 -XX:-UseCompressedOops -XX:+UseG1GC

7 tests failed.
FAILED:  org.apache.solr.handler.TestSQLHandler.doTest

Error Message:
--> https://127.0.0.1:53654/collection1_shard2_replica_n41:Failed to execute 
sqlQuery 'select id, field_i, str_s from collection1 where (text='()' OR 
text='') AND text='' order by field_i desc' against JDBC connection 
'jdbc:calcitesolr:'. Error while executing SQL "select id, field_i, str_s from 
collection1 where (text='()' OR text='') AND text='' order by 
field_i desc": java.io.IOException: java.util.concurrent.ExecutionException: 
java.io.IOException: --> 
https://127.0.0.1:53654/collection1_shard2_replica_n41/:id must have DocValues 
to use this feature.

Stack Trace:
java.io.IOException: --> 
https://127.0.0.1:53654/collection1_shard2_replica_n41:Failed to execute 
sqlQuery 'select id, field_i, str_s from collection1 where (text='()' OR 
text='') AND text='' order by field_i desc' against JDBC connection 
'jdbc:calcitesolr:'.
Error while executing SQL "select id, field_i, str_s from collection1 where 
(text='()' OR text='') AND text='' order by field_i desc": 
java.io.IOException: java.util.concurrent.ExecutionException: 
java.io.IOException: --> 
https://127.0.0.1:53654/collection1_shard2_replica_n41/:id must have DocValues 
to use this feature.
at 
__randomizedtesting.SeedInfo.seed([879FDBFDF9EFF4A1:20DB63599454E718]:0)
at 
org.apache.solr.client.solrj.io.stream.SolrStream.read(SolrStream.java:222)
at 
org.apache.solr.handler.TestSQLHandler.getTuples(TestSQLHandler.java:2522)
at 
org.apache.solr.handler.TestSQLHandler.testBasicSelect(TestSQLHandler.java:124)
at org.apache.solr.handler.TestSQLHandler.doTest(TestSQLHandler.java:82)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:934)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:970)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:984)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:993)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:968)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:943)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:829)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:879)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:890)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at 

[jira] [Commented] (LUCENE-2899) Add OpenNLP Analysis capabilities as a module

2018-04-06 Thread Alexey Ponomarenko (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428471#comment-16428471
 ] 

Alexey Ponomarenko commented on LUCENE-2899:


[~steve_rowe] Thanks, I will try your solution. But I will also wait for 
[~lancenorskog] about network solutions. 

> Add OpenNLP Analysis capabilities as a module
> -
>
> Key: LUCENE-2899
> URL: https://issues.apache.org/jira/browse/LUCENE-2899
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Grant Ingersoll
>Assignee: Steve Rowe
>Priority: Minor
> Fix For: 7.3, master (8.0)
>
> Attachments: LUCENE-2899-6.1.0.patch, LUCENE-2899-RJN.patch, 
> LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, 
> LUCENE-2899.patch, LUCENE-2899.patch, OpenNLPFilter.java, 
> OpenNLPTokenizer.java
>
>
> Now that OpenNLP is an ASF project and has a nice license, it would be nice 
> to have a submodule (under analysis) that exposed capabilities for it. Drew 
> Farris, Tom Morton and I have code that does:
> * Sentence Detection as a Tokenizer (could also be a TokenFilter, although it 
> would have to change slightly to buffer tokens)
> * NamedEntity recognition as a TokenFilter
> We are also planning a Tokenizer/TokenFilter that can put parts of speech as 
> either payloads (PartOfSpeechAttribute?) on a token or at the same position.
> I'd propose it go under:
> modules/analysis/opennlp



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12196) Prepare Admin UI for migrating to Angular.io

2018-04-06 Thread Upayavira (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428448#comment-16428448
 ] 

Upayavira commented on SOLR-12196:
--

I don't have a major investment in Solr at this time, but I am certainly game 
to follow on and add what bits I might. Whilst I am far from a front-end 
developer, I have recently played with Webpack and React with some success. The 
transition from the old JS UI to Angular was made simpler because of how 
Angular manages its (whole page) templates. React breaks things down into 
smaller components, and whilst this could be better in the long run in terms of 
component reuse, it means that a conversion could be a substantial piece of 
work. I like your idea of breaking the task down into smaller steps.

> Prepare Admin UI for migrating to Angular.io
> 
>
> Key: SOLR-12196
> URL: https://issues.apache.org/jira/browse/SOLR-12196
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Reporter: Jan Høydahl
>Priority: Major
>  Labels: Angular, AngularJS, angular-migration
> Fix For: master (8.0)
>
>
> AngularJS is soon end of life, it [enters LTS in july 
> 2018|https://docs.angularjs.org/misc/version-support-status], whereupon it 
> will only receive fixes to serious bugs. Solr uses AngularJS 1.3 (the latest 
> AngularJS will be 1.7).
> This issue is *not* for upgrading to Angular5/6, but to start preparing the 
> existing UI for easier migration later on. See 
> [https://angular.io/guide/upgrade].
> This JIRA will likely get multiple sub tasks such as
>  * Change to [Folders-by-Feature 
> Structure|https://angular.io/guide/upgrade#follow-the-angularjs-style-guide], 
> i.e. mix html, css, js in a folder based on feature
>  * Use a [Module 
> Loader|https://angular.io/guide/upgrade#using-a-module-loader] like 
> [Webpack|https://webpack.js.org/]
>  * Use [Component 
> Directives|https://angular.io/guide/upgrade#using-component-directives] 
> (requires first move from AngularJS 1.3 to 1.5)
> The rationale for this lira is recognising how central the Admin UI is to 
> Solr, not letting it rot on top of a dying framework. Better to start moving 
> step by step and [perhaps write all new views in Angular 
> 5|https://angular.io/guide/upgrade#upgrading-with-ngupgrade], than to fall 
> further and further behind.
> This effort of course assumes that Angular.io is the path we want to go, and 
> not React, VueJS or some other new kid on the block :)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12183) Refactor Streaming Expression test cases

2018-04-06 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428439#comment-16428439
 ] 

ASF subversion and git services commented on SOLR-12183:


Commit 03461d8c8f7529f063929a0dec1935e30683c0ca in lucene-solr's branch 
refs/heads/branch_7x from [~joel.bernstein]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=03461d8 ]

SOLR-12183: Remove dead code


> Refactor Streaming Expression test cases
> 
>
> Key: SOLR-12183
> URL: https://issues.apache.org/jira/browse/SOLR-12183
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
> Fix For: 7.4
>
>
> This ticket will breakup the StreamExpressionTest into multiple smaller files 
> based on the following areas:
> 1) Stream Sources
> 2) Stream Decorators
> 3) Stream Evaluators (This may have to be broken up more in the future)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12183) Refactor Streaming Expression test cases

2018-04-06 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428438#comment-16428438
 ] 

ASF subversion and git services commented on SOLR-12183:


Commit 8a73d38936d5346adcd50924d782f2098e71725d in lucene-solr's branch 
refs/heads/branch_7x from [~joel.bernstein]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=8a73d38 ]

SOLR-12183: Refactor Streaming Expression test cases


> Refactor Streaming Expression test cases
> 
>
> Key: SOLR-12183
> URL: https://issues.apache.org/jira/browse/SOLR-12183
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
> Fix For: 7.4
>
>
> This ticket will breakup the StreamExpressionTest into multiple smaller files 
> based on the following areas:
> 1) Stream Sources
> 2) Stream Decorators
> 3) Stream Evaluators (This may have to be broken up more in the future)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12183) Refactor Streaming Expression test cases

2018-04-06 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428441#comment-16428441
 ] 

ASF subversion and git services commented on SOLR-12183:


Commit eddcb9894a05ad531588d88cabdab69e8048bcb9 in lucene-solr's branch 
refs/heads/branch_7x from [~joel.bernstein]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=eddcb98 ]

SOLR-12183: Fix precommit


> Refactor Streaming Expression test cases
> 
>
> Key: SOLR-12183
> URL: https://issues.apache.org/jira/browse/SOLR-12183
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
> Fix For: 7.4
>
>
> This ticket will breakup the StreamExpressionTest into multiple smaller files 
> based on the following areas:
> 1) Stream Sources
> 2) Stream Decorators
> 3) Stream Evaluators (This may have to be broken up more in the future)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12175) Add random field type and dynamic field to the default managed-schema

2018-04-06 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428437#comment-16428437
 ] 

ASF subversion and git services commented on SOLR-12175:


Commit 65e07852d76e8b2859f15fc67c5c0fd0580be5a3 in lucene-solr's branch 
refs/heads/branch_7x from [~joel.bernstein]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=65e0785 ]

SOLR-12175: Add random field type and dynamic field to the default 
managed-schema


> Add random field type and dynamic field to the default managed-schema
> -
>
> Key: SOLR-12175
> URL: https://issues.apache.org/jira/browse/SOLR-12175
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Priority: Major
> Attachments: SOLR-12175.patch
>
>
> Currently the default manage-schema file doesn't have the random field 
> configured. Both the techproducts and example manage-schema files have it 
> configured. This ticket will add the random dynamic field and field type to 
> the default managed-schema so this functionality is available out of the box 
> when using the default schema.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12175) Add random field type and dynamic field to the default managed-schema

2018-04-06 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-12175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428440#comment-16428440
 ] 

ASF subversion and git services commented on SOLR-12175:


Commit 6a5d6880ea039ad66b99ebeb5e5ee875d2bed274 in lucene-solr's branch 
refs/heads/branch_7x from [~joel.bernstein]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=6a5d688 ]

SOLR-12175: Fix TestConfigSetsAPI


> Add random field type and dynamic field to the default managed-schema
> -
>
> Key: SOLR-12175
> URL: https://issues.apache.org/jira/browse/SOLR-12175
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Joel Bernstein
>Priority: Major
> Attachments: SOLR-12175.patch
>
>
> Currently the default manage-schema file doesn't have the random field 
> configured. Both the techproducts and example manage-schema files have it 
> configured. This ticket will add the random dynamic field and field type to 
> the default managed-schema so this functionality is available out of the box 
> when using the default schema.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2899) Add OpenNLP Analysis capabilities as a module

2018-04-06 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428434#comment-16428434
 ] 

Steve Rowe commented on LUCENE-2899:


I should mention that the ideal hosting location for OpenNLP models would be 
the [Blob Store|https://lucene.apache.org/solr/guide/7_3/blob-store-api.html], 
but that is not currently possible for schema-loaded classes - see SOLR-9175.

> Add OpenNLP Analysis capabilities as a module
> -
>
> Key: LUCENE-2899
> URL: https://issues.apache.org/jira/browse/LUCENE-2899
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Grant Ingersoll
>Assignee: Steve Rowe
>Priority: Minor
> Fix For: 7.3, master (8.0)
>
> Attachments: LUCENE-2899-6.1.0.patch, LUCENE-2899-RJN.patch, 
> LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, 
> LUCENE-2899.patch, LUCENE-2899.patch, OpenNLPFilter.java, 
> OpenNLPTokenizer.java
>
>
> Now that OpenNLP is an ASF project and has a nice license, it would be nice 
> to have a submodule (under analysis) that exposed capabilities for it. Drew 
> Farris, Tom Morton and I have code that does:
> * Sentence Detection as a Tokenizer (could also be a TokenFilter, although it 
> would have to change slightly to buffer tokens)
> * NamedEntity recognition as a TokenFilter
> We are also planning a Tokenizer/TokenFilter that can put parts of speech as 
> either payloads (PartOfSpeechAttribute?) on a token or at the same position.
> I'd propose it go under:
> modules/analysis/opennlp



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2899) Add OpenNLP Analysis capabilities as a module

2018-04-06 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428422#comment-16428422
 ] 

Steve Rowe commented on LUCENE-2899:


{quote}[~steve_rowe] Thanks for pointing me into right direction. But maybe you 
know about putting model files somewhere in network. This is my prev question. 
Maybe you know something about this as [~lancenorskog] said about this?
{quote}

Sorry, I haven't tested this, but I believe you'll have to use locally attached 
storage on each server, and specify an absolute path.

> Add OpenNLP Analysis capabilities as a module
> -
>
> Key: LUCENE-2899
> URL: https://issues.apache.org/jira/browse/LUCENE-2899
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Grant Ingersoll
>Assignee: Steve Rowe
>Priority: Minor
> Fix For: 7.3, master (8.0)
>
> Attachments: LUCENE-2899-6.1.0.patch, LUCENE-2899-RJN.patch, 
> LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, 
> LUCENE-2899.patch, LUCENE-2899.patch, OpenNLPFilter.java, 
> OpenNLPTokenizer.java
>
>
> Now that OpenNLP is an ASF project and has a nice license, it would be nice 
> to have a submodule (under analysis) that exposed capabilities for it. Drew 
> Farris, Tom Morton and I have code that does:
> * Sentence Detection as a Tokenizer (could also be a TokenFilter, although it 
> would have to change slightly to buffer tokens)
> * NamedEntity recognition as a TokenFilter
> We are also planning a Tokenizer/TokenFilter that can put parts of speech as 
> either payloads (PartOfSpeechAttribute?) on a token or at the same position.
> I'd propose it go under:
> modules/analysis/opennlp



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2899) Add OpenNLP Analysis capabilities as a module

2018-04-06 Thread Alexey Ponomarenko (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428420#comment-16428420
 ] 

Alexey Ponomarenko commented on LUCENE-2899:


[~steve_rowe] Thanks for pointing me into right direction. But maybe you know 
about putting model files somewhere in network. This is my prev question. Maybe 
you know something about this as [~lancenorskog] said about this?

> Add OpenNLP Analysis capabilities as a module
> -
>
> Key: LUCENE-2899
> URL: https://issues.apache.org/jira/browse/LUCENE-2899
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Grant Ingersoll
>Assignee: Steve Rowe
>Priority: Minor
> Fix For: 7.3, master (8.0)
>
> Attachments: LUCENE-2899-6.1.0.patch, LUCENE-2899-RJN.patch, 
> LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, 
> LUCENE-2899.patch, LUCENE-2899.patch, OpenNLPFilter.java, 
> OpenNLPTokenizer.java
>
>
> Now that OpenNLP is an ASF project and has a nice license, it would be nice 
> to have a submodule (under analysis) that exposed capabilities for it. Drew 
> Farris, Tom Morton and I have code that does:
> * Sentence Detection as a Tokenizer (could also be a TokenFilter, although it 
> would have to change slightly to buffer tokens)
> * NamedEntity recognition as a TokenFilter
> We are also planning a Tokenizer/TokenFilter that can put parts of speech as 
> either payloads (PartOfSpeechAttribute?) on a token or at the same position.
> I'd propose it go under:
> modules/analysis/opennlp



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #312: Solr 11898

2018-04-06 Thread millerjeff0
Github user millerjeff0 closed the pull request at:

https://github.com/apache/lucene-solr/pull/312


---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Reopened] (SOLR-10442) xtendedDismaxQParser (edismax) makes pf* require search term exactly

2018-04-06 Thread Nikolay Martynov (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-10442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nikolay Martynov reopened SOLR-10442:
-

Sorry for misinformation, this is still happening on 6.6.1

> xtendedDismaxQParser (edismax) makes pf* require search term exactly
> 
>
> Key: SOLR-10442
> URL: https://issues.apache.org/jira/browse/SOLR-10442
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: 6.5
>Reporter: Nikolay Martynov
>Priority: Major
>
> Request like:
> {code}
> "params":{
>   "q": "cat AND dog",
>   "q.op": "AND",
>   "defType":"edismax",
>   "qf":"description",
>   "pf2":"description"
> }
> {code}
> produces query like this:
> {code}
> "parsedquery_toString":"+(+(description.en:cat) +(description.en:dog)) 
> (+(description.en:\"cat dog\"))"
> {code}
> Solr 4.6.1 produces different parsing of this query:
> {code}
> "parsedquery_toString": "+(+(description.en:cat) +(description.en:dog)) 
> (description.en:\"cat dog\")",
> {code}
> Replacing {{q.op=AND}} with {{q.op=OR}} in newer Solr produces same query as 
> old Solr despite the fact that it would seem that this change should not make 
> a difference.
> This issue is probably related to SOLR-8812 - looks like it is just one more 
> case of same problem. It also would mean that change occurred in version 
> range specified there - unfortunately I would not be able to test that.
> This looks like a change in behaviour is not quite expected: now introducing 
> pf2 searches for documents that must have 'cat dog' phrase instead of just 
> boosting such documents.
> Please let me know if more information is required.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11336) DocBasedVersionConstraintsProcessor should be more extensible and support multiple version fields

2018-04-06 Thread Lucene/Solr QA (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428369#comment-16428369
 ] 

Lucene/Solr QA commented on SOLR-11336:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
18s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Release audit (RAT) {color} | 
{color:green}  1m 16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Check forbidden APIs {color} | 
{color:green}  1m 16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Validate source patterns {color} | 
{color:green}  1m 16s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 48m 
43s{color} | {color:green} core in the patch passed. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 53m 32s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | SOLR-11336 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12913940/SOLR-11336.patch |
| Optional Tests |  compile  javac  unit  ratsources  checkforbiddenapis  
validatesourcepatterns  |
| uname | Linux lucene1-us-west 3.13.0-88-generic #135-Ubuntu SMP Wed Jun 8 
21:10:42 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | ant |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh
 |
| git revision | master / 73d7410 |
| ant | version: Apache Ant(TM) version 1.9.3 compiled on April 8 2014 |
| Default Java | 1.8.0_152 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-SOLR-Build/41/testReport/ |
| modules | C: solr/core U: solr/core |
| Console output | 
https://builds.apache.org/job/PreCommit-SOLR-Build/41/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> DocBasedVersionConstraintsProcessor should be more extensible and support 
> multiple version fields
> -
>
> Key: SOLR-11336
> URL: https://issues.apache.org/jira/browse/SOLR-11336
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Michael Braun
>Assignee: David Smiley
>Priority: Minor
> Attachments: SOLR-11336.patch, SOLR-11336.patch, SOLR-11336.patch, 
> SOLR-11336.patch
>
>
> DocBasedVersionConstraintsProcessor supports allowing document updates only 
> if the new version is greater than the old. However, if any behavior wants to 
> be extended / changed in minor ways, the entire class will need to be copied 
> and slightly modified rather than extending and changing the method in 
> question. 
> It would be nice if DocBasedVersionConstraintsProcessor stood on its own as a 
> non-private class. In addition, certain methods (such as pieces of 
> isVersionNewEnough) should be broken out into separate methods so they can be 
> extended such that someone can extend the processor class and override what 
> it means for a new version to be accepted (allowing equal versions through? 
> What if new is a lower not greater number?). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-2899) Add OpenNLP Analysis capabilities as a module

2018-04-06 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428348#comment-16428348
 ] 

Steve Rowe edited comment on LUCENE-2899 at 4/6/18 1:52 PM:


Note the workaround on SOLR-4793 for ZK resources larger than 1M; from the [ZK 
admin manual|https://zookeeper.apache.org/doc/r3.4.11/zookeeperAdmin.html]:

{quote}
h3. Unsafe Options

The following options can be useful, but be careful when you use them. The risk 
of each is explained along with the explanation of what the variable does.

[...]

jute.maxbuffer:
(Java system property: jute.maxbuffer)
This option can only be set as a Java system property. There is no zookeeper 
prefix on it. It specifies the maximum size of the data that can be stored in a 
znode. The default is 0xf, or just under 1M. If this option is changed, the 
system property must be set on all servers and clients otherwise problems will 
arise. This is really a sanity check. ZooKeeper is designed to store data on 
the order of kilobytes in size.
{quote}

 This is spelled out a little more here: 
https://www.shi-gmbh.com/tutorials/increase-file-size-zookeeper/


was (Author: steve_rowe):
Note the workaround on SOLR-4793 for ZK resources larger than 1M; from the ZK 
admin manual:

{quote}
This option can only be set as a Java system property. There is no zookeeper 
prefix on it. It specifies the maximum size of the data that can be stored in a 
znode. The default is 0xf, or just under 1M. If this option is changed, the 
system property must be set on all servers and clients otherwise problems will 
arise. This is really a sanity check. ZooKeeper is designed to store data on 
the order of kilobytes in size.
{quote}

 This is spelled out a little more here: 
https://www.shi-gmbh.com/tutorials/increase-file-size-zookeeper/

> Add OpenNLP Analysis capabilities as a module
> -
>
> Key: LUCENE-2899
> URL: https://issues.apache.org/jira/browse/LUCENE-2899
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Grant Ingersoll
>Assignee: Steve Rowe
>Priority: Minor
> Fix For: 7.3, master (8.0)
>
> Attachments: LUCENE-2899-6.1.0.patch, LUCENE-2899-RJN.patch, 
> LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, 
> LUCENE-2899.patch, LUCENE-2899.patch, OpenNLPFilter.java, 
> OpenNLPTokenizer.java
>
>
> Now that OpenNLP is an ASF project and has a nice license, it would be nice 
> to have a submodule (under analysis) that exposed capabilities for it. Drew 
> Farris, Tom Morton and I have code that does:
> * Sentence Detection as a Tokenizer (could also be a TokenFilter, although it 
> would have to change slightly to buffer tokens)
> * NamedEntity recognition as a TokenFilter
> We are also planning a Tokenizer/TokenFilter that can put parts of speech as 
> either payloads (PartOfSpeechAttribute?) on a token or at the same position.
> I'd propose it go under:
> modules/analysis/opennlp



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2899) Add OpenNLP Analysis capabilities as a module

2018-04-06 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428348#comment-16428348
 ] 

Steve Rowe commented on LUCENE-2899:


Note the workaround on SOLR-4793 for ZK resources larger than 1M; from the ZK 
admin manual:

{quote}
This option can only be set as a Java system property. There is no zookeeper 
prefix on it. It specifies the maximum size of the data that can be stored in a 
znode. The default is 0xf, or just under 1M. If this option is changed, the 
system property must be set on all servers and clients otherwise problems will 
arise. This is really a sanity check. ZooKeeper is designed to store data on 
the order of kilobytes in size.
{quote}

 This is spelled out a little more here: 
https://www.shi-gmbh.com/tutorials/increase-file-size-zookeeper/

> Add OpenNLP Analysis capabilities as a module
> -
>
> Key: LUCENE-2899
> URL: https://issues.apache.org/jira/browse/LUCENE-2899
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Grant Ingersoll
>Assignee: Steve Rowe
>Priority: Minor
> Fix For: 7.3, master (8.0)
>
> Attachments: LUCENE-2899-6.1.0.patch, LUCENE-2899-RJN.patch, 
> LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, 
> LUCENE-2899.patch, LUCENE-2899.patch, OpenNLPFilter.java, 
> OpenNLPTokenizer.java
>
>
> Now that OpenNLP is an ASF project and has a nice license, it would be nice 
> to have a submodule (under analysis) that exposed capabilities for it. Drew 
> Farris, Tom Morton and I have code that does:
> * Sentence Detection as a Tokenizer (could also be a TokenFilter, although it 
> would have to change slightly to buffer tokens)
> * NamedEntity recognition as a TokenFilter
> We are also planning a Tokenizer/TokenFilter that can put parts of speech as 
> either payloads (PartOfSpeechAttribute?) on a token or at the same position.
> I'd propose it go under:
> modules/analysis/opennlp



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-12196) Prepare Admin UI for migrating to Angular.io

2018-04-06 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/SOLR-12196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl updated SOLR-12196:
---
Description: 
AngularJS is soon end of life, it [enters LTS in july 
2018|https://docs.angularjs.org/misc/version-support-status], whereupon it will 
only receive fixes to serious bugs. Solr uses AngularJS 1.3 (the latest 
AngularJS will be 1.7).

This issue is *not* for upgrading to Angular5/6, but to start preparing the 
existing UI for easier migration later on. See 
[https://angular.io/guide/upgrade].

This JIRA will likely get multiple sub tasks such as
 * Change to [Folders-by-Feature 
Structure|https://angular.io/guide/upgrade#follow-the-angularjs-style-guide], 
i.e. mix html, css, js in a folder based on feature
 * Use a [Module Loader|https://angular.io/guide/upgrade#using-a-module-loader] 
like [Webpack|https://webpack.js.org/]
 * Use [Component 
Directives|https://angular.io/guide/upgrade#using-component-directives] 
(requires first move from AngularJS 1.3 to 1.5)

The rationale for this lira is recognising how central the Admin UI is to Solr, 
not letting it rot on top of a dying framework. Better to start moving step by 
step and [perhaps write all new views in Angular 
5|https://angular.io/guide/upgrade#upgrading-with-ngupgrade], than to fall 
further and further behind.

This effort of course assumes that Angular.io is the path we want to go, and 
not React, VueJS or some other new kid on the block :)

  was:
AngularJS is soon end of life, it enters LTS in july 2018, whereupon it will 
only receive fixes to serious bugs. Solr uses AngularJS 1.3 (the latest 
AngularJS is 1.7).

This issue is *not* for upgrading to Angular5/6, but to start preparing the 
existing UI for easier migration later on. See 
https://angular.io/guide/upgrade. 

This JIRA will likely get multiple sub tasks such as
* Change to [Folders-by-Feature 
Structure|https://angular.io/guide/upgrade#follow-the-angularjs-style-guide], 
i.e. mix html, css, js in a folder base don feature
* Use a [Module Loader|https://angular.io/guide/upgrade#using-a-module-loader] 
like [Webpack|https://webpack.js.org]
* Use [Component 
Directives|https://angular.io/guide/upgrade#using-component-directives] 
(requires first move from AngularJS 1.3 to 1.5)

The rationale for this lira is recognising how central the Admin UI is to Solr, 
not letting it rot on top of a dying framework. Better to start moving step by 
step and [perhaps write all new views in Angular 
5|https://angular.io/guide/upgrade#upgrading-with-ngupgrade], than to fall 
further and further behind.

This effort of course assumes that Angular.io is the path we want to go, and 
not React, VueJS or some other new kid on the block :-)


> Prepare Admin UI for migrating to Angular.io
> 
>
> Key: SOLR-12196
> URL: https://issues.apache.org/jira/browse/SOLR-12196
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Reporter: Jan Høydahl
>Priority: Major
>  Labels: Angular, AngularJS, angular-migration
> Fix For: master (8.0)
>
>
> AngularJS is soon end of life, it [enters LTS in july 
> 2018|https://docs.angularjs.org/misc/version-support-status], whereupon it 
> will only receive fixes to serious bugs. Solr uses AngularJS 1.3 (the latest 
> AngularJS will be 1.7).
> This issue is *not* for upgrading to Angular5/6, but to start preparing the 
> existing UI for easier migration later on. See 
> [https://angular.io/guide/upgrade].
> This JIRA will likely get multiple sub tasks such as
>  * Change to [Folders-by-Feature 
> Structure|https://angular.io/guide/upgrade#follow-the-angularjs-style-guide], 
> i.e. mix html, css, js in a folder based on feature
>  * Use a [Module 
> Loader|https://angular.io/guide/upgrade#using-a-module-loader] like 
> [Webpack|https://webpack.js.org/]
>  * Use [Component 
> Directives|https://angular.io/guide/upgrade#using-component-directives] 
> (requires first move from AngularJS 1.3 to 1.5)
> The rationale for this lira is recognising how central the Admin UI is to 
> Solr, not letting it rot on top of a dying framework. Better to start moving 
> step by step and [perhaps write all new views in Angular 
> 5|https://angular.io/guide/upgrade#upgrading-with-ngupgrade], than to fall 
> further and further behind.
> This effort of course assumes that Angular.io is the path we want to go, and 
> not React, VueJS or some other new kid on the block :)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-12196) Prepare Admin UI for migrating to Angular.io

2018-04-06 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-12196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428335#comment-16428335
 ] 

Jan Høydahl commented on SOLR-12196:


Not sure if this is realistic if we don't have a dedicated frontend committer 
who can lead the effort. I know enough of JS, HTML and Angular to find my way 
around Admin UI and do small stuff, but doing major restructuring would benefit 
from someone who have done it before... Hope to get some input into this 
thinking from [~upayavira], [~steffkes], [~erickerickson]. The 
[ngUpgrade|https://angular.io/guide/upgrade#upgrading-with-ngupgrade] approach 
looks attractive as we can migrate one view at a time, assuming that our app is 
large enough for it to be a major undertaking to do it all-in-one-step. So this 
Jira I guess is to take us to a position where it would be possible to even 
think about such a move :)

> Prepare Admin UI for migrating to Angular.io
> 
>
> Key: SOLR-12196
> URL: https://issues.apache.org/jira/browse/SOLR-12196
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Reporter: Jan Høydahl
>Priority: Major
>  Labels: Angular, AngularJS, angular-migration
> Fix For: master (8.0)
>
>
> AngularJS is soon end of life, it enters LTS in july 2018, whereupon it will 
> only receive fixes to serious bugs. Solr uses AngularJS 1.3 (the latest 
> AngularJS is 1.7).
> This issue is *not* for upgrading to Angular5/6, but to start preparing the 
> existing UI for easier migration later on. See 
> https://angular.io/guide/upgrade. 
> This JIRA will likely get multiple sub tasks such as
> * Change to [Folders-by-Feature 
> Structure|https://angular.io/guide/upgrade#follow-the-angularjs-style-guide], 
> i.e. mix html, css, js in a folder base don feature
> * Use a [Module 
> Loader|https://angular.io/guide/upgrade#using-a-module-loader] like 
> [Webpack|https://webpack.js.org]
> * Use [Component 
> Directives|https://angular.io/guide/upgrade#using-component-directives] 
> (requires first move from AngularJS 1.3 to 1.5)
> The rationale for this lira is recognising how central the Admin UI is to 
> Solr, not letting it rot on top of a dying framework. Better to start moving 
> step by step and [perhaps write all new views in Angular 
> 5|https://angular.io/guide/upgrade#upgrading-with-ngupgrade], than to fall 
> further and further behind.
> This effort of course assumes that Angular.io is the path we want to go, and 
> not React, VueJS or some other new kid on the block :-)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2899) Add OpenNLP Analysis capabilities as a module

2018-04-06 Thread Alexey Ponomarenko (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428331#comment-16428331
 ] 

Alexey Ponomarenko commented on LUCENE-2899:


BTW here is part of my management-schema config: 


{code:java}

  
   


   
  

{code}


> Add OpenNLP Analysis capabilities as a module
> -
>
> Key: LUCENE-2899
> URL: https://issues.apache.org/jira/browse/LUCENE-2899
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Grant Ingersoll
>Assignee: Steve Rowe
>Priority: Minor
> Fix For: 7.3, master (8.0)
>
> Attachments: LUCENE-2899-6.1.0.patch, LUCENE-2899-RJN.patch, 
> LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, 
> LUCENE-2899.patch, LUCENE-2899.patch, OpenNLPFilter.java, 
> OpenNLPTokenizer.java
>
>
> Now that OpenNLP is an ASF project and has a nice license, it would be nice 
> to have a submodule (under analysis) that exposed capabilities for it. Drew 
> Farris, Tom Morton and I have code that does:
> * Sentence Detection as a Tokenizer (could also be a TokenFilter, although it 
> would have to change slightly to buffer tokens)
> * NamedEntity recognition as a TokenFilter
> We are also planning a Tokenizer/TokenFilter that can put parts of speech as 
> either payloads (PartOfSpeechAttribute?) on a token or at the same position.
> I'd propose it go under:
> modules/analysis/opennlp



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-2899) Add OpenNLP Analysis capabilities as a module

2018-04-06 Thread Alexey Ponomarenko (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428310#comment-16428310
 ] 

Alexey Ponomarenko edited comment on LUCENE-2899 at 4/6/18 1:37 PM:


[~lancenorskog] One more question. How can I use SMB and\or scp with SolrCloud 
correclty?

Event if I use someting like this: 
{code:java}
// smb://DESKTOP-LMQI80K/opennlp/en-tokenizer.bin or 
\\DESKTOP-LMQI80K/opennlp/en-tokenizer.bin or 
file://DESKTOP-LMQI80K/opennlp/en-pos-maxent.bin
{code}

Solr is throwing strange error:


{code:java}
org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: 
Could not load conf for core numberplate_shard2_replica_n6: Can't load schema 
managed-schema: java.io.IOException: Error opening 
/configs/numberplate/smb://DESKTOP-LMQI80K/opennlp/en-pos-maxent.bin
{code}

It seems that it "want to find" files inside of Zookeeper.  



was (Author: fatalityap):
[~lancenorskog] One more question. How can I use SMB and\or scp with SolrCloud 
correclty?

Event if I use someting like this: 
{code:java}
// smb://DESKTOP-LMQI80K/opennlp/en-tokenizer.bin or 
\\DESKTOP-LMQI80K/opennlp/en-tokenizer.bin or 
file://DESKTOP-LMQI80K/opennlp/en-tokenizer.bin
{code}

Solr is throwing strange error:


{code:java}
org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: 
Could not load conf for core numberplate_shard2_replica_n6: Can't load schema 
managed-schema: java.io.IOException: Error opening 
/configs/numberplate/smb://DESKTOP-LMQI80K/opennlp/en-tokenizer.bin
{code}

It seems that it "want to find" files inside of Zookeeper.  


> Add OpenNLP Analysis capabilities as a module
> -
>
> Key: LUCENE-2899
> URL: https://issues.apache.org/jira/browse/LUCENE-2899
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Grant Ingersoll
>Assignee: Steve Rowe
>Priority: Minor
> Fix For: 7.3, master (8.0)
>
> Attachments: LUCENE-2899-6.1.0.patch, LUCENE-2899-RJN.patch, 
> LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, 
> LUCENE-2899.patch, LUCENE-2899.patch, OpenNLPFilter.java, 
> OpenNLPTokenizer.java
>
>
> Now that OpenNLP is an ASF project and has a nice license, it would be nice 
> to have a submodule (under analysis) that exposed capabilities for it. Drew 
> Farris, Tom Morton and I have code that does:
> * Sentence Detection as a Tokenizer (could also be a TokenFilter, although it 
> would have to change slightly to buffer tokens)
> * NamedEntity recognition as a TokenFilter
> We are also planning a Tokenizer/TokenFilter that can put parts of speech as 
> either payloads (PartOfSpeechAttribute?) on a token or at the same position.
> I'd propose it go under:
> modules/analysis/opennlp



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-12196) Prepare Admin UI for migrating to Angular.io

2018-04-06 Thread JIRA
Jan Høydahl created SOLR-12196:
--

 Summary: Prepare Admin UI for migrating to Angular.io
 Key: SOLR-12196
 URL: https://issues.apache.org/jira/browse/SOLR-12196
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
  Components: Admin UI
Reporter: Jan Høydahl
 Fix For: master (8.0)


AngularJS is soon end of life, it enters LTS in july 2018, whereupon it will 
only receive fixes to serious bugs. Solr uses AngularJS 1.3 (the latest 
AngularJS is 1.7).

This issue is *not* for upgrading to Angular5/6, but to start preparing the 
existing UI for easier migration later on. See 
https://angular.io/guide/upgrade. 

This JIRA will likely get multiple sub tasks such as
* Change to [Folders-by-Feature 
Structure|https://angular.io/guide/upgrade#follow-the-angularjs-style-guide], 
i.e. mix html, css, js in a folder base don feature
* Use a [Module Loader|https://angular.io/guide/upgrade#using-a-module-loader] 
like [Webpack|https://webpack.js.org]
* Use [Component 
Directives|https://angular.io/guide/upgrade#using-component-directives] 
(requires first move from AngularJS 1.3 to 1.5)

The rationale for this lira is recognising how central the Admin UI is to Solr, 
not letting it rot on top of a dying framework. Better to start moving step by 
step and [perhaps write all new views in Angular 
5|https://angular.io/guide/upgrade#upgrading-with-ngupgrade], than to fall 
further and further behind.

This effort of course assumes that Angular.io is the path we want to go, and 
not React, VueJS or some other new kid on the block :-)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2899) Add OpenNLP Analysis capabilities as a module

2018-04-06 Thread Alexey Ponomarenko (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428310#comment-16428310
 ] 

Alexey Ponomarenko commented on LUCENE-2899:


[~lancenorskog] One more question. How can I use SMB and\or scp with SolrCloud 
correclty?

Event if I use someting like this: 
{code:java}
// smb://DESKTOP-LMQI80K/opennlp/en-tokenizer.bin or 
\\DESKTOP-LMQI80K/opennlp/en-tokenizer.bin or 
file://DESKTOP-LMQI80K/opennlp/en-tokenizer.bin
{code}

Solr is throwing strange error:


{code:java}
org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: 
Could not load conf for core numberplate_shard2_replica_n6: Can't load schema 
managed-schema: java.io.IOException: Error opening 
/configs/numberplate/smb://DESKTOP-LMQI80K/opennlp/en-tokenizer.bin
{code}

It seems that it "want to find" files inside of Zookeeper.  


> Add OpenNLP Analysis capabilities as a module
> -
>
> Key: LUCENE-2899
> URL: https://issues.apache.org/jira/browse/LUCENE-2899
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Grant Ingersoll
>Assignee: Steve Rowe
>Priority: Minor
> Fix For: 7.3, master (8.0)
>
> Attachments: LUCENE-2899-6.1.0.patch, LUCENE-2899-RJN.patch, 
> LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, 
> LUCENE-2899.patch, LUCENE-2899.patch, OpenNLPFilter.java, 
> OpenNLPTokenizer.java
>
>
> Now that OpenNLP is an ASF project and has a nice license, it would be nice 
> to have a submodule (under analysis) that exposed capabilities for it. Drew 
> Farris, Tom Morton and I have code that does:
> * Sentence Detection as a Tokenizer (could also be a TokenFilter, although it 
> would have to change slightly to buffer tokens)
> * NamedEntity recognition as a TokenFilter
> We are also planning a Tokenizer/TokenFilter that can put parts of speech as 
> either payloads (PartOfSpeechAttribute?) on a token or at the same position.
> I'd propose it go under:
> modules/analysis/opennlp



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-11913) SolrParams ought to implement Iterable<Map.Entry<String,String[]>>

2018-04-06 Thread Lucene/Solr QA (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-11913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428288#comment-16428288
 ] 

Lucene/Solr QA commented on SOLR-11913:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
21s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
16s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Release audit (RAT) {color} | 
{color:green}  1m  4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Check forbidden APIs {color} | 
{color:green}  1m  0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Validate source patterns {color} | 
{color:green}  1m  0s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  0m 
47s{color} | {color:green} dataimporthandler in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
39s{color} | {color:green} solrj in the patch passed. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}  9m 40s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | SOLR-11913 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12917498/SOLR-11913.patch |
| Optional Tests |  compile  javac  unit  ratsources  checkforbiddenapis  
validatesourcepatterns  |
| uname | Linux lucene1-us-west 3.13.0-88-generic #135-Ubuntu SMP Wed Jun 8 
21:10:42 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | ant |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh
 |
| git revision | master / 73d7410 |
| ant | version: Apache Ant(TM) version 1.9.3 compiled on April 8 2014 |
| Default Java | 1.8.0_152 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-SOLR-Build/40/testReport/ |
| modules | C: solr/contrib/dataimporthandler solr/solrj U: solr |
| Console output | 
https://builds.apache.org/job/PreCommit-SOLR-Build/40/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> SolrParams ought to implement Iterable>
> --
>
> Key: SOLR-11913
> URL: https://issues.apache.org/jira/browse/SOLR-11913
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
>  Labels: newdev
> Attachments: SOLR-11913.patch, SOLR-11913.patch, SOLR-11913.patch, 
> SOLR-11913_v2.patch
>
>
> SolrJ ought to implement {{Iterable>}} so that 
> it's easier to iterate on it, either using Java 5 for-each style, or Java 8 
> streams.  The implementation on ModifiableSolrParams can delegate through to 
> the underlying LinkedHashMap entry set.  The default impl can produce a 
> Map.Entry with a getValue that calls through to getParams.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-8083) Give similarities better values for maxScore

2018-04-06 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-8083.
--
Resolution: Invalid

This has been invalidated by the indexing of impacts.

> Give similarities better values for maxScore
> 
>
> Key: LUCENE-8083
> URL: https://issues.apache.org/jira/browse/LUCENE-8083
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-8083.patch
>
>
> The benefits of LUCENE-4100 largely depend on the quality of the upper bound 
> of the scores that is provided by the similarity.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-master-Linux (64bit/jdk-9.0.4) - Build # 21764 - Unstable!

2018-04-06 Thread Policeman Jenkins Server
Build: https://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/21764/
Java: 64bit/jdk-9.0.4 -XX:-UseCompressedOops -XX:+UseSerialGC

1 tests failed.
FAILED:  org.apache.solr.cloud.AddReplicaTest.test

Error Message:
core_node6:{"core":"addreplicatest_coll_shard1_replica_n5","base_url":"https://127.0.0.1:40353/solr","node_name":"127.0.0.1:40353_solr","state":"active","type":"NRT"}

Stack Trace:
java.lang.AssertionError: 
core_node6:{"core":"addreplicatest_coll_shard1_replica_n5","base_url":"https://127.0.0.1:40353/solr","node_name":"127.0.0.1:40353_solr","state":"active","type":"NRT"}
at 
__randomizedtesting.SeedInfo.seed([6E4DA87F31DFE996:E61997A59F23846E]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.assertTrue(Assert.java:43)
at org.apache.solr.cloud.AddReplicaTest.test(AddReplicaTest.java:84)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:564)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:934)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:970)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:984)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:943)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:829)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:879)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:890)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
at java.base/java.lang.Thread.run(Thread.java:844)




Build Log:
[...truncated 13616 lines...]
   [junit4] Suite: 

[jira] [Resolved] (LUCENE-8010) fix or sandbox similarities in core with problems

2018-04-06 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-8010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-8010.
--
Resolution: Fixed

> fix or sandbox similarities in core with problems
> -
>
> Key: LUCENE-8010
> URL: https://issues.apache.org/jira/browse/LUCENE-8010
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
>Priority: Major
> Fix For: master (8.0)
>
> Attachments: LUCENE-8010.patch
>
>
> We want to support scoring optimizations such as LUCENE-4100 and LUCENE-7993, 
> which put very minimal requirements on the similarity impl. Today 
> similarities of various quality are in core and tests. 
> The ones with problems currently have warnings in the javadocs about their 
> bugs, and if the problems are severe enough, then they are also disabled in 
> randomized testing too.
> IMO lucene core should only have practical functions that won't return 
> {{NaN}} scores at times or cause relevance to go backwards if the user's 
> stopfilter isn't configured perfectly. Also it is important for unit tests to 
> not deal with broken or semi-broken sims, and the ones in core should pass 
> all unit tests.
> I propose we move the buggy ones to sandbox and deprecate them. If they can 
> be fixed we can put them back in core, otherwise bye-bye.
> FWIW tests developed in LUCENE-7997 document the following requirements:
>* scores are non-negative and finite.
>* score matches the explanation exactly.
>* internal explanations calculations are sane (e.g. sum of: and so on 
> actually compute sums)
>* scores don't decrease as term frequencies increase: e.g. score(freq=N + 
> 1) >= score(freq=N)
>* scores don't decrease as documents get shorter, e.g. score(len=M) >= 
> score(len=M+1)
>* scores don't decrease as terms get rarer, e.g. score(term=N) >= 
> score(term=N+1)
>* scoring works for floating point frequencies (e.g. sloppy phrase and 
> span queries will work)
>* scoring works for reasonably large 64-bit statistic values (e.g. 
> distributed search will work)
>* scoring works for reasonably large boost values (0 .. Integer.MAX_VALUE, 
> e.g. query boosts will work)
>* scoring works for parameters randomized within valid ranges



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #345: LUCENE-8229: Add Weight.matches() method

2018-04-06 Thread jpountz
Github user jpountz commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/345#discussion_r179743866
  
--- Diff: 
lucene/core/src/test/org/apache/lucene/search/TestMatchesIterator.java ---
@@ -0,0 +1,317 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.search;
+
+import java.io.IOException;
+import java.util.HashSet;
+import java.util.Set;
+
+import org.apache.lucene.analysis.MockAnalyzer;
+import org.apache.lucene.document.Document;
+import org.apache.lucene.document.FieldType;
+import org.apache.lucene.document.NumericDocValuesField;
+import org.apache.lucene.document.TextField;
+import org.apache.lucene.index.IndexOptions;
+import org.apache.lucene.index.IndexReader;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.index.RandomIndexWriter;
+import org.apache.lucene.index.ReaderUtil;
+import org.apache.lucene.index.Term;
+import org.apache.lucene.store.Directory;
+import org.apache.lucene.util.LuceneTestCase;
+
+public class TestMatchesIterator extends LuceneTestCase {
+
+  protected IndexSearcher searcher;
+  protected Directory directory;
+  protected IndexReader reader;
+
+  public static final String FIELD_WITH_OFFSETS = "field_offsets";
+  public static final String FIELD_NO_OFFSETS = "field_no_offsets";
+
+  public static final FieldType OFFSETS = new 
FieldType(TextField.TYPE_STORED);
+  static {
+
OFFSETS.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
+  }
+
+  @Override
+  public void tearDown() throws Exception {
+reader.close();
+directory.close();
+super.tearDown();
+  }
+
+  @Override
+  public void setUp() throws Exception {
+super.setUp();
+directory = newDirectory();
+RandomIndexWriter writer = new RandomIndexWriter(random(), directory,
+newIndexWriterConfig(new 
MockAnalyzer(random())).setMergePolicy(newLogMergePolicy()));
+for (int i = 0; i < docFields.length; i++) {
+  Document doc = new Document();
+  doc.add(newField(FIELD_WITH_OFFSETS, docFields[i], OFFSETS));
+  doc.add(newField(FIELD_NO_OFFSETS, docFields[i], 
TextField.TYPE_STORED));
+  doc.add(new NumericDocValuesField("id", i));
+  doc.add(newField("id", Integer.toString(i), TextField.TYPE_STORED));
+  writer.addDocument(doc);
+}
+writer.forceMerge(1);
+reader = writer.getReader();
+writer.close();
+searcher = newSearcher(getOnlyLeafReader(reader));
+  }
+
+  protected String[] docFields = {
+  "w1 w2 w3 w4 w5",
+  "w1 w3 w2 w3 zz",
+  "w1 xx w2 yy w4",
+  "w1 w2 w1 w4 w2 w3",
+  "nothing matches this document"
+  };
+
+  void checkMatches(Query q, String field, int[][] expected) throws 
IOException {
+Weight w = searcher.createNormalizedWeight(q, 
ScoreMode.COMPLETE_NO_SCORES);
+for (int i = 0; i < expected.length; i++) {
+  LeafReaderContext ctx = 
searcher.leafContexts.get(ReaderUtil.subIndex(expected[i][0], 
searcher.leafContexts));
+  int doc = expected[i][0] - ctx.docBase;
+  Matches matches = w.matches(ctx, doc);
+  if (matches == null) {
+assertEquals(expected[i].length, 1);
+continue;
+  }
+  MatchesIterator it = matches.getMatches(field);
+  checkFieldMatches(it, expected[i]);
+}
+  }
+
+  void checkFieldMatches(MatchesIterator it, int[] expected) throws 
IOException {
+int pos = 1;
+while (it.next()) {
+  //System.out.println(expected[i][pos] + "->" + expected[i][pos + 1] 
+ "[" + expected[i][pos + 2] + "->" + expected[i][pos + 3] + "]");
+  assertEquals(expected[pos], it.startPosition());
+  assertEquals(expected[pos + 1], it.endPosition());
+  assertEquals(expected[pos + 2], it.startOffset());
+  assertEquals(expected[pos 

[GitHub] lucene-solr pull request #345: LUCENE-8229: Add Weight.matches() method

2018-04-06 Thread jpountz
Github user jpountz commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/345#discussion_r179744351
  
--- Diff: lucene/core/src/java/org/apache/lucene/search/Weight.java ---
@@ -69,6 +69,21 @@ protected Weight(Query query) {
*/
   public abstract void extractTerms(Set terms);
 
+  /**
+   * Returns {@link Matches} for a specific document, or {@code null} if 
the document
+   * does not match the parent query
--- End diff --

maybe mention that a match without positions will be reported as an empty 
instance?


---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #345: LUCENE-8229: Add Weight.matches() method

2018-04-06 Thread jpountz
Github user jpountz commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/345#discussion_r179741828
  
--- Diff: 
lucene/core/src/java/org/apache/lucene/search/DisjunctionMatchesIterator.java 
---
@@ -0,0 +1,160 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.search;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.index.PostingsEnum;
+import org.apache.lucene.index.Term;
+import org.apache.lucene.index.Terms;
+import org.apache.lucene.index.TermsEnum;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.BytesRefIterator;
+import org.apache.lucene.util.PriorityQueue;
+
+/**
+ * A {@link MatchesIterator} that combines matches from a set of 
sub-iterators
+ *
+ * Matches are sorted by their start positions, and then by their end 
positions, so that
+ * prefixes sort first.  Matches may overlap, or be duplicated if they 
appear in more
+ * than one of the sub-iterators.
+ */
+public final class DisjunctionMatchesIterator implements MatchesIterator {
--- End diff --

can we reduce visibility?


---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #345: LUCENE-8229: Add Weight.matches() method

2018-04-06 Thread jpountz
Github user jpountz commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/345#discussion_r179742139
  
--- Diff: 
lucene/core/src/java/org/apache/lucene/search/DisjunctionMatchesIterator.java 
---
@@ -0,0 +1,160 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.search;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.index.PostingsEnum;
+import org.apache.lucene.index.Term;
+import org.apache.lucene.index.Terms;
+import org.apache.lucene.index.TermsEnum;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.BytesRefIterator;
+import org.apache.lucene.util.PriorityQueue;
+
+/**
+ * A {@link MatchesIterator} that combines matches from a set of 
sub-iterators
+ *
+ * Matches are sorted by their start positions, and then by their end 
positions, so that
+ * prefixes sort first.  Matches may overlap, or be duplicated if they 
appear in more
+ * than one of the sub-iterators.
+ */
+public final class DisjunctionMatchesIterator implements MatchesIterator {
+
+  /**
+   * Create a {@link DisjunctionMatchesIterator} over a list of terms
+   *
+   * Only terms that have at least one match in the given document will be 
included
+   */
+  public static DisjunctionMatchesIterator fromTerms(LeafReaderContext 
context, int doc, String field, List terms) throws IOException {
+return fromTermsEnum(context, doc, field, asBytesRefIterator(terms));
+  }
+
+  private static BytesRefIterator asBytesRefIterator(List terms) {
+return new BytesRefIterator() {
+  int i = 0;
+  @Override
+  public BytesRef next() {
+if (i >= terms.size())
+  return null;
+return terms.get(i++).bytes();
+  }
+};
+  }
+
+  /**
+   * Create a {@link DisjunctionMatchesIterator} over a list of terms 
extracted from a {@link BytesRefIterator}
+   *
+   * Only terms that have at least one match in the given document will be 
included
+   */
+  public static DisjunctionMatchesIterator fromTermsEnum(LeafReaderContext 
context, int doc, String field, BytesRefIterator terms) throws IOException {
+List mis = new ArrayList<>();
+Terms t = context.reader().terms(field);
+if (t == null)
+  return null;
+TermsEnum te = t.iterator();
+PostingsEnum reuse = null;
+for (BytesRef term = terms.next(); term != null; term = terms.next()) {
+  if (te.seekExact(term)) {
+PostingsEnum pe = te.postings(reuse, PostingsEnum.OFFSETS);
+if (pe.advance(doc) == doc) {
+  // TODO do we want to use the copied term here, or instead 
create a label that associates all of the TMIs with a single term?
+  mis.add(new TermMatchesIterator(BytesRef.deepCopyOf(term), pe));
+  reuse = null;
+}
+else {
+  reuse = pe;
+}
+  }
+}
+if (mis.size() == 0)
+  return null;
+return new DisjunctionMatchesIterator(mis);
--- End diff --

should we specialize the size==1 case as well?


---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #345: LUCENE-8229: Add Weight.matches() method

2018-04-06 Thread jpountz
Github user jpountz commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/345#discussion_r179737921
  
--- Diff: 
lucene/core/src/java/org/apache/lucene/search/DisjunctionMatchesIterator.java 
---
@@ -0,0 +1,160 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.search;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.index.PostingsEnum;
+import org.apache.lucene.index.Term;
+import org.apache.lucene.index.Terms;
+import org.apache.lucene.index.TermsEnum;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.BytesRefIterator;
+import org.apache.lucene.util.PriorityQueue;
+
+/**
+ * A {@link MatchesIterator} that combines matches from a set of 
sub-iterators
+ *
+ * Matches are sorted by their start positions, and then by their end 
positions, so that
+ * prefixes sort first.  Matches may overlap, or be duplicated if they 
appear in more
+ * than one of the sub-iterators.
+ */
+public final class DisjunctionMatchesIterator implements MatchesIterator {
+
+  /**
+   * Create a {@link DisjunctionMatchesIterator} over a list of terms
+   *
+   * Only terms that have at least one match in the given document will be 
included
+   */
+  public static DisjunctionMatchesIterator fromTerms(LeafReaderContext 
context, int doc, String field, List terms) throws IOException {
+return fromTermsEnum(context, doc, field, asBytesRefIterator(terms));
--- End diff --

let's validate that all terms have `field` as a field, or directly take a 
list of BytesRefs?


---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #345: LUCENE-8229: Add Weight.matches() method

2018-04-06 Thread jpountz
Github user jpountz commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/345#discussion_r179737753
  
--- Diff: lucene/core/src/java/org/apache/lucene/search/Matches.java ---
@@ -0,0 +1,116 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.search;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.stream.Collectors;
+
+/**
+ * Reports the positions and optionally offsets of all matching terms in a 
query
+ * for a single document
+ *
+ * To obtain a {@link MatchesIterator} for a particular field, call {@link 
#getMatches(String)}
+ */
+public class Matches implements Iterable {
+
+  /**
+   * Indicates a match with no term positions, for example on a Point or 
DocValues field
+   */
+  public static final Matches MATCH_WITH_NO_TERMS = new 
Matches(Collections.emptyMap());
+
+  private final Map matches;
+
+  /**
+   * Create a simple {@link Matches} for a single field
+   */
+  public static Matches fromField(String field, MatchesIterator it) {
+if (it == null) {
+  return null;
+}
+return new Matches(field, it);
+  }
+
+  /**
+   * Amalgamate a collection of {@link Matches} into a single object
+   */
+  public static Matches fromSubMatches(List subMatches) throws 
IOException {
+if (subMatches == null || subMatches.size() == 0) {
+  return null;
+}
+subMatches = subMatches.stream().filter(m -> m != 
MATCH_WITH_NO_TERMS).collect(Collectors.toList());
+if (subMatches.size() == 0) {
+  return MATCH_WITH_NO_TERMS;
+}
+if (subMatches.size() == 1) {
+  return subMatches.get(0);
+}
+Map matches = new HashMap<>();
+Set allFields = new HashSet<>();
+for (Matches m : subMatches) {
+  for (String field : m) {
+allFields.add(field);
+  }
+}
+for (String field : allFields) {
+  List mis = new ArrayList<>();
+  for (Matches m : subMatches) {
+MatchesIterator mi = m.getMatches(field);
+if (mi != null) {
+  mis.add(mi);
+}
+  }
+  matches.put(field, DisjunctionMatchesIterator.fromSubIterators(mis));
+}
+return new Matches(matches);
+  }
+
+  /**
+   * Create a {@link Matches} from a map of fields to iterators
+   */
+  protected Matches(Map matches) {
+this.matches = matches;
+  }
+
+  private Matches(String field, MatchesIterator iterator) {
+this.matches = Collections.singletonMap(field, iterator);
+  }
+
+  /**
+   * Returns a {@link MatchesIterator} over the matches for a single field,
+   * or {@code null} if there are no matches in that field.
+   *
+   * This method always returns the same iterator, so clients should only
+   * call it once per field
--- End diff --

I find these semantics a bit error-prone. I'd like something like `Fields` 
where each call creates a new iterator better. 


---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #345: LUCENE-8229: Add Weight.matches() method

2018-04-06 Thread jpountz
Github user jpountz commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/345#discussion_r179742203
  
--- Diff: 
lucene/core/src/java/org/apache/lucene/search/DisjunctionMatchesIterator.java 
---
@@ -0,0 +1,160 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.search;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.index.PostingsEnum;
+import org.apache.lucene.index.Term;
+import org.apache.lucene.index.Terms;
+import org.apache.lucene.index.TermsEnum;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.BytesRefIterator;
+import org.apache.lucene.util.PriorityQueue;
+
+/**
+ * A {@link MatchesIterator} that combines matches from a set of 
sub-iterators
+ *
+ * Matches are sorted by their start positions, and then by their end 
positions, so that
+ * prefixes sort first.  Matches may overlap, or be duplicated if they 
appear in more
+ * than one of the sub-iterators.
+ */
+public final class DisjunctionMatchesIterator implements MatchesIterator {
+
+  /**
+   * Create a {@link DisjunctionMatchesIterator} over a list of terms
+   *
+   * Only terms that have at least one match in the given document will be 
included
+   */
+  public static DisjunctionMatchesIterator fromTerms(LeafReaderContext 
context, int doc, String field, List terms) throws IOException {
+return fromTermsEnum(context, doc, field, asBytesRefIterator(terms));
+  }
+
+  private static BytesRefIterator asBytesRefIterator(List terms) {
+return new BytesRefIterator() {
+  int i = 0;
+  @Override
+  public BytesRef next() {
+if (i >= terms.size())
+  return null;
+return terms.get(i++).bytes();
+  }
+};
+  }
+
+  /**
+   * Create a {@link DisjunctionMatchesIterator} over a list of terms 
extracted from a {@link BytesRefIterator}
+   *
+   * Only terms that have at least one match in the given document will be 
included
+   */
+  public static DisjunctionMatchesIterator fromTermsEnum(LeafReaderContext 
context, int doc, String field, BytesRefIterator terms) throws IOException {
--- End diff --

maybe return MatchesIterator? the impl shouldn't matter


---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



  1   2   >