[jira] [Commented] (SOLR-7275) Pluggable authorization module in Solr

2016-10-28 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15617446#comment-15617446
 ] 

Noble Paul commented on SOLR-7275:
--

This issue did not eliminate any feature.  The configuration you mentioned was 
not a feature of solr and it was not even documented. You please verify if it 
actually works and open a separate discussion

> Pluggable authorization module in Solr
> --
>
> Key: SOLR-7275
> URL: https://issues.apache.org/jira/browse/SOLR-7275
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Anshum Gupta
>Assignee: Anshum Gupta
> Fix For: 5.2
>
> Attachments: SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, 
> SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, 
> SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, 
> SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, 
> SOLR-7275.patch, SOLR-7275.patch
>
>
> Solr needs an interface that makes it easy for different authorization 
> systems to be plugged into it. Here's what I plan on doing:
> Define an interface {{SolrAuthorizationPlugin}} with one single method 
> {{isAuthorized}}. This would take in a {{SolrRequestContext}} object and 
> return an {{SolrAuthorizationResponse}} object. The object as of now would 
> only contain a single boolean value but in the future could contain more 
> information e.g. ACL for document filtering etc.
> The reason why we need a context object is so that the plugin doesn't need to 
> understand Solr's capabilities e.g. how to extract the name of the collection 
> or other information from the incoming request as there are multiple ways to 
> specify the target collection for a request. Similarly request type can be 
> specified by {{qt}} or {{/handler_name}}.
> Flow:
> Request -> SolrDispatchFilter -> isAuthorized(context) -> Process/Return.
> {code}
> public interface SolrAuthorizationPlugin {
>   public SolrAuthorizationResponse isAuthorized(SolrRequestContext context);
> }
> {code}
> {code}
> public  class SolrRequestContext {
>   UserInfo; // Will contain user context from the authentication layer.
>   HTTPRequest request;
>   Enum OperationType; // Correlated with user roles.
>   String[] CollectionsAccessed;
>   String[] FieldsAccessed;
>   String Resource;
> }
> {code}
> {code}
> public class SolrAuthorizationResponse {
>   boolean authorized;
>   public boolean isAuthorized();
> }
> {code}
> User Roles: 
> * Admin
> * Collection Level:
>   * Query
>   * Update
>   * Admin
> Using this framework, an implementation could be written for specific 
> security systems e.g. Apache Ranger or Sentry. It would keep all the security 
> system specific code out of Solr.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7275) Pluggable authorization module in Solr

2016-10-28 Thread Thomas Quinot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15617430#comment-15617430
 ] 

Thomas Quinot commented on SOLR-7275:
-

Back in the time of Solr 4, it was possible to control access using the Java 
security service, loading LoginService modules provided by Jetty. For example
 
   

   Infosys
  /myapp/auth/webauth.properties
 
   

allowed user authentication against a list of UNIX crypt(3) hashes.

Is this officially gone? If so this seems to be a significant regression.

If this is still supported, could the 
org.ecliporg.eclipse.jetty.plus.jaas.JAASLoginService class be added to the 
Jetty instance packaged with Solr? JAAS provides a lot of flexibility without 
requiring Solr to reinvent the wheel (for example allowing authentication 
against an LDAP server).

> Pluggable authorization module in Solr
> --
>
> Key: SOLR-7275
> URL: https://issues.apache.org/jira/browse/SOLR-7275
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Anshum Gupta
>Assignee: Anshum Gupta
> Fix For: 5.2
>
> Attachments: SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, 
> SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, 
> SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, 
> SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, SOLR-7275.patch, 
> SOLR-7275.patch, SOLR-7275.patch
>
>
> Solr needs an interface that makes it easy for different authorization 
> systems to be plugged into it. Here's what I plan on doing:
> Define an interface {{SolrAuthorizationPlugin}} with one single method 
> {{isAuthorized}}. This would take in a {{SolrRequestContext}} object and 
> return an {{SolrAuthorizationResponse}} object. The object as of now would 
> only contain a single boolean value but in the future could contain more 
> information e.g. ACL for document filtering etc.
> The reason why we need a context object is so that the plugin doesn't need to 
> understand Solr's capabilities e.g. how to extract the name of the collection 
> or other information from the incoming request as there are multiple ways to 
> specify the target collection for a request. Similarly request type can be 
> specified by {{qt}} or {{/handler_name}}.
> Flow:
> Request -> SolrDispatchFilter -> isAuthorized(context) -> Process/Return.
> {code}
> public interface SolrAuthorizationPlugin {
>   public SolrAuthorizationResponse isAuthorized(SolrRequestContext context);
> }
> {code}
> {code}
> public  class SolrRequestContext {
>   UserInfo; // Will contain user context from the authentication layer.
>   HTTPRequest request;
>   Enum OperationType; // Correlated with user roles.
>   String[] CollectionsAccessed;
>   String[] FieldsAccessed;
>   String Resource;
> }
> {code}
> {code}
> public class SolrAuthorizationResponse {
>   boolean authorized;
>   public boolean isAuthorized();
> }
> {code}
> User Roles: 
> * Admin
> * Collection Level:
>   * Query
>   * Update
>   * Admin
> Using this framework, an implementation could be written for specific 
> security systems e.g. Apache Ranger or Sentry. It would keep all the security 
> system specific code out of Solr.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-master-Solaris (64bit/jdk1.8.0) - Build # 935 - Unstable!

2016-10-28 Thread Policeman Jenkins Server
Build: https://jenkins.thetaphi.de/job/Lucene-Solr-master-Solaris/935/
Java: 64bit/jdk1.8.0 -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC

2 tests failed.
FAILED:  org.apache.solr.security.BasicAuthIntegrationTest.testBasicAuth

Error Message:
expected:<200> but was:<403>

Stack Trace:
java.lang.AssertionError: expected:<200> but was:<403>
at 
__randomizedtesting.SeedInfo.seed([64128F033964E134:D87CF9119D37624E]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:472)
at org.junit.Assert.assertEquals(Assert.java:456)
at 
org.apache.solr.security.BasicAuthIntegrationTest.executeCommand(BasicAuthIntegrationTest.java:231)
at 
org.apache.solr.security.BasicAuthIntegrationTest.testBasicAuth(BasicAuthIntegrationTest.java:143)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1713)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:907)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:943)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:957)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:811)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:462)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:916)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:802)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:852)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
at 

[JENKINS] Lucene-Solr-SmokeRelease-master - Build # 606 - Failure

2016-10-28 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-SmokeRelease-master/606/

No tests ran.

Build Log:
[...truncated 40573 lines...]
prepare-release-no-sign:
[mkdir] Created dir: 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/lucene/build/smokeTestRelease/dist
 [copy] Copying 476 files to 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/lucene/build/smokeTestRelease/dist/lucene
 [copy] Copying 245 files to 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/lucene/build/smokeTestRelease/dist/solr
   [smoker] Java 1.8 JAVA_HOME=/home/jenkins/tools/java/latest1.8
   [smoker] NOTE: output encoding is UTF-8
   [smoker] 
   [smoker] Load release URL 
"file:/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/lucene/build/smokeTestRelease/dist/"...
   [smoker] 
   [smoker] Test Lucene...
   [smoker]   test basics...
   [smoker]   get KEYS
   [smoker] 0.2 MB in 0.01 sec (18.7 MB/sec)
   [smoker]   check changes HTML...
   [smoker]   download lucene-7.0.0-src.tgz...
   [smoker] 30.0 MB in 0.04 sec (728.9 MB/sec)
   [smoker] verify md5/sha1 digests
   [smoker]   download lucene-7.0.0.tgz...
   [smoker] 64.6 MB in 0.09 sec (754.5 MB/sec)
   [smoker] verify md5/sha1 digests
   [smoker]   download lucene-7.0.0.zip...
   [smoker] 75.3 MB in 0.10 sec (733.3 MB/sec)
   [smoker] verify md5/sha1 digests
   [smoker]   unpack lucene-7.0.0.tgz...
   [smoker] verify JAR metadata/identity/no javax.* or java.* classes...
   [smoker] test demo with 1.8...
   [smoker]   got 6088 hits for query "lucene"
   [smoker] checkindex with 1.8...
   [smoker] check Lucene's javadoc JAR
   [smoker]   unpack lucene-7.0.0.zip...
   [smoker] verify JAR metadata/identity/no javax.* or java.* classes...
   [smoker] test demo with 1.8...
   [smoker]   got 6088 hits for query "lucene"
   [smoker] checkindex with 1.8...
   [smoker] check Lucene's javadoc JAR
   [smoker]   unpack lucene-7.0.0-src.tgz...
   [smoker] make sure no JARs/WARs in src dist...
   [smoker] run "ant validate"
   [smoker] run tests w/ Java 8 and testArgs='-Dtests.slow=false'...
   [smoker] test demo with 1.8...
   [smoker]   got 213 hits for query "lucene"
   [smoker] checkindex with 1.8...
   [smoker] generate javadocs w/ Java 8...
   [smoker] 
   [smoker] Crawl/parse...
   [smoker] 
   [smoker] Verify...
   [smoker]   confirm all releases have coverage in TestBackwardsCompatibility
   [smoker] find all past Lucene releases...
   [smoker] run TestBackwardsCompatibility..
   [smoker] success!
   [smoker] 
   [smoker] Test Solr...
   [smoker]   test basics...
   [smoker]   get KEYS
   [smoker] 0.2 MB in 0.00 sec (205.4 MB/sec)
   [smoker]   check changes HTML...
   [smoker]   download solr-7.0.0-src.tgz...
   [smoker] 39.4 MB in 0.05 sec (809.6 MB/sec)
   [smoker] verify md5/sha1 digests
   [smoker]   download solr-7.0.0.tgz...
   [smoker] 139.3 MB in 0.17 sec (811.6 MB/sec)
   [smoker] verify md5/sha1 digests
   [smoker]   download solr-7.0.0.zip...
   [smoker] 148.4 MB in 0.18 sec (822.7 MB/sec)
   [smoker] verify md5/sha1 digests
   [smoker]   unpack solr-7.0.0.tgz...
   [smoker] verify JAR metadata/identity/no javax.* or java.* classes...
   [smoker] unpack lucene-7.0.0.tgz...
   [smoker]   **WARNING**: skipping check of 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/lucene/build/smokeTestRelease/tmp/unpack/solr-7.0.0/contrib/dataimporthandler-extras/lib/javax.mail-1.5.1.jar:
 it has javax.* classes
   [smoker]   **WARNING**: skipping check of 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/lucene/build/smokeTestRelease/tmp/unpack/solr-7.0.0/contrib/dataimporthandler-extras/lib/activation-1.1.1.jar:
 it has javax.* classes
   [smoker] copying unpacked distribution for Java 8 ...
   [smoker] test solr example w/ Java 8...
   [smoker]   start Solr instance 
(log=/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/lucene/build/smokeTestRelease/tmp/unpack/solr-7.0.0-java8/solr-example.log)...
   [smoker] No process found for Solr node running on port 8983
   [smoker]   Running techproducts example on port 8983 from 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/lucene/build/smokeTestRelease/tmp/unpack/solr-7.0.0-java8
   [smoker] Creating Solr home directory 
/x1/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/lucene/build/smokeTestRelease/tmp/unpack/solr-7.0.0-java8/example/techproducts/solr
   [smoker] 
   [smoker] Starting up Solr on port 8983 using command:
   [smoker] bin/solr start -p 8983 -s "example/techproducts/solr"
   [smoker] 
   [smoker] Waiting up to 180 seconds to see Solr running on port 8983 [|]  
 [/]   [-]   [\]  
   [smoker] Started Solr server on port 8983 (pid=5321). Happy searching!
   [smoker] 
 

[JENKINS] Lucene-Solr-6.x-Linux (32bit/jdk1.8.0_102) - Build # 2063 - Failure!

2016-10-28 Thread Policeman Jenkins Server
Build: https://jenkins.thetaphi.de/job/Lucene-Solr-6.x-Linux/2063/
Java: 32bit/jdk1.8.0_102 -server -XX:+UseConcMarkSweepGC

All tests passed

Build Log:
[...truncated 6105 lines...]
   [junit4] ERROR: JVM J0 ended with an exception, command line: 
/home/jenkins/tools/java/32bit/jdk1.8.0_102/jre/bin/java -server 
-XX:+UseConcMarkSweepGC -XX:+HeapDumpOnOutOfMemoryError 
-XX:HeapDumpPath=/home/jenkins/workspace/Lucene-Solr-6.x-Linux/heapdumps -ea 
-esa -Dtests.prefix=tests -Dtests.seed=BE68A44A02F87297 -Xmx512M -Dtests.iters= 
-Dtests.verbose=false -Dtests.infostream=false -Dtests.codec=random 
-Dtests.postingsformat=random -Dtests.docvaluesformat=random 
-Dtests.locale=random -Dtests.timezone=random -Dtests.directory=random 
-Dtests.linedocsfile=europarl.lines.txt.gz -Dtests.luceneMatchVersion=6.3.0 
-Dtests.cleanthreads=perMethod 
-Djava.util.logging.config.file=/home/jenkins/workspace/Lucene-Solr-6.x-Linux/lucene/tools/junit4/logging.properties
 -Dtests.nightly=false -Dtests.weekly=false -Dtests.monster=false 
-Dtests.slow=true -Dtests.asserts=true -Dtests.multiplier=3 -DtempDir=./temp 
-Djava.io.tmpdir=./temp 
-Djunit4.tempDir=/home/jenkins/workspace/Lucene-Solr-6.x-Linux/lucene/build/codecs/test/temp
 -Dcommon.dir=/home/jenkins/workspace/Lucene-Solr-6.x-Linux/lucene 
-Dclover.db.dir=/home/jenkins/workspace/Lucene-Solr-6.x-Linux/lucene/build/clover/db
 
-Djava.security.policy=/home/jenkins/workspace/Lucene-Solr-6.x-Linux/lucene/tools/junit4/tests.policy
 -Dtests.LUCENE_VERSION=6.3.0 -Djetty.testMode=1 -Djetty.insecurerandom=1 
-Dsolr.directoryFactory=org.apache.solr.core.MockDirectoryFactory 
-Djava.awt.headless=true -Djdk.map.althashing.threshold=0 
-Djunit4.childvm.cwd=/home/jenkins/workspace/Lucene-Solr-6.x-Linux/lucene/build/codecs/test/J0
 -Djunit4.childvm.id=0 -Djunit4.childvm.count=3 -Dtests.leaveTemporary=false 
-Dtests.filterstacks=true -Dtests.disableHdfs=true 
-Djava.security.manager=org.apache.lucene.util.TestSecurityManager 
-Dfile.encoding=ISO-8859-1 -classpath 
/home/jenkins/workspace/Lucene-Solr-6.x-Linux/lucene/build/codecs/classes/test:/home/jenkins/workspace/Lucene-Solr-6.x-Linux/lucene/build/test-framework/classes/java:/home/jenkins/workspace/Lucene-Solr-6.x-Linux/lucene/build/codecs/classes/java:/home/jenkins/workspace/Lucene-Solr-6.x-Linux/lucene/build/core/classes/java:/home/jenkins/workspace/Lucene-Solr-6.x-Linux/lucene/test-framework/lib/junit-4.10.jar:/home/jenkins/workspace/Lucene-Solr-6.x-Linux/lucene/test-framework/lib/randomizedtesting-runner-2.4.0.jar:/var/lib/jenkins/tools/hudson.tasks.Ant_AntInstallation/ANT_1.8.2/lib/ant-launcher.jar:/home/jenkins/.ant/lib/ivy-2.3.0.jar:/var/lib/jenkins/tools/hudson.tasks.Ant_AntInstallation/ANT_1.8.2/lib/ant-jdepend.jar:/var/lib/jenkins/tools/hudson.tasks.Ant_AntInstallation/ANT_1.8.2/lib/ant.jar:/var/lib/jenkins/tools/hudson.tasks.Ant_AntInstallation/ANT_1.8.2/lib/ant-apache-bcel.jar:/var/lib/jenkins/tools/hudson.tasks.Ant_AntInstallation/ANT_1.8.2/lib/ant-jmf.jar:/var/lib/jenkins/tools/hudson.tasks.Ant_AntInstallation/ANT_1.8.2/lib/ant-junit4.jar:/var/lib/jenkins/tools/hudson.tasks.Ant_AntInstallation/ANT_1.8.2/lib/ant-apache-xalan2.jar:/var/lib/jenkins/tools/hudson.tasks.Ant_AntInstallation/ANT_1.8.2/lib/ant-javamail.jar:/var/lib/jenkins/tools/hudson.tasks.Ant_AntInstallation/ANT_1.8.2/lib/ant-jai.jar:/var/lib/jenkins/tools/hudson.tasks.Ant_AntInstallation/ANT_1.8.2/lib/ant-apache-bsf.jar:/var/lib/jenkins/tools/hudson.tasks.Ant_AntInstallation/ANT_1.8.2/lib/ant-commons-logging.jar:/var/lib/jenkins/tools/hudson.tasks.Ant_AntInstallation/ANT_1.8.2/lib/ant-commons-net.jar:/var/lib/jenkins/tools/hudson.tasks.Ant_AntInstallation/ANT_1.8.2/lib/ant-apache-resolver.jar:/var/lib/jenkins/tools/hudson.tasks.Ant_AntInstallation/ANT_1.8.2/lib/ant-apache-log4j.jar:/var/lib/jenkins/tools/hudson.tasks.Ant_AntInstallation/ANT_1.8.2/lib/ant-junit.jar:/var/lib/jenkins/tools/hudson.tasks.Ant_AntInstallation/ANT_1.8.2/lib/ant-apache-oro.jar:/var/lib/jenkins/tools/hudson.tasks.Ant_AntInstallation/ANT_1.8.2/lib/ant-antlr.jar:/var/lib/jenkins/tools/hudson.tasks.Ant_AntInstallation/ANT_1.8.2/lib/ant-jsch.jar:/var/lib/jenkins/tools/hudson.tasks.Ant_AntInstallation/ANT_1.8.2/lib/ant-apache-regexp.jar:/var/lib/jenkins/tools/hudson.tasks.Ant_AntInstallation/ANT_1.8.2/lib/ant-swing.jar:/var/lib/jenkins/tools/hudson.tasks.Ant_AntInstallation/ANT_1.8.2/lib/ant-testutil.jar:/var/lib/jenkins/tools/hudson.tasks.Ant_AntInstallation/ANT_1.8.2/lib/ant-netrexx.jar:/home/jenkins/tools/java/32bit/jdk1.8.0_102/lib/tools.jar:/home/jenkins/.ivy2/cache/com.carrotsearch.randomizedtesting/junit4-ant/jars/junit4-ant-2.4.0.jar
 com.carrotsearch.ant.tasks.junit4.slave.SlaveMainSafe -eventsfile 
/home/jenkins/workspace/Lucene-Solr-6.x-Linux/lucene/build/codecs/test/temp/junit4-J0-20161029_025543_432.events
 
@/home/jenkins/workspace/Lucene-Solr-6.x-Linux/lucene/build/codecs/test/temp/junit4-J0-20161029_025543_432.suites
 -stdin
   [junit4] ERROR: JVM J0 ended with an exception: Quit 

[JENKINS] Lucene-Solr-master-Linux (32bit/jdk1.8.0_102) - Build # 18165 - Unstable!

2016-10-28 Thread Policeman Jenkins Server
Build: https://jenkins.thetaphi.de/job/Lucene-Solr-master-Linux/18165/
Java: 32bit/jdk1.8.0_102 -client -XX:+UseSerialGC

1 tests failed.
FAILED:  
org.apache.solr.handler.TestReplicationHandler.doTestIndexAndConfigAliasReplication

Error Message:
expected:<1> but was:<0>

Stack Trace:
java.lang.AssertionError: expected:<1> but was:<0>
at 
__randomizedtesting.SeedInfo.seed([3C65A73C9568FCBC:CB1649645380535A]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:472)
at org.junit.Assert.assertEquals(Assert.java:456)
at 
org.apache.solr.handler.TestReplicationHandler.doTestIndexAndConfigAliasReplication(TestReplicationHandler.java:1329)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1713)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:907)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:943)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:957)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:811)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:462)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:916)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:802)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:852)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
at java.lang.Thread.run(Thread.java:745)




Build Log:
[...truncated 11246 lines...]
   [junit4] Suite: 

[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-10-28 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616986#comment-15616986
 ] 

Julian Hyde commented on SOLR-8593:
---

Ah, I think I see what's going on. You're using avatica-1.9-SNAPSHOT with 
calcite-1.10. calcite-1.10 requires avatica-1.8, so you should use that. (Or is 
there a good reason why you need avatica-1.9?)

By the way, avatica-1.9 is less than a week from release. calcite-1.11 is maybe 
a month to six weeks away. The exact compatibility issues you describe are 
covered in CALCITE-1270 (and see the PR attached to that case).

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>
>The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-10-28 Thread Julian Hyde (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616963#comment-15616963
 ] 

Julian Hyde commented on SOLR-8593:
---

Is there a Calcite issue logged for the AbstractMethodError relating to 
CalciteConnectionProperty? I see [others are running into the same 
problem|http://stackoverflow.com/questions/39318653/create-a-streaming-example-with-calcite-using-csv]
 and I want to document the solution (or fix the bug in Calcite/Avatica if it 
is a bug).

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>
>The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-10-28 Thread Cao Manh Dat (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cao Manh Dat updated SOLR-8593:
---
Description: 
   The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
nicely split off from the larger Presto project and it did everything that was 
needed for the initial implementation.

Phase two of the SQL work though will require an optimizer. Here is where 
Apache Calcite comes into play. It has a battle tested cost based optimizer and 
has been integrated into Apache Drill and Hive.

This work can begin in trunk following the 6.0 release. The final query plans 
will continue to be translated to Streaming API objects (TupleStreams), so 
continued work on the JDBC driver should plug in nicely with the Calcite work.

  was:
The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
nicely split off from the larger Presto project and it did everything that was 
needed for the initial implementation.

Phase two of the SQL work though will require an optimizer. Here is where 
Apache Calcite comes into play. It has a battle tested cost based optimizer and 
has been integrated into Apache Drill and Hive.

This work can begin in trunk following the 6.0 release. The final query plans 
will continue to be translated to Streaming API objects (TupleStreams), so 
continued work on the JDBC driver should plug in nicely with the Calcite work.


> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>
>The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7525) ASCIIFoldingFilter.foldToASCII performance issue due to large compiled method size

2016-10-28 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616922#comment-15616922
 ] 

Ahmet Arslan commented on LUCENE-7525:
--

Can workings of ICUFoldingFilter give any insight here?

> ASCIIFoldingFilter.foldToASCII performance issue due to large compiled method 
> size
> --
>
> Key: LUCENE-7525
> URL: https://issues.apache.org/jira/browse/LUCENE-7525
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 6.2.1
>Reporter: Karl von Randow
> Attachments: ASCIIFolding.java, ASCIIFoldingFilter.java, 
> TestASCIIFolding.java
>
>
> The {{ASCIIFoldingFilter.foldToASCII}} method has an enormous switch 
> statement and is too large for the HotSpot compiler to compile; causing a 
> performance problem.
> The method is about 13K compiled, versus the 8KB HotSpot limit. So splitting 
> the method in half works around the problem.
> In my tests splitting the method in half resulted in a 5X performance 
> increase.
> In the test code below you can see how slow the fold method is, even when it 
> is using the shortcut when the character is less than 0x80, compared to an 
> inline implementation of the same shortcut.
> So a workaround is to split the method. I'm happy to provide a patch. It's a 
> hack, of course. Perhaps using the {{MappingCharFilterFactory}} with an input 
> file as per SOLR-2013 would be a better replacement for this method in this 
> class?
> {code:java}
> public class ASCIIFoldingFilterPerformanceTest {
>   private static final int ITERATIONS = 1_000_000;
>   @Test
>   public void testFoldShortString() {
>   char[] input = "testing".toCharArray();
>   char[] output = new char[input.length * 4];
>   for (int i = 0; i < ITERATIONS; i++) {
>   ASCIIFoldingFilter.foldToASCII(input, 0, output, 0, 
> input.length);
>   }
>   }
>   @Test
>   public void testFoldShortAccentedString() {
>   char[] input = "éúéúøßüäéúéúøßüä".toCharArray();
>   char[] output = new char[input.length * 4];
>   for (int i = 0; i < ITERATIONS; i++) {
>   ASCIIFoldingFilter.foldToASCII(input, 0, output, 0, 
> input.length);
>   }
>   }
>   @Test
>   public void testManualFoldTinyString() {
>   char[] input = "t".toCharArray();
>   char[] output = new char[input.length * 4];
>   for (int i = 0; i < ITERATIONS; i++) {
>   int k = 0;
>   for (int j = 0; j < 1; ++j) {
>   final char c = input[j];
>   if (c < '\u0080') {
>   output[k++] = c;
>   } else {
>   Assert.assertTrue(false);
>   }
>   }
>   }
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7526) Improvements to UnifiedHighlighter OffsetStrategies

2016-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616778#comment-15616778
 ] 

ASF GitHub Bot commented on LUCENE-7526:


Github user Timothy055 commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/105#discussion_r85619786
  
--- Diff: 
lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/FieldOffsetStrategy.java
 ---
@@ -65,58 +65,88 @@ public String getField() {
*/
   public abstract List getOffsetsEnums(IndexReader reader, 
int docId, String content) throws IOException;
 
-  protected List createOffsetsEnums(LeafReader leafReader, 
int doc, TokenStream tokenStream) throws IOException {
-List offsetsEnums = 
createOffsetsEnumsFromReader(leafReader, doc);
-if (automata.length > 0) {
-  offsetsEnums.add(createOffsetsEnumFromTokenStream(doc, tokenStream));
+  protected List createOffsetsEnumsFromReader(LeafReader 
leafReader, int doc) throws IOException {
+final Terms termsIndex = leafReader.terms(field);
+if (termsIndex == null) {
+  return Collections.emptyList();
 }
-return offsetsEnums;
-  }
 
-  protected List createOffsetsEnumsFromReader(LeafReader 
atomicReader, int doc) throws IOException {
 // For strict positions, get a Map of term to Spans:
 //note: ScriptPhraseHelper.NONE does the right thing for these 
method calls
 final Map strictPhrasesTermToSpans =
-strictPhrases.getTermToSpans(atomicReader, doc);
+phraseHelper.getTermToSpans(leafReader, doc);
 // Usually simply wraps terms in a List; but if willRewrite() then can 
be expanded
 final List sourceTerms =
-strictPhrases.expandTermsIfRewrite(terms, 
strictPhrasesTermToSpans);
+phraseHelper.expandTermsIfRewrite(terms, strictPhrasesTermToSpans);
 
-final List offsetsEnums = new 
ArrayList<>(sourceTerms.size() + 1);
+final List offsetsEnums = new 
ArrayList<>(sourceTerms.size() + automata.length);
 
-Terms termsIndex = atomicReader == null || sourceTerms.isEmpty() ? 
null : atomicReader.terms(field);
-if (termsIndex != null) {
+// Handle sourceTerms:
+if (!sourceTerms.isEmpty()) {
   TermsEnum termsEnum = termsIndex.iterator();//does not return null
   for (BytesRef term : sourceTerms) {
-if (!termsEnum.seekExact(term)) {
-  continue; // term not found
-}
-PostingsEnum postingsEnum = termsEnum.postings(null, 
PostingsEnum.OFFSETS);
-if (postingsEnum == null) {
-  // no offsets or positions available
-  throw new IllegalArgumentException("field '" + field + "' was 
indexed without offsets, cannot highlight");
-}
-if (doc != postingsEnum.advance(doc)) { // now it's positioned, 
although may be exhausted
-  continue;
+if (termsEnum.seekExact(term)) {
+  PostingsEnum postingsEnum = termsEnum.postings(null, 
PostingsEnum.OFFSETS);
+
+  if (postingsEnum == null) {
+// no offsets or positions available
+throw new IllegalArgumentException("field '" + field + "' was 
indexed without offsets, cannot highlight");
+  }
+
+  if (doc == postingsEnum.advance(doc)) { // now it's positioned, 
although may be exhausted
+postingsEnum = phraseHelper.filterPostings(term, postingsEnum, 
strictPhrasesTermToSpans.get(term));
+if (postingsEnum != null) {
+  offsetsEnums.add(new OffsetsEnum(term, postingsEnum));
+}
+  }
 }
-postingsEnum = strictPhrases.filterPostings(term, postingsEnum, 
strictPhrasesTermToSpans.get(term));
-if (postingsEnum == null) {
-  continue;// completely filtered out
+  }
+}
+
+// Handle automata
+if (automata.length > 0) {
+  offsetsEnums.addAll(createAutomataOffsetsFromTerms(termsIndex, doc));
+}
+
+return offsetsEnums;
+  }
+
+  protected List createAutomataOffsetsFromTerms(Terms 
termsIndex, int doc) throws IOException {
+Map automataPostings = new 
IdentityHashMap<>(automata.length);
--- End diff --

Pushed that for now to see what you think.


> Improvements to UnifiedHighlighter OffsetStrategies
> ---
>
> Key: LUCENE-7526
> URL: https://issues.apache.org/jira/browse/LUCENE-7526
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/highlighter
>Reporter: Timothy M. Rodriguez
>Assignee: 

[jira] [Comment Edited] (SOLR-5260) Facet search on a docvalue field in a multi shard collection

2016-10-28 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616772#comment-15616772
 ] 

Erick Erickson edited comment on SOLR-5260 at 10/28/16 10:22 PM:
-

This is on trunk, fresh pull.

I was looking at this today and it's a problem indeed. Since we're advocating 
using docValues for faceting failing sometimes and succeeding others is 
disconcerting.

I have a field indexed="false" docValues="true". Sometimes it works and 
sometimes it doesn't, it depends as Trym says, on how many docs are in the 
result set and the number of shards. The error is reported as Trym indicated, 
even on a current trunk.

Caused by: java.lang.IllegalStateException: Cannot use facet.mincount=0 on 
field eoe which is not indexed
at 
org.apache.solr.request.NumericFacets.getCounts(NumericFacets.java:256)
at 
org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:465)

json facets work with mincount>0. If mincount=0 it fails with an error message 
(on the client and the server) something like:
"Numeric fields do not support facet mincount=0; try indexing as terms".

Anyway, I'm surely not going to get to this in the near  future, so 
un-assigning it to myself. Not something for 6.3 as it's been around for a long 
time.

I guess there are two work-arounds at present:
1> use json facets
or
2> set index=true




was (Author: erickerickson):
This is on trunk, fresh pull.

I was looking at this today and it's a problem indeed. Since we're advocating 
using docValues for faceting failing sometimes and succeeding others is 
disconcerting.

I have a field indexed="false" docValues="true". Sometimes it works and 
sometimes it doesn't, it depends as Trym says, on how many docs are in the 
result set and the number of shards. The error is reported as Trym indicated, 
even on a current trunk.

Caused by: java.lang.IllegalStateException: Cannot use facet.mincount=0 on 
field eoe which is not indexed
at 
org.apache.solr.request.NumericFacets.getCounts(NumericFacets.java:256)
at 
org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:465)

json facets work with mincount>0. If mincount=0 it fails with an error message 
(on the client and the server) something like:
"Numeric fields do not support facet mincount=0; try indexing as terms".

Anyway, I'm surely not going to get to this in the near  future, so 
un-assigning it to myself. Not something for 6.3 as it's been around for a long 
time.



> Facet search on a docvalue field in a multi shard collection
> 
>
> Key: SOLR-5260
> URL: https://issues.apache.org/jira/browse/SOLR-5260
> Project: Solr
>  Issue Type: Bug
>  Components: search, SolrCloud
>Affects Versions: 4.4
>Reporter: Trym Møller
>
> I have a problem doing facet search on a doc value field in a multi shard 
> collection.
> My Solr schema specifies fieldA as a docvalue type and I have created a two 
> shard collection using Solr 4.4.0 (and the unreleased 4.5 branch).
> When I do a facet search on fieldA with a "large" facet.limit then the query 
> fails with the below exception
> A "large" facet.limit seems to be when (10 + (facet.limit * 1,5)) * number of 
> shards > rows matching my query
> The exception does not occur when I run with a single shard collection.
> It can easily be reproduced by indexing a single row and querying it, as the 
> default facet.limit is 100.
> The facet query received by Solr looks as follows:
> {noformat}
> 576793 [qtp170860084-18] INFO  org.apache.solr.core.SolrCore  ¦ 
> [trym_shard2_replica1] webapp=/solr path=/select 
>  
> params={facet=true=0=*:*=true=trym=fieldA=javabin=2=0}
>  
>  status=500 QTime=20
> {noformat}
> One of the "internal query" send by Solr to its shard looks like
> {noformat}
> 576783 [qtp170860084-19] INFO  org.apache.solr.core.SolrCore  ¦ 
> [trym_shard1_replica1] webapp=/solr path=/select 
>  
> params={facet=true=false=trym=javabin=2=0=1379855011787
> 
>
> =192.168.56.1:8501/solr/trym_shard1_replica1/=text=id,score=160
>=0=*:*=fieldA=true=true} 
>  hits=1 status=500 QTime=2
> {noformat}
> The exception thrown by Solr is as follows
> {noformat}
> 576784 [qtp170860084-17] ERROR org.apache.solr.servlet.SolrDispatchFilter  ¦ 
> null:java.lang.IllegalStateException: 
>  Cannot use facet.mincount=0 on a field which is not indexed
> at 
> org.apache.solr.request.NumericFacets.getCounts(NumericFacets.java:257)
> at 
> org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:423)
> at 
> org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:530)
> at 
> org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:259)
> at 
> 

[GitHub] lucene-solr pull request #105: LUCENE-7526 Improvements to UnifiedHighlighte...

2016-10-28 Thread Timothy055
Github user Timothy055 commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/105#discussion_r85619786
  
--- Diff: 
lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/FieldOffsetStrategy.java
 ---
@@ -65,58 +65,88 @@ public String getField() {
*/
   public abstract List getOffsetsEnums(IndexReader reader, 
int docId, String content) throws IOException;
 
-  protected List createOffsetsEnums(LeafReader leafReader, 
int doc, TokenStream tokenStream) throws IOException {
-List offsetsEnums = 
createOffsetsEnumsFromReader(leafReader, doc);
-if (automata.length > 0) {
-  offsetsEnums.add(createOffsetsEnumFromTokenStream(doc, tokenStream));
+  protected List createOffsetsEnumsFromReader(LeafReader 
leafReader, int doc) throws IOException {
+final Terms termsIndex = leafReader.terms(field);
+if (termsIndex == null) {
+  return Collections.emptyList();
 }
-return offsetsEnums;
-  }
 
-  protected List createOffsetsEnumsFromReader(LeafReader 
atomicReader, int doc) throws IOException {
 // For strict positions, get a Map of term to Spans:
 //note: ScriptPhraseHelper.NONE does the right thing for these 
method calls
 final Map strictPhrasesTermToSpans =
-strictPhrases.getTermToSpans(atomicReader, doc);
+phraseHelper.getTermToSpans(leafReader, doc);
 // Usually simply wraps terms in a List; but if willRewrite() then can 
be expanded
 final List sourceTerms =
-strictPhrases.expandTermsIfRewrite(terms, 
strictPhrasesTermToSpans);
+phraseHelper.expandTermsIfRewrite(terms, strictPhrasesTermToSpans);
 
-final List offsetsEnums = new 
ArrayList<>(sourceTerms.size() + 1);
+final List offsetsEnums = new 
ArrayList<>(sourceTerms.size() + automata.length);
 
-Terms termsIndex = atomicReader == null || sourceTerms.isEmpty() ? 
null : atomicReader.terms(field);
-if (termsIndex != null) {
+// Handle sourceTerms:
+if (!sourceTerms.isEmpty()) {
   TermsEnum termsEnum = termsIndex.iterator();//does not return null
   for (BytesRef term : sourceTerms) {
-if (!termsEnum.seekExact(term)) {
-  continue; // term not found
-}
-PostingsEnum postingsEnum = termsEnum.postings(null, 
PostingsEnum.OFFSETS);
-if (postingsEnum == null) {
-  // no offsets or positions available
-  throw new IllegalArgumentException("field '" + field + "' was 
indexed without offsets, cannot highlight");
-}
-if (doc != postingsEnum.advance(doc)) { // now it's positioned, 
although may be exhausted
-  continue;
+if (termsEnum.seekExact(term)) {
+  PostingsEnum postingsEnum = termsEnum.postings(null, 
PostingsEnum.OFFSETS);
+
+  if (postingsEnum == null) {
+// no offsets or positions available
+throw new IllegalArgumentException("field '" + field + "' was 
indexed without offsets, cannot highlight");
+  }
+
+  if (doc == postingsEnum.advance(doc)) { // now it's positioned, 
although may be exhausted
+postingsEnum = phraseHelper.filterPostings(term, postingsEnum, 
strictPhrasesTermToSpans.get(term));
+if (postingsEnum != null) {
+  offsetsEnums.add(new OffsetsEnum(term, postingsEnum));
+}
+  }
 }
-postingsEnum = strictPhrases.filterPostings(term, postingsEnum, 
strictPhrasesTermToSpans.get(term));
-if (postingsEnum == null) {
-  continue;// completely filtered out
+  }
+}
+
+// Handle automata
+if (automata.length > 0) {
+  offsetsEnums.addAll(createAutomataOffsetsFromTerms(termsIndex, doc));
+}
+
+return offsetsEnums;
+  }
+
+  protected List createAutomataOffsetsFromTerms(Terms 
termsIndex, int doc) throws IOException {
+Map automataPostings = new 
IdentityHashMap<>(automata.length);
--- End diff --

Pushed that for now to see what you think.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5260) Facet search on a docvalue field in a multi shard collection

2016-10-28 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-5260:
-
Assignee: (was: Erick Erickson)

> Facet search on a docvalue field in a multi shard collection
> 
>
> Key: SOLR-5260
> URL: https://issues.apache.org/jira/browse/SOLR-5260
> Project: Solr
>  Issue Type: Bug
>  Components: search, SolrCloud
>Affects Versions: 4.4
>Reporter: Trym Møller
>
> I have a problem doing facet search on a doc value field in a multi shard 
> collection.
> My Solr schema specifies fieldA as a docvalue type and I have created a two 
> shard collection using Solr 4.4.0 (and the unreleased 4.5 branch).
> When I do a facet search on fieldA with a "large" facet.limit then the query 
> fails with the below exception
> A "large" facet.limit seems to be when (10 + (facet.limit * 1,5)) * number of 
> shards > rows matching my query
> The exception does not occur when I run with a single shard collection.
> It can easily be reproduced by indexing a single row and querying it, as the 
> default facet.limit is 100.
> The facet query received by Solr looks as follows:
> {noformat}
> 576793 [qtp170860084-18] INFO  org.apache.solr.core.SolrCore  ¦ 
> [trym_shard2_replica1] webapp=/solr path=/select 
>  
> params={facet=true=0=*:*=true=trym=fieldA=javabin=2=0}
>  
>  status=500 QTime=20
> {noformat}
> One of the "internal query" send by Solr to its shard looks like
> {noformat}
> 576783 [qtp170860084-19] INFO  org.apache.solr.core.SolrCore  ¦ 
> [trym_shard1_replica1] webapp=/solr path=/select 
>  
> params={facet=true=false=trym=javabin=2=0=1379855011787
> 
>
> =192.168.56.1:8501/solr/trym_shard1_replica1/=text=id,score=160
>=0=*:*=fieldA=true=true} 
>  hits=1 status=500 QTime=2
> {noformat}
> The exception thrown by Solr is as follows
> {noformat}
> 576784 [qtp170860084-17] ERROR org.apache.solr.servlet.SolrDispatchFilter  ¦ 
> null:java.lang.IllegalStateException: 
>  Cannot use facet.mincount=0 on a field which is not indexed
> at 
> org.apache.solr.request.NumericFacets.getCounts(NumericFacets.java:257)
> at 
> org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:423)
> at 
> org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:530)
> at 
> org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:259)
> at 
> org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:78)
> at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
> at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
> at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
> at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
> at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
> at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
> at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
> at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
> at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
> at 
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
> at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
> at org.eclipse.jetty.server.Server.handle(Server.java:368)
> at 
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
> at 
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
> at 
> org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
> at 
> 

[jira] [Commented] (SOLR-5260) Facet search on a docvalue field in a multi shard collection

2016-10-28 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616772#comment-15616772
 ] 

Erick Erickson commented on SOLR-5260:
--

This is on trunk, fresh pull.

I was looking at this today and it's a problem indeed. Since we're advocating 
using docValues for faceting failing sometimes and succeeding others is 
disconcerting.

I have a field indexed="false" docValues="true". Sometimes it works and 
sometimes it doesn't, it depends as Trym says, on how many docs are in the 
result set and the number of shards. The error is reported as Trym indicated, 
even on a current trunk.

Caused by: java.lang.IllegalStateException: Cannot use facet.mincount=0 on 
field eoe which is not indexed
at 
org.apache.solr.request.NumericFacets.getCounts(NumericFacets.java:256)
at 
org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:465)

json facets work with mincount>0. If mincount=0 it fails with an error message 
(on the client and the server) something like:
"Numeric fields do not support facet mincount=0; try indexing as terms".

Anyway, I'm surely not going to get to this in the near  future, so 
un-assigning it to myself. Not something for 6.3 as it's been around for a long 
time.



> Facet search on a docvalue field in a multi shard collection
> 
>
> Key: SOLR-5260
> URL: https://issues.apache.org/jira/browse/SOLR-5260
> Project: Solr
>  Issue Type: Bug
>  Components: search, SolrCloud
>Affects Versions: 4.4
>Reporter: Trym Møller
>Assignee: Erick Erickson
>
> I have a problem doing facet search on a doc value field in a multi shard 
> collection.
> My Solr schema specifies fieldA as a docvalue type and I have created a two 
> shard collection using Solr 4.4.0 (and the unreleased 4.5 branch).
> When I do a facet search on fieldA with a "large" facet.limit then the query 
> fails with the below exception
> A "large" facet.limit seems to be when (10 + (facet.limit * 1,5)) * number of 
> shards > rows matching my query
> The exception does not occur when I run with a single shard collection.
> It can easily be reproduced by indexing a single row and querying it, as the 
> default facet.limit is 100.
> The facet query received by Solr looks as follows:
> {noformat}
> 576793 [qtp170860084-18] INFO  org.apache.solr.core.SolrCore  ¦ 
> [trym_shard2_replica1] webapp=/solr path=/select 
>  
> params={facet=true=0=*:*=true=trym=fieldA=javabin=2=0}
>  
>  status=500 QTime=20
> {noformat}
> One of the "internal query" send by Solr to its shard looks like
> {noformat}
> 576783 [qtp170860084-19] INFO  org.apache.solr.core.SolrCore  ¦ 
> [trym_shard1_replica1] webapp=/solr path=/select 
>  
> params={facet=true=false=trym=javabin=2=0=1379855011787
> 
>
> =192.168.56.1:8501/solr/trym_shard1_replica1/=text=id,score=160
>=0=*:*=fieldA=true=true} 
>  hits=1 status=500 QTime=2
> {noformat}
> The exception thrown by Solr is as follows
> {noformat}
> 576784 [qtp170860084-17] ERROR org.apache.solr.servlet.SolrDispatchFilter  ¦ 
> null:java.lang.IllegalStateException: 
>  Cannot use facet.mincount=0 on a field which is not indexed
> at 
> org.apache.solr.request.NumericFacets.getCounts(NumericFacets.java:257)
> at 
> org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:423)
> at 
> org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:530)
> at 
> org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:259)
> at 
> org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:78)
> at 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)
> at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
> at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
> at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
> at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
> at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
> at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
> at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
>

[jira] [Commented] (LUCENE-7526) Improvements to UnifiedHighlighter OffsetStrategies

2016-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616738#comment-15616738
 ] 

ASF GitHub Bot commented on LUCENE-7526:


Github user Timothy055 commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/105#discussion_r85618489
  
--- Diff: 
lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/CompositePostingsEnum.java
 ---
@@ -0,0 +1,165 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.search.uhighlight;
+
+import java.io.IOException;
+import java.util.List;
+
+import org.apache.lucene.index.PostingsEnum;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.PriorityQueue;
+
+
+final class CompositePostingsEnum extends PostingsEnum {
+
+  private static final int NO_MORE_POSITIONS = -2;
+  private final BytesRef term;
+  private final int freq;
+  private final PriorityQueue queue;
+
+
+  /**
+   * This class is used to ensure we don't over iterate the underlying
+   * postings enum by keeping track of the position relative to the
+   * frequency.
+   * Ideally this would've been an implementation of a PostingsEnum
+   * but it would have to delegate most methods and it seemed easier
+   * to just wrap the tweaked method.
+   */
+  private static final class BoundsCheckingPostingsEnum {
+
+
+private final PostingsEnum postingsEnum;
+private final int freq;
--- End diff --

Hmm, did you mean to calculate freq dynamically in the method and just use 
a remainingPositions count as walking over the postings? Instead of 
re-computing it's cached in the constructor, but if we modified it, the freq 
count would change as the postings were iterated.


> Improvements to UnifiedHighlighter OffsetStrategies
> ---
>
> Key: LUCENE-7526
> URL: https://issues.apache.org/jira/browse/LUCENE-7526
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/highlighter
>Reporter: Timothy M. Rodriguez
>Assignee: David Smiley
>Priority: Minor
> Fix For: 6.4
>
>
> This ticket improves several of the UnifiedHighlighter FieldOffsetStrategies 
> by reducing reliance on creating or re-creating TokenStreams.
> The primary changes are as follows:
> * AnalysisOffsetStrategy - split into two offset strategies
>   ** MemoryIndexOffsetStrategy - the primary analysis mode that utilizes a 
> MemoryIndex for producing Offsets
>   ** TokenStreamOffsetStrategy - an offset strategy that avoids creating a 
> MemoryIndex.  Can only be used if the query distills down to terms and 
> automata.
> * TokenStream removal 
>   ** MemoryIndexOffsetStrategy - previously a TokenStream was created to fill 
> the memory index and then once consumed a new one was generated by 
> uninverting the MemoryIndex back into a TokenStream if there were automata 
> (wildcard/mtq queries) involved.  Now this is avoided, which should save 
> memory and avoid a second pass over the data.
>   ** TermVectorOffsetStrategy - this was refactored in a similar way to avoid 
> generating a TokenStream if automata are involved.
>   ** PostingsWithTermVectorsOffsetStrategy - similar refactoring
> * CompositePostingsEnum - aggregates several underlying PostingsEnums for 
> wildcard/mtq queries.  This should improve relevancy by providing unified 
> metrics for a wildcard across all it's term matches
> * Added a HighlightFlag for enabling the newly separated 
> TokenStreamOffsetStrategy since it can adversely affect passage relevancy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #105: LUCENE-7526 Improvements to UnifiedHighlighte...

2016-10-28 Thread Timothy055
Github user Timothy055 commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/105#discussion_r85618489
  
--- Diff: 
lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/CompositePostingsEnum.java
 ---
@@ -0,0 +1,165 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.search.uhighlight;
+
+import java.io.IOException;
+import java.util.List;
+
+import org.apache.lucene.index.PostingsEnum;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.PriorityQueue;
+
+
+final class CompositePostingsEnum extends PostingsEnum {
+
+  private static final int NO_MORE_POSITIONS = -2;
+  private final BytesRef term;
+  private final int freq;
+  private final PriorityQueue queue;
+
+
+  /**
+   * This class is used to ensure we don't over iterate the underlying
+   * postings enum by keeping track of the position relative to the
+   * frequency.
+   * Ideally this would've been an implementation of a PostingsEnum
+   * but it would have to delegate most methods and it seemed easier
+   * to just wrap the tweaked method.
+   */
+  private static final class BoundsCheckingPostingsEnum {
+
+
+private final PostingsEnum postingsEnum;
+private final int freq;
--- End diff --

Hmm, did you mean to calculate freq dynamically in the method and just use 
a remainingPositions count as walking over the postings? Instead of 
re-computing it's cached in the constructor, but if we modified it, the freq 
count would change as the postings were iterated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7526) Improvements to UnifiedHighlighter OffsetStrategies

2016-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616723#comment-15616723
 ] 

ASF GitHub Bot commented on LUCENE-7526:


Github user Timothy055 commented on the issue:

https://github.com/apache/lucene-solr/pull/105
  
I've pushed some more changes now.  Still taking a look at what we might be 
able to do further with CompositePostingsEnum


> Improvements to UnifiedHighlighter OffsetStrategies
> ---
>
> Key: LUCENE-7526
> URL: https://issues.apache.org/jira/browse/LUCENE-7526
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/highlighter
>Reporter: Timothy M. Rodriguez
>Assignee: David Smiley
>Priority: Minor
> Fix For: 6.4
>
>
> This ticket improves several of the UnifiedHighlighter FieldOffsetStrategies 
> by reducing reliance on creating or re-creating TokenStreams.
> The primary changes are as follows:
> * AnalysisOffsetStrategy - split into two offset strategies
>   ** MemoryIndexOffsetStrategy - the primary analysis mode that utilizes a 
> MemoryIndex for producing Offsets
>   ** TokenStreamOffsetStrategy - an offset strategy that avoids creating a 
> MemoryIndex.  Can only be used if the query distills down to terms and 
> automata.
> * TokenStream removal 
>   ** MemoryIndexOffsetStrategy - previously a TokenStream was created to fill 
> the memory index and then once consumed a new one was generated by 
> uninverting the MemoryIndex back into a TokenStream if there were automata 
> (wildcard/mtq queries) involved.  Now this is avoided, which should save 
> memory and avoid a second pass over the data.
>   ** TermVectorOffsetStrategy - this was refactored in a similar way to avoid 
> generating a TokenStream if automata are involved.
>   ** PostingsWithTermVectorsOffsetStrategy - similar refactoring
> * CompositePostingsEnum - aggregates several underlying PostingsEnums for 
> wildcard/mtq queries.  This should improve relevancy by providing unified 
> metrics for a wildcard across all it's term matches
> * Added a HighlightFlag for enabling the newly separated 
> TokenStreamOffsetStrategy since it can adversely affect passage relevancy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr issue #105: LUCENE-7526 Improvements to UnifiedHighlighter Offse...

2016-10-28 Thread Timothy055
Github user Timothy055 commented on the issue:

https://github.com/apache/lucene-solr/pull/105
  
I've pushed some more changes now.  Still taking a look at what we might be 
able to do further with CompositePostingsEnum


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #105: LUCENE-7526 Improvements to UnifiedHighlighte...

2016-10-28 Thread Timothy055
Github user Timothy055 commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/105#discussion_r85616894
  
--- Diff: 
lucene/highlighter/src/test/org/apache/lucene/search/uhighlight/visibility/TestUnifiedHighlighterExtensibility.java
 ---
@@ -79,7 +90,7 @@ public void testFieldOffsetStrategyExtensibility() {
   @Test
   public void testUnifiedHighlighterExtensibility() {
 final int maxLength = 1000;
-UnifiedHighlighter uh = new UnifiedHighlighter(null, new 
MockAnalyzer(random())){
+UnifiedHighlighter uh = new UnifiedHighlighter(null, new 
MockAnalyzer(new Random())){
--- End diff --

I'm a bit confused.  What I should I change?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7526) Improvements to UnifiedHighlighter OffsetStrategies

2016-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616710#comment-15616710
 ] 

ASF GitHub Bot commented on LUCENE-7526:


Github user Timothy055 commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/105#discussion_r85616651
  
--- Diff: 
lucene/highlighter/src/test/org/apache/lucene/search/uhighlight/visibility/TestUnifiedHighlighterExtensibility.java
 ---
@@ -79,7 +90,7 @@ public void testFieldOffsetStrategyExtensibility() {
   @Test
   public void testUnifiedHighlighterExtensibility() {
 final int maxLength = 1000;
-UnifiedHighlighter uh = new UnifiedHighlighter(null, new 
MockAnalyzer(random())){
+UnifiedHighlighter uh = new UnifiedHighlighter(null, new 
MockAnalyzer(new Random())){
--- End diff --

I recall now why I changed it.  I started getting this error and figured it 
was a change elsewhere: java.lang.IllegalStateException: No context information 
for thread: Thread[id=1, name=main, state=RUNNABLE, group=main]. Is this thread 
running under a class com.carrotsearch.randomizedtesting.RandomizedRunner 
runner context? Add @RunWith(class 
com.carrotsearch.randomizedtesting.RandomizedRunner.class) to your test class. 
Make sure your code accesses random contexts within @BeforeClass and 
@AfterClass boundary (for example, static test class initializers are not 
permitted to access random contexts).


> Improvements to UnifiedHighlighter OffsetStrategies
> ---
>
> Key: LUCENE-7526
> URL: https://issues.apache.org/jira/browse/LUCENE-7526
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/highlighter
>Reporter: Timothy M. Rodriguez
>Assignee: David Smiley
>Priority: Minor
> Fix For: 6.4
>
>
> This ticket improves several of the UnifiedHighlighter FieldOffsetStrategies 
> by reducing reliance on creating or re-creating TokenStreams.
> The primary changes are as follows:
> * AnalysisOffsetStrategy - split into two offset strategies
>   ** MemoryIndexOffsetStrategy - the primary analysis mode that utilizes a 
> MemoryIndex for producing Offsets
>   ** TokenStreamOffsetStrategy - an offset strategy that avoids creating a 
> MemoryIndex.  Can only be used if the query distills down to terms and 
> automata.
> * TokenStream removal 
>   ** MemoryIndexOffsetStrategy - previously a TokenStream was created to fill 
> the memory index and then once consumed a new one was generated by 
> uninverting the MemoryIndex back into a TokenStream if there were automata 
> (wildcard/mtq queries) involved.  Now this is avoided, which should save 
> memory and avoid a second pass over the data.
>   ** TermVectorOffsetStrategy - this was refactored in a similar way to avoid 
> generating a TokenStream if automata are involved.
>   ** PostingsWithTermVectorsOffsetStrategy - similar refactoring
> * CompositePostingsEnum - aggregates several underlying PostingsEnums for 
> wildcard/mtq queries.  This should improve relevancy by providing unified 
> metrics for a wildcard across all it's term matches
> * Added a HighlightFlag for enabling the newly separated 
> TokenStreamOffsetStrategy since it can adversely affect passage relevancy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #105: LUCENE-7526 Improvements to UnifiedHighlighte...

2016-10-28 Thread Timothy055
Github user Timothy055 commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/105#discussion_r85616651
  
--- Diff: 
lucene/highlighter/src/test/org/apache/lucene/search/uhighlight/visibility/TestUnifiedHighlighterExtensibility.java
 ---
@@ -79,7 +90,7 @@ public void testFieldOffsetStrategyExtensibility() {
   @Test
   public void testUnifiedHighlighterExtensibility() {
 final int maxLength = 1000;
-UnifiedHighlighter uh = new UnifiedHighlighter(null, new 
MockAnalyzer(random())){
+UnifiedHighlighter uh = new UnifiedHighlighter(null, new 
MockAnalyzer(new Random())){
--- End diff --

I recall now why I changed it.  I started getting this error and figured it 
was a change elsewhere: java.lang.IllegalStateException: No context information 
for thread: Thread[id=1, name=main, state=RUNNABLE, group=main]. Is this thread 
running under a class com.carrotsearch.randomizedtesting.RandomizedRunner 
runner context? Add @RunWith(class 
com.carrotsearch.randomizedtesting.RandomizedRunner.class) to your test class. 
Make sure your code accesses random contexts within @BeforeClass and 
@AfterClass boundary (for example, static test class initializers are not 
permitted to access random contexts).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-NightlyTests-6.x - Build # 187 - Failure

2016-10-28 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-6.x/187/

4 tests failed.
FAILED:  
org.apache.lucene.spatial.geopoint.search.TestLegacyGeoPointQuery.testRandomBig

Error Message:
GC overhead limit exceeded

Stack Trace:
java.lang.OutOfMemoryError: GC overhead limit exceeded
at 
__randomizedtesting.SeedInfo.seed([2FDE41DE24EAFBFD:A8893C51B5B3877D]:0)
at 
org.apache.lucene.util.fst.ByteSequenceOutputs.read(ByteSequenceOutputs.java:129)
at 
org.apache.lucene.util.fst.ByteSequenceOutputs.read(ByteSequenceOutputs.java:35)
at org.apache.lucene.util.fst.FST.readNextRealArc(FST.java:1088)
at org.apache.lucene.util.fst.FST.pack(FST.java:1769)
at org.apache.lucene.util.fst.Builder.finish(Builder.java:500)
at 
org.apache.lucene.codecs.memory.MemoryPostingsFormat$TermsWriter.finish(MemoryPostingsFormat.java:267)
at 
org.apache.lucene.codecs.memory.MemoryPostingsFormat$MemoryFieldsConsumer.write(MemoryPostingsFormat.java:401)
at 
org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:105)
at 
org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.merge(PerFieldPostingsFormat.java:164)
at 
org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:216)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:101)
at 
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4312)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3889)
at 
org.apache.lucene.index.SerialMergeScheduler.merge(SerialMergeScheduler.java:40)
at org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:2052)
at 
org.apache.lucene.index.IndexWriter.doAfterSegmentFlushed(IndexWriter.java:4953)
at 
org.apache.lucene.index.DocumentsWriter$MergePendingEvent.process(DocumentsWriter.java:731)
at 
org.apache.lucene.index.IndexWriter.processEvents(IndexWriter.java:4991)
at 
org.apache.lucene.index.IndexWriter.processEvents(IndexWriter.java:4982)
at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1565)
at 
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1307)
at 
org.apache.lucene.geo.BaseGeoPointTestCase.verifyRandomRectangles(BaseGeoPointTestCase.java:774)
at 
org.apache.lucene.geo.BaseGeoPointTestCase.verify(BaseGeoPointTestCase.java:743)
at 
org.apache.lucene.geo.BaseGeoPointTestCase.doTestRandom(BaseGeoPointTestCase.java:692)
at 
org.apache.lucene.geo.BaseGeoPointTestCase.testRandomBig(BaseGeoPointTestCase.java:623)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1713)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:907)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:943)


FAILED:  
org.apache.solr.cloud.CdcrReplicationDistributedZkTest.testReplicationStartStop

Error Message:
Timeout while trying to assert number of documents @ target_collection

Stack Trace:
java.lang.AssertionError: Timeout while trying to assert number of documents @ 
target_collection
at 
__randomizedtesting.SeedInfo.seed([16E038A66AE0973A:9523E6357E381CB3]:0)
at 
org.apache.solr.cloud.BaseCdcrDistributedZkTest.assertNumDocs(BaseCdcrDistributedZkTest.java:271)
at 
org.apache.solr.cloud.CdcrReplicationDistributedZkTest.testReplicationStartStop(CdcrReplicationDistributedZkTest.java:173)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1713)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:907)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:943)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:957)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:992)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:967)
at 

[jira] [Commented] (LUCENE-7526) Improvements to UnifiedHighlighter OffsetStrategies

2016-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616654#comment-15616654
 ] 

ASF GitHub Bot commented on LUCENE-7526:


Github user Timothy055 commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/105#discussion_r85614277
  
--- Diff: 
lucene/highlighter/src/test/org/apache/lucene/search/uhighlight/visibility/TestUnifiedHighlighterExtensibility.java
 ---
@@ -79,7 +90,7 @@ public void testFieldOffsetStrategyExtensibility() {
   @Test
   public void testUnifiedHighlighterExtensibility() {
 final int maxLength = 1000;
-UnifiedHighlighter uh = new UnifiedHighlighter(null, new 
MockAnalyzer(random())){
+UnifiedHighlighter uh = new UnifiedHighlighter(null, new 
MockAnalyzer(new Random())){
--- End diff --

fixed


> Improvements to UnifiedHighlighter OffsetStrategies
> ---
>
> Key: LUCENE-7526
> URL: https://issues.apache.org/jira/browse/LUCENE-7526
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/highlighter
>Reporter: Timothy M. Rodriguez
>Assignee: David Smiley
>Priority: Minor
> Fix For: 6.4
>
>
> This ticket improves several of the UnifiedHighlighter FieldOffsetStrategies 
> by reducing reliance on creating or re-creating TokenStreams.
> The primary changes are as follows:
> * AnalysisOffsetStrategy - split into two offset strategies
>   ** MemoryIndexOffsetStrategy - the primary analysis mode that utilizes a 
> MemoryIndex for producing Offsets
>   ** TokenStreamOffsetStrategy - an offset strategy that avoids creating a 
> MemoryIndex.  Can only be used if the query distills down to terms and 
> automata.
> * TokenStream removal 
>   ** MemoryIndexOffsetStrategy - previously a TokenStream was created to fill 
> the memory index and then once consumed a new one was generated by 
> uninverting the MemoryIndex back into a TokenStream if there were automata 
> (wildcard/mtq queries) involved.  Now this is avoided, which should save 
> memory and avoid a second pass over the data.
>   ** TermVectorOffsetStrategy - this was refactored in a similar way to avoid 
> generating a TokenStream if automata are involved.
>   ** PostingsWithTermVectorsOffsetStrategy - similar refactoring
> * CompositePostingsEnum - aggregates several underlying PostingsEnums for 
> wildcard/mtq queries.  This should improve relevancy by providing unified 
> metrics for a wildcard across all it's term matches
> * Added a HighlightFlag for enabling the newly separated 
> TokenStreamOffsetStrategy since it can adversely affect passage relevancy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #105: LUCENE-7526 Improvements to UnifiedHighlighte...

2016-10-28 Thread Timothy055
Github user Timothy055 commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/105#discussion_r85614277
  
--- Diff: 
lucene/highlighter/src/test/org/apache/lucene/search/uhighlight/visibility/TestUnifiedHighlighterExtensibility.java
 ---
@@ -79,7 +90,7 @@ public void testFieldOffsetStrategyExtensibility() {
   @Test
   public void testUnifiedHighlighterExtensibility() {
 final int maxLength = 1000;
-UnifiedHighlighter uh = new UnifiedHighlighter(null, new 
MockAnalyzer(random())){
+UnifiedHighlighter uh = new UnifiedHighlighter(null, new 
MockAnalyzer(new Random())){
--- End diff --

fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-9621) Remove several guava, apache commons calls in favor of java 8 alternatives

2016-10-28 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley resolved SOLR-9621.

Resolution: Fixed

> Remove several guava, apache commons calls in favor of java 8 alternatives
> --
>
> Key: SOLR-9621
> URL: https://issues.apache.org/jira/browse/SOLR-9621
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Michael Braun
>Assignee: David Smiley
>Priority: Trivial
> Fix For: 6.4
>
> Attachments: SOLR-9621.patch, SOLR-9621.patch
>
>
> Now that Solr is against Java 8, we can take advantage of replacing some 
> guava and apache commons calls with JDK standards. I'd like to start by 
> replacing the following:
> com.google.common.base.Supplier  -> java.util.function.Supplier
> com.google.common.base.Predicate -> java.util.function.Predicate
> com.google.common.base.Charsets -> java.nio.charset.StandardCharsets
> org.apache.commons.codec.Charsets -> java.nio.charset.StandardCharsets
> com.google.common.collect.Ordering -> java.util.Comparator
> com.google.common.base.Joiner -> java.util.stream.Collectors::joining
> com.google.common.base.Function -> java.util.function.Function
> com.google.common.base.Preconditions::checkNotNull -> 
> java.util.Objects::requireNonNull
> com.google.common.base.Objects::equals -> java.util.Objects::equals
> com.google.common.base.Objects::hashCode -> java.util.Objects::hashCode



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7526) Improvements to UnifiedHighlighter OffsetStrategies

2016-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616646#comment-15616646
 ] 

ASF GitHub Bot commented on LUCENE-7526:


Github user Timothy055 commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/105#discussion_r85613814
  
--- Diff: 
lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/FieldOffsetStrategy.java
 ---
@@ -65,58 +65,88 @@ public String getField() {
*/
   public abstract List getOffsetsEnums(IndexReader reader, 
int docId, String content) throws IOException;
 
-  protected List createOffsetsEnums(LeafReader leafReader, 
int doc, TokenStream tokenStream) throws IOException {
-List offsetsEnums = 
createOffsetsEnumsFromReader(leafReader, doc);
-if (automata.length > 0) {
-  offsetsEnums.add(createOffsetsEnumFromTokenStream(doc, tokenStream));
+  protected List createOffsetsEnumsFromReader(LeafReader 
leafReader, int doc) throws IOException {
+final Terms termsIndex = leafReader.terms(field);
+if (termsIndex == null) {
+  return Collections.emptyList();
 }
-return offsetsEnums;
-  }
 
-  protected List createOffsetsEnumsFromReader(LeafReader 
atomicReader, int doc) throws IOException {
 // For strict positions, get a Map of term to Spans:
 //note: ScriptPhraseHelper.NONE does the right thing for these 
method calls
 final Map strictPhrasesTermToSpans =
-strictPhrases.getTermToSpans(atomicReader, doc);
+phraseHelper.getTermToSpans(leafReader, doc);
 // Usually simply wraps terms in a List; but if willRewrite() then can 
be expanded
 final List sourceTerms =
-strictPhrases.expandTermsIfRewrite(terms, 
strictPhrasesTermToSpans);
+phraseHelper.expandTermsIfRewrite(terms, strictPhrasesTermToSpans);
 
-final List offsetsEnums = new 
ArrayList<>(sourceTerms.size() + 1);
+final List offsetsEnums = new 
ArrayList<>(sourceTerms.size() + automata.length);
 
-Terms termsIndex = atomicReader == null || sourceTerms.isEmpty() ? 
null : atomicReader.terms(field);
-if (termsIndex != null) {
+// Handle sourceTerms:
+if (!sourceTerms.isEmpty()) {
   TermsEnum termsEnum = termsIndex.iterator();//does not return null
   for (BytesRef term : sourceTerms) {
-if (!termsEnum.seekExact(term)) {
-  continue; // term not found
-}
-PostingsEnum postingsEnum = termsEnum.postings(null, 
PostingsEnum.OFFSETS);
-if (postingsEnum == null) {
-  // no offsets or positions available
-  throw new IllegalArgumentException("field '" + field + "' was 
indexed without offsets, cannot highlight");
-}
-if (doc != postingsEnum.advance(doc)) { // now it's positioned, 
although may be exhausted
-  continue;
+if (termsEnum.seekExact(term)) {
+  PostingsEnum postingsEnum = termsEnum.postings(null, 
PostingsEnum.OFFSETS);
+
+  if (postingsEnum == null) {
+// no offsets or positions available
+throw new IllegalArgumentException("field '" + field + "' was 
indexed without offsets, cannot highlight");
+  }
+
+  if (doc == postingsEnum.advance(doc)) { // now it's positioned, 
although may be exhausted
+postingsEnum = phraseHelper.filterPostings(term, postingsEnum, 
strictPhrasesTermToSpans.get(term));
+if (postingsEnum != null) {
+  offsetsEnums.add(new OffsetsEnum(term, postingsEnum));
+}
+  }
 }
-postingsEnum = strictPhrases.filterPostings(term, postingsEnum, 
strictPhrasesTermToSpans.get(term));
-if (postingsEnum == null) {
-  continue;// completely filtered out
+  }
+}
+
+// Handle automata
+if (automata.length > 0) {
+  offsetsEnums.addAll(createAutomataOffsetsFromTerms(termsIndex, doc));
+}
+
+return offsetsEnums;
+  }
+
+  protected List createAutomataOffsetsFromTerms(Terms 
termsIndex, int doc) throws IOException {
+Map automataPostings = new 
IdentityHashMap<>(automata.length);
--- End diff --

How about List that should give better locality without 
lose of type safety?


> Improvements to UnifiedHighlighter OffsetStrategies
> ---
>
> Key: LUCENE-7526
> URL: https://issues.apache.org/jira/browse/LUCENE-7526
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/highlighter
>Reporter: 

[GitHub] lucene-solr pull request #105: LUCENE-7526 Improvements to UnifiedHighlighte...

2016-10-28 Thread Timothy055
Github user Timothy055 commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/105#discussion_r85613814
  
--- Diff: 
lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/FieldOffsetStrategy.java
 ---
@@ -65,58 +65,88 @@ public String getField() {
*/
   public abstract List getOffsetsEnums(IndexReader reader, 
int docId, String content) throws IOException;
 
-  protected List createOffsetsEnums(LeafReader leafReader, 
int doc, TokenStream tokenStream) throws IOException {
-List offsetsEnums = 
createOffsetsEnumsFromReader(leafReader, doc);
-if (automata.length > 0) {
-  offsetsEnums.add(createOffsetsEnumFromTokenStream(doc, tokenStream));
+  protected List createOffsetsEnumsFromReader(LeafReader 
leafReader, int doc) throws IOException {
+final Terms termsIndex = leafReader.terms(field);
+if (termsIndex == null) {
+  return Collections.emptyList();
 }
-return offsetsEnums;
-  }
 
-  protected List createOffsetsEnumsFromReader(LeafReader 
atomicReader, int doc) throws IOException {
 // For strict positions, get a Map of term to Spans:
 //note: ScriptPhraseHelper.NONE does the right thing for these 
method calls
 final Map strictPhrasesTermToSpans =
-strictPhrases.getTermToSpans(atomicReader, doc);
+phraseHelper.getTermToSpans(leafReader, doc);
 // Usually simply wraps terms in a List; but if willRewrite() then can 
be expanded
 final List sourceTerms =
-strictPhrases.expandTermsIfRewrite(terms, 
strictPhrasesTermToSpans);
+phraseHelper.expandTermsIfRewrite(terms, strictPhrasesTermToSpans);
 
-final List offsetsEnums = new 
ArrayList<>(sourceTerms.size() + 1);
+final List offsetsEnums = new 
ArrayList<>(sourceTerms.size() + automata.length);
 
-Terms termsIndex = atomicReader == null || sourceTerms.isEmpty() ? 
null : atomicReader.terms(field);
-if (termsIndex != null) {
+// Handle sourceTerms:
+if (!sourceTerms.isEmpty()) {
   TermsEnum termsEnum = termsIndex.iterator();//does not return null
   for (BytesRef term : sourceTerms) {
-if (!termsEnum.seekExact(term)) {
-  continue; // term not found
-}
-PostingsEnum postingsEnum = termsEnum.postings(null, 
PostingsEnum.OFFSETS);
-if (postingsEnum == null) {
-  // no offsets or positions available
-  throw new IllegalArgumentException("field '" + field + "' was 
indexed without offsets, cannot highlight");
-}
-if (doc != postingsEnum.advance(doc)) { // now it's positioned, 
although may be exhausted
-  continue;
+if (termsEnum.seekExact(term)) {
+  PostingsEnum postingsEnum = termsEnum.postings(null, 
PostingsEnum.OFFSETS);
+
+  if (postingsEnum == null) {
+// no offsets or positions available
+throw new IllegalArgumentException("field '" + field + "' was 
indexed without offsets, cannot highlight");
+  }
+
+  if (doc == postingsEnum.advance(doc)) { // now it's positioned, 
although may be exhausted
+postingsEnum = phraseHelper.filterPostings(term, postingsEnum, 
strictPhrasesTermToSpans.get(term));
+if (postingsEnum != null) {
+  offsetsEnums.add(new OffsetsEnum(term, postingsEnum));
+}
+  }
 }
-postingsEnum = strictPhrases.filterPostings(term, postingsEnum, 
strictPhrasesTermToSpans.get(term));
-if (postingsEnum == null) {
-  continue;// completely filtered out
+  }
+}
+
+// Handle automata
+if (automata.length > 0) {
+  offsetsEnums.addAll(createAutomataOffsetsFromTerms(termsIndex, doc));
+}
+
+return offsetsEnums;
+  }
+
+  protected List createAutomataOffsetsFromTerms(Terms 
termsIndex, int doc) throws IOException {
+Map automataPostings = new 
IdentityHashMap<>(automata.length);
--- End diff --

How about List that should give better locality without 
lose of type safety?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9621) Remove several guava, apache commons calls in favor of java 8 alternatives

2016-10-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616644#comment-15616644
 ] 

ASF subversion and git services commented on SOLR-9621:
---

Commit a3c701e2f6f0949f5ee2f08eb1d03f962bff6eca in lucene-solr's branch 
refs/heads/branch_6x from [~dsmiley]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=a3c701e ]

SOLR-9621: Remove several Guava & Apache Commons calls in favor of java 8 
alternatives.

(cherry picked from commit 2e21511)


> Remove several guava, apache commons calls in favor of java 8 alternatives
> --
>
> Key: SOLR-9621
> URL: https://issues.apache.org/jira/browse/SOLR-9621
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Michael Braun
>Assignee: David Smiley
>Priority: Trivial
> Fix For: 6.4
>
> Attachments: SOLR-9621.patch, SOLR-9621.patch
>
>
> Now that Solr is against Java 8, we can take advantage of replacing some 
> guava and apache commons calls with JDK standards. I'd like to start by 
> replacing the following:
> com.google.common.base.Supplier  -> java.util.function.Supplier
> com.google.common.base.Predicate -> java.util.function.Predicate
> com.google.common.base.Charsets -> java.nio.charset.StandardCharsets
> org.apache.commons.codec.Charsets -> java.nio.charset.StandardCharsets
> com.google.common.collect.Ordering -> java.util.Comparator
> com.google.common.base.Joiner -> java.util.stream.Collectors::joining
> com.google.common.base.Function -> java.util.function.Function
> com.google.common.base.Preconditions::checkNotNull -> 
> java.util.Objects::requireNonNull
> com.google.common.base.Objects::equals -> java.util.Objects::equals
> com.google.common.base.Objects::hashCode -> java.util.Objects::hashCode



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7526) Improvements to UnifiedHighlighter OffsetStrategies

2016-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616638#comment-15616638
 ] 

ASF GitHub Bot commented on LUCENE-7526:


Github user Timothy055 commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/105#discussion_r85613130
  
--- Diff: 
lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/FieldOffsetStrategy.java
 ---
@@ -65,58 +65,88 @@ public String getField() {
*/
   public abstract List getOffsetsEnums(IndexReader reader, 
int docId, String content) throws IOException;
 
-  protected List createOffsetsEnums(LeafReader leafReader, 
int doc, TokenStream tokenStream) throws IOException {
-List offsetsEnums = 
createOffsetsEnumsFromReader(leafReader, doc);
-if (automata.length > 0) {
-  offsetsEnums.add(createOffsetsEnumFromTokenStream(doc, tokenStream));
+  protected List createOffsetsEnumsFromReader(LeafReader 
leafReader, int doc) throws IOException {
+final Terms termsIndex = leafReader.terms(field);
+if (termsIndex == null) {
+  return Collections.emptyList();
 }
-return offsetsEnums;
-  }
 
-  protected List createOffsetsEnumsFromReader(LeafReader 
atomicReader, int doc) throws IOException {
 // For strict positions, get a Map of term to Spans:
 //note: ScriptPhraseHelper.NONE does the right thing for these 
method calls
 final Map strictPhrasesTermToSpans =
-strictPhrases.getTermToSpans(atomicReader, doc);
+phraseHelper.getTermToSpans(leafReader, doc);
 // Usually simply wraps terms in a List; but if willRewrite() then can 
be expanded
 final List sourceTerms =
-strictPhrases.expandTermsIfRewrite(terms, 
strictPhrasesTermToSpans);
+phraseHelper.expandTermsIfRewrite(terms, strictPhrasesTermToSpans);
 
-final List offsetsEnums = new 
ArrayList<>(sourceTerms.size() + 1);
+final List offsetsEnums = new 
ArrayList<>(sourceTerms.size() + automata.length);
 
-Terms termsIndex = atomicReader == null || sourceTerms.isEmpty() ? 
null : atomicReader.terms(field);
-if (termsIndex != null) {
+// Handle sourceTerms:
+if (!sourceTerms.isEmpty()) {
   TermsEnum termsEnum = termsIndex.iterator();//does not return null
   for (BytesRef term : sourceTerms) {
-if (!termsEnum.seekExact(term)) {
-  continue; // term not found
-}
-PostingsEnum postingsEnum = termsEnum.postings(null, 
PostingsEnum.OFFSETS);
-if (postingsEnum == null) {
-  // no offsets or positions available
-  throw new IllegalArgumentException("field '" + field + "' was 
indexed without offsets, cannot highlight");
-}
-if (doc != postingsEnum.advance(doc)) { // now it's positioned, 
although may be exhausted
-  continue;
+if (termsEnum.seekExact(term)) {
+  PostingsEnum postingsEnum = termsEnum.postings(null, 
PostingsEnum.OFFSETS);
+
+  if (postingsEnum == null) {
+// no offsets or positions available
+throw new IllegalArgumentException("field '" + field + "' was 
indexed without offsets, cannot highlight");
+  }
+
+  if (doc == postingsEnum.advance(doc)) { // now it's positioned, 
although may be exhausted
+postingsEnum = phraseHelper.filterPostings(term, postingsEnum, 
strictPhrasesTermToSpans.get(term));
+if (postingsEnum != null) {
+  offsetsEnums.add(new OffsetsEnum(term, postingsEnum));
+}
+  }
 }
-postingsEnum = strictPhrases.filterPostings(term, postingsEnum, 
strictPhrasesTermToSpans.get(term));
-if (postingsEnum == null) {
-  continue;// completely filtered out
+  }
+}
+
+// Handle automata
+if (automata.length > 0) {
+  offsetsEnums.addAll(createAutomataOffsetsFromTerms(termsIndex, doc));
+}
+
+return offsetsEnums;
+  }
+
+  protected List createAutomataOffsetsFromTerms(Terms 
termsIndex, int doc) throws IOException {
+Map automataPostings = new 
IdentityHashMap<>(automata.length);
--- End diff --

One minor problem with that was that the code would not longer by type-safe 
because of the lack of generic arrays in java.  I wouldn't be able to do 
`List[]` = new ArrayList[automata.length];` but 
could do `List[]` = new ArrayList[automata.length];` with 
unchecked casts.  Seem worth it?


> Improvements to UnifiedHighlighter OffsetStrategies
> ---
>
> Key: LUCENE-7526
> 

[GitHub] lucene-solr pull request #105: LUCENE-7526 Improvements to UnifiedHighlighte...

2016-10-28 Thread Timothy055
Github user Timothy055 commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/105#discussion_r85613130
  
--- Diff: 
lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/FieldOffsetStrategy.java
 ---
@@ -65,58 +65,88 @@ public String getField() {
*/
   public abstract List getOffsetsEnums(IndexReader reader, 
int docId, String content) throws IOException;
 
-  protected List createOffsetsEnums(LeafReader leafReader, 
int doc, TokenStream tokenStream) throws IOException {
-List offsetsEnums = 
createOffsetsEnumsFromReader(leafReader, doc);
-if (automata.length > 0) {
-  offsetsEnums.add(createOffsetsEnumFromTokenStream(doc, tokenStream));
+  protected List createOffsetsEnumsFromReader(LeafReader 
leafReader, int doc) throws IOException {
+final Terms termsIndex = leafReader.terms(field);
+if (termsIndex == null) {
+  return Collections.emptyList();
 }
-return offsetsEnums;
-  }
 
-  protected List createOffsetsEnumsFromReader(LeafReader 
atomicReader, int doc) throws IOException {
 // For strict positions, get a Map of term to Spans:
 //note: ScriptPhraseHelper.NONE does the right thing for these 
method calls
 final Map strictPhrasesTermToSpans =
-strictPhrases.getTermToSpans(atomicReader, doc);
+phraseHelper.getTermToSpans(leafReader, doc);
 // Usually simply wraps terms in a List; but if willRewrite() then can 
be expanded
 final List sourceTerms =
-strictPhrases.expandTermsIfRewrite(terms, 
strictPhrasesTermToSpans);
+phraseHelper.expandTermsIfRewrite(terms, strictPhrasesTermToSpans);
 
-final List offsetsEnums = new 
ArrayList<>(sourceTerms.size() + 1);
+final List offsetsEnums = new 
ArrayList<>(sourceTerms.size() + automata.length);
 
-Terms termsIndex = atomicReader == null || sourceTerms.isEmpty() ? 
null : atomicReader.terms(field);
-if (termsIndex != null) {
+// Handle sourceTerms:
+if (!sourceTerms.isEmpty()) {
   TermsEnum termsEnum = termsIndex.iterator();//does not return null
   for (BytesRef term : sourceTerms) {
-if (!termsEnum.seekExact(term)) {
-  continue; // term not found
-}
-PostingsEnum postingsEnum = termsEnum.postings(null, 
PostingsEnum.OFFSETS);
-if (postingsEnum == null) {
-  // no offsets or positions available
-  throw new IllegalArgumentException("field '" + field + "' was 
indexed without offsets, cannot highlight");
-}
-if (doc != postingsEnum.advance(doc)) { // now it's positioned, 
although may be exhausted
-  continue;
+if (termsEnum.seekExact(term)) {
+  PostingsEnum postingsEnum = termsEnum.postings(null, 
PostingsEnum.OFFSETS);
+
+  if (postingsEnum == null) {
+// no offsets or positions available
+throw new IllegalArgumentException("field '" + field + "' was 
indexed without offsets, cannot highlight");
+  }
+
+  if (doc == postingsEnum.advance(doc)) { // now it's positioned, 
although may be exhausted
+postingsEnum = phraseHelper.filterPostings(term, postingsEnum, 
strictPhrasesTermToSpans.get(term));
+if (postingsEnum != null) {
+  offsetsEnums.add(new OffsetsEnum(term, postingsEnum));
+}
+  }
 }
-postingsEnum = strictPhrases.filterPostings(term, postingsEnum, 
strictPhrasesTermToSpans.get(term));
-if (postingsEnum == null) {
-  continue;// completely filtered out
+  }
+}
+
+// Handle automata
+if (automata.length > 0) {
+  offsetsEnums.addAll(createAutomataOffsetsFromTerms(termsIndex, doc));
+}
+
+return offsetsEnums;
+  }
+
+  protected List createAutomataOffsetsFromTerms(Terms 
termsIndex, int doc) throws IOException {
+Map automataPostings = new 
IdentityHashMap<>(automata.length);
--- End diff --

One minor problem with that was that the code would not longer by type-safe 
because of the lack of generic arrays in java.  I wouldn't be able to do 
`List[]` = new ArrayList[automata.length];` but 
could do `List[]` = new ArrayList[automata.length];` with 
unchecked casts.  Seem worth it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-

[jira] [Updated] (SOLR-9621) Remove several guava, apache commons calls in favor of java 8 alternatives

2016-10-28 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-9621:
---
Fix Version/s: 6.4
   Issue Type: Improvement  (was: Bug)

> Remove several guava, apache commons calls in favor of java 8 alternatives
> --
>
> Key: SOLR-9621
> URL: https://issues.apache.org/jira/browse/SOLR-9621
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Michael Braun
>Assignee: David Smiley
>Priority: Trivial
> Fix For: 6.4
>
> Attachments: SOLR-9621.patch, SOLR-9621.patch
>
>
> Now that Solr is against Java 8, we can take advantage of replacing some 
> guava and apache commons calls with JDK standards. I'd like to start by 
> replacing the following:
> com.google.common.base.Supplier  -> java.util.function.Supplier
> com.google.common.base.Predicate -> java.util.function.Predicate
> com.google.common.base.Charsets -> java.nio.charset.StandardCharsets
> org.apache.commons.codec.Charsets -> java.nio.charset.StandardCharsets
> com.google.common.collect.Ordering -> java.util.Comparator
> com.google.common.base.Joiner -> java.util.stream.Collectors::joining
> com.google.common.base.Function -> java.util.function.Function
> com.google.common.base.Preconditions::checkNotNull -> 
> java.util.Objects::requireNonNull
> com.google.common.base.Objects::equals -> java.util.Objects::equals
> com.google.common.base.Objects::hashCode -> java.util.Objects::hashCode



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9621) Remove several guava, apache commons calls in favor of java 8 alternatives

2016-10-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616623#comment-15616623
 ] 

ASF subversion and git services commented on SOLR-9621:
---

Commit 2e21511cd37310044e7d167fd80b5277cb942603 in lucene-solr's branch 
refs/heads/master from [~dsmiley]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2e21511 ]

SOLR-9621: Remove several Guava & Apache Commons calls in favor of java 8 
alternatives.


> Remove several guava, apache commons calls in favor of java 8 alternatives
> --
>
> Key: SOLR-9621
> URL: https://issues.apache.org/jira/browse/SOLR-9621
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Michael Braun
>Assignee: David Smiley
>Priority: Trivial
> Attachments: SOLR-9621.patch, SOLR-9621.patch
>
>
> Now that Solr is against Java 8, we can take advantage of replacing some 
> guava and apache commons calls with JDK standards. I'd like to start by 
> replacing the following:
> com.google.common.base.Supplier  -> java.util.function.Supplier
> com.google.common.base.Predicate -> java.util.function.Predicate
> com.google.common.base.Charsets -> java.nio.charset.StandardCharsets
> org.apache.commons.codec.Charsets -> java.nio.charset.StandardCharsets
> com.google.common.collect.Ordering -> java.util.Comparator
> com.google.common.base.Joiner -> java.util.stream.Collectors::joining
> com.google.common.base.Function -> java.util.function.Function
> com.google.common.base.Preconditions::checkNotNull -> 
> java.util.Objects::requireNonNull
> com.google.common.base.Objects::equals -> java.util.Objects::equals
> com.google.common.base.Objects::hashCode -> java.util.Objects::hashCode



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7526) Improvements to UnifiedHighlighter OffsetStrategies

2016-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616618#comment-15616618
 ] 

ASF GitHub Bot commented on LUCENE-7526:


Github user Timothy055 commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/105#discussion_r85611978
  
--- Diff: 
lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/CompositePostingsEnum.java
 ---
@@ -0,0 +1,165 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.search.uhighlight;
+
+import java.io.IOException;
+import java.util.List;
+
+import org.apache.lucene.index.PostingsEnum;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.PriorityQueue;
+
+
+final class CompositePostingsEnum extends PostingsEnum {
+
+  private static final int NO_MORE_POSITIONS = -2;
+  private final BytesRef term;
+  private final int freq;
+  private final PriorityQueue queue;
+
+
+  /**
+   * This class is used to ensure we don't over iterate the underlying
+   * postings enum by keeping track of the position relative to the
+   * frequency.
+   * Ideally this would've been an implementation of a PostingsEnum
+   * but it would have to delegate most methods and it seemed easier
+   * to just wrap the tweaked method.
+   */
+  private static final class BoundsCheckingPostingsEnum {
+
+
+private final PostingsEnum postingsEnum;
+private final int freq;
+private int position;
+private int nextPosition;
+private int positionInc = 1;
+
+private int startOffset;
+private int endOffset;
+
+BoundsCheckingPostingsEnum(PostingsEnum postingsEnum) throws 
IOException {
+  this.postingsEnum = postingsEnum;
+  this.freq = postingsEnum.freq();
+  nextPosition = postingsEnum.nextPosition();
+  position = nextPosition;
+  startOffset = postingsEnum.startOffset();
+  endOffset = postingsEnum.endOffset();
+}
+
+private boolean hasMorePositions() throws IOException {
+  return positionInc < freq;
+}
+
+/**
+ * Returns the next position of the underlying postings enum unless
+ * it cannot iterate further and returns NO_MORE_POSITIONS;
+ * @return
+ * @throws IOException
+ */
+private int nextPosition() throws IOException {
+  position = nextPosition;
+  startOffset = postingsEnum.startOffset();
+  endOffset = postingsEnum.endOffset();
+  if (hasMorePositions()) {
+positionInc++;
+nextPosition = postingsEnum.nextPosition();
+  } else {
+nextPosition = NO_MORE_POSITIONS;
+  }
+  return position;
+}
+
+  }
+
+  CompositePostingsEnum(BytesRef term, List postingsEnums) 
throws IOException {
+this.term = term;
+queue = new 
PriorityQueue(postingsEnums.size()) {
+  @Override
+  protected boolean lessThan(BoundsCheckingPostingsEnum a, 
BoundsCheckingPostingsEnum b) {
+return a.position < b.position;
+  }
+};
+
+int freqAdd = 0;
+for (PostingsEnum postingsEnum : postingsEnums) {
+  queue.add(new BoundsCheckingPostingsEnum(postingsEnum));
+  freqAdd += postingsEnum.freq();
+}
+freq = freqAdd;
+  }
+
+  @Override
+  public int freq() throws IOException {
+return freq;
+  }
+
+  @Override
+  public int nextPosition() throws IOException {
+int position = NO_MORE_POSITIONS;
+while (queue.size() >= 1) {
+  queue.top().nextPosition();
+  queue.updateTop(); //the new position may be behind another 
postingsEnum in the queue
+  position = queue.top().position;
+
+  if (position == NO_MORE_POSITIONS) {
+queue.pop(); //this postingsEnum is consumed, let's 

[jira] [Commented] (LUCENE-7526) Improvements to UnifiedHighlighter OffsetStrategies

2016-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616620#comment-15616620
 ] 

ASF GitHub Bot commented on LUCENE-7526:


Github user Timothy055 commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/105#discussion_r85612011
  
--- Diff: 
lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/AnalysisOffsetStrategy.java
 ---
@@ -17,174 +17,28 @@
 package org.apache.lucene.search.uhighlight;
 
 import java.io.IOException;
-import java.util.ArrayList;
-import java.util.Arrays;
-import java.util.Collections;
-import java.util.List;
 
 import org.apache.lucene.analysis.Analyzer;
-import org.apache.lucene.analysis.FilteringTokenFilter;
 import org.apache.lucene.analysis.TokenStream;
-import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
-import org.apache.lucene.index.IndexReader;
-import org.apache.lucene.index.LeafReader;
-import org.apache.lucene.index.Terms;
-import org.apache.lucene.index.memory.MemoryIndex;
-import org.apache.lucene.search.spans.SpanQuery;
 import org.apache.lucene.util.BytesRef;
-import org.apache.lucene.util.automaton.Automata;
 import org.apache.lucene.util.automaton.CharacterRunAutomaton;
 
+public abstract class AnalysisOffsetStrategy extends FieldOffsetStrategy {
--- End diff --

Thanks


> Improvements to UnifiedHighlighter OffsetStrategies
> ---
>
> Key: LUCENE-7526
> URL: https://issues.apache.org/jira/browse/LUCENE-7526
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/highlighter
>Reporter: Timothy M. Rodriguez
>Assignee: David Smiley
>Priority: Minor
> Fix For: 6.4
>
>
> This ticket improves several of the UnifiedHighlighter FieldOffsetStrategies 
> by reducing reliance on creating or re-creating TokenStreams.
> The primary changes are as follows:
> * AnalysisOffsetStrategy - split into two offset strategies
>   ** MemoryIndexOffsetStrategy - the primary analysis mode that utilizes a 
> MemoryIndex for producing Offsets
>   ** TokenStreamOffsetStrategy - an offset strategy that avoids creating a 
> MemoryIndex.  Can only be used if the query distills down to terms and 
> automata.
> * TokenStream removal 
>   ** MemoryIndexOffsetStrategy - previously a TokenStream was created to fill 
> the memory index and then once consumed a new one was generated by 
> uninverting the MemoryIndex back into a TokenStream if there were automata 
> (wildcard/mtq queries) involved.  Now this is avoided, which should save 
> memory and avoid a second pass over the data.
>   ** TermVectorOffsetStrategy - this was refactored in a similar way to avoid 
> generating a TokenStream if automata are involved.
>   ** PostingsWithTermVectorsOffsetStrategy - similar refactoring
> * CompositePostingsEnum - aggregates several underlying PostingsEnums for 
> wildcard/mtq queries.  This should improve relevancy by providing unified 
> metrics for a wildcard across all it's term matches
> * Added a HighlightFlag for enabling the newly separated 
> TokenStreamOffsetStrategy since it can adversely affect passage relevancy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #105: LUCENE-7526 Improvements to UnifiedHighlighte...

2016-10-28 Thread Timothy055
Github user Timothy055 commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/105#discussion_r85612011
  
--- Diff: 
lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/AnalysisOffsetStrategy.java
 ---
@@ -17,174 +17,28 @@
 package org.apache.lucene.search.uhighlight;
 
 import java.io.IOException;
-import java.util.ArrayList;
-import java.util.Arrays;
-import java.util.Collections;
-import java.util.List;
 
 import org.apache.lucene.analysis.Analyzer;
-import org.apache.lucene.analysis.FilteringTokenFilter;
 import org.apache.lucene.analysis.TokenStream;
-import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
-import org.apache.lucene.index.IndexReader;
-import org.apache.lucene.index.LeafReader;
-import org.apache.lucene.index.Terms;
-import org.apache.lucene.index.memory.MemoryIndex;
-import org.apache.lucene.search.spans.SpanQuery;
 import org.apache.lucene.util.BytesRef;
-import org.apache.lucene.util.automaton.Automata;
 import org.apache.lucene.util.automaton.CharacterRunAutomaton;
 
+public abstract class AnalysisOffsetStrategy extends FieldOffsetStrategy {
--- End diff --

Thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #105: LUCENE-7526 Improvements to UnifiedHighlighte...

2016-10-28 Thread Timothy055
Github user Timothy055 commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/105#discussion_r85611978
  
--- Diff: 
lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/CompositePostingsEnum.java
 ---
@@ -0,0 +1,165 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.search.uhighlight;
+
+import java.io.IOException;
+import java.util.List;
+
+import org.apache.lucene.index.PostingsEnum;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.PriorityQueue;
+
+
+final class CompositePostingsEnum extends PostingsEnum {
+
+  private static final int NO_MORE_POSITIONS = -2;
+  private final BytesRef term;
+  private final int freq;
+  private final PriorityQueue queue;
+
+
+  /**
+   * This class is used to ensure we don't over iterate the underlying
+   * postings enum by keeping track of the position relative to the
+   * frequency.
+   * Ideally this would've been an implementation of a PostingsEnum
+   * but it would have to delegate most methods and it seemed easier
+   * to just wrap the tweaked method.
+   */
+  private static final class BoundsCheckingPostingsEnum {
+
+
+private final PostingsEnum postingsEnum;
+private final int freq;
+private int position;
+private int nextPosition;
+private int positionInc = 1;
+
+private int startOffset;
+private int endOffset;
+
+BoundsCheckingPostingsEnum(PostingsEnum postingsEnum) throws 
IOException {
+  this.postingsEnum = postingsEnum;
+  this.freq = postingsEnum.freq();
+  nextPosition = postingsEnum.nextPosition();
+  position = nextPosition;
+  startOffset = postingsEnum.startOffset();
+  endOffset = postingsEnum.endOffset();
+}
+
+private boolean hasMorePositions() throws IOException {
+  return positionInc < freq;
+}
+
+/**
+ * Returns the next position of the underlying postings enum unless
+ * it cannot iterate further and returns NO_MORE_POSITIONS;
+ * @return
+ * @throws IOException
+ */
+private int nextPosition() throws IOException {
+  position = nextPosition;
+  startOffset = postingsEnum.startOffset();
+  endOffset = postingsEnum.endOffset();
+  if (hasMorePositions()) {
+positionInc++;
+nextPosition = postingsEnum.nextPosition();
+  } else {
+nextPosition = NO_MORE_POSITIONS;
+  }
+  return position;
+}
+
+  }
+
+  CompositePostingsEnum(BytesRef term, List postingsEnums) 
throws IOException {
+this.term = term;
+queue = new 
PriorityQueue(postingsEnums.size()) {
+  @Override
+  protected boolean lessThan(BoundsCheckingPostingsEnum a, 
BoundsCheckingPostingsEnum b) {
+return a.position < b.position;
+  }
+};
+
+int freqAdd = 0;
+for (PostingsEnum postingsEnum : postingsEnums) {
+  queue.add(new BoundsCheckingPostingsEnum(postingsEnum));
+  freqAdd += postingsEnum.freq();
+}
+freq = freqAdd;
+  }
+
+  @Override
+  public int freq() throws IOException {
+return freq;
+  }
+
+  @Override
+  public int nextPosition() throws IOException {
+int position = NO_MORE_POSITIONS;
+while (queue.size() >= 1) {
+  queue.top().nextPosition();
+  queue.updateTop(); //the new position may be behind another 
postingsEnum in the queue
+  position = queue.top().position;
+
+  if (position == NO_MORE_POSITIONS) {
+queue.pop(); //this postingsEnum is consumed, let's get rid of it
+  } else {
+break; //we got a new position
+  }
+
+}
+return position;
+  }
+
+  @Override
+  public int startOffset() throws IOException {
+return 

[jira] [Commented] (LUCENE-7526) Improvements to UnifiedHighlighter OffsetStrategies

2016-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616611#comment-15616611
 ] 

ASF GitHub Bot commented on LUCENE-7526:


Github user Timothy055 commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/105#discussion_r85611673
  
--- Diff: 
lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/CompositePostingsEnum.java
 ---
@@ -0,0 +1,165 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.search.uhighlight;
+
+import java.io.IOException;
+import java.util.List;
+
+import org.apache.lucene.index.PostingsEnum;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.PriorityQueue;
+
+
+final class CompositePostingsEnum extends PostingsEnum {
+
+  private static final int NO_MORE_POSITIONS = -2;
+  private final BytesRef term;
+  private final int freq;
+  private final PriorityQueue queue;
+
+
+  /**
+   * This class is used to ensure we don't over iterate the underlying
+   * postings enum by keeping track of the position relative to the
+   * frequency.
+   * Ideally this would've been an implementation of a PostingsEnum
+   * but it would have to delegate most methods and it seemed easier
+   * to just wrap the tweaked method.
+   */
+  private static final class BoundsCheckingPostingsEnum {
+
+
+private final PostingsEnum postingsEnum;
+private final int freq;
+private int position;
+private int nextPosition;
+private int positionInc = 1;
+
+private int startOffset;
--- End diff --

I thought the same, but unfortunately there's no way to check the current 
position of the underlying PostingsEnum before calling nextPosition.  Since we 
call nextPosition the PostingsEnum has moved on before the highlighter calls 
startOffset or endOffset.


> Improvements to UnifiedHighlighter OffsetStrategies
> ---
>
> Key: LUCENE-7526
> URL: https://issues.apache.org/jira/browse/LUCENE-7526
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/highlighter
>Reporter: Timothy M. Rodriguez
>Assignee: David Smiley
>Priority: Minor
> Fix For: 6.4
>
>
> This ticket improves several of the UnifiedHighlighter FieldOffsetStrategies 
> by reducing reliance on creating or re-creating TokenStreams.
> The primary changes are as follows:
> * AnalysisOffsetStrategy - split into two offset strategies
>   ** MemoryIndexOffsetStrategy - the primary analysis mode that utilizes a 
> MemoryIndex for producing Offsets
>   ** TokenStreamOffsetStrategy - an offset strategy that avoids creating a 
> MemoryIndex.  Can only be used if the query distills down to terms and 
> automata.
> * TokenStream removal 
>   ** MemoryIndexOffsetStrategy - previously a TokenStream was created to fill 
> the memory index and then once consumed a new one was generated by 
> uninverting the MemoryIndex back into a TokenStream if there were automata 
> (wildcard/mtq queries) involved.  Now this is avoided, which should save 
> memory and avoid a second pass over the data.
>   ** TermVectorOffsetStrategy - this was refactored in a similar way to avoid 
> generating a TokenStream if automata are involved.
>   ** PostingsWithTermVectorsOffsetStrategy - similar refactoring
> * CompositePostingsEnum - aggregates several underlying PostingsEnums for 
> wildcard/mtq queries.  This should improve relevancy by providing unified 
> metrics for a wildcard across all it's term matches
> * Added a HighlightFlag for enabling the newly separated 
> TokenStreamOffsetStrategy since it can adversely affect passage relevancy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: 

[jira] [Commented] (LUCENE-7526) Improvements to UnifiedHighlighter OffsetStrategies

2016-10-28 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616577#comment-15616577
 ] 

David Smiley commented on LUCENE-7526:
--

It seems the ASF's GitHub -> JIRA comment integration doesn't incorporate 
overall review comments (i.e. comments not attached to a line number).  Please 
look in GH for that.

After a round of edits or two, I'll run the benchmarks to see how this fairs.  
I expect it'll be a little slower for the analysis case when no phrases are 
present due to the change of default to prefer MemoryIndex for the benefit of 
better passage relevancy (now a toggle'able option).  Hopefully very little.  
If the difference seems imperceptible then there's a case for dropping the 
TokenStreamOffsetStrategy along with some quirks related to it, which would be 
nice in terms of code maintenance.

> Improvements to UnifiedHighlighter OffsetStrategies
> ---
>
> Key: LUCENE-7526
> URL: https://issues.apache.org/jira/browse/LUCENE-7526
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/highlighter
>Reporter: Timothy M. Rodriguez
>Assignee: David Smiley
>Priority: Minor
> Fix For: 6.4
>
>
> This ticket improves several of the UnifiedHighlighter FieldOffsetStrategies 
> by reducing reliance on creating or re-creating TokenStreams.
> The primary changes are as follows:
> * AnalysisOffsetStrategy - split into two offset strategies
>   ** MemoryIndexOffsetStrategy - the primary analysis mode that utilizes a 
> MemoryIndex for producing Offsets
>   ** TokenStreamOffsetStrategy - an offset strategy that avoids creating a 
> MemoryIndex.  Can only be used if the query distills down to terms and 
> automata.
> * TokenStream removal 
>   ** MemoryIndexOffsetStrategy - previously a TokenStream was created to fill 
> the memory index and then once consumed a new one was generated by 
> uninverting the MemoryIndex back into a TokenStream if there were automata 
> (wildcard/mtq queries) involved.  Now this is avoided, which should save 
> memory and avoid a second pass over the data.
>   ** TermVectorOffsetStrategy - this was refactored in a similar way to avoid 
> generating a TokenStream if automata are involved.
>   ** PostingsWithTermVectorsOffsetStrategy - similar refactoring
> * CompositePostingsEnum - aggregates several underlying PostingsEnums for 
> wildcard/mtq queries.  This should improve relevancy by providing unified 
> metrics for a wildcard across all it's term matches
> * Added a HighlightFlag for enabling the newly separated 
> TokenStreamOffsetStrategy since it can adversely affect passage relevancy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7526) Improvements to UnifiedHighlighter OffsetStrategies

2016-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616556#comment-15616556
 ] 

ASF GitHub Bot commented on LUCENE-7526:


Github user dsmiley commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/105#discussion_r85604667
  
--- Diff: 
lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/CompositePostingsEnum.java
 ---
@@ -0,0 +1,165 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.search.uhighlight;
+
+import java.io.IOException;
+import java.util.List;
+
+import org.apache.lucene.index.PostingsEnum;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.PriorityQueue;
+
+
+final class CompositePostingsEnum extends PostingsEnum {
+
+  private static final int NO_MORE_POSITIONS = -2;
+  private final BytesRef term;
+  private final int freq;
+  private final PriorityQueue queue;
+
+
+  /**
+   * This class is used to ensure we don't over iterate the underlying
+   * postings enum by keeping track of the position relative to the
+   * frequency.
+   * Ideally this would've been an implementation of a PostingsEnum
+   * but it would have to delegate most methods and it seemed easier
+   * to just wrap the tweaked method.
+   */
+  private static final class BoundsCheckingPostingsEnum {
+
+
+private final PostingsEnum postingsEnum;
+private final int freq;
+private int position;
+private int nextPosition;
+private int positionInc = 1;
+
+private int startOffset;
+private int endOffset;
+
+BoundsCheckingPostingsEnum(PostingsEnum postingsEnum) throws 
IOException {
+  this.postingsEnum = postingsEnum;
+  this.freq = postingsEnum.freq();
+  nextPosition = postingsEnum.nextPosition();
+  position = nextPosition;
+  startOffset = postingsEnum.startOffset();
+  endOffset = postingsEnum.endOffset();
+}
+
+private boolean hasMorePositions() throws IOException {
+  return positionInc < freq;
+}
+
+/**
+ * Returns the next position of the underlying postings enum unless
+ * it cannot iterate further and returns NO_MORE_POSITIONS;
+ * @return
+ * @throws IOException
+ */
+private int nextPosition() throws IOException {
+  position = nextPosition;
+  startOffset = postingsEnum.startOffset();
+  endOffset = postingsEnum.endOffset();
+  if (hasMorePositions()) {
+positionInc++;
+nextPosition = postingsEnum.nextPosition();
+  } else {
+nextPosition = NO_MORE_POSITIONS;
+  }
+  return position;
+}
+
+  }
+
+  CompositePostingsEnum(BytesRef term, List postingsEnums) 
throws IOException {
+this.term = term;
+queue = new 
PriorityQueue(postingsEnums.size()) {
+  @Override
+  protected boolean lessThan(BoundsCheckingPostingsEnum a, 
BoundsCheckingPostingsEnum b) {
+return a.position < b.position;
+  }
+};
+
+int freqAdd = 0;
+for (PostingsEnum postingsEnum : postingsEnums) {
+  queue.add(new BoundsCheckingPostingsEnum(postingsEnum));
+  freqAdd += postingsEnum.freq();
+}
+freq = freqAdd;
+  }
+
+  @Override
+  public int freq() throws IOException {
+return freq;
+  }
+
+  @Override
+  public int nextPosition() throws IOException {
+int position = NO_MORE_POSITIONS;
+while (queue.size() >= 1) {
+  queue.top().nextPosition();
+  queue.updateTop(); //the new position may be behind another 
postingsEnum in the queue
+  position = queue.top().position;
+
+  if (position == NO_MORE_POSITIONS) {
+queue.pop(); //this postingsEnum is consumed, let's get 

[jira] [Commented] (LUCENE-7526) Improvements to UnifiedHighlighter OffsetStrategies

2016-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616554#comment-15616554
 ] 

ASF GitHub Bot commented on LUCENE-7526:


Github user dsmiley commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/105#discussion_r85603003
  
--- Diff: 
lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/CompositePostingsEnum.java
 ---
@@ -0,0 +1,165 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.search.uhighlight;
+
+import java.io.IOException;
+import java.util.List;
+
+import org.apache.lucene.index.PostingsEnum;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.PriorityQueue;
+
+
+final class CompositePostingsEnum extends PostingsEnum {
+
+  private static final int NO_MORE_POSITIONS = -2;
+  private final BytesRef term;
+  private final int freq;
+  private final PriorityQueue queue;
+
+
+  /**
+   * This class is used to ensure we don't over iterate the underlying
+   * postings enum by keeping track of the position relative to the
+   * frequency.
+   * Ideally this would've been an implementation of a PostingsEnum
+   * but it would have to delegate most methods and it seemed easier
+   * to just wrap the tweaked method.
+   */
+  private static final class BoundsCheckingPostingsEnum {
+
+
+private final PostingsEnum postingsEnum;
+private final int freq;
+private int position;
+private int nextPosition;
+private int positionInc = 1;
+
+private int startOffset;
--- End diff --

Don't need these.  They can be fetched on-demand from the head of the 
queue, easily & cheaply enough.


> Improvements to UnifiedHighlighter OffsetStrategies
> ---
>
> Key: LUCENE-7526
> URL: https://issues.apache.org/jira/browse/LUCENE-7526
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/highlighter
>Reporter: Timothy M. Rodriguez
>Assignee: David Smiley
>Priority: Minor
> Fix For: 6.4
>
>
> This ticket improves several of the UnifiedHighlighter FieldOffsetStrategies 
> by reducing reliance on creating or re-creating TokenStreams.
> The primary changes are as follows:
> * AnalysisOffsetStrategy - split into two offset strategies
>   ** MemoryIndexOffsetStrategy - the primary analysis mode that utilizes a 
> MemoryIndex for producing Offsets
>   ** TokenStreamOffsetStrategy - an offset strategy that avoids creating a 
> MemoryIndex.  Can only be used if the query distills down to terms and 
> automata.
> * TokenStream removal 
>   ** MemoryIndexOffsetStrategy - previously a TokenStream was created to fill 
> the memory index and then once consumed a new one was generated by 
> uninverting the MemoryIndex back into a TokenStream if there were automata 
> (wildcard/mtq queries) involved.  Now this is avoided, which should save 
> memory and avoid a second pass over the data.
>   ** TermVectorOffsetStrategy - this was refactored in a similar way to avoid 
> generating a TokenStream if automata are involved.
>   ** PostingsWithTermVectorsOffsetStrategy - similar refactoring
> * CompositePostingsEnum - aggregates several underlying PostingsEnums for 
> wildcard/mtq queries.  This should improve relevancy by providing unified 
> metrics for a wildcard across all it's term matches
> * Added a HighlightFlag for enabling the newly separated 
> TokenStreamOffsetStrategy since it can adversely affect passage relevancy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7526) Improvements to UnifiedHighlighter OffsetStrategies

2016-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616546#comment-15616546
 ] 

ASF GitHub Bot commented on LUCENE-7526:


Github user dsmiley commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/105#discussion_r85602404
  
--- Diff: 
lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/CompositePostingsEnum.java
 ---
@@ -0,0 +1,165 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.search.uhighlight;
+
+import java.io.IOException;
+import java.util.List;
+
+import org.apache.lucene.index.PostingsEnum;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.PriorityQueue;
+
+
+final class CompositePostingsEnum extends PostingsEnum {
--- End diff --

A comment would be helpful to explain the scope/purpose.


> Improvements to UnifiedHighlighter OffsetStrategies
> ---
>
> Key: LUCENE-7526
> URL: https://issues.apache.org/jira/browse/LUCENE-7526
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/highlighter
>Reporter: Timothy M. Rodriguez
>Assignee: David Smiley
>Priority: Minor
> Fix For: 6.4
>
>
> This ticket improves several of the UnifiedHighlighter FieldOffsetStrategies 
> by reducing reliance on creating or re-creating TokenStreams.
> The primary changes are as follows:
> * AnalysisOffsetStrategy - split into two offset strategies
>   ** MemoryIndexOffsetStrategy - the primary analysis mode that utilizes a 
> MemoryIndex for producing Offsets
>   ** TokenStreamOffsetStrategy - an offset strategy that avoids creating a 
> MemoryIndex.  Can only be used if the query distills down to terms and 
> automata.
> * TokenStream removal 
>   ** MemoryIndexOffsetStrategy - previously a TokenStream was created to fill 
> the memory index and then once consumed a new one was generated by 
> uninverting the MemoryIndex back into a TokenStream if there were automata 
> (wildcard/mtq queries) involved.  Now this is avoided, which should save 
> memory and avoid a second pass over the data.
>   ** TermVectorOffsetStrategy - this was refactored in a similar way to avoid 
> generating a TokenStream if automata are involved.
>   ** PostingsWithTermVectorsOffsetStrategy - similar refactoring
> * CompositePostingsEnum - aggregates several underlying PostingsEnums for 
> wildcard/mtq queries.  This should improve relevancy by providing unified 
> metrics for a wildcard across all it's term matches
> * Added a HighlightFlag for enabling the newly separated 
> TokenStreamOffsetStrategy since it can adversely affect passage relevancy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7526) Improvements to UnifiedHighlighter OffsetStrategies

2016-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616551#comment-15616551
 ] 

ASF GitHub Bot commented on LUCENE-7526:


Github user dsmiley commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/105#discussion_r85608645
  
--- Diff: 
lucene/highlighter/src/test/org/apache/lucene/search/uhighlight/visibility/TestUnifiedHighlighterExtensibility.java
 ---
@@ -79,7 +90,7 @@ public void testFieldOffsetStrategyExtensibility() {
   @Test
   public void testUnifiedHighlighterExtensibility() {
 final int maxLength = 1000;
-UnifiedHighlighter uh = new UnifiedHighlighter(null, new 
MockAnalyzer(random())){
+UnifiedHighlighter uh = new UnifiedHighlighter(null, new 
MockAnalyzer(new Random())){
--- End diff --

? why not `random()` ?  This will likely fail precommit.


> Improvements to UnifiedHighlighter OffsetStrategies
> ---
>
> Key: LUCENE-7526
> URL: https://issues.apache.org/jira/browse/LUCENE-7526
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/highlighter
>Reporter: Timothy M. Rodriguez
>Assignee: David Smiley
>Priority: Minor
> Fix For: 6.4
>
>
> This ticket improves several of the UnifiedHighlighter FieldOffsetStrategies 
> by reducing reliance on creating or re-creating TokenStreams.
> The primary changes are as follows:
> * AnalysisOffsetStrategy - split into two offset strategies
>   ** MemoryIndexOffsetStrategy - the primary analysis mode that utilizes a 
> MemoryIndex for producing Offsets
>   ** TokenStreamOffsetStrategy - an offset strategy that avoids creating a 
> MemoryIndex.  Can only be used if the query distills down to terms and 
> automata.
> * TokenStream removal 
>   ** MemoryIndexOffsetStrategy - previously a TokenStream was created to fill 
> the memory index and then once consumed a new one was generated by 
> uninverting the MemoryIndex back into a TokenStream if there were automata 
> (wildcard/mtq queries) involved.  Now this is avoided, which should save 
> memory and avoid a second pass over the data.
>   ** TermVectorOffsetStrategy - this was refactored in a similar way to avoid 
> generating a TokenStream if automata are involved.
>   ** PostingsWithTermVectorsOffsetStrategy - similar refactoring
> * CompositePostingsEnum - aggregates several underlying PostingsEnums for 
> wildcard/mtq queries.  This should improve relevancy by providing unified 
> metrics for a wildcard across all it's term matches
> * Added a HighlightFlag for enabling the newly separated 
> TokenStreamOffsetStrategy since it can adversely affect passage relevancy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7526) Improvements to UnifiedHighlighter OffsetStrategies

2016-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616549#comment-15616549
 ] 

ASF GitHub Bot commented on LUCENE-7526:


Github user dsmiley commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/105#discussion_r85607262
  
--- Diff: 
lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/TokenStreamOffsetStrategy.java
 ---
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.search.uhighlight;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.util.Collections;
+import java.util.List;
+
+import org.apache.lucene.analysis.Analyzer;
+import org.apache.lucene.analysis.TokenStream;
+import org.apache.lucene.index.IndexReader;
+import org.apache.lucene.index.PostingsEnum;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.automaton.CharacterRunAutomaton;
+
+public class TokenStreamOffsetStrategy extends AnalysisOffsetStrategy {
+
+  private static final BytesRef[] ZERO_LEN_BYTES_REF_ARRAY = new 
BytesRef[0];
+
+  public TokenStreamOffsetStrategy(String field, BytesRef[] terms, 
PhraseHelper phraseHelper, CharacterRunAutomaton[] automata, Analyzer 
indexAnalyzer) {
+super(field, terms, phraseHelper, automata, indexAnalyzer);
+this.automata = convertTermsToAutomata(terms, automata);
+this.terms = ZERO_LEN_BYTES_REF_ARRAY;
+  }
+
+  @Override
+  public List getOffsetsEnums(IndexReader reader, int docId, 
String content) throws IOException {
+TokenStream tokenStream = tokenStream(content);
+PostingsEnum mtqPostingsEnum = 
MultiTermHighlighting.getDocsEnum(tokenStream, automata);
--- End diff --

I think there's a case to be made in moving ` 
MultiTermHighlighting.getDocsEnum` into this class, to thus keep the 
TokenStream aspect more isolated?


> Improvements to UnifiedHighlighter OffsetStrategies
> ---
>
> Key: LUCENE-7526
> URL: https://issues.apache.org/jira/browse/LUCENE-7526
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/highlighter
>Reporter: Timothy M. Rodriguez
>Assignee: David Smiley
>Priority: Minor
> Fix For: 6.4
>
>
> This ticket improves several of the UnifiedHighlighter FieldOffsetStrategies 
> by reducing reliance on creating or re-creating TokenStreams.
> The primary changes are as follows:
> * AnalysisOffsetStrategy - split into two offset strategies
>   ** MemoryIndexOffsetStrategy - the primary analysis mode that utilizes a 
> MemoryIndex for producing Offsets
>   ** TokenStreamOffsetStrategy - an offset strategy that avoids creating a 
> MemoryIndex.  Can only be used if the query distills down to terms and 
> automata.
> * TokenStream removal 
>   ** MemoryIndexOffsetStrategy - previously a TokenStream was created to fill 
> the memory index and then once consumed a new one was generated by 
> uninverting the MemoryIndex back into a TokenStream if there were automata 
> (wildcard/mtq queries) involved.  Now this is avoided, which should save 
> memory and avoid a second pass over the data.
>   ** TermVectorOffsetStrategy - this was refactored in a similar way to avoid 
> generating a TokenStream if automata are involved.
>   ** PostingsWithTermVectorsOffsetStrategy - similar refactoring
> * CompositePostingsEnum - aggregates several underlying PostingsEnums for 
> wildcard/mtq queries.  This should improve relevancy by providing unified 
> metrics for a wildcard across all it's term matches
> * Added a HighlightFlag for enabling the newly separated 
> TokenStreamOffsetStrategy since it can adversely affect passage relevancy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: 

[jira] [Commented] (LUCENE-7526) Improvements to UnifiedHighlighter OffsetStrategies

2016-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616555#comment-15616555
 ] 

ASF GitHub Bot commented on LUCENE-7526:


Github user dsmiley commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/105#discussion_r85606333
  
--- Diff: 
lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/FieldOffsetStrategy.java
 ---
@@ -65,58 +65,88 @@ public String getField() {
*/
   public abstract List getOffsetsEnums(IndexReader reader, 
int docId, String content) throws IOException;
 
-  protected List createOffsetsEnums(LeafReader leafReader, 
int doc, TokenStream tokenStream) throws IOException {
-List offsetsEnums = 
createOffsetsEnumsFromReader(leafReader, doc);
-if (automata.length > 0) {
-  offsetsEnums.add(createOffsetsEnumFromTokenStream(doc, tokenStream));
+  protected List createOffsetsEnumsFromReader(LeafReader 
leafReader, int doc) throws IOException {
+final Terms termsIndex = leafReader.terms(field);
+if (termsIndex == null) {
+  return Collections.emptyList();
 }
-return offsetsEnums;
-  }
 
-  protected List createOffsetsEnumsFromReader(LeafReader 
atomicReader, int doc) throws IOException {
 // For strict positions, get a Map of term to Spans:
 //note: ScriptPhraseHelper.NONE does the right thing for these 
method calls
 final Map strictPhrasesTermToSpans =
-strictPhrases.getTermToSpans(atomicReader, doc);
+phraseHelper.getTermToSpans(leafReader, doc);
 // Usually simply wraps terms in a List; but if willRewrite() then can 
be expanded
 final List sourceTerms =
-strictPhrases.expandTermsIfRewrite(terms, 
strictPhrasesTermToSpans);
+phraseHelper.expandTermsIfRewrite(terms, strictPhrasesTermToSpans);
 
-final List offsetsEnums = new 
ArrayList<>(sourceTerms.size() + 1);
+final List offsetsEnums = new 
ArrayList<>(sourceTerms.size() + automata.length);
 
-Terms termsIndex = atomicReader == null || sourceTerms.isEmpty() ? 
null : atomicReader.terms(field);
-if (termsIndex != null) {
+// Handle sourceTerms:
+if (!sourceTerms.isEmpty()) {
   TermsEnum termsEnum = termsIndex.iterator();//does not return null
   for (BytesRef term : sourceTerms) {
-if (!termsEnum.seekExact(term)) {
-  continue; // term not found
-}
-PostingsEnum postingsEnum = termsEnum.postings(null, 
PostingsEnum.OFFSETS);
-if (postingsEnum == null) {
-  // no offsets or positions available
-  throw new IllegalArgumentException("field '" + field + "' was 
indexed without offsets, cannot highlight");
-}
-if (doc != postingsEnum.advance(doc)) { // now it's positioned, 
although may be exhausted
-  continue;
+if (termsEnum.seekExact(term)) {
+  PostingsEnum postingsEnum = termsEnum.postings(null, 
PostingsEnum.OFFSETS);
+
+  if (postingsEnum == null) {
+// no offsets or positions available
+throw new IllegalArgumentException("field '" + field + "' was 
indexed without offsets, cannot highlight");
+  }
+
+  if (doc == postingsEnum.advance(doc)) { // now it's positioned, 
although may be exhausted
+postingsEnum = phraseHelper.filterPostings(term, postingsEnum, 
strictPhrasesTermToSpans.get(term));
+if (postingsEnum != null) {
+  offsetsEnums.add(new OffsetsEnum(term, postingsEnum));
+}
+  }
 }
-postingsEnum = strictPhrases.filterPostings(term, postingsEnum, 
strictPhrasesTermToSpans.get(term));
-if (postingsEnum == null) {
-  continue;// completely filtered out
+  }
+}
+
+// Handle automata
+if (automata.length > 0) {
+  offsetsEnums.addAll(createAutomataOffsetsFromTerms(termsIndex, doc));
+}
+
+return offsetsEnums;
+  }
+
+  protected List createAutomataOffsetsFromTerms(Terms 
termsIndex, int doc) throws IOException {
+Map automataPostings = new 
IdentityHashMap<>(automata.length);
--- End diff --

I suggest a parallel array to automata, so that later you can avoid a map 
lookup on each matching term.  Also, I suggest lazy-initializing the array 
later... perhaps some wildcards in a disjunction might never match.


> Improvements to UnifiedHighlighter OffsetStrategies
> ---
>
> Key: LUCENE-7526
> URL: https://issues.apache.org/jira/browse/LUCENE-7526
>

[jira] [Commented] (LUCENE-7526) Improvements to UnifiedHighlighter OffsetStrategies

2016-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616547#comment-15616547
 ] 

ASF GitHub Bot commented on LUCENE-7526:


Github user dsmiley commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/105#discussion_r85602210
  
--- Diff: 
lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/AnalysisOffsetStrategy.java
 ---
@@ -17,174 +17,28 @@
 package org.apache.lucene.search.uhighlight;
 
 import java.io.IOException;
-import java.util.ArrayList;
-import java.util.Arrays;
-import java.util.Collections;
-import java.util.List;
 
 import org.apache.lucene.analysis.Analyzer;
-import org.apache.lucene.analysis.FilteringTokenFilter;
 import org.apache.lucene.analysis.TokenStream;
-import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
-import org.apache.lucene.index.IndexReader;
-import org.apache.lucene.index.LeafReader;
-import org.apache.lucene.index.Terms;
-import org.apache.lucene.index.memory.MemoryIndex;
-import org.apache.lucene.search.spans.SpanQuery;
 import org.apache.lucene.util.BytesRef;
-import org.apache.lucene.util.automaton.Automata;
 import org.apache.lucene.util.automaton.CharacterRunAutomaton;
 
+public abstract class AnalysisOffsetStrategy extends FieldOffsetStrategy {
--- End diff --

All public classes need a javadoc comment.  Remember lucene.internal for 
this one.


> Improvements to UnifiedHighlighter OffsetStrategies
> ---
>
> Key: LUCENE-7526
> URL: https://issues.apache.org/jira/browse/LUCENE-7526
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/highlighter
>Reporter: Timothy M. Rodriguez
>Assignee: David Smiley
>Priority: Minor
> Fix For: 6.4
>
>
> This ticket improves several of the UnifiedHighlighter FieldOffsetStrategies 
> by reducing reliance on creating or re-creating TokenStreams.
> The primary changes are as follows:
> * AnalysisOffsetStrategy - split into two offset strategies
>   ** MemoryIndexOffsetStrategy - the primary analysis mode that utilizes a 
> MemoryIndex for producing Offsets
>   ** TokenStreamOffsetStrategy - an offset strategy that avoids creating a 
> MemoryIndex.  Can only be used if the query distills down to terms and 
> automata.
> * TokenStream removal 
>   ** MemoryIndexOffsetStrategy - previously a TokenStream was created to fill 
> the memory index and then once consumed a new one was generated by 
> uninverting the MemoryIndex back into a TokenStream if there were automata 
> (wildcard/mtq queries) involved.  Now this is avoided, which should save 
> memory and avoid a second pass over the data.
>   ** TermVectorOffsetStrategy - this was refactored in a similar way to avoid 
> generating a TokenStream if automata are involved.
>   ** PostingsWithTermVectorsOffsetStrategy - similar refactoring
> * CompositePostingsEnum - aggregates several underlying PostingsEnums for 
> wildcard/mtq queries.  This should improve relevancy by providing unified 
> metrics for a wildcard across all it's term matches
> * Added a HighlightFlag for enabling the newly separated 
> TokenStreamOffsetStrategy since it can adversely affect passage relevancy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7526) Improvements to UnifiedHighlighter OffsetStrategies

2016-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616548#comment-15616548
 ] 

ASF GitHub Bot commented on LUCENE-7526:


Github user dsmiley commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/105#discussion_r85603812
  
--- Diff: 
lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/CompositePostingsEnum.java
 ---
@@ -0,0 +1,165 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.search.uhighlight;
+
+import java.io.IOException;
+import java.util.List;
+
+import org.apache.lucene.index.PostingsEnum;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.PriorityQueue;
+
+
+final class CompositePostingsEnum extends PostingsEnum {
+
+  private static final int NO_MORE_POSITIONS = -2;
+  private final BytesRef term;
+  private final int freq;
+  private final PriorityQueue queue;
+
+
+  /**
+   * This class is used to ensure we don't over iterate the underlying
+   * postings enum by keeping track of the position relative to the
+   * frequency.
+   * Ideally this would've been an implementation of a PostingsEnum
+   * but it would have to delegate most methods and it seemed easier
+   * to just wrap the tweaked method.
+   */
+  private static final class BoundsCheckingPostingsEnum {
+
+
+private final PostingsEnum postingsEnum;
+private final int freq;
+private int position;
+private int nextPosition;
+private int positionInc = 1;
+
+private int startOffset;
+private int endOffset;
+
+BoundsCheckingPostingsEnum(PostingsEnum postingsEnum) throws 
IOException {
+  this.postingsEnum = postingsEnum;
+  this.freq = postingsEnum.freq();
+  nextPosition = postingsEnum.nextPosition();
+  position = nextPosition;
+  startOffset = postingsEnum.startOffset();
+  endOffset = postingsEnum.endOffset();
+}
+
+private boolean hasMorePositions() throws IOException {
+  return positionInc < freq;
+}
+
+/**
+ * Returns the next position of the underlying postings enum unless
+ * it cannot iterate further and returns NO_MORE_POSITIONS;
+ * @return
+ * @throws IOException
+ */
+private int nextPosition() throws IOException {
+  position = nextPosition;
+  startOffset = postingsEnum.startOffset();
+  endOffset = postingsEnum.endOffset();
+  if (hasMorePositions()) {
+positionInc++;
+nextPosition = postingsEnum.nextPosition();
+  } else {
+nextPosition = NO_MORE_POSITIONS;
+  }
+  return position;
+}
+
+  }
+
+  CompositePostingsEnum(BytesRef term, List postingsEnums) 
throws IOException {
+this.term = term;
+queue = new 
PriorityQueue(postingsEnums.size()) {
+  @Override
+  protected boolean lessThan(BoundsCheckingPostingsEnum a, 
BoundsCheckingPostingsEnum b) {
+return a.position < b.position;
--- End diff --

In the event the positions are equal (e.g. two terms a the same position in 
which the wildcard matches both), we might want to fall-back on startOffset 
then endOffset?  Or maybe simply ignore position altogether and just do 
offsets, so then you needn't even track the position?


> Improvements to UnifiedHighlighter OffsetStrategies
> ---
>
> Key: LUCENE-7526
> URL: https://issues.apache.org/jira/browse/LUCENE-7526
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/highlighter
>Reporter: Timothy M. Rodriguez
>Assignee: David Smiley
>Priority: Minor
> Fix For: 6.4
>
>
> This ticket improves several of the 

[jira] [Commented] (LUCENE-7526) Improvements to UnifiedHighlighter OffsetStrategies

2016-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616550#comment-15616550
 ] 

ASF GitHub Bot commented on LUCENE-7526:


Github user dsmiley commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/105#discussion_r85602867
  
--- Diff: 
lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/CompositePostingsEnum.java
 ---
@@ -0,0 +1,165 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.search.uhighlight;
+
+import java.io.IOException;
+import java.util.List;
+
+import org.apache.lucene.index.PostingsEnum;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.PriorityQueue;
+
+
+final class CompositePostingsEnum extends PostingsEnum {
+
+  private static final int NO_MORE_POSITIONS = -2;
+  private final BytesRef term;
+  private final int freq;
+  private final PriorityQueue queue;
+
+
+  /**
+   * This class is used to ensure we don't over iterate the underlying
+   * postings enum by keeping track of the position relative to the
+   * frequency.
+   * Ideally this would've been an implementation of a PostingsEnum
+   * but it would have to delegate most methods and it seemed easier
+   * to just wrap the tweaked method.
+   */
+  private static final class BoundsCheckingPostingsEnum {
+
+
+private final PostingsEnum postingsEnum;
+private final int freq;
--- End diff --

Instead of holding `freq` and `nextPosition`, why not just 
`remainingPositions`?


> Improvements to UnifiedHighlighter OffsetStrategies
> ---
>
> Key: LUCENE-7526
> URL: https://issues.apache.org/jira/browse/LUCENE-7526
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/highlighter
>Reporter: Timothy M. Rodriguez
>Assignee: David Smiley
>Priority: Minor
> Fix For: 6.4
>
>
> This ticket improves several of the UnifiedHighlighter FieldOffsetStrategies 
> by reducing reliance on creating or re-creating TokenStreams.
> The primary changes are as follows:
> * AnalysisOffsetStrategy - split into two offset strategies
>   ** MemoryIndexOffsetStrategy - the primary analysis mode that utilizes a 
> MemoryIndex for producing Offsets
>   ** TokenStreamOffsetStrategy - an offset strategy that avoids creating a 
> MemoryIndex.  Can only be used if the query distills down to terms and 
> automata.
> * TokenStream removal 
>   ** MemoryIndexOffsetStrategy - previously a TokenStream was created to fill 
> the memory index and then once consumed a new one was generated by 
> uninverting the MemoryIndex back into a TokenStream if there were automata 
> (wildcard/mtq queries) involved.  Now this is avoided, which should save 
> memory and avoid a second pass over the data.
>   ** TermVectorOffsetStrategy - this was refactored in a similar way to avoid 
> generating a TokenStream if automata are involved.
>   ** PostingsWithTermVectorsOffsetStrategy - similar refactoring
> * CompositePostingsEnum - aggregates several underlying PostingsEnums for 
> wildcard/mtq queries.  This should improve relevancy by providing unified 
> metrics for a wildcard across all it's term matches
> * Added a HighlightFlag for enabling the newly separated 
> TokenStreamOffsetStrategy since it can adversely affect passage relevancy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7526) Improvements to UnifiedHighlighter OffsetStrategies

2016-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616552#comment-15616552
 ] 

ASF GitHub Bot commented on LUCENE-7526:


Github user dsmiley commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/105#discussion_r85606885
  
--- Diff: 
lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/Passage.java ---
@@ -40,7 +40,7 @@
 BytesRef matchTerms[] = new BytesRef[8];
 int numMatches = 0;
 
-void addMatch(int startOffset, int endOffset, BytesRef term) {
+public void addMatch(int startOffset, int endOffset, BytesRef term) {
--- End diff --

Excellent; now it's possible for someone to override 
`FieldHighlighter.highlightOffsetsEnums()` to make Passages.  But do add 
@lucene.experimental.


> Improvements to UnifiedHighlighter OffsetStrategies
> ---
>
> Key: LUCENE-7526
> URL: https://issues.apache.org/jira/browse/LUCENE-7526
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/highlighter
>Reporter: Timothy M. Rodriguez
>Assignee: David Smiley
>Priority: Minor
> Fix For: 6.4
>
>
> This ticket improves several of the UnifiedHighlighter FieldOffsetStrategies 
> by reducing reliance on creating or re-creating TokenStreams.
> The primary changes are as follows:
> * AnalysisOffsetStrategy - split into two offset strategies
>   ** MemoryIndexOffsetStrategy - the primary analysis mode that utilizes a 
> MemoryIndex for producing Offsets
>   ** TokenStreamOffsetStrategy - an offset strategy that avoids creating a 
> MemoryIndex.  Can only be used if the query distills down to terms and 
> automata.
> * TokenStream removal 
>   ** MemoryIndexOffsetStrategy - previously a TokenStream was created to fill 
> the memory index and then once consumed a new one was generated by 
> uninverting the MemoryIndex back into a TokenStream if there were automata 
> (wildcard/mtq queries) involved.  Now this is avoided, which should save 
> memory and avoid a second pass over the data.
>   ** TermVectorOffsetStrategy - this was refactored in a similar way to avoid 
> generating a TokenStream if automata are involved.
>   ** PostingsWithTermVectorsOffsetStrategy - similar refactoring
> * CompositePostingsEnum - aggregates several underlying PostingsEnums for 
> wildcard/mtq queries.  This should improve relevancy by providing unified 
> metrics for a wildcard across all it's term matches
> * Added a HighlightFlag for enabling the newly separated 
> TokenStreamOffsetStrategy since it can adversely affect passage relevancy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7526) Improvements to UnifiedHighlighter OffsetStrategies

2016-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616553#comment-15616553
 ] 

ASF GitHub Bot commented on LUCENE-7526:


Github user dsmiley commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/105#discussion_r85607859
  
--- Diff: 
lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/UnifiedHighlighter.java
 ---
@@ -116,6 +116,8 @@
 
   private boolean defaultHighlightPhrasesStrictly = true; // AKA 
"accuracy" or "query debugging"
 
+  private boolean defaultPassageRelevancyOverSpeed = true; //Prefer using 
a memory index
--- End diff --

Suggest the comment be: For analysis, prefer MemoryIndex approach


> Improvements to UnifiedHighlighter OffsetStrategies
> ---
>
> Key: LUCENE-7526
> URL: https://issues.apache.org/jira/browse/LUCENE-7526
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/highlighter
>Reporter: Timothy M. Rodriguez
>Assignee: David Smiley
>Priority: Minor
> Fix For: 6.4
>
>
> This ticket improves several of the UnifiedHighlighter FieldOffsetStrategies 
> by reducing reliance on creating or re-creating TokenStreams.
> The primary changes are as follows:
> * AnalysisOffsetStrategy - split into two offset strategies
>   ** MemoryIndexOffsetStrategy - the primary analysis mode that utilizes a 
> MemoryIndex for producing Offsets
>   ** TokenStreamOffsetStrategy - an offset strategy that avoids creating a 
> MemoryIndex.  Can only be used if the query distills down to terms and 
> automata.
> * TokenStream removal 
>   ** MemoryIndexOffsetStrategy - previously a TokenStream was created to fill 
> the memory index and then once consumed a new one was generated by 
> uninverting the MemoryIndex back into a TokenStream if there were automata 
> (wildcard/mtq queries) involved.  Now this is avoided, which should save 
> memory and avoid a second pass over the data.
>   ** TermVectorOffsetStrategy - this was refactored in a similar way to avoid 
> generating a TokenStream if automata are involved.
>   ** PostingsWithTermVectorsOffsetStrategy - similar refactoring
> * CompositePostingsEnum - aggregates several underlying PostingsEnums for 
> wildcard/mtq queries.  This should improve relevancy by providing unified 
> metrics for a wildcard across all it's term matches
> * Added a HighlightFlag for enabling the newly separated 
> TokenStreamOffsetStrategy since it can adversely affect passage relevancy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #105: LUCENE-7526 Improvements to UnifiedHighlighte...

2016-10-28 Thread dsmiley
Github user dsmiley commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/105#discussion_r85606885
  
--- Diff: 
lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/Passage.java ---
@@ -40,7 +40,7 @@
 BytesRef matchTerms[] = new BytesRef[8];
 int numMatches = 0;
 
-void addMatch(int startOffset, int endOffset, BytesRef term) {
+public void addMatch(int startOffset, int endOffset, BytesRef term) {
--- End diff --

Excellent; now it's possible for someone to override 
`FieldHighlighter.highlightOffsetsEnums()` to make Passages.  But do add 
@lucene.experimental.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #105: LUCENE-7526 Improvements to UnifiedHighlighte...

2016-10-28 Thread dsmiley
Github user dsmiley commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/105#discussion_r85607859
  
--- Diff: 
lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/UnifiedHighlighter.java
 ---
@@ -116,6 +116,8 @@
 
   private boolean defaultHighlightPhrasesStrictly = true; // AKA 
"accuracy" or "query debugging"
 
+  private boolean defaultPassageRelevancyOverSpeed = true; //Prefer using 
a memory index
--- End diff --

Suggest the comment be: For analysis, prefer MemoryIndex approach


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #105: LUCENE-7526 Improvements to UnifiedHighlighte...

2016-10-28 Thread dsmiley
Github user dsmiley commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/105#discussion_r85603812
  
--- Diff: 
lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/CompositePostingsEnum.java
 ---
@@ -0,0 +1,165 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.search.uhighlight;
+
+import java.io.IOException;
+import java.util.List;
+
+import org.apache.lucene.index.PostingsEnum;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.PriorityQueue;
+
+
+final class CompositePostingsEnum extends PostingsEnum {
+
+  private static final int NO_MORE_POSITIONS = -2;
+  private final BytesRef term;
+  private final int freq;
+  private final PriorityQueue queue;
+
+
+  /**
+   * This class is used to ensure we don't over iterate the underlying
+   * postings enum by keeping track of the position relative to the
+   * frequency.
+   * Ideally this would've been an implementation of a PostingsEnum
+   * but it would have to delegate most methods and it seemed easier
+   * to just wrap the tweaked method.
+   */
+  private static final class BoundsCheckingPostingsEnum {
+
+
+private final PostingsEnum postingsEnum;
+private final int freq;
+private int position;
+private int nextPosition;
+private int positionInc = 1;
+
+private int startOffset;
+private int endOffset;
+
+BoundsCheckingPostingsEnum(PostingsEnum postingsEnum) throws 
IOException {
+  this.postingsEnum = postingsEnum;
+  this.freq = postingsEnum.freq();
+  nextPosition = postingsEnum.nextPosition();
+  position = nextPosition;
+  startOffset = postingsEnum.startOffset();
+  endOffset = postingsEnum.endOffset();
+}
+
+private boolean hasMorePositions() throws IOException {
+  return positionInc < freq;
+}
+
+/**
+ * Returns the next position of the underlying postings enum unless
+ * it cannot iterate further and returns NO_MORE_POSITIONS;
+ * @return
+ * @throws IOException
+ */
+private int nextPosition() throws IOException {
+  position = nextPosition;
+  startOffset = postingsEnum.startOffset();
+  endOffset = postingsEnum.endOffset();
+  if (hasMorePositions()) {
+positionInc++;
+nextPosition = postingsEnum.nextPosition();
+  } else {
+nextPosition = NO_MORE_POSITIONS;
+  }
+  return position;
+}
+
+  }
+
+  CompositePostingsEnum(BytesRef term, List postingsEnums) 
throws IOException {
+this.term = term;
+queue = new 
PriorityQueue(postingsEnums.size()) {
+  @Override
+  protected boolean lessThan(BoundsCheckingPostingsEnum a, 
BoundsCheckingPostingsEnum b) {
+return a.position < b.position;
--- End diff --

In the event the positions are equal (e.g. two terms a the same position in 
which the wildcard matches both), we might want to fall-back on startOffset 
then endOffset?  Or maybe simply ignore position altogether and just do 
offsets, so then you needn't even track the position?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #105: LUCENE-7526 Improvements to UnifiedHighlighte...

2016-10-28 Thread dsmiley
Github user dsmiley commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/105#discussion_r85608645
  
--- Diff: 
lucene/highlighter/src/test/org/apache/lucene/search/uhighlight/visibility/TestUnifiedHighlighterExtensibility.java
 ---
@@ -79,7 +90,7 @@ public void testFieldOffsetStrategyExtensibility() {
   @Test
   public void testUnifiedHighlighterExtensibility() {
 final int maxLength = 1000;
-UnifiedHighlighter uh = new UnifiedHighlighter(null, new 
MockAnalyzer(random())){
+UnifiedHighlighter uh = new UnifiedHighlighter(null, new 
MockAnalyzer(new Random())){
--- End diff --

? why not `random()` ?  This will likely fail precommit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #105: LUCENE-7526 Improvements to UnifiedHighlighte...

2016-10-28 Thread dsmiley
Github user dsmiley commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/105#discussion_r85607262
  
--- Diff: 
lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/TokenStreamOffsetStrategy.java
 ---
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.search.uhighlight;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.util.Collections;
+import java.util.List;
+
+import org.apache.lucene.analysis.Analyzer;
+import org.apache.lucene.analysis.TokenStream;
+import org.apache.lucene.index.IndexReader;
+import org.apache.lucene.index.PostingsEnum;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.automaton.CharacterRunAutomaton;
+
+public class TokenStreamOffsetStrategy extends AnalysisOffsetStrategy {
+
+  private static final BytesRef[] ZERO_LEN_BYTES_REF_ARRAY = new 
BytesRef[0];
+
+  public TokenStreamOffsetStrategy(String field, BytesRef[] terms, 
PhraseHelper phraseHelper, CharacterRunAutomaton[] automata, Analyzer 
indexAnalyzer) {
+super(field, terms, phraseHelper, automata, indexAnalyzer);
+this.automata = convertTermsToAutomata(terms, automata);
+this.terms = ZERO_LEN_BYTES_REF_ARRAY;
+  }
+
+  @Override
+  public List getOffsetsEnums(IndexReader reader, int docId, 
String content) throws IOException {
+TokenStream tokenStream = tokenStream(content);
+PostingsEnum mtqPostingsEnum = 
MultiTermHighlighting.getDocsEnum(tokenStream, automata);
--- End diff --

I think there's a case to be made in moving ` 
MultiTermHighlighting.getDocsEnum` into this class, to thus keep the 
TokenStream aspect more isolated?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #105: LUCENE-7526 Improvements to UnifiedHighlighte...

2016-10-28 Thread dsmiley
Github user dsmiley commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/105#discussion_r85602404
  
--- Diff: 
lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/CompositePostingsEnum.java
 ---
@@ -0,0 +1,165 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.search.uhighlight;
+
+import java.io.IOException;
+import java.util.List;
+
+import org.apache.lucene.index.PostingsEnum;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.PriorityQueue;
+
+
+final class CompositePostingsEnum extends PostingsEnum {
--- End diff --

A comment would be helpful to explain the scope/purpose.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #105: LUCENE-7526 Improvements to UnifiedHighlighte...

2016-10-28 Thread dsmiley
Github user dsmiley commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/105#discussion_r85602867
  
--- Diff: 
lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/CompositePostingsEnum.java
 ---
@@ -0,0 +1,165 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.search.uhighlight;
+
+import java.io.IOException;
+import java.util.List;
+
+import org.apache.lucene.index.PostingsEnum;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.PriorityQueue;
+
+
+final class CompositePostingsEnum extends PostingsEnum {
+
+  private static final int NO_MORE_POSITIONS = -2;
+  private final BytesRef term;
+  private final int freq;
+  private final PriorityQueue queue;
+
+
+  /**
+   * This class is used to ensure we don't over iterate the underlying
+   * postings enum by keeping track of the position relative to the
+   * frequency.
+   * Ideally this would've been an implementation of a PostingsEnum
+   * but it would have to delegate most methods and it seemed easier
+   * to just wrap the tweaked method.
+   */
+  private static final class BoundsCheckingPostingsEnum {
+
+
+private final PostingsEnum postingsEnum;
+private final int freq;
--- End diff --

Instead of holding `freq` and `nextPosition`, why not just 
`remainingPositions`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #105: LUCENE-7526 Improvements to UnifiedHighlighte...

2016-10-28 Thread dsmiley
Github user dsmiley commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/105#discussion_r85606333
  
--- Diff: 
lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/FieldOffsetStrategy.java
 ---
@@ -65,58 +65,88 @@ public String getField() {
*/
   public abstract List getOffsetsEnums(IndexReader reader, 
int docId, String content) throws IOException;
 
-  protected List createOffsetsEnums(LeafReader leafReader, 
int doc, TokenStream tokenStream) throws IOException {
-List offsetsEnums = 
createOffsetsEnumsFromReader(leafReader, doc);
-if (automata.length > 0) {
-  offsetsEnums.add(createOffsetsEnumFromTokenStream(doc, tokenStream));
+  protected List createOffsetsEnumsFromReader(LeafReader 
leafReader, int doc) throws IOException {
+final Terms termsIndex = leafReader.terms(field);
+if (termsIndex == null) {
+  return Collections.emptyList();
 }
-return offsetsEnums;
-  }
 
-  protected List createOffsetsEnumsFromReader(LeafReader 
atomicReader, int doc) throws IOException {
 // For strict positions, get a Map of term to Spans:
 //note: ScriptPhraseHelper.NONE does the right thing for these 
method calls
 final Map strictPhrasesTermToSpans =
-strictPhrases.getTermToSpans(atomicReader, doc);
+phraseHelper.getTermToSpans(leafReader, doc);
 // Usually simply wraps terms in a List; but if willRewrite() then can 
be expanded
 final List sourceTerms =
-strictPhrases.expandTermsIfRewrite(terms, 
strictPhrasesTermToSpans);
+phraseHelper.expandTermsIfRewrite(terms, strictPhrasesTermToSpans);
 
-final List offsetsEnums = new 
ArrayList<>(sourceTerms.size() + 1);
+final List offsetsEnums = new 
ArrayList<>(sourceTerms.size() + automata.length);
 
-Terms termsIndex = atomicReader == null || sourceTerms.isEmpty() ? 
null : atomicReader.terms(field);
-if (termsIndex != null) {
+// Handle sourceTerms:
+if (!sourceTerms.isEmpty()) {
   TermsEnum termsEnum = termsIndex.iterator();//does not return null
   for (BytesRef term : sourceTerms) {
-if (!termsEnum.seekExact(term)) {
-  continue; // term not found
-}
-PostingsEnum postingsEnum = termsEnum.postings(null, 
PostingsEnum.OFFSETS);
-if (postingsEnum == null) {
-  // no offsets or positions available
-  throw new IllegalArgumentException("field '" + field + "' was 
indexed without offsets, cannot highlight");
-}
-if (doc != postingsEnum.advance(doc)) { // now it's positioned, 
although may be exhausted
-  continue;
+if (termsEnum.seekExact(term)) {
+  PostingsEnum postingsEnum = termsEnum.postings(null, 
PostingsEnum.OFFSETS);
+
+  if (postingsEnum == null) {
+// no offsets or positions available
+throw new IllegalArgumentException("field '" + field + "' was 
indexed without offsets, cannot highlight");
+  }
+
+  if (doc == postingsEnum.advance(doc)) { // now it's positioned, 
although may be exhausted
+postingsEnum = phraseHelper.filterPostings(term, postingsEnum, 
strictPhrasesTermToSpans.get(term));
+if (postingsEnum != null) {
+  offsetsEnums.add(new OffsetsEnum(term, postingsEnum));
+}
+  }
 }
-postingsEnum = strictPhrases.filterPostings(term, postingsEnum, 
strictPhrasesTermToSpans.get(term));
-if (postingsEnum == null) {
-  continue;// completely filtered out
+  }
+}
+
+// Handle automata
+if (automata.length > 0) {
+  offsetsEnums.addAll(createAutomataOffsetsFromTerms(termsIndex, doc));
+}
+
+return offsetsEnums;
+  }
+
+  protected List createAutomataOffsetsFromTerms(Terms 
termsIndex, int doc) throws IOException {
+Map automataPostings = new 
IdentityHashMap<>(automata.length);
--- End diff --

I suggest a parallel array to automata, so that later you can avoid a map 
lookup on each matching term.  Also, I suggest lazy-initializing the array 
later... perhaps some wildcards in a disjunction might never match.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional 

[GitHub] lucene-solr pull request #105: LUCENE-7526 Improvements to UnifiedHighlighte...

2016-10-28 Thread dsmiley
Github user dsmiley commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/105#discussion_r85602210
  
--- Diff: 
lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/AnalysisOffsetStrategy.java
 ---
@@ -17,174 +17,28 @@
 package org.apache.lucene.search.uhighlight;
 
 import java.io.IOException;
-import java.util.ArrayList;
-import java.util.Arrays;
-import java.util.Collections;
-import java.util.List;
 
 import org.apache.lucene.analysis.Analyzer;
-import org.apache.lucene.analysis.FilteringTokenFilter;
 import org.apache.lucene.analysis.TokenStream;
-import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
-import org.apache.lucene.index.IndexReader;
-import org.apache.lucene.index.LeafReader;
-import org.apache.lucene.index.Terms;
-import org.apache.lucene.index.memory.MemoryIndex;
-import org.apache.lucene.search.spans.SpanQuery;
 import org.apache.lucene.util.BytesRef;
-import org.apache.lucene.util.automaton.Automata;
 import org.apache.lucene.util.automaton.CharacterRunAutomaton;
 
+public abstract class AnalysisOffsetStrategy extends FieldOffsetStrategy {
--- End diff --

All public classes need a javadoc comment.  Remember lucene.internal for 
this one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #105: LUCENE-7526 Improvements to UnifiedHighlighte...

2016-10-28 Thread dsmiley
Github user dsmiley commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/105#discussion_r85603003
  
--- Diff: 
lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/CompositePostingsEnum.java
 ---
@@ -0,0 +1,165 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.search.uhighlight;
+
+import java.io.IOException;
+import java.util.List;
+
+import org.apache.lucene.index.PostingsEnum;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.PriorityQueue;
+
+
+final class CompositePostingsEnum extends PostingsEnum {
+
+  private static final int NO_MORE_POSITIONS = -2;
+  private final BytesRef term;
+  private final int freq;
+  private final PriorityQueue queue;
+
+
+  /**
+   * This class is used to ensure we don't over iterate the underlying
+   * postings enum by keeping track of the position relative to the
+   * frequency.
+   * Ideally this would've been an implementation of a PostingsEnum
+   * but it would have to delegate most methods and it seemed easier
+   * to just wrap the tweaked method.
+   */
+  private static final class BoundsCheckingPostingsEnum {
+
+
+private final PostingsEnum postingsEnum;
+private final int freq;
+private int position;
+private int nextPosition;
+private int positionInc = 1;
+
+private int startOffset;
--- End diff --

Don't need these.  They can be fetched on-demand from the head of the 
queue, easily & cheaply enough.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #105: LUCENE-7526 Improvements to UnifiedHighlighte...

2016-10-28 Thread dsmiley
Github user dsmiley commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/105#discussion_r85604667
  
--- Diff: 
lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/CompositePostingsEnum.java
 ---
@@ -0,0 +1,165 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.search.uhighlight;
+
+import java.io.IOException;
+import java.util.List;
+
+import org.apache.lucene.index.PostingsEnum;
+import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.PriorityQueue;
+
+
+final class CompositePostingsEnum extends PostingsEnum {
+
+  private static final int NO_MORE_POSITIONS = -2;
+  private final BytesRef term;
+  private final int freq;
+  private final PriorityQueue queue;
+
+
+  /**
+   * This class is used to ensure we don't over iterate the underlying
+   * postings enum by keeping track of the position relative to the
+   * frequency.
+   * Ideally this would've been an implementation of a PostingsEnum
+   * but it would have to delegate most methods and it seemed easier
+   * to just wrap the tweaked method.
+   */
+  private static final class BoundsCheckingPostingsEnum {
+
+
+private final PostingsEnum postingsEnum;
+private final int freq;
+private int position;
+private int nextPosition;
+private int positionInc = 1;
+
+private int startOffset;
+private int endOffset;
+
+BoundsCheckingPostingsEnum(PostingsEnum postingsEnum) throws 
IOException {
+  this.postingsEnum = postingsEnum;
+  this.freq = postingsEnum.freq();
+  nextPosition = postingsEnum.nextPosition();
+  position = nextPosition;
+  startOffset = postingsEnum.startOffset();
+  endOffset = postingsEnum.endOffset();
+}
+
+private boolean hasMorePositions() throws IOException {
+  return positionInc < freq;
+}
+
+/**
+ * Returns the next position of the underlying postings enum unless
+ * it cannot iterate further and returns NO_MORE_POSITIONS;
+ * @return
+ * @throws IOException
+ */
+private int nextPosition() throws IOException {
+  position = nextPosition;
+  startOffset = postingsEnum.startOffset();
+  endOffset = postingsEnum.endOffset();
+  if (hasMorePositions()) {
+positionInc++;
+nextPosition = postingsEnum.nextPosition();
+  } else {
+nextPosition = NO_MORE_POSITIONS;
+  }
+  return position;
+}
+
+  }
+
+  CompositePostingsEnum(BytesRef term, List postingsEnums) 
throws IOException {
+this.term = term;
+queue = new 
PriorityQueue(postingsEnums.size()) {
+  @Override
+  protected boolean lessThan(BoundsCheckingPostingsEnum a, 
BoundsCheckingPostingsEnum b) {
+return a.position < b.position;
+  }
+};
+
+int freqAdd = 0;
+for (PostingsEnum postingsEnum : postingsEnums) {
+  queue.add(new BoundsCheckingPostingsEnum(postingsEnum));
+  freqAdd += postingsEnum.freq();
+}
+freq = freqAdd;
+  }
+
+  @Override
+  public int freq() throws IOException {
+return freq;
+  }
+
+  @Override
+  public int nextPosition() throws IOException {
+int position = NO_MORE_POSITIONS;
+while (queue.size() >= 1) {
+  queue.top().nextPosition();
+  queue.updateTop(); //the new position may be behind another 
postingsEnum in the queue
+  position = queue.top().position;
+
+  if (position == NO_MORE_POSITIONS) {
+queue.pop(); //this postingsEnum is consumed, let's get rid of it
+  } else {
+break; //we got a new position
+  }
+
+}
+return position;
+  }
+
+  @Override
+  public int startOffset() throws IOException {
+return 

[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-10-28 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616458#comment-15616458
 ] 

Yonik Seeley commented on SOLR-8593:


bq. (This should include joins)

Awesome! I know many who have been waiting for that!

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>
> The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7527) Facing unsafe memory access operation error while calling searcherManager.maybeReopen()

2016-10-28 Thread Jagmohan Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jagmohan Singh updated LUCENE-7527:
---
Description: 
We are getting below error while calling searcherManager.maybeReopen() method. 
We are using MMAP implementation to read NFS index directory mounted against 3 
servers. We have a different process to update the indices and 3 other 
processes to read from the same index. What we believe is that this issue 
occurs when we call maybeReopen() method during heavy writes to the indices and 
MMap implementation is not able to coop with it..

Caused by: java.lang.InternalError: a fault occurred in a recent unsafe memory 
access operation in compiled Java code
at java.security.AccessController.doPrivileged(Native Method)
at 
org.apache.lucene.store.MMapDirectory.cleanMapping(MMapDirectory.java:158)
at 
org.apache.lucene.store.MMapDirectory$MMapIndexInput.close(MMapDirectory.java:389)
at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:690)
at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:593)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:359)
at 
org.apache.lucene.index.SegmentInfos.readCurrentVersion(SegmentInfos.java:480)
at 
org.apache.lucene.index.DirectoryReader.isCurrent(DirectoryReader.java:901)
at 
org.apache.lucene.index.DirectoryReader.doOpenNoWriter(DirectoryReader.java:471)
at 
org.apache.lucene.index.DirectoryReader.doOpenIfChanged(DirectoryReader.java:450)
at 
org.apache.lucene.index.DirectoryReader.doOpenIfChanged(DirectoryReader.java:391)
at 
org.apache.lucene.index.IndexReader.openIfChanged(IndexReader.java:497)
at 
org.apache.lucene.search.SearcherManager.maybeReopen(SearcherManager.java:162)


  was:
We are getting below error while calling searcherManager.maybeReopen() method. 
We are using MMAP implementation to read NFS index directory mounted against 3 
servers. We have a different process to update the indices and 3 other 
processes to read from the same index. What we believe is that this issue is 
occuring when we call maybeReopen() method during heavy writes to the indices 
which are happening at that time and MMap implementation is not able to coop 
with it..

Caused by: java.lang.InternalError: a fault occurred in a recent unsafe memory 
access operation in compiled Java code
at java.security.AccessController.doPrivileged(Native Method)
at 
org.apache.lucene.store.MMapDirectory.cleanMapping(MMapDirectory.java:158)
at 
org.apache.lucene.store.MMapDirectory$MMapIndexInput.close(MMapDirectory.java:389)
at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:690)
at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:593)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:359)
at 
org.apache.lucene.index.SegmentInfos.readCurrentVersion(SegmentInfos.java:480)
at 
org.apache.lucene.index.DirectoryReader.isCurrent(DirectoryReader.java:901)
at 
org.apache.lucene.index.DirectoryReader.doOpenNoWriter(DirectoryReader.java:471)
at 
org.apache.lucene.index.DirectoryReader.doOpenIfChanged(DirectoryReader.java:450)
at 
org.apache.lucene.index.DirectoryReader.doOpenIfChanged(DirectoryReader.java:391)
at 
org.apache.lucene.index.IndexReader.openIfChanged(IndexReader.java:497)
at 
org.apache.lucene.search.SearcherManager.maybeReopen(SearcherManager.java:162)



> Facing unsafe memory access operation error while calling 
> searcherManager.maybeReopen()
> ---
>
> Key: LUCENE-7527
> URL: https://issues.apache.org/jira/browse/LUCENE-7527
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 3.5
>Reporter: Jagmohan Singh
>
> We are getting below error while calling searcherManager.maybeReopen() 
> method. We are using MMAP implementation to read NFS index directory mounted 
> against 3 servers. We have a different process to update the indices and 3 
> other processes to read from the same index. What we believe is that this 
> issue occurs when we call maybeReopen() method during heavy writes to the 
> indices and MMap implementation is not able to coop with it..
> Caused by: java.lang.InternalError: a fault occurred in a recent unsafe 
> memory access operation in compiled Java code
> at java.security.AccessController.doPrivileged(Native Method)
> at 
> org.apache.lucene.store.MMapDirectory.cleanMapping(MMapDirectory.java:158)
> at 
> org.apache.lucene.store.MMapDirectory$MMapIndexInput.close(MMapDirectory.java:389)
> at 
> 

[jira] [Updated] (LUCENE-7527) Facing unsafe memory access operation error while calling searcherManager.maybeReopen()

2016-10-28 Thread Jagmohan Singh (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jagmohan Singh updated LUCENE-7527:
---
Description: 
We are getting below error while calling searcherManager.maybeReopen() method. 
We are using MMAP implementation to read NFS index directory mounted against 3 
servers. We have a different process to update the indices and 3 other 
processes to read from the same index. What we believe is that this issue is 
occuring when we call maybeReopen() method during heavy writes to the indices 
which are happening at that time and MMap implementation is not able to coop 
with it..

Caused by: java.lang.InternalError: a fault occurred in a recent unsafe memory 
access operation in compiled Java code
at java.security.AccessController.doPrivileged(Native Method)
at 
org.apache.lucene.store.MMapDirectory.cleanMapping(MMapDirectory.java:158)
at 
org.apache.lucene.store.MMapDirectory$MMapIndexInput.close(MMapDirectory.java:389)
at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:690)
at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:593)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:359)
at 
org.apache.lucene.index.SegmentInfos.readCurrentVersion(SegmentInfos.java:480)
at 
org.apache.lucene.index.DirectoryReader.isCurrent(DirectoryReader.java:901)
at 
org.apache.lucene.index.DirectoryReader.doOpenNoWriter(DirectoryReader.java:471)
at 
org.apache.lucene.index.DirectoryReader.doOpenIfChanged(DirectoryReader.java:450)
at 
org.apache.lucene.index.DirectoryReader.doOpenIfChanged(DirectoryReader.java:391)
at 
org.apache.lucene.index.IndexReader.openIfChanged(IndexReader.java:497)
at 
org.apache.lucene.search.SearcherManager.maybeReopen(SearcherManager.java:162)


  was:
We are getting below error while calling searcherManager.maybeReopen() method. 
We are using MMAP to read NFS index directory mounted against 3 servers. We 
have a different process to update the indices and have 3 processes to read 
from the index. What we believe it when we face this error in one of the 
readers is due to heavy writes int he indices which are happening at that time 
and MMap implementation is not able to coop with it..

Caused by: java.lang.InternalError: a fault occurred in a recent unsafe memory 
access operation in compiled Java code
at java.security.AccessController.doPrivileged(Native Method)
at 
org.apache.lucene.store.MMapDirectory.cleanMapping(MMapDirectory.java:158)
at 
org.apache.lucene.store.MMapDirectory$MMapIndexInput.close(MMapDirectory.java:389)
at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:690)
at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:593)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:359)
at 
org.apache.lucene.index.SegmentInfos.readCurrentVersion(SegmentInfos.java:480)
at 
org.apache.lucene.index.DirectoryReader.isCurrent(DirectoryReader.java:901)
at 
org.apache.lucene.index.DirectoryReader.doOpenNoWriter(DirectoryReader.java:471)
at 
org.apache.lucene.index.DirectoryReader.doOpenIfChanged(DirectoryReader.java:450)
at 
org.apache.lucene.index.DirectoryReader.doOpenIfChanged(DirectoryReader.java:391)
at 
org.apache.lucene.index.IndexReader.openIfChanged(IndexReader.java:497)
at 
org.apache.lucene.search.SearcherManager.maybeReopen(SearcherManager.java:162)



> Facing unsafe memory access operation error while calling 
> searcherManager.maybeReopen()
> ---
>
> Key: LUCENE-7527
> URL: https://issues.apache.org/jira/browse/LUCENE-7527
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 3.5
>Reporter: Jagmohan Singh
>
> We are getting below error while calling searcherManager.maybeReopen() 
> method. We are using MMAP implementation to read NFS index directory mounted 
> against 3 servers. We have a different process to update the indices and 3 
> other processes to read from the same index. What we believe is that this 
> issue is occuring when we call maybeReopen() method during heavy writes to 
> the indices which are happening at that time and MMap implementation is not 
> able to coop with it..
> Caused by: java.lang.InternalError: a fault occurred in a recent unsafe 
> memory access operation in compiled Java code
> at java.security.AccessController.doPrivileged(Native Method)
> at 
> org.apache.lucene.store.MMapDirectory.cleanMapping(MMapDirectory.java:158)
> at 
> 

[jira] [Created] (LUCENE-7527) Facing unsafe memory access operation error while calling searcherManager.maybeReopen()

2016-10-28 Thread Jagmohan Singh (JIRA)
Jagmohan Singh created LUCENE-7527:
--

 Summary: Facing unsafe memory access operation error while calling 
searcherManager.maybeReopen()
 Key: LUCENE-7527
 URL: https://issues.apache.org/jira/browse/LUCENE-7527
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Affects Versions: 3.5
Reporter: Jagmohan Singh


We are getting below error while calling searcherManager.maybeReopen() method. 
We are using MMAP to read NFS index directory mounted against 3 servers. We 
have a different process to update the indices and have 3 processes to read 
from the index. What we believe it when we face this error in one of the 
readers is due to heavy writes int he indices which are happening at that time 
and MMap implementation is not able to coop with it..

Caused by: java.lang.InternalError: a fault occurred in a recent unsafe memory 
access operation in compiled Java code
at java.security.AccessController.doPrivileged(Native Method)
at 
org.apache.lucene.store.MMapDirectory.cleanMapping(MMapDirectory.java:158)
at 
org.apache.lucene.store.MMapDirectory$MMapIndexInput.close(MMapDirectory.java:389)
at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:690)
at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:593)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:359)
at 
org.apache.lucene.index.SegmentInfos.readCurrentVersion(SegmentInfos.java:480)
at 
org.apache.lucene.index.DirectoryReader.isCurrent(DirectoryReader.java:901)
at 
org.apache.lucene.index.DirectoryReader.doOpenNoWriter(DirectoryReader.java:471)
at 
org.apache.lucene.index.DirectoryReader.doOpenIfChanged(DirectoryReader.java:450)
at 
org.apache.lucene.index.DirectoryReader.doOpenIfChanged(DirectoryReader.java:391)
at 
org.apache.lucene.index.IndexReader.openIfChanged(IndexReader.java:497)
at 
org.apache.lucene.search.SearcherManager.maybeReopen(SearcherManager.java:162)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-10-28 Thread Kevin Risden (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616318#comment-15616318
 ] 

Kevin Risden commented on SOLR-8593:


One thing that you may have to do to make the Solr server happy until Calcite 
1.11 gets released is put the solr-core jar in front of calcite-core on the 
classpath. They are alphabetical right now (I fixed by renaming solr-core to 
asolr-core). This is to make sure the CalciteConnectionProperty fixes goes in 
before the original version. Otherwise you get an abstract method error.

select count(distinct) ... might actually work right now with that patch. If it 
doesn't then comment out the rules in the static block in SolrRules. That 
should give the full power of just regular SQL on top of Solr. It wouldn't 
really push anything down but still the SQL works. (This should include joins)

You should be able to add/change the SolrRules to see what gets pushed down. 
The classes from the branch are the exact same as the code here: 
https://github.com/risdenk/solr-calcite-example The code there has some more 
tests that show the explain plan. It was easier for me to iterate on the 
implementation of the rules that way.

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>
> The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-10-28 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616304#comment-15616304
 ] 

Joel Bernstein commented on SOLR-8593:
--

[~risdenk], I'll pull the branch and begin working with it. My plan is to run 
the tests and see how the various Calcite pieces get triggered.

Perhaps for the 6.4 for release we should shoot for the same functionality we 
currently have, just with Calcite swapped in.

If we want tp add one new thing I think the SELECT COUNT(DISTINCT) ... query 
would be a great thing to add.

> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>
> The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7526) Improvements to UnifiedHighlighter OffsetStrategies

2016-10-28 Thread Timothy M. Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616294#comment-15616294
 ] 

Timothy M. Rodriguez commented on LUCENE-7526:
--

Thanks [~dsmiley] :).  I've just submitted the pull request.  You're right this 
only removes an additional use of token streams.  In the case of the Analysis 
strategies a TokenStream is still necessary at least initially to analyze the 
field.  I'm glad I got to work on this during the wonderful Boston Hackday 
event (https://github.com/flaxsearch/london-hackday-2016).  Thanks [~dsmiley] 
for some tips while there and [~mbraun688] for some initial feedback on the pr.

> Improvements to UnifiedHighlighter OffsetStrategies
> ---
>
> Key: LUCENE-7526
> URL: https://issues.apache.org/jira/browse/LUCENE-7526
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/highlighter
>Reporter: Timothy M. Rodriguez
>Assignee: David Smiley
>Priority: Minor
> Fix For: 6.4
>
>
> This ticket improves several of the UnifiedHighlighter FieldOffsetStrategies 
> by reducing reliance on creating or re-creating TokenStreams.
> The primary changes are as follows:
> * AnalysisOffsetStrategy - split into two offset strategies
>   ** MemoryIndexOffsetStrategy - the primary analysis mode that utilizes a 
> MemoryIndex for producing Offsets
>   ** TokenStreamOffsetStrategy - an offset strategy that avoids creating a 
> MemoryIndex.  Can only be used if the query distills down to terms and 
> automata.
> * TokenStream removal 
>   ** MemoryIndexOffsetStrategy - previously a TokenStream was created to fill 
> the memory index and then once consumed a new one was generated by 
> uninverting the MemoryIndex back into a TokenStream if there were automata 
> (wildcard/mtq queries) involved.  Now this is avoided, which should save 
> memory and avoid a second pass over the data.
>   ** TermVectorOffsetStrategy - this was refactored in a similar way to avoid 
> generating a TokenStream if automata are involved.
>   ** PostingsWithTermVectorsOffsetStrategy - similar refactoring
> * CompositePostingsEnum - aggregates several underlying PostingsEnums for 
> wildcard/mtq queries.  This should improve relevancy by providing unified 
> metrics for a wildcard across all it's term matches
> * Added a HighlightFlag for enabling the newly separated 
> TokenStreamOffsetStrategy since it can adversely affect passage relevancy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8593) Integrate Apache Calcite into the SQLHandler

2016-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616284#comment-15616284
 ] 

ASF GitHub Bot commented on SOLR-8593:
--

Github user joel-bernstein commented on the issue:

https://github.com/apache/lucene-solr/pull/104
  
Ok, got it. I'll pull the branch and start working with it.


> Integrate Apache Calcite into the SQLHandler
> 
>
> Key: SOLR-8593
> URL: https://issues.apache.org/jira/browse/SOLR-8593
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>
> The Presto SQL Parser was perfect for phase one of the SQLHandler. It was 
> nicely split off from the larger Presto project and it did everything that 
> was needed for the initial implementation.
> Phase two of the SQL work though will require an optimizer. Here is where 
> Apache Calcite comes into play. It has a battle tested cost based optimizer 
> and has been integrated into Apache Drill and Hive.
> This work can begin in trunk following the 6.0 release. The final query plans 
> will continue to be translated to Streaming API objects (TupleStreams), so 
> continued work on the JDBC driver should plug in nicely with the Calcite work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr issue #104: SOLR-8593 - WIP

2016-10-28 Thread joel-bernstein
Github user joel-bernstein commented on the issue:

https://github.com/apache/lucene-solr/pull/104
  
Ok, got it. I'll pull the branch and start working with it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr issue #104: Jira/solr 8593

2016-10-28 Thread risdenk
Github user risdenk commented on the issue:

https://github.com/apache/lucene-solr/pull/104
  
https://github.com/apache/lucene-solr/tree/jira/solr-8593


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr issue #104: Jira/solr 8593

2016-10-28 Thread risdenk
Github user risdenk commented on the issue:

https://github.com/apache/lucene-solr/pull/104
  
This is on `apache/lucene-solr` already. It is branch `jira/solr-8593`. I 
opened the PR just so I could comment on some of the files.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7526) Improvements to UnifiedHighlighter OffsetStrategies

2016-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616239#comment-15616239
 ] 

ASF GitHub Bot commented on LUCENE-7526:


GitHub user Timothy055 opened a pull request:

https://github.com/apache/lucene-solr/pull/105

LUCENE-7526 Improvements to UnifiedHighlighter OffsetStrategies

Pull request for LUCENE-7526

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Timothy055/lucene-solr master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/lucene-solr/pull/105.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #105


commit 02e932c4a6146363680b88f4947a693c6697c955
Author: Timothy Rodriguez 
Date:   2016-09-01T19:23:50Z

Initial fork of PostingsHighlighter for UnifiedHighlighter

commit 9d88411b3985a98851384d78d681431dba710e89
Author: Timothy Rodriguez 
Date:   2016-09-01T23:17:06Z

Initial commit of the UnifiedHighlighter for OSS contribution

commit e45e39bc4b07ea33e4423b264c2fefb9aa08777a
Author: David Smiley 
Date:   2016-09-02T12:45:49Z

Fix misc issues; "ant test" now works. (#1)

commit 046a28ef31acf4cea7d2554b827e6a714e3d
Author: Timothy Rodriguez 
Date:   2016-09-02T20:58:31Z

Minor refactoring of the AnalysisFieldHighlighter

commit ccd1a2280abd4b48cfef8122696e5d9cfd12920f
Author: David Smiley 
Date:   2016-09-03T12:55:20Z

AbstractFieldHighlighter: order methods more sensibly; renamed a couple.

commit d4714a04a3e41d5e95bbe942b275c32ed69b9c2e
Author: David Smiley 
Date:   2016-09-04T01:03:29Z

Improve javadocs and @lucene.external/internal labeling & scope.
"ant precommit" now passes.

commit e0659f18a59bf2893076da6d7643ff30f2fa5a52
Author: David Smiley 
Date:   2016-09-04T01:25:55Z

Analysis: remove dubious filter() method

commit ccd7ce707bff2c06da89b31853cca9aecea72008
Author: David Smiley 
Date:   2016-09-04T01:44:01Z

getStrictPhraseHelper -> rm "Strict", getHighlightAccuracy -> getFlags, and 
only call filterExtractedTerms once.

commit ffc2a22c700b8abcbf87673d5d05bb3659d177c9
Author: David Smiley 
Date:   2016-09-04T15:21:08Z

UnifiedHighlighter round 2 (#2)

* AbstractFieldHighlighter: order methods more sensibly; renamed a couple.

* Improve javadocs and @lucene.external/internal labeling & scope.
"ant precommit" now passes.

* Analysis: remove dubious filter() method

* getStrictPhraseHelper -> rm "Strict", getHighlightAccuracy -> getFlags, 
and only call filterExtractedTerms once.

commit 5f95e05595db462d3ab5bffc68c2c92f70875072
Author: David Smiley 
Date:   2016-09-04T16:12:33Z

Refactor: FieldOffsetStrategy

commit 86fb6265fbbdb955ead6d4baf944bf708175715e
Author: David Smiley 
Date:   2016-09-04T16:21:32Z

stop passing maxPassages into highlightFieldForDoc()

commit f6fd80544eae9fab953b94b1e9346c0883f956eb
Author: David Smiley 
Date:   2016-09-04T16:12:33Z

Refactor: FieldOffsetStrategy

commit b335a673c2ce45904890c1e9af7cbfda2bd27b0f
Author: David Smiley 
Date:   2016-09-04T16:21:32Z

stop passing maxPassages into highlightFieldForDoc()

commit 478db9437b92214cbf459f82ba2e3a67c966a150
Author: David Smiley 
Date:   2016-09-04T18:29:44Z

Rename subclasses of FieldOffsetStrategy.

commit dbf4280755c11420a5032445cd618fadb7444b61
Author: David Smiley 
Date:   2016-09-04T18:31:34Z

Re-order and harmonize params on methods called by UH.getFieldHighlighter()

commit f0340e27e61dcda2e11992f08ec07a72fad6c24c
Author: David Smiley 
Date:   2016-09-04T18:53:51Z

FieldHighlighter: harmonize field/param order. And don't apply 
maxNoHighlightPasses twice.

commit 817f63c1d48fd523c13b9c40a2ae9b8a4047209a
Author: Timothy Rodriguez 
Date:   2016-09-06T20:43:20Z

Merge of renaming changes

commit 0f644a4f53c1ed4d41d562848f6fe51a87442a75
Author: Timothy Rodriguez 
Date:   2016-09-06T20:54:13Z

add visibility tests

commit 9171f49e117085e7d086267bb73836831ff07f8e
Author: Timothy Rodriguez 
Date:   2016-09-07T14:26:59Z

ADd additional extensibility test

commit 7ce488147cb811e15cb6e9125a835171157746f2
Author: Timothy Rodriguez 
Date:   2016-09-28T22:04:15Z

Reduce visibility of MultiTermHighlighting to package protected

commit 2f08465020448592b0e8750db568ade5a9218267
Author: Timothy M. 

[GitHub] lucene-solr pull request #105: LUCENE-7526 Improvements to UnifiedHighlighte...

2016-10-28 Thread Timothy055
GitHub user Timothy055 opened a pull request:

https://github.com/apache/lucene-solr/pull/105

LUCENE-7526 Improvements to UnifiedHighlighter OffsetStrategies

Pull request for LUCENE-7526

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Timothy055/lucene-solr master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/lucene-solr/pull/105.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #105


commit 02e932c4a6146363680b88f4947a693c6697c955
Author: Timothy Rodriguez 
Date:   2016-09-01T19:23:50Z

Initial fork of PostingsHighlighter for UnifiedHighlighter

commit 9d88411b3985a98851384d78d681431dba710e89
Author: Timothy Rodriguez 
Date:   2016-09-01T23:17:06Z

Initial commit of the UnifiedHighlighter for OSS contribution

commit e45e39bc4b07ea33e4423b264c2fefb9aa08777a
Author: David Smiley 
Date:   2016-09-02T12:45:49Z

Fix misc issues; "ant test" now works. (#1)

commit 046a28ef31acf4cea7d2554b827e6a714e3d
Author: Timothy Rodriguez 
Date:   2016-09-02T20:58:31Z

Minor refactoring of the AnalysisFieldHighlighter

commit ccd1a2280abd4b48cfef8122696e5d9cfd12920f
Author: David Smiley 
Date:   2016-09-03T12:55:20Z

AbstractFieldHighlighter: order methods more sensibly; renamed a couple.

commit d4714a04a3e41d5e95bbe942b275c32ed69b9c2e
Author: David Smiley 
Date:   2016-09-04T01:03:29Z

Improve javadocs and @lucene.external/internal labeling & scope.
"ant precommit" now passes.

commit e0659f18a59bf2893076da6d7643ff30f2fa5a52
Author: David Smiley 
Date:   2016-09-04T01:25:55Z

Analysis: remove dubious filter() method

commit ccd7ce707bff2c06da89b31853cca9aecea72008
Author: David Smiley 
Date:   2016-09-04T01:44:01Z

getStrictPhraseHelper -> rm "Strict", getHighlightAccuracy -> getFlags, and 
only call filterExtractedTerms once.

commit ffc2a22c700b8abcbf87673d5d05bb3659d177c9
Author: David Smiley 
Date:   2016-09-04T15:21:08Z

UnifiedHighlighter round 2 (#2)

* AbstractFieldHighlighter: order methods more sensibly; renamed a couple.

* Improve javadocs and @lucene.external/internal labeling & scope.
"ant precommit" now passes.

* Analysis: remove dubious filter() method

* getStrictPhraseHelper -> rm "Strict", getHighlightAccuracy -> getFlags, 
and only call filterExtractedTerms once.

commit 5f95e05595db462d3ab5bffc68c2c92f70875072
Author: David Smiley 
Date:   2016-09-04T16:12:33Z

Refactor: FieldOffsetStrategy

commit 86fb6265fbbdb955ead6d4baf944bf708175715e
Author: David Smiley 
Date:   2016-09-04T16:21:32Z

stop passing maxPassages into highlightFieldForDoc()

commit f6fd80544eae9fab953b94b1e9346c0883f956eb
Author: David Smiley 
Date:   2016-09-04T16:12:33Z

Refactor: FieldOffsetStrategy

commit b335a673c2ce45904890c1e9af7cbfda2bd27b0f
Author: David Smiley 
Date:   2016-09-04T16:21:32Z

stop passing maxPassages into highlightFieldForDoc()

commit 478db9437b92214cbf459f82ba2e3a67c966a150
Author: David Smiley 
Date:   2016-09-04T18:29:44Z

Rename subclasses of FieldOffsetStrategy.

commit dbf4280755c11420a5032445cd618fadb7444b61
Author: David Smiley 
Date:   2016-09-04T18:31:34Z

Re-order and harmonize params on methods called by UH.getFieldHighlighter()

commit f0340e27e61dcda2e11992f08ec07a72fad6c24c
Author: David Smiley 
Date:   2016-09-04T18:53:51Z

FieldHighlighter: harmonize field/param order. And don't apply 
maxNoHighlightPasses twice.

commit 817f63c1d48fd523c13b9c40a2ae9b8a4047209a
Author: Timothy Rodriguez 
Date:   2016-09-06T20:43:20Z

Merge of renaming changes

commit 0f644a4f53c1ed4d41d562848f6fe51a87442a75
Author: Timothy Rodriguez 
Date:   2016-09-06T20:54:13Z

add visibility tests

commit 9171f49e117085e7d086267bb73836831ff07f8e
Author: Timothy Rodriguez 
Date:   2016-09-07T14:26:59Z

ADd additional extensibility test

commit 7ce488147cb811e15cb6e9125a835171157746f2
Author: Timothy Rodriguez 
Date:   2016-09-28T22:04:15Z

Reduce visibility of MultiTermHighlighting to package protected

commit 2f08465020448592b0e8750db568ade5a9218267
Author: Timothy M. Rodriguez 
Date:   2016-10-11T16:44:29Z

Initial commit that will use memory index to generate offsets enum if the 
tokenstream is null

commit 357f3dfb9ace4deef20787af19bc2e5a6b4ff61e
Author: Timothy M. Rodriguez 

[GitHub] lucene-solr pull request #79: LUCENE-7438 UnifiedHighlighter

2016-10-28 Thread Timothy055
Github user Timothy055 closed the pull request at:

https://github.com/apache/lucene-solr/pull/79


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7438) UnifiedHighlighter

2016-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15616233#comment-15616233
 ] 

ASF GitHub Bot commented on LUCENE-7438:


Github user Timothy055 closed the pull request at:

https://github.com/apache/lucene-solr/pull/79


> UnifiedHighlighter
> --
>
> Key: LUCENE-7438
> URL: https://issues.apache.org/jira/browse/LUCENE-7438
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/highlighter
>Affects Versions: 6.2
>Reporter: Timothy M. Rodriguez
>Assignee: David Smiley
> Fix For: 6.3
>
> Attachments: LUCENE-7438.patch, LUCENE_7438_UH_benchmark.patch, 
> LUCENE_7438_UH_benchmark.patch, LUCENE_7438_UH_small_changes.patch
>
>
> The UnifiedHighlighter is an evolution of the PostingsHighlighter that is 
> able to highlight using offsets in either postings, term vectors, or from 
> analysis (a TokenStream). Lucene’s existing highlighters are mostly 
> demarcated along offset source lines, whereas here it is unified -- hence 
> this proposed name. In this highlighter, the offset source strategy is 
> separated from the core highlighting functionalty. The UnifiedHighlighter 
> further improves on the PostingsHighlighter’s design by supporting accurate 
> phrase highlighting using an approach similar to the standard highlighter’s 
> WeightedSpanTermExtractor. The next major improvement is a hybrid offset 
> source strategythat utilizes postings and “light” term vectors (i.e. just the 
> terms) for highlighting multi-term queries (wildcards) without resorting to 
> analysis. Phrase highlighting and wildcard highlighting can both be disabled 
> if you’d rather highlight a little faster albeit not as accurately reflecting 
> the query.
> We’ve benchmarked an earlier version of this highlighter comparing it to the 
> other highlighters and the results were exciting! It’s tempting to share 
> those results but it’s definitely due for another benchmark, so we’ll work on 
> that. Performance was the main motivator for creating the UnifiedHighlighter, 
> as the standard Highlighter (the only one meeting Bloomberg Law’s accuracy 
> requirements) wasn’t fast enough, even with term vectors along with several 
> improvements we contributed back, and even after we forked it to highlight in 
> multiple threads.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-6.x-Linux (32bit/jdk1.8.0_102) - Build # 2060 - Unstable!

2016-10-28 Thread Policeman Jenkins Server
Build: https://jenkins.thetaphi.de/job/Lucene-Solr-6.x-Linux/2060/
Java: 32bit/jdk1.8.0_102 -server -XX:+UseConcMarkSweepGC

1 tests failed.
FAILED:  org.apache.solr.cloud.ShardSplitTest.testSplitAfterFailedSplit

Error Message:
expected:<1> but was:<2>

Stack Trace:
java.lang.AssertionError: expected:<1> but was:<2>
at 
__randomizedtesting.SeedInfo.seed([A78F4599CC12AA00:5EC2D636F067E78A]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:472)
at org.junit.Assert.assertEquals(Assert.java:456)
at 
org.apache.solr.cloud.ShardSplitTest.testSplitAfterFailedSplit(ShardSplitTest.java:284)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1713)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:907)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:943)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:957)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:992)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:967)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:811)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:462)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:916)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:802)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:852)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)

[JENKINS] Lucene-Solr-6.x-Solaris (64bit/jdk1.8.0) - Build # 477 - Still Unstable!

2016-10-28 Thread Policeman Jenkins Server
Build: https://jenkins.thetaphi.de/job/Lucene-Solr-6.x-Solaris/477/
Java: 64bit/jdk1.8.0 -XX:+UseCompressedOops -XX:+UseSerialGC

3 tests failed.
FAILED:  junit.framework.TestSuite.org.apache.solr.core.TestCoreDiscovery

Error Message:
ObjectTracker found 5 object(s) that were not released!!! 
[MDCAwareThreadPoolExecutor, MockDirectoryWrapper, MockDirectoryWrapper, 
SolrCore, MockDirectoryWrapper] 
org.apache.solr.common.util.ObjectReleaseTracker$ObjectTrackerException  at 
org.apache.solr.common.util.ObjectReleaseTracker.track(ObjectReleaseTracker.java:43)
  at org.apache.solr.core.SolrCore.(SolrCore.java:799)  at 
org.apache.solr.core.SolrCore.(SolrCore.java:776)  at 
org.apache.solr.core.CoreContainer.create(CoreContainer.java:842)  at 
org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:1113)  at 
org.apache.solr.core.TestCoreDiscovery.testTooManyTransientCores(TestCoreDiscovery.java:211)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)  at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)  
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:498)  at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1713)
  at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:907)
  at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:943)
  at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:957)
  at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
  at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
  at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
  at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
  at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
  at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
  at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
  at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:811)
  at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:462)
  at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:916)
  at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:802)
  at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:852)
  at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
  at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
  at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
  at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
  at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
  at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
  at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
  at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
  at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
  at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
  at java.lang.Thread.run(Thread.java:745)  
org.apache.solr.common.util.ObjectReleaseTracker$ObjectTrackerException  at 
org.apache.solr.common.util.ObjectReleaseTracker.track(ObjectReleaseTracker.java:43)
  at 

[GitHub] lucene-solr issue #104: Jira/solr 8593

2016-10-28 Thread joel-bernstein
Github user joel-bernstein commented on the issue:

https://github.com/apache/lucene-solr/pull/104
  
Hi Kevin,
Shall we push a branch out to apache/lucene-solr for this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9621) Remove several guava, apache commons calls in favor of java 8 alternatives

2016-10-28 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15615932#comment-15615932
 ] 

David Smiley commented on SOLR-9621:


Looks great to me Michael; thanks for doing this!  I'll commit later this 
evening.

> Remove several guava, apache commons calls in favor of java 8 alternatives
> --
>
> Key: SOLR-9621
> URL: https://issues.apache.org/jira/browse/SOLR-9621
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Michael Braun
>Assignee: David Smiley
>Priority: Trivial
> Attachments: SOLR-9621.patch, SOLR-9621.patch
>
>
> Now that Solr is against Java 8, we can take advantage of replacing some 
> guava and apache commons calls with JDK standards. I'd like to start by 
> replacing the following:
> com.google.common.base.Supplier  -> java.util.function.Supplier
> com.google.common.base.Predicate -> java.util.function.Predicate
> com.google.common.base.Charsets -> java.nio.charset.StandardCharsets
> org.apache.commons.codec.Charsets -> java.nio.charset.StandardCharsets
> com.google.common.collect.Ordering -> java.util.Comparator
> com.google.common.base.Joiner -> java.util.stream.Collectors::joining
> com.google.common.base.Function -> java.util.function.Function
> com.google.common.base.Preconditions::checkNotNull -> 
> java.util.Objects::requireNonNull
> com.google.common.base.Objects::equals -> java.util.Objects::equals
> com.google.common.base.Objects::hashCode -> java.util.Objects::hashCode



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr issue #104: Jira/solr 8593

2016-10-28 Thread risdenk
Github user risdenk commented on the issue:

https://github.com/apache/lucene-solr/pull/104
  
Filter is needed for the Lucene query parsing piece. The rules aren't 
completely correct. Some issues with `is not` and others that need to add some 
tests for. It does compile though and able to do things like `select * ...`. 
I'm thinking back out a few of the rules to make this simpler then 
incrementally add from there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr issue #104: Jira/solr 8593

2016-10-28 Thread risdenk
Github user risdenk commented on the issue:

https://github.com/apache/lucene-solr/pull/104
  
@joel-bernstein FYI


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-NightlyTests-master - Build # 1140 - Still unstable

2016-10-28 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-master/1140/

6 tests failed.
FAILED:  org.apache.solr.cloud.CollectionsAPIDistributedZkTest.test

Error Message:
Timeout occured while waiting response from server at: 
https://127.0.0.1:57603/j_nb/a

Stack Trace:
org.apache.solr.client.solrj.SolrServerException: Timeout occured while waiting 
response from server at: https://127.0.0.1:57603/j_nb/a
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:621)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:262)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:251)
at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
at 
org.apache.solr.cloud.CollectionsAPIDistributedZkTest.makeRequest(CollectionsAPIDistributedZkTest.java:400)
at 
org.apache.solr.cloud.CollectionsAPIDistributedZkTest.testErrorHandling(CollectionsAPIDistributedZkTest.java:458)
at 
org.apache.solr.cloud.CollectionsAPIDistributedZkTest.test(CollectionsAPIDistributedZkTest.java:180)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1713)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:907)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:943)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:957)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:985)
at 
org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:960)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:811)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:462)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:916)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:802)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:852)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
at 

[jira] [Commented] (SOLR-9699) CoreStatus requests can fail if executed during a core reload

2016-10-28 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15615655#comment-15615655
 ] 

Alan Woodward commented on SOLR-9699:
-

I think the fix here is probably to check for AlreadyClosedExceptions in 
StatusOp, and retry if one is hit.

> CoreStatus requests can fail if executed during a core reload
> -
>
> Key: SOLR-9699
> URL: https://issues.apache.org/jira/browse/SOLR-9699
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Alan Woodward
>
> CoreStatus requests delegate some of their response down to a core's 
> IndexWriter.  If the core is being reloaded, then there's a race between 
> these calls and the IndexWriter being closed, which can lead to the request 
> failing with an AlreadyClosedException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9132) Cut over AbstractDistribZkTestCase tests to SolrCloudTestCase

2016-10-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15615642#comment-15615642
 ] 

ASF subversion and git services commented on SOLR-9132:
---

Commit 3b49705c43178fcd75dc85e56bcd2820cb35e166 in lucene-solr's branch 
refs/heads/master from [~romseygeek]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=3b49705 ]

SOLR-9132: Don't require indexInfo from corestatus over reloads


> Cut over AbstractDistribZkTestCase tests to SolrCloudTestCase
> -
>
> Key: SOLR-9132
> URL: https://issues.apache.org/jira/browse/SOLR-9132
> Project: Solr
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
> Attachments: SOLR-9132-deletereplicas.patch, 
> SOLR-9132-recovery.patch, SOLR-9132-rules.patch, SOLR-9132.patch, 
> TEST-org.apache.solr.cloud.TestDeleteCollectionOnDownNodes.xml
>
>
> We need to remove AbstractDistribZkTestCase if we want to move away from 
> legacy cloud configurations.  This issue is for migrating tests to 
> SolrCloudTestCase instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9132) Cut over AbstractDistribZkTestCase tests to SolrCloudTestCase

2016-10-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15615641#comment-15615641
 ] 

ASF subversion and git services commented on SOLR-9132:
---

Commit 9b669d72876a13221f49db09ba9f8e1a1f60487e in lucene-solr's branch 
refs/heads/branch_6x from [~romseygeek]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9b669d7 ]

SOLR-9132: Don't require indexInfo from corestatus over reloads


> Cut over AbstractDistribZkTestCase tests to SolrCloudTestCase
> -
>
> Key: SOLR-9132
> URL: https://issues.apache.org/jira/browse/SOLR-9132
> Project: Solr
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
> Attachments: SOLR-9132-deletereplicas.patch, 
> SOLR-9132-recovery.patch, SOLR-9132-rules.patch, SOLR-9132.patch, 
> TEST-org.apache.solr.cloud.TestDeleteCollectionOnDownNodes.xml
>
>
> We need to remove AbstractDistribZkTestCase if we want to move away from 
> legacy cloud configurations.  This issue is for migrating tests to 
> SolrCloudTestCase instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-master-MacOSX (64bit/jdk1.8.0) - Build # 3633 - Failure!

2016-10-28 Thread Policeman Jenkins Server
Build: https://jenkins.thetaphi.de/job/Lucene-Solr-master-MacOSX/3633/
Java: 64bit/jdk1.8.0 -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC

All tests passed

Build Log:
[...truncated 23684 lines...]
-validate-source-patterns:
[source-patterns] nocommit: 
solr/core/src/test/org/apache/solr/cloud/RecoveryZkTest.java
[source-patterns] invalid logging pattern [not private static final, uses 
static class name]: solr/core/src/test/org/apache/solr/cloud/RecoveryZkTest.java

BUILD FAILED
/Users/jenkins/workspace/Lucene-Solr-master-MacOSX/build.xml:765: The following 
error occurred while executing this line:
/Users/jenkins/workspace/Lucene-Solr-master-MacOSX/build.xml:130: Found 2 
violations in source files (invalid logging pattern [not private static final, 
uses static class name], nocommit).

Total time: 94 minutes 44 seconds
Build step 'Invoke Ant' marked build as failure
Archiving artifacts
[WARNINGS] Skipping publisher since build result is FAILURE
Recording test results
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-9132) Cut over AbstractDistribZkTestCase tests to SolrCloudTestCase

2016-10-28 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-9132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated SOLR-9132:
---
Attachment: TEST-org.apache.solr.cloud.TestDeleteCollectionOnDownNodes.xml

I'm attaching test failure details while I still have the file.  Feel free to 
ignore if you don't hear about this failure again.

> Cut over AbstractDistribZkTestCase tests to SolrCloudTestCase
> -
>
> Key: SOLR-9132
> URL: https://issues.apache.org/jira/browse/SOLR-9132
> Project: Solr
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
> Attachments: SOLR-9132-deletereplicas.patch, 
> SOLR-9132-recovery.patch, SOLR-9132-rules.patch, SOLR-9132.patch, 
> TEST-org.apache.solr.cloud.TestDeleteCollectionOnDownNodes.xml
>
>
> We need to remove AbstractDistribZkTestCase if we want to move away from 
> legacy cloud configurations.  This issue is for migrating tests to 
> SolrCloudTestCase instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7526) Improvements to UnifiedHighlighter OffsetStrategies

2016-10-28 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-7526:
-
   Labels:   (was: highlighter unified-highlighter)
Fix Version/s: 6.4
  Component/s: modules/highlighter

I'm looking forward to seeing this.  :-)

The summary heading "TokenStream removal" may be confusing to folks... 
TokenStreams are certainly going to be involved for the Analysis based offset 
source.  I think you mean that the changes here relating to MultiTermQuery 
processing will mean that MTQs aren't coerced into a TokenStream over the index 
any more, except for the refactored-out TokenStreamOffsetStrategy.  So this 
removes many but not all occurrences of TokenStream within the internal API.

> Improvements to UnifiedHighlighter OffsetStrategies
> ---
>
> Key: LUCENE-7526
> URL: https://issues.apache.org/jira/browse/LUCENE-7526
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/highlighter
>Reporter: Timothy M. Rodriguez
>Assignee: David Smiley
>Priority: Minor
> Fix For: 6.4
>
>
> This ticket improves several of the UnifiedHighlighter FieldOffsetStrategies 
> by reducing reliance on creating or re-creating TokenStreams.
> The primary changes are as follows:
> * AnalysisOffsetStrategy - split into two offset strategies
>   ** MemoryIndexOffsetStrategy - the primary analysis mode that utilizes a 
> MemoryIndex for producing Offsets
>   ** TokenStreamOffsetStrategy - an offset strategy that avoids creating a 
> MemoryIndex.  Can only be used if the query distills down to terms and 
> automata.
> * TokenStream removal 
>   ** MemoryIndexOffsetStrategy - previously a TokenStream was created to fill 
> the memory index and then once consumed a new one was generated by 
> uninverting the MemoryIndex back into a TokenStream if there were automata 
> (wildcard/mtq queries) involved.  Now this is avoided, which should save 
> memory and avoid a second pass over the data.
>   ** TermVectorOffsetStrategy - this was refactored in a similar way to avoid 
> generating a TokenStream if automata are involved.
>   ** PostingsWithTermVectorsOffsetStrategy - similar refactoring
> * CompositePostingsEnum - aggregates several underlying PostingsEnums for 
> wildcard/mtq queries.  This should improve relevancy by providing unified 
> metrics for a wildcard across all it's term matches
> * Added a HighlightFlag for enabling the newly separated 
> TokenStreamOffsetStrategy since it can adversely affect passage relevancy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9132) Cut over AbstractDistribZkTestCase tests to SolrCloudTestCase

2016-10-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15615412#comment-15615412
 ] 

ASF subversion and git services commented on SOLR-9132:
---

Commit 6340f3b9b9902f2aaf04fd460d0ed91bd8da00e4 in lucene-solr's branch 
refs/heads/branch_6x from [~romseygeek]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=6340f3b ]

SOLR-9132: Fix test bug


> Cut over AbstractDistribZkTestCase tests to SolrCloudTestCase
> -
>
> Key: SOLR-9132
> URL: https://issues.apache.org/jira/browse/SOLR-9132
> Project: Solr
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
> Attachments: SOLR-9132-deletereplicas.patch, 
> SOLR-9132-recovery.patch, SOLR-9132-rules.patch, SOLR-9132.patch
>
>
> We need to remove AbstractDistribZkTestCase if we want to move away from 
> legacy cloud configurations.  This issue is for migrating tests to 
> SolrCloudTestCase instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9132) Cut over AbstractDistribZkTestCase tests to SolrCloudTestCase

2016-10-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15615413#comment-15615413
 ] 

ASF subversion and git services commented on SOLR-9132:
---

Commit cff2774a3749378a040ce417f00560b95c93e10f in lucene-solr's branch 
refs/heads/master from [~romseygeek]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=cff2774 ]

SOLR-9132: Fix test bug


> Cut over AbstractDistribZkTestCase tests to SolrCloudTestCase
> -
>
> Key: SOLR-9132
> URL: https://issues.apache.org/jira/browse/SOLR-9132
> Project: Solr
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
> Attachments: SOLR-9132-deletereplicas.patch, 
> SOLR-9132-recovery.patch, SOLR-9132-rules.patch, SOLR-9132.patch
>
>
> We need to remove AbstractDistribZkTestCase if we want to move away from 
> legacy cloud configurations.  This issue is for migrating tests to 
> SolrCloudTestCase instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7135) Constants check for JRE bitness causes SecurityException under WebStart

2016-10-28 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15615391#comment-15615391
 ] 

Michael McCandless commented on LUCENE-7135:


Is {{OS_ARCH.contains("64"))}} really a safe way to determine if we are running 
in a 64 bit JVM?  Maybe we should only do this on fallback, if the security 
manager doesn't let us do {{System.getProperty("sun.arch.data.model")}}?

> Constants check for JRE bitness causes SecurityException under WebStart
> ---
>
> Key: LUCENE-7135
> URL: https://issues.apache.org/jira/browse/LUCENE-7135
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/other
>Affects Versions: 5.5
> Environment: OS X 10.11.4, Java 1.8.0_77-b03 (under WebStart)
>Reporter: Aaron Madlon-Kay
> Attachments: LUCENE-7135.diff
>
>
> I have an app that I deploy via WebStart that uses Lucene 5.2.1 (we are 
> locked to 5.2.1 because that's what [LanguageTool|https://languagetool.org/] 
> uses).
> When running under the WebStart security manager, there are two locations 
> where exceptions are thrown and prevent pretty much all Lucene classes from 
> initializing. This is true even when we sign everything and specify 
> {{}}.
> # In {{RamUsageEstimator}}, fixed by LUCENE-6923
> # In {{Constants}}, caused by the call 
> {{System.getProperty("sun.arch.data.model")}} (stack trace below).
> {code}
> Error: Caused by: java.security.AccessControlException: access denied 
> ("java.util.PropertyPermission" "sun.arch.data.model" "read") 
> Error:at 
> java.security.AccessControlContext.checkPermission(AccessControlContext.java:472)
>  
> Error:at 
> java.security.AccessController.checkPermission(AccessController.java:884) 
> Error:at 
> java.lang.SecurityManager.checkPermission(SecurityManager.java:549) 
> Error:at 
> com.sun.javaws.security.JavaWebStartSecurity.checkPermission(Unknown Source) 
> Error:at 
> java.lang.SecurityManager.checkPropertyAccess(SecurityManager.java:1294) 
> Error:at java.lang.System.getProperty(System.java:717) 
> Error:at org.apache.lucene.util.Constants.(Constants.java:71) 
> Error:... 34 more 
> {code}
> The latter is still present in the latest version. My patch illustrates one 
> solution that appears to be working for us.
> (This patch, together with a backport of the fix to LUCENE-6923, seems to fix 
> the issue for our purposes. However if you really wanted to make my day you 
> could put out a maintenance release of 5.2 with both fixes included.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7523) UpgradeIndexMergePolicy: beyond one-off use, monster segment avoidance

2016-10-28 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15615379#comment-15615379
 ] 

Michael McCandless commented on LUCENE-7523:


UIMP was really designed for one-off usage via the {{IndexUpgrader}} tool, but 
I agree it's interesting to maybe have it become instead a merge policy that 
passes through ordinary merging as well?

It's a somewhat complex problem, though: if the merge policy is presented with 
an index that has N old segments and M new ones, and it's in need of merging, 
how does it pick?  Is it only {{forceMerge}} that would explicitly target only 
old segments first?  Would there be just an added bias to favor old ones, like 
how {{TieredMergePolicy}} biases to segments that have more deletions.

Maybe we just fold this behavior into TMP and remove UIMP?

bq.  That extra new segment could be quite a large 'monster' segment.

Maybe we could have a {{maxMergedSegmentMB}}, like {{TieredMergePolicy}}?  Then 
UIMP could only send segments whose total size is less than that to the wrapped 
merge policy, maybe?

bq. UIMP.findMerges does not pass the mergeTrigger to the inner/delegate merge 
policy.

That seems like a bug to me.

> UpgradeIndexMergePolicy: beyond one-off use, monster segment avoidance
> --
>
> Key: LUCENE-7523
> URL: https://issues.apache.org/jira/browse/LUCENE-7523
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Christine Poerschke
>Priority: Minor
> Attachments: LUCENE-7523-outline.patch
>
>
> (Was looking at UpgradeIndexMergePolicy as part of SOLR-9648 and came up with 
> these possibilities here, what do people think?)
> Currently one probably would not configure use of the 
> {{UpgradeIndexMergePolicy}} (UIMP) permanently since 
> [findForcedMerges|https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/index/UpgradeIndexMergePolicy.java#L74]
>  becomes a no-op after all segments have been upgraded.
> * How about adding an optional {{fallbackToInnerAfterUpgrade}} flag? That way 
> UIMP.findForcedMerges could fallback to its inner/delegate merge policy's 
> findForcedMerges call after all segments have been upgraded.
> Currently UIMP.findForcedMerges identifies all the segments to be upgraded 
> and then asks its inner/delegate merge policy to come up with a 
> MergeSpecification for those segments. If the inner/delegate merge policy 
> does not supply a merge for all the segments to be upgraded then UIMP merges 
> the remaining segments into _one_ new segment. That extra new segment could 
> be quite a large 'monster' segment.
> * How about adding an optional {{upgradeUnmergedSegmentsIndividually}} flag? 
> That way UIMP.findForcedMerges could upgrade (but not merge) the remaining 
> segments.
> * Or indeed should 'upgradeUnmergedSegmentsIndividually' be the default 
> behaviour?
> Noticed whilst looking at the code:
> * 
> [UIMP.findMerges|https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/index/UpgradeIndexMergePolicy.java#L69]
>  does not pass the mergeTrigger to the inner/delegate merge policy.
> ** If we can figure out why that is, let's add a comment to say why that is.
> ** Understanding why that is would also be needed before proceeding with 
> beyond one-off use of UIMP.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9132) Cut over AbstractDistribZkTestCase tests to SolrCloudTestCase

2016-10-28 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15615361#comment-15615361
 ] 

Alan Woodward commented on SOLR-9132:
-

Thanks David - if it fails again, any chance you could attach the test-failures 
output to this issue?

> Cut over AbstractDistribZkTestCase tests to SolrCloudTestCase
> -
>
> Key: SOLR-9132
> URL: https://issues.apache.org/jira/browse/SOLR-9132
> Project: Solr
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
> Attachments: SOLR-9132-deletereplicas.patch, 
> SOLR-9132-recovery.patch, SOLR-9132-rules.patch, SOLR-9132.patch
>
>
> We need to remove AbstractDistribZkTestCase if we want to move away from 
> legacy cloud configurations.  This issue is for migrating tests to 
> SolrCloudTestCase instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9132) Cut over AbstractDistribZkTestCase tests to SolrCloudTestCase

2016-10-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15615357#comment-15615357
 ] 

ASF subversion and git services commented on SOLR-9132:
---

Commit b6e0ab01743df112dd7ad49135bd33769b7773b7 in lucene-solr's branch 
refs/heads/master from [~romseygeek]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=b6e0ab0 ]

SOLR-9132: Fix precommit


> Cut over AbstractDistribZkTestCase tests to SolrCloudTestCase
> -
>
> Key: SOLR-9132
> URL: https://issues.apache.org/jira/browse/SOLR-9132
> Project: Solr
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
> Attachments: SOLR-9132-deletereplicas.patch, 
> SOLR-9132-recovery.patch, SOLR-9132-rules.patch, SOLR-9132.patch
>
>
> We need to remove AbstractDistribZkTestCase if we want to move away from 
> legacy cloud configurations.  This issue is for migrating tests to 
> SolrCloudTestCase instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-Solr-master-Solaris (64bit/jdk1.8.0) - Build # 933 - Unstable!

2016-10-28 Thread Jan Høydahl
Hmm, this test has been passing several runs before it now failed.
Tried beasting with Miller’s script
  ./beast.sh ~/git/lucene-solr solr ~/beast-tmp BasicAuthStandaloneTest 
~/beast-results y y 12 4
But all tests succeed. I’m committing a fix for some unnecessary/wrong test 
code that may be the cause but not verified.
See 
https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;a=commitdiff;h=1f06411;hp=f56d111adf46e127c62a3fd11fdae9b9725c1024
 


--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 28. okt. 2016 kl. 08.51 skrev Policeman Jenkins Server :
> 
> Build: https://jenkins.thetaphi.de/job/Lucene-Solr-master-Solaris/933/
> Java: 64bit/jdk1.8.0 -XX:+UseCompressedOops -XX:+UseParallelGC
> 
> 1 tests failed.
> FAILED:  org.apache.solr.security.BasicAuthStandaloneTest.testBasicAuth
> 
> Error Message:
> Invalid jsoncontent="text/html;charset=ISO-8859-1"/> Error 401   
>  HTTP ERROR: 401 Problem accessing 
> /solr/admin/authentication. Reason: Bad credentials  />http://eclipse.org/jetty;>Powered by Jetty:// 
> 9.3.8.v20160314   
> 
> Stack Trace:
> java.lang.AssertionError: Invalid json 
> 
> 
> Error 401 
> 
> 
> HTTP ERROR: 401
> Problem accessing /solr/admin/authentication. Reason:
> Bad credentials
> http://eclipse.org/jetty;>Powered by Jetty:// 
> 9.3.8.v20160314
> 
> 
> 
>   at 
> __randomizedtesting.SeedInfo.seed([5538438BE1A53202:E956359945F6B178]:0)
>   at org.junit.Assert.fail(Assert.java:93)
>   at 
> org.apache.solr.security.BasicAuthIntegrationTest.verifySecurityStatus(BasicAuthIntegrationTest.java:256)
>   at 
> org.apache.solr.security.BasicAuthIntegrationTest.verifySecurityStatus(BasicAuthIntegrationTest.java:237)
>   at 
> org.apache.solr.security.BasicAuthStandaloneTest.testBasicAuth(BasicAuthStandaloneTest.java:103)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1713)
>   at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:907)
>   at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:943)
>   at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:957)
>   at 
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
>   at 
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
>   at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
>   at 
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
>   at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
>   at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
>   at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>   at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:367)
>   at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:811)
>   at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:462)
>   at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:916)
>   at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:802)
>   at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:852)
>   at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
>   at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>   at 
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
>   at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
>   at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>   at 
> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
>   at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
>   at 
> 

[jira] [Commented] (SOLR-9132) Cut over AbstractDistribZkTestCase tests to SolrCloudTestCase

2016-10-28 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15615348#comment-15615348
 ] 

ASF subversion and git services commented on SOLR-9132:
---

Commit bb33bb7d4c6f5547022abb2b61a844e8daaaf8fb in lucene-solr's branch 
refs/heads/branch_6x from [~romseygeek]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=bb33bb7 ]

SOLR-9132: Fix precommit


> Cut over AbstractDistribZkTestCase tests to SolrCloudTestCase
> -
>
> Key: SOLR-9132
> URL: https://issues.apache.org/jira/browse/SOLR-9132
> Project: Solr
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
> Attachments: SOLR-9132-deletereplicas.patch, 
> SOLR-9132-recovery.patch, SOLR-9132-rules.patch, SOLR-9132.patch
>
>
> We need to remove AbstractDistribZkTestCase if we want to move away from 
> legacy cloud configurations.  This issue is for migrating tests to 
> SolrCloudTestCase instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-7526) Improvements to UnifiedHighlighter OffsetStrategies

2016-10-28 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley reassigned LUCENE-7526:


Assignee: David Smiley

> Improvements to UnifiedHighlighter OffsetStrategies
> ---
>
> Key: LUCENE-7526
> URL: https://issues.apache.org/jira/browse/LUCENE-7526
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Timothy M. Rodriguez
>Assignee: David Smiley
>Priority: Minor
>  Labels: highlighter, unified-highlighter
>
> This ticket improves several of the UnifiedHighlighter FieldOffsetStrategies 
> by reducing reliance on creating or re-creating TokenStreams.
> The primary changes are as follows:
> * AnalysisOffsetStrategy - split into two offset strategies
>   ** MemoryIndexOffsetStrategy - the primary analysis mode that utilizes a 
> MemoryIndex for producing Offsets
>   ** TokenStreamOffsetStrategy - an offset strategy that avoids creating a 
> MemoryIndex.  Can only be used if the query distills down to terms and 
> automata.
> * TokenStream removal 
>   ** MemoryIndexOffsetStrategy - previously a TokenStream was created to fill 
> the memory index and then once consumed a new one was generated by 
> uninverting the MemoryIndex back into a TokenStream if there were automata 
> (wildcard/mtq queries) involved.  Now this is avoided, which should save 
> memory and avoid a second pass over the data.
>   ** TermVectorOffsetStrategy - this was refactored in a similar way to avoid 
> generating a TokenStream if automata are involved.
>   ** PostingsWithTermVectorsOffsetStrategy - similar refactoring
> * CompositePostingsEnum - aggregates several underlying PostingsEnums for 
> wildcard/mtq queries.  This should improve relevancy by providing unified 
> metrics for a wildcard across all it's term matches
> * Added a HighlightFlag for enabling the newly separated 
> TokenStreamOffsetStrategy since it can adversely affect passage relevancy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7524) More detailed explanation of idf

2016-10-28 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15615338#comment-15615338
 ] 

Michael McCandless commented on LUCENE-7524:


+1

> More detailed explanation of idf
> 
>
> Key: LUCENE-7524
> URL: https://issues.apache.org/jira/browse/LUCENE-7524
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-7524.patch
>
>
> The explanations of idf give the docCount and docFreq, but they do not say 
> how the idf is computed even though the formula is different eg. for 
> ClassicSimilarity and BM25Similarity. Maybe it would help to put the formula 
> in the explanations?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9132) Cut over AbstractDistribZkTestCase tests to SolrCloudTestCase

2016-10-28 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15615335#comment-15615335
 ] 

David Smiley commented on SOLR-9132:


[~romseygeek] you committed a "nocommit" in 
{{org.apache.solr.cloud.RecoveryZkTest#assertShardConsistency}} (line 143)

Also, separately, not sure if it's related to this but in my local test run, 
TestDeleteCollectionOnDownNodes failed due to "Timed out waiting for leader 
elections".  However it didn't reproduce for me -- seed 
B25E79F549CDBB64:2C6B1D0D6FEEF7EC

> Cut over AbstractDistribZkTestCase tests to SolrCloudTestCase
> -
>
> Key: SOLR-9132
> URL: https://issues.apache.org/jira/browse/SOLR-9132
> Project: Solr
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
> Attachments: SOLR-9132-deletereplicas.patch, 
> SOLR-9132-recovery.patch, SOLR-9132-rules.patch, SOLR-9132.patch
>
>
> We need to remove AbstractDistribZkTestCase if we want to move away from 
> legacy cloud configurations.  This issue is for migrating tests to 
> SolrCloudTestCase instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



  1   2   >