[jira] [Commented] (PYLUCENE-31) JCC Parallel/Multiprocess Compilation + Caching
[ https://issues.apache.org/jira/browse/PYLUCENE-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126652#comment-14126652 ] Andi Vajda commented on PYLUCENE-31: I did indeed eventually forget about this issue. My apologies. My reservations about maintaining more monkey patches still stand but I need to review the patch to see how 'bad' it actually is. JCC Parallel/Multiprocess Compilation + Caching --- Key: PYLUCENE-31 URL: https://issues.apache.org/jira/browse/PYLUCENE-31 Project: PyLucene Issue Type: Improvement Environment: Linux 3.11.0-19-generic #33-Ubuntu SMP x86_64 GNU/Linux Reporter: Lee Skillen Priority: Minor Labels: build, cache, ccache, distutils, jcc, parallel Attachments: feature-parallel-build.patch JCC utilises distutils.Extension() in order to build JCC itself and the packages that it generates for Java wrapping - Unfortunately distutils performs its build sequentially and doesn't take advantage of any additional free cores for parallel building. As discussed on the list this is likely a design decision due to potential issues that may arise when building projects with awkward, cyclic or recursive dependencies. These issues shouldn't appear within JCC-based projects because of the generative nature of the build; i.e. all dependencies are resolved and generated prior to building, and the build process itself is about compilation and construction of the wrapper alone, of which the wrapper files are contained to a sequence of flattened compilation units. Enabling this requires monkey patching of distutils, which was also discussed on the list as being a potential source of issues, although we feel that the risk is likely lower than the current setuptools patching utilised. This would be optional functionality that is also only enabled if the monkey-patching succeeds. Distutils itself is also part of the standard library and might be less susceptible to change than setuptools, and the area of code monkey patched almost hasn't changed since 2002 (see: http://hg.python.org/cpython/file/tip/Lib/distutils/ccompiler.py). In addition to the distutils changes this patch also includes changes to the wrapper class generation to make it more cache friendly, with the target being that no changes in the wrapped code equals no changes in the wrapper code. So any changes that minimally change the wrapped code mean that with a tool such as ccache the rebuild time would be significantly reduced (almost to a nth, where n is the number of files and only one has changed). Obviously the maintainers would have to assess this risk and decide whether they would like to accept the patch or not. Code has only been tested on Linux with Python 2.7.5 but should gracefully fail and prevent parallelisation if one of the requirements hasn't been met (not on linux, no multiprocessing support, or monkey patching somehow fails). The change to caching should still benefit everyone regardless. Please note that an additional dependency on orderedset has been added to achieve the more deterministic ordering - This may not be desirable (i.e. another package might be desired, such as ordered-set, or the code might be inlined into the package instead), as per maintainer comments. --- [following repeated from mailing list] --- Performance Statistics :- The following are some quick and dirty statistics for building the jcc pylucene itself (incl. java lucene which accounts for about 30-ish seconds upfront) - The JCC files are split using --files 8, and each build is preceded with a make clean: Serial (unpatched): real5m1.502s user5m22.887s sys 0m7.749s Parallel (patched, 4 physical cores, 8 hyperthreads, 8 parallel jobs): real1m37.382s user7m16.658s sys 0m8.697s Furthermore, some additional changes were made to the wrapped file generation to make the generated code more ccache friendly (additional deterministic sorting for methods and some usage of an ordered set). With these in place and the CC and CCACHE_COMPILERCHECK environment variables set to ccache gcc and content respectively, and ensuring ccache is installed, subsequent compilation time is reduced again as follows: Parallel (patched, 4 physical cores, 8 hyperthreads, 8 parallel jobs, ccache enabled): real0m43.051s user1m10.392s sys 0m4.547s This was a run in which nothing changed between runs, so a realistic run in which changes occur it'll be a figure between 0m43.051s and 1m37.382s, depending on how drastic the change was. If many changes are expected and you want to keep it more cache friendly then using a higher --files would probably work (to an extent), or ideally use --files
RE: [JENKINS] Lucene-Solr-4.x-Linux (32bit/jdk1.7.0_67) - Build # 11066 - Failure!
Hi, those 2 tests fail now after TIKA upgrade: [junit4] - org.apache.solr.morphlines.cell.SolrCellMorphlineTest.testSolrCellJPGCompressed [junit4] - org.apache.solr.morphlines.cell.SolrCellMorphlineTest.testSolrCellDocumentTypes I have no idea, sorry. It *could* be related to the new metadata items (X-Parsed-By) that is now returned by TIKA. It does not fail on my computer, but this is because it only works on Linux and Java 8 is disabled, too. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Policeman Jenkins Server [mailto:jenk...@thetaphi.de] Sent: Monday, September 08, 2014 1:45 AM To: u...@thetaphi.de; dev@lucene.apache.org Subject: [JENKINS] Lucene-Solr-4.x-Linux (32bit/jdk1.7.0_67) - Build # 11066 - Failure! Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/11066/ Java: 32bit/jdk1.7.0_67 -client -XX:+UseConcMarkSweepGC 2 tests failed. FAILED: org.apache.solr.morphlines.cell.SolrCellMorphlineTest.testSolrCellJPGCompr essed Error Message: Cannot execute script Stack Trace: org.kitesdk.morphline.api.MorphlineRuntimeException: Cannot execute script at __randomizedtesting.SeedInfo.seed([8A821065A1A75AD1:AE86C65E2124C91 ]:0) at org.kitesdk.morphline.stdlib.JavaBuilder$Java.doProcess(JavaBuilder.java:98 ) at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.j ava:156) at org.kitesdk.morphline.base.Connector.process(Connector.java:64) at org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractComman d.java:181) at org.kitesdk.morphline.stdlib.SeparateAttachmentsBuilder$SeparateAttachm ents.doProcess(SeparateAttachmentsBuilder.java:79) at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.j ava:156) at org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractComman d.java:181) at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.j ava:156) at org.apache.solr.morphlines.solr.AbstractSolrMorphlineTestBase.load(Abstra ctSolrMorphlineTestBase.java:197) at org.apache.solr.morphlines.solr.AbstractSolrMorphlineTestBase.testDocume ntTypesInternal(AbstractSolrMorphlineTestBase.java:168) at org.apache.solr.morphlines.cell.SolrCellMorphlineTest.testSolrCellJPGCompr essed(SolrCellMorphlineTest.java:153) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j ava:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces sorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(Randomize dRunner.java:1618) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(Rando mizedRunner.java:827) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(Rando mizedRunner.java:863) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(Rando mizedRunner.java:877) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1. evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRule SetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCa cheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeA fterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1 .evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleTh readAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRule IgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure .java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(Stat ementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner. run(ThreadLeakControl.java:365) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask (ThreadLeakControl.java:798) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadL eakControl.java:458) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(Ran domizedRunner.java:836) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(Rando mizedRunner.java:738) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(Rando mizedRunner.java:772) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(Rando mizedRunner.java:783) at
[jira] [Resolved] (SOLR-4765) The new Collections API test deleteCollectionWithDownNodes fails often with a server 500 error.
[ https://issues.apache.org/jira/browse/SOLR-4765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anshum Gupta resolved SOLR-4765. Resolution: Fixed I haven't seen this issue since 4.9 (considering 4.8 officially released on the 28th Apr). Marking this as fixed. The new Collections API test deleteCollectionWithDownNodes fails often with a server 500 error. --- Key: SOLR-4765 URL: https://issues.apache.org/jira/browse/SOLR-4765 Project: Solr Issue Type: Test Components: Tests Reporter: Mark Miller Assignee: Mark Miller Priority: Minor Fix For: 5.0, 4.9 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-Tests-4.x-Java7 - Build # 2093 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-Tests-4.x-Java7/2093/ 2 tests failed. REGRESSION: org.apache.solr.morphlines.cell.SolrCellMorphlineTest.testSolrCellJPGCompressed Error Message: Cannot execute script Stack Trace: org.kitesdk.morphline.api.MorphlineRuntimeException: Cannot execute script at __randomizedtesting.SeedInfo.seed([68C26E082DEE8164:E8A812086E5B9724]:0) at org.kitesdk.morphline.stdlib.JavaBuilder$Java.doProcess(JavaBuilder.java:98) at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156) at org.kitesdk.morphline.base.Connector.process(Connector.java:64) at org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181) at org.kitesdk.morphline.stdlib.SeparateAttachmentsBuilder$SeparateAttachments.doProcess(SeparateAttachmentsBuilder.java:79) at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156) at org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181) at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156) at org.apache.solr.morphlines.solr.AbstractSolrMorphlineTestBase.load(AbstractSolrMorphlineTestBase.java:197) at org.apache.solr.morphlines.solr.AbstractSolrMorphlineTestBase.testDocumentTypesInternal(AbstractSolrMorphlineTestBase.java:168) at org.apache.solr.morphlines.cell.SolrCellMorphlineTest.testSolrCellJPGCompressed(SolrCellMorphlineTest.java:153) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at
[jira] [Created] (SOLR-6489) morphlines-cell tests fail after upgrade to TIKA 1.6
Uwe Schindler created SOLR-6489: --- Summary: morphlines-cell tests fail after upgrade to TIKA 1.6 Key: SOLR-6489 URL: https://issues.apache.org/jira/browse/SOLR-6489 Project: Solr Issue Type: Bug Components: Tests Affects Versions: 4.11 Reporter: Uwe Schindler Fix For: 5.0, 4.11 After upgrade tp Apache TIKA 1.6 (SOLR-6488), solr-morphlines-cell tests fail with scripting error messages. Due to missing understanding, caused by the crazy configuration file format and inability to figure out the test setup, I have to give up and hope that somebody else can take care. In addition, on my own machines, all of Hadoop does not work at all, so I cannot debug (Windows). The whole Morphlines setup is not really good, because Solr core depends on another TIKA version than the included morphlines libraries. This is not a good situation for Solr, because we should be able to upgrade to any version of our core components and not depend on external libraries that itsself depend on Solr in older versions! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6488) Upgrade to TIKA 1.6
[ https://issues.apache.org/jira/browse/SOLR-6488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125262#comment-14125262 ] ASF subversion and git services commented on SOLR-6488: --- Commit 1623308 from [~thetaphi] in branch 'dev/trunk' [ https://svn.apache.org/r1623308 ] SOLR-6489: Disable Morphlines-Cell tests, because Update to Tika 1.6 (SOLR-6488) broke them Upgrade to TIKA 1.6 --- Key: SOLR-6488 URL: https://issues.apache.org/jira/browse/SOLR-6488 Project: Solr Issue Type: Improvement Components: contrib - Solr Cell (Tika extraction) Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 5.0, 4.11 Attachments: SOLR-6488.patch, SOLR-6488.patch, SOLR-6488.patch Apache TIKA 1.6 came out yesterday, we should upgrade it. The dependencies of bundled Apache POI changed (xmlbeans upgraded, already done. dom4j is obsolete). We have to carefully verify the dependency tree!!! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6489) morphlines-cell tests fail after upgrade to TIKA 1.6
[ https://issues.apache.org/jira/browse/SOLR-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125261#comment-14125261 ] ASF subversion and git services commented on SOLR-6489: --- Commit 1623308 from [~thetaphi] in branch 'dev/trunk' [ https://svn.apache.org/r1623308 ] SOLR-6489: Disable Morphlines-Cell tests, because Update to Tika 1.6 (SOLR-6488) broke them morphlines-cell tests fail after upgrade to TIKA 1.6 Key: SOLR-6489 URL: https://issues.apache.org/jira/browse/SOLR-6489 Project: Solr Issue Type: Bug Components: Tests Affects Versions: 4.11 Reporter: Uwe Schindler Fix For: 5.0, 4.11 After upgrade tp Apache TIKA 1.6 (SOLR-6488), solr-morphlines-cell tests fail with scripting error messages. Due to missing understanding, caused by the crazy configuration file format and inability to figure out the test setup, I have to give up and hope that somebody else can take care. In addition, on my own machines, all of Hadoop does not work at all, so I cannot debug (Windows). The whole Morphlines setup is not really good, because Solr core depends on another TIKA version than the included morphlines libraries. This is not a good situation for Solr, because we should be able to upgrade to any version of our core components and not depend on external libraries that itsself depend on Solr in older versions! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6489) morphlines-cell tests fail after upgrade to TIKA 1.6
[ https://issues.apache.org/jira/browse/SOLR-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125263#comment-14125263 ] ASF subversion and git services commented on SOLR-6489: --- Commit 1623309 from [~thetaphi] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1623309 ] Merged revision(s) 1623308 from lucene/dev/trunk: SOLR-6489: Disable Morphlines-Cell tests, because Update to Tika 1.6 (SOLR-6488) broke them morphlines-cell tests fail after upgrade to TIKA 1.6 Key: SOLR-6489 URL: https://issues.apache.org/jira/browse/SOLR-6489 Project: Solr Issue Type: Bug Components: Tests Affects Versions: 4.11 Reporter: Uwe Schindler Fix For: 5.0, 4.11 After upgrade tp Apache TIKA 1.6 (SOLR-6488), solr-morphlines-cell tests fail with scripting error messages. Due to missing understanding, caused by the crazy configuration file format and inability to figure out the test setup, I have to give up and hope that somebody else can take care. In addition, on my own machines, all of Hadoop does not work at all, so I cannot debug (Windows). The whole Morphlines setup is not really good, because Solr core depends on another TIKA version than the included morphlines libraries. This is not a good situation for Solr, because we should be able to upgrade to any version of our core components and not depend on external libraries that itsself depend on Solr in older versions! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6488) Upgrade to TIKA 1.6
[ https://issues.apache.org/jira/browse/SOLR-6488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125264#comment-14125264 ] ASF subversion and git services commented on SOLR-6488: --- Commit 1623309 from [~thetaphi] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1623309 ] Merged revision(s) 1623308 from lucene/dev/trunk: SOLR-6489: Disable Morphlines-Cell tests, because Update to Tika 1.6 (SOLR-6488) broke them Upgrade to TIKA 1.6 --- Key: SOLR-6488 URL: https://issues.apache.org/jira/browse/SOLR-6488 Project: Solr Issue Type: Improvement Components: contrib - Solr Cell (Tika extraction) Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 5.0, 4.11 Attachments: SOLR-6488.patch, SOLR-6488.patch, SOLR-6488.patch Apache TIKA 1.6 came out yesterday, we should upgrade it. The dependencies of bundled Apache POI changed (xmlbeans upgraded, already done. dom4j is obsolete). We have to carefully verify the dependency tree!!! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: [JENKINS] Lucene-Solr-4.x-Linux (32bit/jdk1.7.0_67) - Build # 11066 - Failure!
I opened https://issues.apache.org/jira/browse/SOLR-6489 and disabled the Morphlines-Cell test with @AwaitsFix. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Monday, September 08, 2014 8:30 AM To: dev@lucene.apache.org Subject: RE: [JENKINS] Lucene-Solr-4.x-Linux (32bit/jdk1.7.0_67) - Build # 11066 - Failure! Hi, those 2 tests fail now after TIKA upgrade: [junit4] - org.apache.solr.morphlines.cell.SolrCellMorphlineTest.testSolrCellJPGCompr essed [junit4] - org.apache.solr.morphlines.cell.SolrCellMorphlineTest.testSolrCellDocument Types I have no idea, sorry. It *could* be related to the new metadata items (X- Parsed-By) that is now returned by TIKA. It does not fail on my computer, but this is because it only works on Linux and Java 8 is disabled, too. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Policeman Jenkins Server [mailto:jenk...@thetaphi.de] Sent: Monday, September 08, 2014 1:45 AM To: u...@thetaphi.de; dev@lucene.apache.org Subject: [JENKINS] Lucene-Solr-4.x-Linux (32bit/jdk1.7.0_67) - Build # 11066 - Failure! Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/11066/ Java: 32bit/jdk1.7.0_67 -client -XX:+UseConcMarkSweepGC 2 tests failed. FAILED: org.apache.solr.morphlines.cell.SolrCellMorphlineTest.testSolrCellJPGCompr essed Error Message: Cannot execute script Stack Trace: org.kitesdk.morphline.api.MorphlineRuntimeException: Cannot execute script at __randomizedtesting.SeedInfo.seed([8A821065A1A75AD1:AE86C65E2124C91 ]:0) at org.kitesdk.morphline.stdlib.JavaBuilder$Java.doProcess(JavaBuilder.java:98 ) at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.j ava:156) at org.kitesdk.morphline.base.Connector.process(Connector.java:64) at org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractComman d.java:181) at org.kitesdk.morphline.stdlib.SeparateAttachmentsBuilder$SeparateAttachm ents.doProcess(SeparateAttachmentsBuilder.java:79) at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.j ava:156) at org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractComman d.java:181) at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.j ava:156) at org.apache.solr.morphlines.solr.AbstractSolrMorphlineTestBase.load(Abstra ctSolrMorphlineTestBase.java:197) at org.apache.solr.morphlines.solr.AbstractSolrMorphlineTestBase.testDocume ntTypesInternal(AbstractSolrMorphlineTestBase.java:168) at org.apache.solr.morphlines.cell.SolrCellMorphlineTest.testSolrCellJPGCompr essed(SolrCellMorphlineTest.java:153) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j ava:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces sorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(Randomize dRunner.java:1618) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(Rando mizedRunner.java:827) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(Rando mizedRunner.java:863) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(Rando mizedRunner.java:877) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1. evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRule SetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCa cheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeA fterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1 .evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleTh readAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRule IgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure .java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(Stat ementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner. run(ThreadLeakControl.java:365) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask (ThreadLeakControl.java:798) at
[jira] [Closed] (SOLR-6054) Log progress of transaction log replays
[ https://issues.apache.org/jira/browse/SOLR-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anshum Gupta closed SOLR-6054. -- Resolution: Duplicate Log progress of transaction log replays --- Key: SOLR-6054 URL: https://issues.apache.org/jira/browse/SOLR-6054 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Shalin Shekhar Mangar Priority: Minor Fix For: 5.0, 4.9 Attachments: SOLR-6054.patch There is zero logging of how a transaction log replay is progressing. We should add some simple checkpoint based progress information. Logging the size of the log file at the beginning would also be useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6379) CloudSolrServer can query the wrong replica if a collection has a SolrCore name that matches a collection name.
[ https://issues.apache.org/jira/browse/SOLR-6379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anshum Gupta updated SOLR-6379: --- Attachment: SOLR-6379.patch A small optimization to make sure that the querystring is parsed in the DispatchFilter only once. I am still not sure if we have a consensus on this one though. I don't have a strong opinion but would want to have this resolution issue fixed and biased towards collection name resolution in SolrCloud mode while also have a mechanism to resolve a core if need be. We should either resolve this or at least in BOLD, specify that a user shouldn't be reusing a core-name to create a collection (or vice-versa) when the core doesn't belong the the very collection. CloudSolrServer can query the wrong replica if a collection has a SolrCore name that matches a collection name. --- Key: SOLR-6379 URL: https://issues.apache.org/jira/browse/SOLR-6379 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Hoss Man Assignee: Anshum Gupta Priority: Minor Fix For: 5.0, 4.10 Attachments: SOLR-6379.patch, SOLR-6379.patch, SOLR-6379.patch, SOLR-6379.patch, SOLR-6379.patch, SOLR-6379.patch, SOLR-6379.patch, SOLR-6379.patch, SOLR-6379.pristine_collection.test.patch spin off of SOLR-2894 where sarowe miller were getting failures from TestCloudPivot that seemed unrelated to any of hte distrib pivot logic itself. in particular: adding a call to waitForThingsToLevelOut at the start of the test, even before indexing any docs, seemed to work around the problem -- but even if all replicas aren't yet up when the test starts, we should either get a failure when adding docs (ie: no replica hosting the target shard) or queries should only be routed to the replicas that are up and fully caught up with the rest of the collection. (NOTE: we're specifically talking about a situation where the set of docs in the collection is static during the query request) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6154) SolrCloud: facet range option f.field.facet.mincount=1 omits buckets on response
[ https://issues.apache.org/jira/browse/SOLR-6154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125347#comment-14125347 ] Ronald Matamoros commented on SOLR-6154: Thanks Eric, I am sorry that I have not being able to contribute further on the ticket. Let me know if you want me to test anything on my side. SolrCloud: facet range option f.field.facet.mincount=1 omits buckets on response -- Key: SOLR-6154 URL: https://issues.apache.org/jira/browse/SOLR-6154 Project: Solr Issue Type: Bug Affects Versions: 4.5.1, 4.8.1 Environment: Solr 4.5.1 under Linux - explicit id routing Indexed 400,000+ Documents explicit routing custom schema.xml Solr 4.8.1 under Windows+Cygwin Indexed 6 Documents implicit id routing out of the box schema Reporter: Ronald Matamoros Assignee: Erick Erickson Attachments: HowToReplicate.pdf, data.xml Attached - PDF with instructions on how to replicate. - data.xml to replicate index The f.field.facet.mincount option on a distributed search gives inconsistent list of buckets on a range facet. Experiencing that some buckets are ignored when using the option f.field.facet.mincount=1. The Solr logs do not indicate any error or warning during execution. The debug=true option and increasing the log levels to the FacetComponent do not provide any hints to the behaviour. Replicated the issue on both Solr 4.5.1 4.8.1. Example, Removing the f.field.facet.mincount=1 option gives the expected list of buckets for the 6 documents matched. lst name=facet_ranges lst name=price lst name=counts int name=0.00/int int name=50.01/int int name=100.00/int int name=150.03/int int name=200.00/int int name=250.01/int int name=300.00/int int name=350.00/int int name=400.00/int int name=450.00/int int name=500.00/int int name=550.00/int int name=600.00/int int name=650.00/int int name=700.00/int int name=750.01/int int name=800.00/int int name=850.00/int int name=900.00/int int name=950.00/int /lst float name=gap50.0/float float name=start0.0/float float name=end1000.0/float int name=before0/int int name=after0/int int name=between2/int /lst /lst Using the f.field.facet.mincount=1 option removes the 0 count buckets but will also omit bucket int name=250.0 lst name=facet_ranges lst name=price lst name=counts int name=50.01/int int name=150.03/int int name=750.01/int /lst float name=gap50.0/float float name=start0.0/float float name=end1000.0/float int name=before0/int int name=after0/int int name=between4/int /lst /lst Resubmitting the query renders a different bucket list (May need to resubmit a couple times) lst name=facet_ranges lst name=price lst name=counts int name=150.03/int int name=250.01/int /lst float name=gap50.0/float float name=start0.0/float float name=end1000.0/float int name=before0/int int name=after0/int int name=between2/int /lst /lst -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5922) DocValuesDocIdSet is not cacheable
[ https://issues.apache.org/jira/browse/LUCENE-5922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125365#comment-14125365 ] ASF subversion and git services commented on LUCENE-5922: - Commit 1623345 from [~jpountz] in branch 'dev/trunk' [ https://svn.apache.org/r1623345 ] LUCENE-5922: DocValuesDocIdSet is not cacheable. DocValuesDocIdSet is not cacheable -- Key: LUCENE-5922 URL: https://issues.apache.org/jira/browse/LUCENE-5922 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.10 Reporter: Adrien Grand Fix For: 4.11 Attachments: LUCENE-5922.patch This DocIdSet claims it is cacheable although bad things could happen if it was cached since it is not thread-safe and keeps handles to open files. The fix is simple, especially given that this doc id set is cheap to create. But I'm wondering if there is a way we could protect ourselves from such bugs in the future. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5922) DocValuesDocIdSet is not cacheable
[ https://issues.apache.org/jira/browse/LUCENE-5922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125373#comment-14125373 ] ASF subversion and git services commented on LUCENE-5922: - Commit 1623349 from [~jpountz] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1623349 ] LUCENE-5922: FieldCacheDocIdSet is not cacheable. DocValuesDocIdSet is not cacheable -- Key: LUCENE-5922 URL: https://issues.apache.org/jira/browse/LUCENE-5922 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.10 Reporter: Adrien Grand Fix For: 4.11 Attachments: LUCENE-5922.patch This DocIdSet claims it is cacheable although bad things could happen if it was cached since it is not thread-safe and keeps handles to open files. The fix is simple, especially given that this doc id set is cheap to create. But I'm wondering if there is a way we could protect ourselves from such bugs in the future. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS-MAVEN] Lucene-Solr-Maven-4.x #695: POMs out of sync
Build: https://builds.apache.org/job/Lucene-Solr-Maven-4.x/695/ 2 tests failed. REGRESSION: org.apache.solr.morphlines.cell.SolrCellMorphlineTest.testSolrCellDocumentTypes Error Message: unexpected results in /usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-Maven-4.x/maven-build/solr/contrib/morphlines-cell/target/test-classes/test-documents/cars.csv.gz expected:21 but was:16 Stack Trace: java.lang.AssertionError: unexpected results in /usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-Maven-4.x/maven-build/solr/contrib/morphlines-cell/target/test-classes/test-documents/cars.csv.gz expected:21 but was:16 at __randomizedtesting.SeedInfo.seed([1569E599DD021BC7:8FAA4607B8D94512]:0) at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.solr.morphlines.solr.AbstractSolrMorphlineTestBase.testDocumentTypesInternal(AbstractSolrMorphlineTestBase.java:175) at org.apache.solr.morphlines.cell.SolrCellMorphlineTest.testSolrCellDocumentTypes(SolrCellMorphlineTest.java:193) REGRESSION: org.apache.solr.morphlines.cell.SolrCellMorphlineTest.testSolrCellJPGCompressed Error Message: Cannot execute script Stack Trace: org.kitesdk.morphline.api.MorphlineRuntimeException: Cannot execute script at org.kitesdk.morphline.stdlib.TryRulesBuilder$TryRules.doProcess(TryRulesBuilder.java:121) at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156) at org.kitesdk.morphline.base.Connector.process(Connector.java:64) at org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181) at org.kitesdk.morphline.tika.DetectMimeTypeBuilder$DetectMimeType.doProcess(DetectMimeTypeBuilder.java:166) at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156) at org.kitesdk.morphline.base.Connector.process(Connector.java:64) at org.kitesdk.morphline.scriptengine.java.scripts.MyJavaClass4.eval(MyJavaClass4.java:15) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.kitesdk.morphline.scriptengine.java.FastJavaScriptEngine$JavaCompiledScript.eval(FastJavaScriptEngine.java:82) at org.kitesdk.morphline.scriptengine.java.ScriptEvaluator.evaluate(ScriptEvaluator.java:117) at org.kitesdk.morphline.stdlib.JavaBuilder$Java.doProcess(JavaBuilder.java:96) at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156) at org.kitesdk.morphline.base.Connector.process(Connector.java:64) at org.kitesdk.morphline.scriptengine.java.scripts.MyJavaClass3.eval(MyJavaClass3.java:7) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.kitesdk.morphline.scriptengine.java.FastJavaScriptEngine$JavaCompiledScript.eval(FastJavaScriptEngine.java:82) at org.kitesdk.morphline.scriptengine.java.ScriptEvaluator.evaluate(ScriptEvaluator.java:117) at org.kitesdk.morphline.stdlib.JavaBuilder$Java.doProcess(JavaBuilder.java:96) at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156) at org.kitesdk.morphline.base.Connector.process(Connector.java:64) at org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181) at org.kitesdk.morphline.stdlib.SeparateAttachmentsBuilder$SeparateAttachments.doProcess(SeparateAttachmentsBuilder.java:79) at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156) at org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181) at org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156) at org.apache.solr.morphlines.solr.AbstractSolrMorphlineTestBase.load(AbstractSolrMorphlineTestBase.java:197) at org.apache.solr.morphlines.solr.AbstractSolrMorphlineTestBase.testDocumentTypesInternal(AbstractSolrMorphlineTestBase.java:168) at org.apache.solr.morphlines.cell.SolrCellMorphlineTest.testSolrCellJPGCompressed(SolrCellMorphlineTest.java:153) Build Log: [...truncated 51140 lines...] BUILD FAILED /usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-Maven-4.x/build.xml:514: The following error occurred while executing this line:
[jira] [Commented] (SOLR-6441) MoreLikeThis support for stopwords as in Lucene
[ https://issues.apache.org/jira/browse/SOLR-6441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125431#comment-14125431 ] Jeroen Steggink commented on SOLR-6441: --- Hi Ramana, That's exactly what I was thinking of. Thanks! Cheers, Jeroen MoreLikeThis support for stopwords as in Lucene --- Key: SOLR-6441 URL: https://issues.apache.org/jira/browse/SOLR-6441 Project: Solr Issue Type: Improvement Components: MoreLikeThis Affects Versions: 4.9 Reporter: Jeroen Steggink Priority: Minor Labels: difficulty-easy, impact-low, workaround-exists Fix For: 4.10, 4.11 In the Lucene implementation of MoreLikeThis, it's possible to add a list of stopwords which are considered uninteresting and are ignored. It would be a great addition to the MoreLikeThisHandler to be able to specify a list of stopwords. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5922) DocValuesDocIdSet is not cacheable
[ https://issues.apache.org/jira/browse/LUCENE-5922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand resolved LUCENE-5922. -- Resolution: Fixed Fix Version/s: 5.0 Thanks Ryan! DocValuesDocIdSet is not cacheable -- Key: LUCENE-5922 URL: https://issues.apache.org/jira/browse/LUCENE-5922 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.10 Reporter: Adrien Grand Fix For: 5.0, 4.11 Attachments: LUCENE-5922.patch This DocIdSet claims it is cacheable although bad things could happen if it was cached since it is not thread-safe and keeps handles to open files. The fix is simple, especially given that this doc id set is cheap to create. But I'm wondering if there is a way we could protect ourselves from such bugs in the future. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Reopened] (SOLR-6054) Log progress of transaction log replays
[ https://issues.apache.org/jira/browse/SOLR-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller reopened SOLR-6054: --- Assignee: Mark Miller Let's keep this open until my last comment is addressed. Log progress of transaction log replays --- Key: SOLR-6054 URL: https://issues.apache.org/jira/browse/SOLR-6054 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Shalin Shekhar Mangar Assignee: Mark Miller Priority: Minor Fix For: 4.9, 5.0 Attachments: SOLR-6054.patch There is zero logging of how a transaction log replay is progressing. We should add some simple checkpoint based progress information. Logging the size of the log file at the beginning would also be useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-NightlyTests-4.x - Build # 614 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-4.x/614/ 1 tests failed. FAILED: org.apache.solr.cloud.CollectionsAPIDistributedZkTest.testDistribSearch Error Message: createcollection the collection time out:180s Stack Trace: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: createcollection the collection time out:180s at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:550) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206) at org.apache.solr.cloud.CollectionsAPIDistributedZkTest.testErrorHandling(CollectionsAPIDistributedZkTest.java:614) at org.apache.solr.cloud.CollectionsAPIDistributedZkTest.doTest(CollectionsAPIDistributedZkTest.java:205) at org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:871) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at
[jira] [Updated] (SOLR-5480) Make MoreLikeThisHandler distributable
[ https://issues.apache.org/jira/browse/SOLR-5480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Molloy updated SOLR-5480: --- Attachment: SOLR-5480.patch Patch adapted for 4.10, unit tests pass. Make MoreLikeThisHandler distributable -- Key: SOLR-5480 URL: https://issues.apache.org/jira/browse/SOLR-5480 Project: Solr Issue Type: Improvement Reporter: Steve Molloy Assignee: Noble Paul Attachments: SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch The MoreLikeThis component, when used in the standard search handler supports distributed searches. But the MoreLikeThisHandler itself doesn't, which prevents from say, passing in text to perform the query. I'll start looking into adapting the SearchHandler logic to the MoreLikeThisHandler. If anyone has some work done already and want to share, or want to contribute, any help will be welcomed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5480) Make MoreLikeThisHandler distributable
[ https://issues.apache.org/jira/browse/SOLR-5480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125647#comment-14125647 ] Steve Molloy commented on SOLR-5480: [~claudenm] I just attached a version of the patch adapted for 4.10 release. Tests are passing in my environment (Ubuntu 14.04, Oracle JDK 1.7.0_67), I will perform some more integrationt tests in our setup and will let you know if I see any issue. What were the failures you were seeing? Do you have logs/stack traces? Make MoreLikeThisHandler distributable -- Key: SOLR-5480 URL: https://issues.apache.org/jira/browse/SOLR-5480 Project: Solr Issue Type: Improvement Reporter: Steve Molloy Assignee: Noble Paul Attachments: SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch The MoreLikeThis component, when used in the standard search handler supports distributed searches. But the MoreLikeThisHandler itself doesn't, which prevents from say, passing in text to perform the query. I'll start looking into adapting the SearchHandler logic to the MoreLikeThisHandler. If anyone has some work done already and want to share, or want to contribute, any help will be welcomed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5961) Solr gets crazy on /overseer/queue state change
[ https://issues.apache.org/jira/browse/SOLR-5961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125675#comment-14125675 ] Frans Lawaetz commented on SOLR-5961: - Also saw this issue as well. Solr filled 2T worth of disk, recording hundreds of those messages per ms. In our case it may have been related to zkeeper /overseer/queue becoming overloaded with znodes such that it was unable to function normally. Solr gets crazy on /overseer/queue state change --- Key: SOLR-5961 URL: https://issues.apache.org/jira/browse/SOLR-5961 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.7.1 Environment: CentOS, 1 shard - 3 replicas, ZK cluster with 3 nodes (separate machines) Reporter: Maxim Novikov Priority: Critical No idea how to reproduce it, but sometimes Solr stars littering the log with the following messages: 419158 [localhost-startStop-1-EventThread] INFO org.apache.solr.cloud.DistributedQueue ? LatchChildWatcher fired on path: /overseer/queue state: SyncConnected type NodeChildrenChanged 419190 [Thread-3] INFO org.apache.solr.cloud.Overseer ? Update state numShards=1 message={ operation:state, state:recovering, base_url:http://${IP_ADDRESS}/solr;, core:${CORE_NAME}, roles:null, node_name:${NODE_NAME}_solr, shard:shard1, collection:${COLLECTION_NAME}, numShards:1, core_node_name:core_node2} It continues spamming these messages with no delay and the restarting of all the nodes does not help. I have even tried to stop all the nodes in the cluster first, but then when I start one, the behavior doesn't change, it gets crazy nuts with this /overseer/queue state again. PS The only way to handle this was to stop everything, manually clean up all the data in ZooKeeper related to Solr, and then rebuild everything from scratch. As you should understand, it is kinda unbearable in the production environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5925) Use rename instead of segments_N fallback / segments.gen etc
[ https://issues.apache.org/jira/browse/LUCENE-5925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-5925: Attachment: LUCENE-5925.patch Updated patch. This also solves LUCENE-2585. Today, segments.gen helps with the fact that directory listing is not point in time (its a weakly consistent iterator and can reflect changes that happen during iteration). But it cannot be totally relied upon due to timing (you can get unlucky like LUCENE-2585). Instead, in FindSegmentsFile, we simply detect that contents have changed during the execution of listAll, by executing it again and doing a comparison. This way we can detect ConcurrentModificationException and just continue the loop. Use rename instead of segments_N fallback / segments.gen etc Key: LUCENE-5925 URL: https://issues.apache.org/jira/browse/LUCENE-5925 Project: Lucene - Core Issue Type: Improvement Components: core/index, core/store Reporter: Robert Muir Labels: Java7 Attachments: LUCENE-5925.patch, LUCENE-5925.patch Our commit logic is strange, we write corrupted commit points and only on the last phase of commit do we correct them. This means the logic to get the latest commit is always scary and awkward, since it must deal with partial commits, and try to determine if it should fall back to segments_N-1 or actually relay an exception. This logic is incomplete/sheisty as we, e.g. i think we only fall back so far (at most one). If we somehow screw up in all this logic do the wrong thing, then we lose data (e.g. LUCENE-4870 wiped entire index because of TooManyOpenFiles). We now require java 7, i think we should expore instead writing {{pending_segments_N}} and then in finishCommit() doing an atomic rename to {{segments_N}}. We could then remove all the complex fallback logic completely, since we no longer have to deal with ignoring partial commits, instead simply delivering any exception we get when trying to read the commit and sleep better at night. In java 7, we have the apis for this (ATOMIC_MOVE). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6489) morphlines-cell tests fail after upgrade to TIKA 1.6
[ https://issues.apache.org/jira/browse/SOLR-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125687#comment-14125687 ] Mark Miller commented on SOLR-6489: --- [~thetaphi], you really should wait longer than a weekend to disable tests and say you don't understand what the problem is because you want to slam in a library upgrade real quick. It's a distributed community, give time for collaboration before you disable tests. I doubt it was critical to jam in Tika 1.6 over the weekend. I'll look into the issue. morphlines-cell tests fail after upgrade to TIKA 1.6 Key: SOLR-6489 URL: https://issues.apache.org/jira/browse/SOLR-6489 Project: Solr Issue Type: Bug Components: Tests Affects Versions: 4.11 Reporter: Uwe Schindler Fix For: 5.0, 4.11 After upgrade tp Apache TIKA 1.6 (SOLR-6488), solr-morphlines-cell tests fail with scripting error messages. Due to missing understanding, caused by the crazy configuration file format and inability to figure out the test setup, I have to give up and hope that somebody else can take care. In addition, on my own machines, all of Hadoop does not work at all, so I cannot debug (Windows). The whole Morphlines setup is not really good, because Solr core depends on another TIKA version than the included morphlines libraries. This is not a good situation for Solr, because we should be able to upgrade to any version of our core components and not depend on external libraries that itsself depend on Solr in older versions! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-4.x-Linux (32bit/jdk1.9.0-ea-b28) - Build # 11071 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/11071/ Java: 32bit/jdk1.9.0-ea-b28 -client -XX:+UseG1GC 3 tests failed. FAILED: junit.framework.TestSuite.org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServerTest Error Message: 5 threads leaked from SUITE scope at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServerTest: 1) Thread[id=30, name=testCUSS-3-thread-5, state=TIMED_WAITING, group=TGRP-ConcurrentUpdateSolrServerTest] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078) at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:385) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer.request(ConcurrentUpdateSolrServer.java:354) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServerTest$SendDocsRunnable.run(ConcurrentUpdateSolrServerTest.java:220) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)2) Thread[id=28, name=testCUSS-3-thread-3, state=TIMED_WAITING, group=TGRP-ConcurrentUpdateSolrServerTest] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078) at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:385) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer.request(ConcurrentUpdateSolrServer.java:354) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServerTest$SendDocsRunnable.run(ConcurrentUpdateSolrServerTest.java:220) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)3) Thread[id=26, name=testCUSS-3-thread-1, state=TIMED_WAITING, group=TGRP-ConcurrentUpdateSolrServerTest] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078) at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:385) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer.request(ConcurrentUpdateSolrServer.java:354) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServerTest$SendDocsRunnable.run(ConcurrentUpdateSolrServerTest.java:220) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)4) Thread[id=27, name=testCUSS-3-thread-2, state=TIMED_WAITING, group=TGRP-ConcurrentUpdateSolrServerTest] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078) at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:385) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer.request(ConcurrentUpdateSolrServer.java:354) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServerTest$SendDocsRunnable.run(ConcurrentUpdateSolrServerTest.java:220) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)5) Thread[id=29, name=testCUSS-3-thread-4, state=TIMED_WAITING, group=TGRP-ConcurrentUpdateSolrServerTest] at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078) at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:385) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer.request(ConcurrentUpdateSolrServer.java:354) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServerTest$SendDocsRunnable.run(ConcurrentUpdateSolrServerTest.java:220) at
[jira] [Updated] (SOLR-6386) make secondary ordering of facet.field values (and facet.pivot?) consistently deterministic
[ https://issues.apache.org/jira/browse/SOLR-6386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-6386: - Assignee: (was: Erick Erickson) make secondary ordering of facet.field values (and facet.pivot?) consistently deterministic --- Key: SOLR-6386 URL: https://issues.apache.org/jira/browse/SOLR-6386 Project: Solr Issue Type: Improvement Reporter: Hoss Man as a fluke of how the SOLR-2894 patch evolved, it wound up adding a bit of testing of distributed facet.field on date fields (see [r1617789 changes to TestDistributedSearch|https://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/test/org/apache/solr/TestDistributedSearch.java?r1=1617789r2=1617788pathrev=1617789]) ... but this started triggering some random failures due to facet constraints with identical values being sorted differently between the distributed query and the single node control query. We should make the facet.field (and facet.pivot) code order constraints with tied counts consistently regardless of whether it's a distrib search or not. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6386) make secondary ordering of facet.field values (and facet.pivot?) consistently deterministic
[ https://issues.apache.org/jira/browse/SOLR-6386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125694#comment-14125694 ] Erick Erickson commented on SOLR-6386: -- [~hossman_luc...@fucit.org] Some things I found out this weekend: [~markrmil...@gmail.com] Pinging you on this because I half suspect that there's something weird with the test infrastructure. Frankly I'm at a loss, but here's the outstanding things I saw. I'm pretty sure my question of whether this would just get taken care of by the stuff I'm doing for SOLR-6187 is no, so I'm assigning it back to nobody. Adding the facet.limit=1 in the test makes the problem disappear just b/c all the bogus 0 counts that get returned are removed. If I optimize the clients and control server in BaseDistributedSearchTestCase.commit, then this test case does NOT fail. But I must optimize both. If I just optimize the control, it fails. If I just optimize the clients it fails. This really weirds me out. I suspected pilot error here frankly, so I just tried it again and I'm pretty sure I'm not hallucinating. I'd expect optimizing the distributed case would fix this up but no. So I wonder if there's something weird here with RAMDirectory which underpins the servers Although just for yucks I tried using a disk-based directory and it still seemed to fail although I won't swear that I got it right. I set up IntelliJ with the seeds etc. you provided and it's not until the third pass that it fails. But it fails every time on the third pass. Ditto with running the test from the command shell. in DocValuesFacet.getCount, around line 200 or so I'm printing out the values added. This is near the bottom of the clause: if (sort.equals(FacetParams.FACET_SORT_COUNT) || sort.equals(FacetParams.FACET_SORT_COUNT_LEGACY)) { ... near the end } else... On the pass that fails, I get these values: [junit4] 1 QUERY DUMP 1 Adding string/count 2010-05-03T11:00:00Z 1 [junit4] 1 QUERY DUMP 1 Adding string/count 2010-05-03T11:00:00Z 0 [junit4] 1 QUERY DUMP 1 Adding string/count 2010-04-20T11:00:00Z 1 [junit4] 1 QUERY DUMP 1 Adding string/count 2010-05-03T10:59:56.032Z 0 [junit4] 1 QUERY DUMP 1 Adding string/count 2010-05-03T11:00:00Z 1 [junit4] 1 QUERY DUMP 1 Adding string/count 2010-05-03T10:57:12.192Z 0 [junit4] 1 QUERY DUMP 1 Adding string/count 2010-05-02T11:00:00Z 1 [junit4] 1 QUERY DUMP 1 Adding string/count 2010-05-03T07:10:00.704Z 0 [junit4] 1 QUERY DUMP 1 Adding string/count 2010-05-05T11:00:00Z 1 [junit4] 1 QUERY DUMP 1 Adding string/count 2010-04-27T16:01:01.44Z 0 [junit4] 1 QUERY DUMP 1 Adding string/count 2009-03-13T13:23:01.248Z 0 [junit4] 1 QUERY DUMP 1 Adding string/count 1970-01-01T00:00:00Z 0 [junit4] 1 QUERY DUMP 1 Adding string/count 1970-01-01T00:00:00Z 0 [junit4] 1 QUERY DUMP 1 Adding string/count 1970-01-01T00:00:00Z 0 [junit4] 1 QUERY DUMP 1 Adding string/count 1970-01-01T00:00:00Z 0 Notice the Jan-1, 1970. dates. Sure seems like a zero snuck in there somewhere. If you sum up the non-zero counts, you wind up with the right facet counts. On the pass that's optimized, I get this on the third pass which is consistent with what the control server gives back, thus it passes.: [junit4] 1 QUERY DUMP 1 Adding string/count 2010-04-20T11:00:00Z 1 [junit4] 1 QUERY DUMP 1 Adding string/count 2010-05-03T11:00:00Z 1 [junit4] 1 QUERY DUMP 1 Adding string/count 2010-05-03T11:00:00Z 1 [junit4] 1 QUERY DUMP 1 Adding string/count 2010-05-02T11:00:00Z 1 [junit4] 1 QUERY DUMP 1 Adding string/count 2010-05-05T11:00:00Z 1 Anyway, this is beyond what I want to deal with just now. Let me know if there's anything else I can provide. make secondary ordering of facet.field values (and facet.pivot?) consistently deterministic --- Key: SOLR-6386 URL: https://issues.apache.org/jira/browse/SOLR-6386 Project: Solr Issue Type: Improvement Reporter: Hoss Man Assignee: Erick Erickson as a fluke of how the SOLR-2894 patch evolved, it wound up adding a bit of testing of distributed facet.field on date fields (see [r1617789 changes to TestDistributedSearch|https://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/test/org/apache/solr/TestDistributedSearch.java?r1=1617789r2=1617788pathrev=1617789]) ... but this started triggering some random failures due to facet constraints with identical values being sorted differently between the distributed query and the single node control query. We should make the facet.field (and facet.pivot) code order constraints with tied counts consistently regardless of whether it's a distrib search or not. -- This message was sent by Atlassian JIRA
[jira] [Assigned] (SOLR-6489) morphlines-cell tests fail after upgrade to TIKA 1.6
[ https://issues.apache.org/jira/browse/SOLR-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller reassigned SOLR-6489: - Assignee: Mark Miller morphlines-cell tests fail after upgrade to TIKA 1.6 Key: SOLR-6489 URL: https://issues.apache.org/jira/browse/SOLR-6489 Project: Solr Issue Type: Bug Components: Tests Affects Versions: 4.11 Reporter: Uwe Schindler Assignee: Mark Miller Fix For: 5.0, 4.11 After upgrade tp Apache TIKA 1.6 (SOLR-6488), solr-morphlines-cell tests fail with scripting error messages. Due to missing understanding, caused by the crazy configuration file format and inability to figure out the test setup, I have to give up and hope that somebody else can take care. In addition, on my own machines, all of Hadoop does not work at all, so I cannot debug (Windows). The whole Morphlines setup is not really good, because Solr core depends on another TIKA version than the included morphlines libraries. This is not a good situation for Solr, because we should be able to upgrade to any version of our core components and not depend on external libraries that itsself depend on Solr in older versions! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6489) morphlines-cell tests fail after upgrade to TIKA 1.6
[ https://issues.apache.org/jira/browse/SOLR-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125710#comment-14125710 ] Mark Miller commented on SOLR-6489: --- Bummer...seems like when someone re-factored the config for these tests, they stopped working from the IDE. morphlines-cell tests fail after upgrade to TIKA 1.6 Key: SOLR-6489 URL: https://issues.apache.org/jira/browse/SOLR-6489 Project: Solr Issue Type: Bug Components: Tests Affects Versions: 4.11 Reporter: Uwe Schindler Assignee: Mark Miller Fix For: 5.0, 4.11 After upgrade tp Apache TIKA 1.6 (SOLR-6488), solr-morphlines-cell tests fail with scripting error messages. Due to missing understanding, caused by the crazy configuration file format and inability to figure out the test setup, I have to give up and hope that somebody else can take care. In addition, on my own machines, all of Hadoop does not work at all, so I cannot debug (Windows). The whole Morphlines setup is not really good, because Solr core depends on another TIKA version than the included morphlines libraries. This is not a good situation for Solr, because we should be able to upgrade to any version of our core components and not depend on external libraries that itsself depend on Solr in older versions! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6489) morphlines-cell tests fail after upgrade to TIKA 1.6
[ https://issues.apache.org/jira/browse/SOLR-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125723#comment-14125723 ] Mark Miller commented on SOLR-6489: --- Okay, test fails from command line show that .gz files do not appear to be getting extracted properly anymore. If I remove those files, the rest of the tests pass. morphlines-cell tests fail after upgrade to TIKA 1.6 Key: SOLR-6489 URL: https://issues.apache.org/jira/browse/SOLR-6489 Project: Solr Issue Type: Bug Components: Tests Affects Versions: 4.11 Reporter: Uwe Schindler Assignee: Mark Miller Fix For: 5.0, 4.11 After upgrade tp Apache TIKA 1.6 (SOLR-6488), solr-morphlines-cell tests fail with scripting error messages. Due to missing understanding, caused by the crazy configuration file format and inability to figure out the test setup, I have to give up and hope that somebody else can take care. In addition, on my own machines, all of Hadoop does not work at all, so I cannot debug (Windows). The whole Morphlines setup is not really good, because Solr core depends on another TIKA version than the included morphlines libraries. This is not a good situation for Solr, because we should be able to upgrade to any version of our core components and not depend on external libraries that itsself depend on Solr in older versions! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6187) facet.mincount ignored in range faceting using distributed search
[ https://issues.apache.org/jira/browse/SOLR-6187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125725#comment-14125725 ] ASF subversion and git services commented on SOLR-6187: --- Commit 1623429 from [~erickoerickson] in branch 'dev/trunk' [ https://svn.apache.org/r1623429 ] SOLR-6187: facet.mincount ignored in range faceting using distributed search facet.mincount ignored in range faceting using distributed search - Key: SOLR-6187 URL: https://issues.apache.org/jira/browse/SOLR-6187 Project: Solr Issue Type: Bug Components: search Affects Versions: 4.8, 4.8.1 Reporter: Zaccheo Bagnati Assignee: Erick Erickson Attachments: SOLR-6187.patch, SOLR-6187.patch, SOLR-6187.patch, SOLR-6187.patch, SOLR-6187.patch While I was trying to do a range faceting with gap +1YEAR using shards, I noticed that facet.mincount parameter seems to be ignored. Issue can be reproduced in this way: Create 2 cores testshard1 and testshard2 with: solrconfig.xml ?xml version=1.0 encoding=UTF-8 ? config luceneMatchVersionLUCENE_41/luceneMatchVersion lib dir=/opt/solr/dist regex=solr-cell-.*\.jar/ directoryFactory name=DirectoryFactory class=${solr.directoryFactory:solr.NRTCachingDirectoryFactory}/ updateHandler class=solr.DirectUpdateHandler2 / requestHandler name=/select class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str int name=rows10/int str name=dfid/str /lst /requestHandler requestHandler name=/update class=solr.UpdateRequestHandler / requestHandler name=/admin/ class=org.apache.solr.handler.admin.AdminHandlers / requestHandler name=/admin/ping class=solr.PingRequestHandler lst name=invariants str name=qsolrpingquery/str /lst lst name=defaults str name=echoParamsall/str /lst /requestHandler /config schema.xml ?xml version=1.0 ? schema name=${solr.core.name} version=1.5 xmlns:xi=http://www.w3.org/2001/XInclude; fieldType name=int class=solr.TrieIntField precisionStep=0 positionIncrementGap=0/ fieldType name=long class=solr.TrieLongField precisionStep=0 positionIncrementGap=0/ fieldType name=date class=solr.TrieDateField precisionStep=0 positionIncrementGap=0/ field name=_version_ type=long indexed=true stored=true/ field name=id type=int indexed=true stored=true multiValued=false / field name=date type=date indexed=true stored=true multiValued=false / uniqueKeyid/uniqueKey defaultSearchFieldid/defaultSearchField /schema Insert in testshard1: add doc field name=id1/field field name=date2014-06-20T12:51:00Z/field /doc /add Insert into testshard2: add doc field name=id2/field field name=date2013-06-20T12:51:00Z/field /doc /add Now if I execute: curl http://localhost:8983/solr/testshard1/select?q=id:1facet=truefacet.mincount=1facet.range=datef.date.facet.range.start=1900-01-01T00:00:00Zf.date.facet.range.end=NOWf.date.facet.range.gap=%2B1YEARshards=localhost%3A8983%2Fsolr%2Ftestshard1%2Clocalhost%3A8983%2Fsolr%2Ftestshard2shards.info=truewt=json; I obtain:
[jira] [Updated] (SOLR-6187) facet.mincount ignored in range faceting using distributed search
[ https://issues.apache.org/jira/browse/SOLR-6187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-6187: - Attachment: SOLR-6187.patch With CHANGES.txt entries facet.mincount ignored in range faceting using distributed search - Key: SOLR-6187 URL: https://issues.apache.org/jira/browse/SOLR-6187 Project: Solr Issue Type: Bug Components: search Affects Versions: 4.8, 4.8.1 Reporter: Zaccheo Bagnati Assignee: Erick Erickson Attachments: SOLR-6187.patch, SOLR-6187.patch, SOLR-6187.patch, SOLR-6187.patch, SOLR-6187.patch, SOLR-6187.patch While I was trying to do a range faceting with gap +1YEAR using shards, I noticed that facet.mincount parameter seems to be ignored. Issue can be reproduced in this way: Create 2 cores testshard1 and testshard2 with: solrconfig.xml ?xml version=1.0 encoding=UTF-8 ? config luceneMatchVersionLUCENE_41/luceneMatchVersion lib dir=/opt/solr/dist regex=solr-cell-.*\.jar/ directoryFactory name=DirectoryFactory class=${solr.directoryFactory:solr.NRTCachingDirectoryFactory}/ updateHandler class=solr.DirectUpdateHandler2 / requestHandler name=/select class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str int name=rows10/int str name=dfid/str /lst /requestHandler requestHandler name=/update class=solr.UpdateRequestHandler / requestHandler name=/admin/ class=org.apache.solr.handler.admin.AdminHandlers / requestHandler name=/admin/ping class=solr.PingRequestHandler lst name=invariants str name=qsolrpingquery/str /lst lst name=defaults str name=echoParamsall/str /lst /requestHandler /config schema.xml ?xml version=1.0 ? schema name=${solr.core.name} version=1.5 xmlns:xi=http://www.w3.org/2001/XInclude; fieldType name=int class=solr.TrieIntField precisionStep=0 positionIncrementGap=0/ fieldType name=long class=solr.TrieLongField precisionStep=0 positionIncrementGap=0/ fieldType name=date class=solr.TrieDateField precisionStep=0 positionIncrementGap=0/ field name=_version_ type=long indexed=true stored=true/ field name=id type=int indexed=true stored=true multiValued=false / field name=date type=date indexed=true stored=true multiValued=false / uniqueKeyid/uniqueKey defaultSearchFieldid/defaultSearchField /schema Insert in testshard1: add doc field name=id1/field field name=date2014-06-20T12:51:00Z/field /doc /add Insert into testshard2: add doc field name=id2/field field name=date2013-06-20T12:51:00Z/field /doc /add Now if I execute: curl http://localhost:8983/solr/testshard1/select?q=id:1facet=truefacet.mincount=1facet.range=datef.date.facet.range.start=1900-01-01T00:00:00Zf.date.facet.range.end=NOWf.date.facet.range.gap=%2B1YEARshards=localhost%3A8983%2Fsolr%2Ftestshard1%2Clocalhost%3A8983%2Fsolr%2Ftestshard2shards.info=truewt=json; I obtain:
[jira] [Created] (SOLR-6490) ValueSourceParser function max does not handle dates.
Aaron McMillin created SOLR-6490: Summary: ValueSourceParser function max does not handle dates. Key: SOLR-6490 URL: https://issues.apache.org/jira/browse/SOLR-6490 Project: Solr Issue Type: Bug Components: query parsers Reporter: Aaron McMillin Priority: Minor As a user when trying to use sort=max(date1_field_tdt, date2_field_tdt) I expect documents to be returned in order Currently this is not the case. Dates are stored as Long, but max uses MaxFloatFunction which casts them to Floats thereby losing precision. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6489) morphlines-cell tests fail after upgrade to TIKA 1.6
[ https://issues.apache.org/jira/browse/SOLR-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125754#comment-14125754 ] Mark Miller commented on SOLR-6489: --- So my initial guess is that kite-morphlines-tika-decompress depending on compress 1.4 and our move from 1.7 to 1.8 is perhaps the problem. Standard java jar hell. morphlines-cell tests fail after upgrade to TIKA 1.6 Key: SOLR-6489 URL: https://issues.apache.org/jira/browse/SOLR-6489 Project: Solr Issue Type: Bug Components: Tests Affects Versions: 4.11 Reporter: Uwe Schindler Assignee: Mark Miller Fix For: 5.0, 4.11 After upgrade tp Apache TIKA 1.6 (SOLR-6488), solr-morphlines-cell tests fail with scripting error messages. Due to missing understanding, caused by the crazy configuration file format and inability to figure out the test setup, I have to give up and hope that somebody else can take care. In addition, on my own machines, all of Hadoop does not work at all, so I cannot debug (Windows). The whole Morphlines setup is not really good, because Solr core depends on another TIKA version than the included morphlines libraries. This is not a good situation for Solr, because we should be able to upgrade to any version of our core components and not depend on external libraries that itsself depend on Solr in older versions! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6154) SolrCloud: facet range option f.field.facet.mincount=1 omits buckets on response
[ https://issues.apache.org/jira/browse/SOLR-6154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125765#comment-14125765 ] Hoss Man commented on SOLR-6154: Erick, sorry for the late reply. I haven't looked in depth at your patch for this issue or SOLR-6187, but in response to your question on the mailing list... bq. The problem here is that it assumes that the first list in has all the counts that ever will be reported from any shard. You are almost certainly correct, it's very probably that the logic for distributed range faceting isn't taking into account the possibility of mincount suppressing buckets from one or more shards. the general strategy for dealing with this in field faceting pivot faceting (which i suspect is what you already doing in your patch) is to have the coordinator node modify the mincount params when it sends the shard requests to force mincount=0, to ensure it gets a response for every bucket from every shard, then filter the response based on the (original) combined mincount. {panel:title=not recommended idea} I say modify because one of the strategies taken with field/pivot faceting when using facet.sort=index is this... {noformat} // we're sorting by index order. // if minCount==0, we should always be able to get accurate results w/o // over-requesting or refining // if minCount==1, we should be able to get accurate results w/o // over-requesting, but we'll need to refine // if minCount==n (1), we can set the initialMincount to // minCount/nShards, rounded up. // ... {noformat} there is no sorting or top-n aspect to facet.range, so the idea of over-requesting doesn't apply -- but the minCount/nShards idea still applies. if the user requests a minCount of 10 and there are 3 shards, then you could set f.foo.facet.mincount=4 for the shard requests -- because unless at lest one shard responds back with a count higher then 4, you'll never be able to satisfy the original mincount=10 ... HOWEVER: using this strategy requires refinement requests, which we currently avoid in range faceting. {panel} i would not advise going with the refinement approach described above (hence the panel box labeling it not-recommended) because i think the single pass approach of range faceting right now is probably better for most common cases -- we just need to force mincount=0 on hte shard requests -- but i wanted to post it for completeness in case i'm missing something and you think it's a really good idea SolrCloud: facet range option f.field.facet.mincount=1 omits buckets on response -- Key: SOLR-6154 URL: https://issues.apache.org/jira/browse/SOLR-6154 Project: Solr Issue Type: Bug Affects Versions: 4.5.1, 4.8.1 Environment: Solr 4.5.1 under Linux - explicit id routing Indexed 400,000+ Documents explicit routing custom schema.xml Solr 4.8.1 under Windows+Cygwin Indexed 6 Documents implicit id routing out of the box schema Reporter: Ronald Matamoros Assignee: Erick Erickson Attachments: HowToReplicate.pdf, data.xml Attached - PDF with instructions on how to replicate. - data.xml to replicate index The f.field.facet.mincount option on a distributed search gives inconsistent list of buckets on a range facet. Experiencing that some buckets are ignored when using the option f.field.facet.mincount=1. The Solr logs do not indicate any error or warning during execution. The debug=true option and increasing the log levels to the FacetComponent do not provide any hints to the behaviour. Replicated the issue on both Solr 4.5.1 4.8.1. Example, Removing the f.field.facet.mincount=1 option gives the expected list of buckets for the 6 documents matched. lst name=facet_ranges lst name=price lst name=counts int name=0.00/int int name=50.01/int int name=100.00/int int name=150.03/int int name=200.00/int int name=250.01/int int name=300.00/int int name=350.00/int int name=400.00/int int name=450.00/int int name=500.00/int int name=550.00/int int name=600.00/int int name=650.00/int int name=700.00/int int name=750.01/int int name=800.00/int int name=850.00/int int name=900.00/int int name=950.00/int /lst float name=gap50.0/float float name=start0.0/float float name=end1000.0/float int name=before0/int int name=after0/int int name=between2/int /lst
[jira] [Commented] (LUCENE-5925) Use rename instead of segments_N fallback / segments.gen etc
[ https://issues.apache.org/jira/browse/LUCENE-5925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125786#comment-14125786 ] Michael McCandless commented on LUCENE-5925: +1, I like this solution. It passed 85 iters of all lucene core + module tests... Use rename instead of segments_N fallback / segments.gen etc Key: LUCENE-5925 URL: https://issues.apache.org/jira/browse/LUCENE-5925 Project: Lucene - Core Issue Type: Improvement Components: core/index, core/store Reporter: Robert Muir Labels: Java7 Attachments: LUCENE-5925.patch, LUCENE-5925.patch Our commit logic is strange, we write corrupted commit points and only on the last phase of commit do we correct them. This means the logic to get the latest commit is always scary and awkward, since it must deal with partial commits, and try to determine if it should fall back to segments_N-1 or actually relay an exception. This logic is incomplete/sheisty as we, e.g. i think we only fall back so far (at most one). If we somehow screw up in all this logic do the wrong thing, then we lose data (e.g. LUCENE-4870 wiped entire index because of TooManyOpenFiles). We now require java 7, i think we should expore instead writing {{pending_segments_N}} and then in finishCommit() doing an atomic rename to {{segments_N}}. We could then remove all the complex fallback logic completely, since we no longer have to deal with ignoring partial commits, instead simply delivering any exception we get when trying to read the commit and sleep better at night. In java 7, we have the apis for this (ATOMIC_MOVE). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6187) facet.mincount ignored in range faceting using distributed search
[ https://issues.apache.org/jira/browse/SOLR-6187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125796#comment-14125796 ] ASF subversion and git services commented on SOLR-6187: --- Commit 1623447 from [~erickoerickson] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1623447 ] SOLR-6187: facet.mincount ignored in range faceting using distributed search facet.mincount ignored in range faceting using distributed search - Key: SOLR-6187 URL: https://issues.apache.org/jira/browse/SOLR-6187 Project: Solr Issue Type: Bug Components: search Affects Versions: 4.8, 4.8.1 Reporter: Zaccheo Bagnati Assignee: Erick Erickson Attachments: SOLR-6187.patch, SOLR-6187.patch, SOLR-6187.patch, SOLR-6187.patch, SOLR-6187.patch, SOLR-6187.patch While I was trying to do a range faceting with gap +1YEAR using shards, I noticed that facet.mincount parameter seems to be ignored. Issue can be reproduced in this way: Create 2 cores testshard1 and testshard2 with: solrconfig.xml ?xml version=1.0 encoding=UTF-8 ? config luceneMatchVersionLUCENE_41/luceneMatchVersion lib dir=/opt/solr/dist regex=solr-cell-.*\.jar/ directoryFactory name=DirectoryFactory class=${solr.directoryFactory:solr.NRTCachingDirectoryFactory}/ updateHandler class=solr.DirectUpdateHandler2 / requestHandler name=/select class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str int name=rows10/int str name=dfid/str /lst /requestHandler requestHandler name=/update class=solr.UpdateRequestHandler / requestHandler name=/admin/ class=org.apache.solr.handler.admin.AdminHandlers / requestHandler name=/admin/ping class=solr.PingRequestHandler lst name=invariants str name=qsolrpingquery/str /lst lst name=defaults str name=echoParamsall/str /lst /requestHandler /config schema.xml ?xml version=1.0 ? schema name=${solr.core.name} version=1.5 xmlns:xi=http://www.w3.org/2001/XInclude; fieldType name=int class=solr.TrieIntField precisionStep=0 positionIncrementGap=0/ fieldType name=long class=solr.TrieLongField precisionStep=0 positionIncrementGap=0/ fieldType name=date class=solr.TrieDateField precisionStep=0 positionIncrementGap=0/ field name=_version_ type=long indexed=true stored=true/ field name=id type=int indexed=true stored=true multiValued=false / field name=date type=date indexed=true stored=true multiValued=false / uniqueKeyid/uniqueKey defaultSearchFieldid/defaultSearchField /schema Insert in testshard1: add doc field name=id1/field field name=date2014-06-20T12:51:00Z/field /doc /add Insert into testshard2: add doc field name=id2/field field name=date2013-06-20T12:51:00Z/field /doc /add Now if I execute: curl http://localhost:8983/solr/testshard1/select?q=id:1facet=truefacet.mincount=1facet.range=datef.date.facet.range.start=1900-01-01T00:00:00Zf.date.facet.range.end=NOWf.date.facet.range.gap=%2B1YEARshards=localhost%3A8983%2Fsolr%2Ftestshard1%2Clocalhost%3A8983%2Fsolr%2Ftestshard2shards.info=truewt=json; I obtain:
[jira] [Commented] (SOLR-6154) SolrCloud: facet range option f.field.facet.mincount=1 omits buckets on response
[ https://issues.apache.org/jira/browse/SOLR-6154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125802#comment-14125802 ] Erick Erickson commented on SOLR-6154: -- Whew! I _just_ committed this patch and it forces mincount to 0 for the shard requests, which is in-line with your comments I. SolrCloud: facet range option f.field.facet.mincount=1 omits buckets on response -- Key: SOLR-6154 URL: https://issues.apache.org/jira/browse/SOLR-6154 Project: Solr Issue Type: Bug Affects Versions: 4.5.1, 4.8.1 Environment: Solr 4.5.1 under Linux - explicit id routing Indexed 400,000+ Documents explicit routing custom schema.xml Solr 4.8.1 under Windows+Cygwin Indexed 6 Documents implicit id routing out of the box schema Reporter: Ronald Matamoros Assignee: Erick Erickson Attachments: HowToReplicate.pdf, data.xml Attached - PDF with instructions on how to replicate. - data.xml to replicate index The f.field.facet.mincount option on a distributed search gives inconsistent list of buckets on a range facet. Experiencing that some buckets are ignored when using the option f.field.facet.mincount=1. The Solr logs do not indicate any error or warning during execution. The debug=true option and increasing the log levels to the FacetComponent do not provide any hints to the behaviour. Replicated the issue on both Solr 4.5.1 4.8.1. Example, Removing the f.field.facet.mincount=1 option gives the expected list of buckets for the 6 documents matched. lst name=facet_ranges lst name=price lst name=counts int name=0.00/int int name=50.01/int int name=100.00/int int name=150.03/int int name=200.00/int int name=250.01/int int name=300.00/int int name=350.00/int int name=400.00/int int name=450.00/int int name=500.00/int int name=550.00/int int name=600.00/int int name=650.00/int int name=700.00/int int name=750.01/int int name=800.00/int int name=850.00/int int name=900.00/int int name=950.00/int /lst float name=gap50.0/float float name=start0.0/float float name=end1000.0/float int name=before0/int int name=after0/int int name=between2/int /lst /lst Using the f.field.facet.mincount=1 option removes the 0 count buckets but will also omit bucket int name=250.0 lst name=facet_ranges lst name=price lst name=counts int name=50.01/int int name=150.03/int int name=750.01/int /lst float name=gap50.0/float float name=start0.0/float float name=end1000.0/float int name=before0/int int name=after0/int int name=between4/int /lst /lst Resubmitting the query renders a different bucket list (May need to resubmit a couple times) lst name=facet_ranges lst name=price lst name=counts int name=150.03/int int name=250.01/int /lst float name=gap50.0/float float name=start0.0/float float name=end1000.0/float int name=before0/int int name=after0/int int name=between2/int /lst /lst -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-6187) facet.mincount ignored in range faceting using distributed search
[ https://issues.apache.org/jira/browse/SOLR-6187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson resolved SOLR-6187. -- Resolution: Fixed Fix Version/s: 4.11 5.0 facet.mincount ignored in range faceting using distributed search - Key: SOLR-6187 URL: https://issues.apache.org/jira/browse/SOLR-6187 Project: Solr Issue Type: Bug Components: search Affects Versions: 4.8, 4.8.1 Reporter: Zaccheo Bagnati Assignee: Erick Erickson Fix For: 5.0, 4.11 Attachments: SOLR-6187.patch, SOLR-6187.patch, SOLR-6187.patch, SOLR-6187.patch, SOLR-6187.patch, SOLR-6187.patch While I was trying to do a range faceting with gap +1YEAR using shards, I noticed that facet.mincount parameter seems to be ignored. Issue can be reproduced in this way: Create 2 cores testshard1 and testshard2 with: solrconfig.xml ?xml version=1.0 encoding=UTF-8 ? config luceneMatchVersionLUCENE_41/luceneMatchVersion lib dir=/opt/solr/dist regex=solr-cell-.*\.jar/ directoryFactory name=DirectoryFactory class=${solr.directoryFactory:solr.NRTCachingDirectoryFactory}/ updateHandler class=solr.DirectUpdateHandler2 / requestHandler name=/select class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str int name=rows10/int str name=dfid/str /lst /requestHandler requestHandler name=/update class=solr.UpdateRequestHandler / requestHandler name=/admin/ class=org.apache.solr.handler.admin.AdminHandlers / requestHandler name=/admin/ping class=solr.PingRequestHandler lst name=invariants str name=qsolrpingquery/str /lst lst name=defaults str name=echoParamsall/str /lst /requestHandler /config schema.xml ?xml version=1.0 ? schema name=${solr.core.name} version=1.5 xmlns:xi=http://www.w3.org/2001/XInclude; fieldType name=int class=solr.TrieIntField precisionStep=0 positionIncrementGap=0/ fieldType name=long class=solr.TrieLongField precisionStep=0 positionIncrementGap=0/ fieldType name=date class=solr.TrieDateField precisionStep=0 positionIncrementGap=0/ field name=_version_ type=long indexed=true stored=true/ field name=id type=int indexed=true stored=true multiValued=false / field name=date type=date indexed=true stored=true multiValued=false / uniqueKeyid/uniqueKey defaultSearchFieldid/defaultSearchField /schema Insert in testshard1: add doc field name=id1/field field name=date2014-06-20T12:51:00Z/field /doc /add Insert into testshard2: add doc field name=id2/field field name=date2013-06-20T12:51:00Z/field /doc /add Now if I execute: curl http://localhost:8983/solr/testshard1/select?q=id:1facet=truefacet.mincount=1facet.range=datef.date.facet.range.start=1900-01-01T00:00:00Zf.date.facet.range.end=NOWf.date.facet.range.gap=%2B1YEARshards=localhost%3A8983%2Fsolr%2Ftestshard1%2Clocalhost%3A8983%2Fsolr%2Ftestshard2shards.info=truewt=json; I obtain:
[jira] [Resolved] (SOLR-6154) SolrCloud: facet range option f.field.facet.mincount=1 omits buckets on response
[ https://issues.apache.org/jira/browse/SOLR-6154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson resolved SOLR-6154. -- Resolution: Fixed Fix Version/s: 4.11 5.0 Fixed with the checkin for SOLR-6187 Thanks again Ronald for a _great_ problem writeup and reproducible test case! SolrCloud: facet range option f.field.facet.mincount=1 omits buckets on response -- Key: SOLR-6154 URL: https://issues.apache.org/jira/browse/SOLR-6154 Project: Solr Issue Type: Bug Affects Versions: 4.5.1, 4.8.1 Environment: Solr 4.5.1 under Linux - explicit id routing Indexed 400,000+ Documents explicit routing custom schema.xml Solr 4.8.1 under Windows+Cygwin Indexed 6 Documents implicit id routing out of the box schema Reporter: Ronald Matamoros Assignee: Erick Erickson Fix For: 5.0, 4.11 Attachments: HowToReplicate.pdf, data.xml Attached - PDF with instructions on how to replicate. - data.xml to replicate index The f.field.facet.mincount option on a distributed search gives inconsistent list of buckets on a range facet. Experiencing that some buckets are ignored when using the option f.field.facet.mincount=1. The Solr logs do not indicate any error or warning during execution. The debug=true option and increasing the log levels to the FacetComponent do not provide any hints to the behaviour. Replicated the issue on both Solr 4.5.1 4.8.1. Example, Removing the f.field.facet.mincount=1 option gives the expected list of buckets for the 6 documents matched. lst name=facet_ranges lst name=price lst name=counts int name=0.00/int int name=50.01/int int name=100.00/int int name=150.03/int int name=200.00/int int name=250.01/int int name=300.00/int int name=350.00/int int name=400.00/int int name=450.00/int int name=500.00/int int name=550.00/int int name=600.00/int int name=650.00/int int name=700.00/int int name=750.01/int int name=800.00/int int name=850.00/int int name=900.00/int int name=950.00/int /lst float name=gap50.0/float float name=start0.0/float float name=end1000.0/float int name=before0/int int name=after0/int int name=between2/int /lst /lst Using the f.field.facet.mincount=1 option removes the 0 count buckets but will also omit bucket int name=250.0 lst name=facet_ranges lst name=price lst name=counts int name=50.01/int int name=150.03/int int name=750.01/int /lst float name=gap50.0/float float name=start0.0/float float name=end1000.0/float int name=before0/int int name=after0/int int name=between4/int /lst /lst Resubmitting the query renders a different bucket list (May need to resubmit a couple times) lst name=facet_ranges lst name=price lst name=counts int name=150.03/int int name=250.01/int /lst float name=gap50.0/float float name=start0.0/float float name=end1000.0/float int name=before0/int int name=after0/int int name=between2/int /lst /lst -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5871) Ability to see the list of fields that matched the query with scores
[ https://issues.apache.org/jira/browse/SOLR-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125813#comment-14125813 ] Erick Erickson commented on SOLR-5871: -- While I agree this is functionality that people have requested, it's pretty clear that I'm not going to get to it in any reasonable time frame, so un-assigning it. Ability to see the list of fields that matched the query with scores Key: SOLR-5871 URL: https://issues.apache.org/jira/browse/SOLR-5871 Project: Solr Issue Type: Wish Reporter: Alexander S. Hello, I need the ability to tell users what content matched their query, this way: | Name | Twitter Profile | Topics | Site Title | Site Description | Site content | | John Doe | Yes| No | Yes | No | Yes | | Jane Doe | No | Yes | No | No | Yes | All these columns are indexed text fields and I need to know what content matched the query and would be also cool to be able to show the score per field. As far as I know right now there's no way to return this information when running a query request. Debug outputs is suitable for visual review but has lots of nesting levels and is hard for understanding. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5871) Ability to see the list of fields that matched the query with scores
[ https://issues.apache.org/jira/browse/SOLR-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-5871: - Assignee: (was: Erick Erickson) Ability to see the list of fields that matched the query with scores Key: SOLR-5871 URL: https://issues.apache.org/jira/browse/SOLR-5871 Project: Solr Issue Type: Wish Reporter: Alexander S. Hello, I need the ability to tell users what content matched their query, this way: | Name | Twitter Profile | Topics | Site Title | Site Description | Site content | | John Doe | Yes| No | Yes | No | Yes | | Jane Doe | No | Yes | No | No | Yes | All these columns are indexed text fields and I need to know what content matched the query and would be also cool to be able to show the score per field. As far as I know right now there's no way to return this information when running a query request. Debug outputs is suitable for visual review but has lots of nesting levels and is hard for understanding. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [jira] [Reopened] (SOLR-6457) LBHttpSolrServer: AIOOBE risk if counter overflows
: I thought it is an extremely trivial fix. That's why I didn't add it to : changes.txt. I shall do it To elaborate my thoughts: there are really 2 key issues why i think this deserse a CHANGES.txt entry: 1) anytime we fix a bug that affects previously released versions of Solr, there should be *something* related to it in CHANGES.txt. If a bug was found tracked in it's own Jira until fixed, then it should almost certainly be it's own entry in CHANGES.txt if for no other reason then so people who see the jira emails/records can easily find it. Frequently small bugs like this might be fixed as part of a larger bug fix or new feature that refactors a lot of code -- in which case that larger bug/feature's CHANGES.txt entry already notes the relevant change summary, so an individual entry may not be needed. It depends on how impactful/common the bug was. remember: people look at CHANGES.txt to decide if/when it's worth their effort to upgrade, if there's a bug/glitch that's affecting them, and your CHANGES.txt entry makes them think ah, i bet that's what's causin hte weird behavior i'm seeing! yea it's been fixed!! that helps people a lot. 2) any time a contributor provides code, they *MUST* get credit for their contribution. As a committer, if you are making minor tweaks or cleanup or fixing a small thing as part of a larger change, you can always choose not to bother tooting your own horn about every little change you make if ou don't think it warrants special attention based on the point #1 guidelines i was mentioning above -- but when you are acting as the agent of another contributor by committing their code, you really must give them credit in CHANGES.txt : On Sep 5, 2014 7:15 PM, Hoss Man (JIRA) j...@apache.org wrote: : : : [ : https://issues.apache.org/jira/browse/SOLR-6457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel : ] : : Hoss Man reopened SOLR-6457: : : : This needs a CHANGES.txt noting the bug fix and giving longkeyy credit for : the contribution. : : LBHttpSolrServer: AIOOBE risk if counter overflows : -- : : Key: SOLR-6457 : URL: https://issues.apache.org/jira/browse/SOLR-6457 : Project: Solr :Issue Type: Bug :Components: clients - java : Affects Versions: 4.0, 4.1, 4.2, 4.2.1, 4.3, 4.3.1, 4.4, 4.5, 4.5.1, : 4.6, 4.6.1, 4.7, 4.7.1, 4.7.2, 4.8, 4.8.1, 4.9 : Reporter: longkeyy : Assignee: Noble Paul :Labels: patch : Attachments: SOLR-6457.patch : : : org.apache.solr.client.solrj.impl.LBHttpSolrServer : line 442 : int count = counter.incrementAndGet(); : ServerWrapper wrapper = serverList[count % serverList.length]; : when counter overflows, the mod operation of : count % serverList.length will start trying to use negative numbers as : array indexes. : suggess to fixup it ,eg: : //keep count is greater than 0 : int count = counter.incrementAndGet() 0x7FF; : : : : -- : This message was sent by Atlassian JIRA : (v6.3.4#6332) : : - : To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org : For additional commands, e-mail: dev-h...@lucene.apache.org : : : -Hoss http://www.lucidworks.com/ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3583) Percentiles for facets, pivot facets, and distributed pivot facets
[ https://issues.apache.org/jira/browse/SOLR-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Molloy updated SOLR-3583: --- Attachment: SOLR-3583.patch Adapted patch for 4.10 tag which now includes SOLR-2894 OOB. Ran all unit tests successfully. Percentiles for facets, pivot facets, and distributed pivot facets -- Key: SOLR-3583 URL: https://issues.apache.org/jira/browse/SOLR-3583 Project: Solr Issue Type: Improvement Reporter: Chris Russell Priority: Minor Labels: newbie, patch Fix For: 4.9, 5.0 Attachments: SOLR-3583.patch, SOLR-3583.patch, SOLR-3583.patch, SOLR-3583.patch, SOLR-3583.patch, SOLR-3583.patch, SOLR-3583.patch, SOLR-3583.patch Built on top of SOLR-2894, this patch adds percentiles and averages to facets, pivot facets, and distributed pivot facets by making use of range facet internals. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3583) Percentiles for facets, pivot facets, and distributed pivot facets
[ https://issues.apache.org/jira/browse/SOLR-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125887#comment-14125887 ] Steve Molloy commented on SOLR-3583: [~hossman] I kind of agree with all your comments, although I still needed this functionality to be working today and see no progress on issues you pointed to. Anything we can do to help speed things up on the Stats component to support this? (specifically, we need distributed pivot faceted stats for average and sum for numeric fields). Percentiles for facets, pivot facets, and distributed pivot facets -- Key: SOLR-3583 URL: https://issues.apache.org/jira/browse/SOLR-3583 Project: Solr Issue Type: Improvement Reporter: Chris Russell Priority: Minor Labels: newbie, patch Fix For: 4.9, 5.0 Attachments: SOLR-3583.patch, SOLR-3583.patch, SOLR-3583.patch, SOLR-3583.patch, SOLR-3583.patch, SOLR-3583.patch, SOLR-3583.patch, SOLR-3583.patch Built on top of SOLR-2894, this patch adds percentiles and averages to facets, pivot facets, and distributed pivot facets by making use of range facet internals. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-5594) don't call 'svnversion' over and over in the build
[ https://issues.apache.org/jira/browse/LUCENE-5594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler reassigned LUCENE-5594: - Assignee: Uwe Schindler don't call 'svnversion' over and over in the build -- Key: LUCENE-5594 URL: https://issues.apache.org/jira/browse/LUCENE-5594 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Assignee: Uwe Schindler Some ant tasks (at least release packaging, i dunno what else), call svnversion over and over and over for each module in the build. can we just do this one time instead? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5594) don't call 'svnversion' over and over in the build
[ https://issues.apache.org/jira/browse/LUCENE-5594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125897#comment-14125897 ] Uwe Schindler commented on LUCENE-5594: --- Hi, I am working on this. I will put up a patch soon. My work includes not calling the svn executable at all, instead we use svnkit to revision numbers or checkout the src folder. This speeds up the build, because we do not need to fork a process all the time, which is expensive on windows and on freebsd jenkins (because we fork Java). Using svnkit to do this and saving the revision number in a property helps. It makes also Jenkins builds more easy to configure, because you do not depend on local svn installations (e.g. Jenkins checks out with different version than locally installed). This is an issue on Policeman enkins and FreeBSD Jenkins. don't call 'svnversion' over and over in the build -- Key: LUCENE-5594 URL: https://issues.apache.org/jira/browse/LUCENE-5594 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Some ant tasks (at least release packaging, i dunno what else), call svnversion over and over and over for each module in the build. can we just do this one time instead? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4073) Lucene puts output of svnversion into a property even if svnversion failed
[ https://issues.apache.org/jira/browse/LUCENE-4073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125900#comment-14125900 ] Uwe Schindler commented on LUCENE-4073: --- I will fix this soon. Lucene puts output of svnversion into a property even if svnversion failed -- Key: LUCENE-4073 URL: https://issues.apache.org/jira/browse/LUCENE-4073 Project: Lucene - Core Issue Type: Bug Components: general/build Affects Versions: 3.6 Environment: Windows 7 x64, Cygwin for tools like svn Reporter: Trejkaz We had a build issue today where Lucene was running svnversion which was failing (the reason for the failure itself is not particularly important.) As a result, the error text output of running the command ended up in the svnversion property. The build later attempted to insert this into MANIFEST.MF which resulted in an invalid manifest file, causing the build to fail. A related observation is that even if it works, the svnversion would be the version of our own repository, so the usefulness of it in the context of Lucene's version number is questionable anyway. It would be nice if the build could get the svn version number only if it was checked out from Lucene trunk. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5594) don't call 'svnversion' over and over in the build
[ https://issues.apache.org/jira/browse/LUCENE-5594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-5594: -- Fix Version/s: 4.11 5.0 don't call 'svnversion' over and over in the build -- Key: LUCENE-5594 URL: https://issues.apache.org/jira/browse/LUCENE-5594 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Assignee: Uwe Schindler Fix For: 5.0, 4.11 Some ant tasks (at least release packaging, i dunno what else), call svnversion over and over and over for each module in the build. can we just do this one time instead? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5480) Make MoreLikeThisHandler distributable
[ https://issues.apache.org/jira/browse/SOLR-5480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Claude updated SOLR-5480: - Attachment: MoreLikeThisHandlerTestST.txt Stack trace from running MoreLikeThisHandlerTest on OSX using 1.7.0_67 Java. Make MoreLikeThisHandler distributable -- Key: SOLR-5480 URL: https://issues.apache.org/jira/browse/SOLR-5480 Project: Solr Issue Type: Improvement Reporter: Steve Molloy Assignee: Noble Paul Attachments: MoreLikeThisHandlerTestST.txt, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch The MoreLikeThis component, when used in the standard search handler supports distributed searches. But the MoreLikeThisHandler itself doesn't, which prevents from say, passing in text to perform the query. I'll start looking into adapting the SearchHandler logic to the MoreLikeThisHandler. If anyone has some work done already and want to share, or want to contribute, any help will be welcomed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5480) Make MoreLikeThisHandler distributable
[ https://issues.apache.org/jira/browse/SOLR-5480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125919#comment-14125919 ] Claude commented on SOLR-5480: -- [~smolloy], I've run on OSX and ubuntu 14.04 with the same JDK you are using and get a stack trace which I've attached. Make MoreLikeThisHandler distributable -- Key: SOLR-5480 URL: https://issues.apache.org/jira/browse/SOLR-5480 Project: Solr Issue Type: Improvement Reporter: Steve Molloy Assignee: Noble Paul Attachments: MoreLikeThisHandlerTestST.txt, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch The MoreLikeThis component, when used in the standard search handler supports distributed searches. But the MoreLikeThisHandler itself doesn't, which prevents from say, passing in text to perform the query. I'll start looking into adapting the SearchHandler logic to the MoreLikeThisHandler. If anyone has some work done already and want to share, or want to contribute, any help will be welcomed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5480) Make MoreLikeThisHandler distributable
[ https://issues.apache.org/jira/browse/SOLR-5480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125933#comment-14125933 ] Steve Molloy commented on SOLR-5480: [~claudenm] Something is wrong with this stack trace. You actually have compilation errors pointing to MLT handler not implementing an abstract method of RequestHandlerBase which is implemented in SearchHandler which MLT handler extends after applying the patch. Caused by: java.lang.Error: Unresolved compilation problems: The type MoreLikeThisHandler must implement the inherited abstract method RequestHandlerBase.handleRequestBody(SolrQueryRequest, SolrQueryResponse) After applying the latest patch, is you MoreLikeThisHandler declaration like: public class MoreLikeThisHandler extends SearchHandler I also see you are in eclipse (from the paths), are you running the tests from command line or within eclipse? (trying to see where things may differ) Make MoreLikeThisHandler distributable -- Key: SOLR-5480 URL: https://issues.apache.org/jira/browse/SOLR-5480 Project: Solr Issue Type: Improvement Reporter: Steve Molloy Assignee: Noble Paul Attachments: MoreLikeThisHandlerTestST.txt, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch The MoreLikeThis component, when used in the standard search handler supports distributed searches. But the MoreLikeThisHandler itself doesn't, which prevents from say, passing in text to perform the query. I'll start looking into adapting the SearchHandler logic to the MoreLikeThisHandler. If anyone has some work done already and want to share, or want to contribute, any help will be welcomed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4212) Support for facet pivot query for filtered count
[ https://issues.apache.org/jira/browse/SOLR-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Molloy updated SOLR-4212: --- Attachment: SOLR-4212.patch Adapted patch for 4.10 code and validated that tests passed. Support for facet pivot query for filtered count Key: SOLR-4212 URL: https://issues.apache.org/jira/browse/SOLR-4212 Project: Solr Issue Type: Improvement Components: search Affects Versions: 4.0 Reporter: Steve Molloy Fix For: 4.9, 5.0 Attachments: SOLR-4212.patch, SOLR-4212.patch, SOLR-4212.patch, patch-4212.txt Facet pivot provide hierarchical support for computing data used to populate a treemap or similar visualization. TreeMaps usually offer users extra information by applying an overlay color on top of the existing square sizes based on hierarchical counts. This second count is based on user choices, representing, usually with gradient, the proportion of the square that fits the user's choices. The proposition is to add a facet.pivot.q parameter that would allow to specify a query (per field) that would be intersected with DocSet used to calculate pivot count, stored in separate q-count. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4212) Support for facet pivot query for filtered count
[ https://issues.apache.org/jira/browse/SOLR-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Molloy updated SOLR-4212: --- Attachment: (was: SOLR-4212.patch) Support for facet pivot query for filtered count Key: SOLR-4212 URL: https://issues.apache.org/jira/browse/SOLR-4212 Project: Solr Issue Type: Improvement Components: search Affects Versions: 4.0 Reporter: Steve Molloy Fix For: 4.9, 5.0 Attachments: SOLR-4212.patch, SOLR-4212.patch, patch-4212.txt Facet pivot provide hierarchical support for computing data used to populate a treemap or similar visualization. TreeMaps usually offer users extra information by applying an overlay color on top of the existing square sizes based on hierarchical counts. This second count is based on user choices, representing, usually with gradient, the proportion of the square that fits the user's choices. The proposition is to add a facet.pivot.q parameter that would allow to specify a query (per field) that would be intersected with DocSet used to calculate pivot count, stored in separate q-count. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Deleted] (SOLR-4212) Support for facet pivot query for filtered count
[ https://issues.apache.org/jira/browse/SOLR-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Molloy updated SOLR-4212: --- Comment: was deleted (was: Adapted patch for 4.10 code and validated that tests passed.) Support for facet pivot query for filtered count Key: SOLR-4212 URL: https://issues.apache.org/jira/browse/SOLR-4212 Project: Solr Issue Type: Improvement Components: search Affects Versions: 4.0 Reporter: Steve Molloy Fix For: 4.9, 5.0 Attachments: SOLR-4212.patch, SOLR-4212.patch, patch-4212.txt Facet pivot provide hierarchical support for computing data used to populate a treemap or similar visualization. TreeMaps usually offer users extra information by applying an overlay color on top of the existing square sizes based on hierarchical counts. This second count is based on user choices, representing, usually with gradient, the proportion of the square that fits the user's choices. The proposition is to add a facet.pivot.q parameter that would allow to specify a query (per field) that would be intersected with DocSet used to calculate pivot count, stored in separate q-count. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4212) Support for facet pivot query for filtered count
[ https://issues.apache.org/jira/browse/SOLR-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Molloy updated SOLR-4212: --- Attachment: SOLR-4212.patch (right attchment this time) Adapted patch for 4.10 code and ensured all tests passed. Support for facet pivot query for filtered count Key: SOLR-4212 URL: https://issues.apache.org/jira/browse/SOLR-4212 Project: Solr Issue Type: Improvement Components: search Affects Versions: 4.0 Reporter: Steve Molloy Fix For: 4.9, 5.0 Attachments: SOLR-4212.patch, SOLR-4212.patch, SOLR-4212.patch, patch-4212.txt Facet pivot provide hierarchical support for computing data used to populate a treemap or similar visualization. TreeMaps usually offer users extra information by applying an overlay color on top of the existing square sizes based on hierarchical counts. This second count is based on user choices, representing, usually with gradient, the proportion of the square that fits the user's choices. The proposition is to add a facet.pivot.q parameter that would allow to specify a query (per field) that would be intersected with DocSet used to calculate pivot count, stored in separate q-count. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5927) 4.9 - 4.10 change in StandardTokenizer behavior on \u1aa2
Ryan Ernst created LUCENE-5927: -- Summary: 4.9 - 4.10 change in StandardTokenizer behavior on \u1aa2 Key: LUCENE-5927 URL: https://issues.apache.org/jira/browse/LUCENE-5927 Project: Lucene - Core Issue Type: Bug Reporter: Ryan Ernst In 4.9, this string was broken into 2 tokens by StandardTokenizer: \u1aa2\u1a7f\u1a6f\u1a6f\u1a61\u1a72 = \u1aa2, \u1a7f\u1a6f\u1a6f\u1a61\u1a72 However, in 4.10, that has changed so it is now a single token returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5927) 4.9 - 4.10 change in StandardTokenizer behavior on \u1aa2
[ https://issues.apache.org/jira/browse/LUCENE-5927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126035#comment-14126035 ] Robert Muir commented on LUCENE-5927: - From ryan's explanation to me, this only impacts stuff with line break of complex context (it will wrongly split on a combining mark). I think we can be sensible about what we do here (I suggest: nothing), because in such a case you arent getting useful tokens from the tokenizer anyway unless you are doing downstream processing... and if you are doing that, its very good that this bug is fixed. 4.9 - 4.10 change in StandardTokenizer behavior on \u1aa2 -- Key: LUCENE-5927 URL: https://issues.apache.org/jira/browse/LUCENE-5927 Project: Lucene - Core Issue Type: Bug Reporter: Ryan Ernst In 4.9, this string was broken into 2 tokens by StandardTokenizer: \u1aa2\u1a7f\u1a6f\u1a6f\u1a61\u1a72 = \u1aa2, \u1a7f\u1a6f\u1a6f\u1a61\u1a72 However, in 4.10, that has changed so it is now a single token returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5480) Make MoreLikeThisHandler distributable
[ https://issues.apache.org/jira/browse/SOLR-5480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126039#comment-14126039 ] Claude commented on SOLR-5480: -- [~smolloy], Thanks. I see that the patch wasn't (and isn't) applying cleanly. I checked out a branch off of the 4.10 release. I'm trying to figure out why. Any thoughts? Make MoreLikeThisHandler distributable -- Key: SOLR-5480 URL: https://issues.apache.org/jira/browse/SOLR-5480 Project: Solr Issue Type: Improvement Reporter: Steve Molloy Assignee: Noble Paul Attachments: MoreLikeThisHandlerTestST.txt, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch The MoreLikeThis component, when used in the standard search handler supports distributed searches. But the MoreLikeThisHandler itself doesn't, which prevents from say, passing in text to perform the query. I'll start looking into adapting the SearchHandler logic to the MoreLikeThisHandler. If anyone has some work done already and want to share, or want to contribute, any help will be welcomed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5480) Make MoreLikeThisHandler distributable
[ https://issues.apache.org/jira/browse/SOLR-5480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126057#comment-14126057 ] Steve Molloy commented on SOLR-5480: I often run into this with patches taken from Jira... The slightest change before creating the patch seem to have a significant impact on whether or not the patch can apply cleanly. I usually have to resort to applying whatever matches while excluding any conflicts which I then apply manually. Make MoreLikeThisHandler distributable -- Key: SOLR-5480 URL: https://issues.apache.org/jira/browse/SOLR-5480 Project: Solr Issue Type: Improvement Reporter: Steve Molloy Assignee: Noble Paul Attachments: MoreLikeThisHandlerTestST.txt, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch The MoreLikeThis component, when used in the standard search handler supports distributed searches. But the MoreLikeThisHandler itself doesn't, which prevents from say, passing in text to perform the query. I'll start looking into adapting the SearchHandler logic to the MoreLikeThisHandler. If anyone has some work done already and want to share, or want to contribute, any help will be welcomed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5927) 4.9 - 4.10 change in StandardTokenizer behavior on \u1aa2
[ https://issues.apache.org/jira/browse/LUCENE-5927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126059#comment-14126059 ] Steve Rowe commented on LUCENE-5927: These characters are in the Tai Tham block, all characters in which have the property {{LB:ComplexContext}}, sequences of which are returned as token type {{SOUTHEAST_ASIAN}}. This behavior change is caused by a grammar fix I included with LUCENE-5770 - previous to 4.10, the grammar did not include {{WB:Format}} or {{WB:Extend}} chars - here are the relevant parts from the 4.9 grammar: {noformat} ContextSupp = ([]) // no supplementary characters in {{LB:ComplexContext}} in Unicode 6.3 ... ComplexContext= (\p{LB:Complex_Context} | {ComplexContextSupp}) ... {ComplexContext}+ { return SOUTH_EAST_ASIAN_TYPE; } {noformat} and the 4.10 grammar is now (note the addition of {{WB:Format}} and {{WB:Extend}} chars): {noformat} ComplexContextEx = \p{LB:Complex_Context} [\p{WB:Format}\p{WB:Extend}]* ... {ComplexContextEx}+ { return SOUTH_EAST_ASIAN_TYPE; } {noformat} 4.9 - 4.10 change in StandardTokenizer behavior on \u1aa2 -- Key: LUCENE-5927 URL: https://issues.apache.org/jira/browse/LUCENE-5927 Project: Lucene - Core Issue Type: Bug Reporter: Ryan Ernst In 4.9, this string was broken into 2 tokens by StandardTokenizer: \u1aa2\u1a7f\u1a6f\u1a6f\u1a61\u1a72 = \u1aa2, \u1a7f\u1a6f\u1a6f\u1a61\u1a72 However, in 4.10, that has changed so it is now a single token returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5927) 4.9 - 4.10 change in StandardTokenizer behavior on \u1aa2
[ https://issues.apache.org/jira/browse/LUCENE-5927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126064#comment-14126064 ] Steve Rowe commented on LUCENE-5927: [~rjernst] mentioned to me offline that this behavior change should have triggered a version-specific implementation, which did not happen. I agree, it should have. But now that it's been released, should we include a version-specific implementation in a bugfix 4.10.1 release? Or wait till 4.11? Or just stop doing version-specific implementations (as will be the case in 5.x)? Thoughts? 4.9 - 4.10 change in StandardTokenizer behavior on \u1aa2 -- Key: LUCENE-5927 URL: https://issues.apache.org/jira/browse/LUCENE-5927 Project: Lucene - Core Issue Type: Bug Reporter: Ryan Ernst In 4.9, this string was broken into 2 tokens by StandardTokenizer: \u1aa2\u1a7f\u1a6f\u1a6f\u1a61\u1a72 = \u1aa2, \u1a7f\u1a6f\u1a6f\u1a61\u1a72 However, in 4.10, that has changed so it is now a single token returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5594) don't call 'svnversion' over and over in the build
[ https://issues.apache.org/jira/browse/LUCENE-5594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126069#comment-14126069 ] Uwe Schindler commented on LUCENE-5594: --- The big question is: Do we really need the SVN revision number in the Implementation-Version of every JAR file? I tend to remove that oen. Opinions? don't call 'svnversion' over and over in the build -- Key: LUCENE-5594 URL: https://issues.apache.org/jira/browse/LUCENE-5594 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Assignee: Uwe Schindler Fix For: 5.0, 4.11 Some ant tasks (at least release packaging, i dunno what else), call svnversion over and over and over for each module in the build. can we just do this one time instead? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5927) 4.9 - 4.10 change in StandardTokenizer behavior on \u1aa2
[ https://issues.apache.org/jira/browse/LUCENE-5927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126081#comment-14126081 ] Steve Rowe commented on LUCENE-5927: bq. I think we can be sensible about what we do here (I suggest: nothing), because in such a case you arent getting useful tokens from the tokenizer anyway unless you are doing downstream processing... and if you are doing that, its very good that this bug is fixed. Version-specific behavior is important for people who don't want changes; IMHO everybody impacted by this change would want it, so I agree: we should do nothing. 4.9 - 4.10 change in StandardTokenizer behavior on \u1aa2 -- Key: LUCENE-5927 URL: https://issues.apache.org/jira/browse/LUCENE-5927 Project: Lucene - Core Issue Type: Bug Reporter: Ryan Ernst In 4.9, this string was broken into 2 tokens by StandardTokenizer: \u1aa2\u1a7f\u1a6f\u1a6f\u1a61\u1a72 = \u1aa2, \u1a7f\u1a6f\u1a6f\u1a61\u1a72 However, in 4.10, that has changed so it is now a single token returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5594) don't call 'svnversion' over and over in the build
[ https://issues.apache.org/jira/browse/LUCENE-5594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126111#comment-14126111 ] Hoss Man commented on LUCENE-5594: -- bq. The big question is: Do we really need the SVN revision number in the Implementation-Version of every JAR file? Personally, i think it's really handy -- it provides a helpful sanity check(sum) to see what version someone is _actually_ running (ie: did they build themselves from svn source? did they have local modifications when they built?) Linking to the original issue where this all was added for some context: LUCENE-908 And the thread where the original suggestion came from in solr (which then got promoted up into the lucene common build stuff later)... https://mail-archives.apache.org/mod_mbox/lucene-solr-dev/200706.mbox/%3c46633309.5070...@lapnap.net%3E don't call 'svnversion' over and over in the build -- Key: LUCENE-5594 URL: https://issues.apache.org/jira/browse/LUCENE-5594 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Assignee: Uwe Schindler Fix For: 5.0, 4.11 Some ant tasks (at least release packaging, i dunno what else), call svnversion over and over and over for each module in the build. can we just do this one time instead? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6491) Add preferredLeader as a ROLE and a collecitons API command to respect this role
Erick Erickson created SOLR-6491: Summary: Add preferredLeader as a ROLE and a collecitons API command to respect this role Key: SOLR-6491 URL: https://issues.apache.org/jira/browse/SOLR-6491 Project: Solr Issue Type: Improvement Affects Versions: 5.0, 4.11 Reporter: Erick Erickson Assignee: Erick Erickson Leaders can currently get out of balance due to the sequence of how nodes are brought up in a cluster. For very good reasons shard leadership cannot be permanently assigned. However, it seems reasonable that a sys admin could optionally specify that a particular node be the _preferred_ leader for a particular collection/shard. During leader election, preference would be given to any node so marked when electing any leader. So the proposal here is to add another role for preferredLeader to the collections API, something like ADDROLE?role=preferredLeadercollection=collection_nameshard=shardId Second, it would be good to have a new collections API call like ELECTPREFERREDLEADERS?collection=collection_name (I really hate that name so far, but you see the idea). That command would (asynchronously?) make an attempt to transfer leadership for each shard in a collection to the leader labeled as the preferred leader by the new ADDROLE role. I'm going to start working on this, any suggestions welcome! This will subsume several other JIRAs, I'll link them momentarily. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6491) Add preferredLeader as a ROLE and a collecitons API command to respect this role
[ https://issues.apache.org/jira/browse/SOLR-6491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126133#comment-14126133 ] Erick Erickson commented on SOLR-6491: -- I think the functionality of all three of these JIRAs will be provided by this one. Add preferredLeader as a ROLE and a collecitons API command to respect this role Key: SOLR-6491 URL: https://issues.apache.org/jira/browse/SOLR-6491 Project: Solr Issue Type: Improvement Affects Versions: 5.0, 4.11 Reporter: Erick Erickson Assignee: Erick Erickson Leaders can currently get out of balance due to the sequence of how nodes are brought up in a cluster. For very good reasons shard leadership cannot be permanently assigned. However, it seems reasonable that a sys admin could optionally specify that a particular node be the _preferred_ leader for a particular collection/shard. During leader election, preference would be given to any node so marked when electing any leader. So the proposal here is to add another role for preferredLeader to the collections API, something like ADDROLE?role=preferredLeadercollection=collection_nameshard=shardId Second, it would be good to have a new collections API call like ELECTPREFERREDLEADERS?collection=collection_name (I really hate that name so far, but you see the idea). That command would (asynchronously?) make an attempt to transfer leadership for each shard in a collection to the leader labeled as the preferred leader by the new ADDROLE role. I'm going to start working on this, any suggestions welcome! This will subsume several other JIRAs, I'll link them momentarily. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5927) 4.9 - 4.10 change in StandardTokenizer behavior on \u1aa2
[ https://issues.apache.org/jira/browse/LUCENE-5927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126163#comment-14126163 ] Robert Muir commented on LUCENE-5927: - {quote} Or just stop doing version-specific implementations (as will be the case in 5.x)? {quote} In my opinion, thats unrelated to this issue (again for this particular issue, I think simulating the old bug is overkill because it just will not be useful). As far as the 4.6 unicode changes, the API complexity is out of the way in 5.x. Analyzers have getVersion/setVersion and if we want to add Lucene40StandardTokenizer and have them make use of this to emulate 4.0 (as opposed to 4.6+) grammar, thats fine. With the API ryan has, it wont cause users pain and keeps the back compat. 4.9 - 4.10 change in StandardTokenizer behavior on \u1aa2 -- Key: LUCENE-5927 URL: https://issues.apache.org/jira/browse/LUCENE-5927 Project: Lucene - Core Issue Type: Bug Reporter: Ryan Ernst In 4.9, this string was broken into 2 tokens by StandardTokenizer: \u1aa2\u1a7f\u1a6f\u1a6f\u1a61\u1a72 = \u1aa2, \u1a7f\u1a6f\u1a6f\u1a61\u1a72 However, in 4.10, that has changed so it is now a single token returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5925) Use rename instead of segments_N fallback / segments.gen etc
[ https://issues.apache.org/jira/browse/LUCENE-5925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126174#comment-14126174 ] Robert Muir commented on LUCENE-5925: - Thanks for beasting guys! I ran 150 iterations of 'test-core' myself. I'll give it a few days. Use rename instead of segments_N fallback / segments.gen etc Key: LUCENE-5925 URL: https://issues.apache.org/jira/browse/LUCENE-5925 Project: Lucene - Core Issue Type: Improvement Components: core/index, core/store Reporter: Robert Muir Labels: Java7 Attachments: LUCENE-5925.patch, LUCENE-5925.patch Our commit logic is strange, we write corrupted commit points and only on the last phase of commit do we correct them. This means the logic to get the latest commit is always scary and awkward, since it must deal with partial commits, and try to determine if it should fall back to segments_N-1 or actually relay an exception. This logic is incomplete/sheisty as we, e.g. i think we only fall back so far (at most one). If we somehow screw up in all this logic do the wrong thing, then we lose data (e.g. LUCENE-4870 wiped entire index because of TooManyOpenFiles). We now require java 7, i think we should expore instead writing {{pending_segments_N}} and then in finishCommit() doing an atomic rename to {{segments_N}}. We could then remove all the complex fallback logic completely, since we no longer have to deal with ignoring partial commits, instead simply delivering any exception we get when trying to read the commit and sleep better at night. In java 7, we have the apis for this (ATOMIC_MOVE). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5594) don't call 'svnversion' over and over in the build
[ https://issues.apache.org/jira/browse/LUCENE-5594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126178#comment-14126178 ] Uwe Schindler commented on LUCENE-5594: --- Hi, I already have a good solution: I cache the resolved svn version in a file in build dir. This file is used when building the JAR files. All stuff is no longer done with command line tools, it now does svnversion and also the svn exports using svnkit, grabbed from Maven Central. The only thing: the file can get stale. I was trying to not use a file and pass the svnversion down the build, but thats hard to do. I am still working on a better solution... don't call 'svnversion' over and over in the build -- Key: LUCENE-5594 URL: https://issues.apache.org/jira/browse/LUCENE-5594 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Assignee: Uwe Schindler Fix For: 5.0, 4.11 Some ant tasks (at least release packaging, i dunno what else), call svnversion over and over and over for each module in the build. can we just do this one time instead? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6489) morphlines-cell tests fail after upgrade to TIKA 1.6
[ https://issues.apache.org/jira/browse/SOLR-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126205#comment-14126205 ] Mark Miller commented on SOLR-6489: --- A good chunk of the issue appears to be that the 'decompress' morphline has support for the mime type 'application/x-gzip' but not 'application/gzip' and that perhaps Tika mimetype detection is now returning that instead for these .gz files. morphlines-cell tests fail after upgrade to TIKA 1.6 Key: SOLR-6489 URL: https://issues.apache.org/jira/browse/SOLR-6489 Project: Solr Issue Type: Bug Components: Tests Affects Versions: 4.11 Reporter: Uwe Schindler Assignee: Mark Miller Fix For: 5.0, 4.11 After upgrade tp Apache TIKA 1.6 (SOLR-6488), solr-morphlines-cell tests fail with scripting error messages. Due to missing understanding, caused by the crazy configuration file format and inability to figure out the test setup, I have to give up and hope that somebody else can take care. In addition, on my own machines, all of Hadoop does not work at all, so I cannot debug (Windows). The whole Morphlines setup is not really good, because Solr core depends on another TIKA version than the included morphlines libraries. This is not a good situation for Solr, because we should be able to upgrade to any version of our core components and not depend on external libraries that itsself depend on Solr in older versions! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6482) Add an onlyIfDown flag for DELETEREPLICA collections API command
[ https://issues.apache.org/jira/browse/SOLR-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-6482: - Attachment: SOLR-6482.patch Patch with a test as well, the test shamelessly piggy-backed on an existing test This may be ready to commit, all tests pass, but I'd like to give people a chance to comment. Add an onlyIfDown flag for DELETEREPLICA collections API command Key: SOLR-6482 URL: https://issues.apache.org/jira/browse/SOLR-6482 Project: Solr Issue Type: Improvement Components: SolrCloud Affects Versions: 5.0, 4.11 Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor Attachments: SOLR-6482.patch, SOLR-6482.patch Having the DELETEREPLICA delete the index is scary for some situations, especially ones in which the operations people are taking more explicit control of the topology of their cluster. As we move towards ZK being the one source of truth and deleting replicas that then come back up, this is even scarier. I propose to have an optional flag onlyIfDown that remove the replica from the ZK cluster state if (and only if) the node was offline. Default value: false. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6492) Solr field type that supports multiple, dynamic analyzers
Trey Grainger created SOLR-6492: --- Summary: Solr field type that supports multiple, dynamic analyzers Key: SOLR-6492 URL: https://issues.apache.org/jira/browse/SOLR-6492 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Trey Grainger Fix For: 4.11 A common request - particularly for multilingual search - is to be able to support one or more dynamically-selected analyzers for a field. For example, someone may have a content field and pass in a document in Greek (using an Analyzer with Tokenizer/Filters for German), a separate document in English (using an English Analyzer), and possibly even a field with mixed-language content in Greek and English. This latter case could pass the content separately through both an analyzer defined for Greek and another Analyzer defined for English, stacking or concatenating the token streams based upon the use-case. There are some distinct advantages in terms of index size and query performance which can be obtained by stacking terms from multiple analyzers in the same field instead of duplicating content in separate fields and searching across multiple fields. Other non-multilingual use cases may include things like switching to a different analyzer for the same field to remove a feature (i.e. turning on/off query-time synonyms against the same field on a per-query basis). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6489) morphlines-cell tests fail after upgrade to TIKA 1.6
[ https://issues.apache.org/jira/browse/SOLR-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126233#comment-14126233 ] Mark Miller commented on SOLR-6489: --- I'll disable the gz tests and file an issue with morphlines. morphlines-cell tests fail after upgrade to TIKA 1.6 Key: SOLR-6489 URL: https://issues.apache.org/jira/browse/SOLR-6489 Project: Solr Issue Type: Bug Components: Tests Affects Versions: 4.11 Reporter: Uwe Schindler Assignee: Mark Miller Fix For: 5.0, 4.11 After upgrade tp Apache TIKA 1.6 (SOLR-6488), solr-morphlines-cell tests fail with scripting error messages. Due to missing understanding, caused by the crazy configuration file format and inability to figure out the test setup, I have to give up and hope that somebody else can take care. In addition, on my own machines, all of Hadoop does not work at all, so I cannot debug (Windows). The whole Morphlines setup is not really good, because Solr core depends on another TIKA version than the included morphlines libraries. This is not a good situation for Solr, because we should be able to upgrade to any version of our core components and not depend on external libraries that itsself depend on Solr in older versions! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6492) Solr field type that supports multiple, dynamic analyzers
[ https://issues.apache.org/jira/browse/SOLR-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126258#comment-14126258 ] Trey Grainger commented on SOLR-6492: - I previously implemented this field type when writing chapter 14 of _Solr in Action_, but I would like to make some improvements and then submit the code back to Solr to (hopefully) be committed. The current code from _Solr in Action_ can be found here: [https://github.com/treygrainger/solr-in-action/tree/first-edition/src/main/java/sia/ch14] To use the current version, you would do the following: 1) Add the following to schema.xml: fieldType name=multiText class=sia.ch14.MultiTextField sortMissingLast=true defaultFieldType=text_general fieldMappings=en:text_english, es:text_spanish, fr:text_french, fr:text_german/ * field name=someMultiTextField type=multiText indexed=true multiValued=true / *note that text_spanish, text_english, text_french, and text_german refer to field types which are defined elsewhere in the schema.xml: 2) Index a document with a field containing multilingual text using syntax like one of the following: field name=someMultiTextFieldsome text/field ** field name=someMultiTextFielden|some text/field field name=someMultiTextFieldes|some more text/field field name=someMultiTextFieldde,fr|some other text/field **uses the default analyzer 3) submit a query specifying which language you want to query in: /select?q=someMultiTextField:en,de|keyword_goes_here -- Improvements to be made before the patch is finalized: 1) Make it possible to specify the field type mappings in the field name instead of the field value: field name=someMultiTextFieldde,fr|some other text/field /select?q=a bunch of keywords heredf=someMultiTextField|en,de This makes querying easier, because the languages can be detected prior to parsing of the query, which prevents prefixes from having to be substituted on each query term (which is cost-prohibitive for most because it effectively means pre-parsing the query before it goes to Solr). 2) Enable support for switching between stacking token streams from each analyzer (good default because it mostly respects position increments across languages and minimizes duplicate tokens in the index) and concatenating token streams. 3) Possibly add the ability to switch analyzers in the middle of input text: field name=someMultiTextFieldde,fr|some other el|text/field 4) Extensive unit testing Solr field type that supports multiple, dynamic analyzers - Key: SOLR-6492 URL: https://issues.apache.org/jira/browse/SOLR-6492 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Trey Grainger Fix For: 4.11 A common request - particularly for multilingual search - is to be able to support one or more dynamically-selected analyzers for a field. For example, someone may have a content field and pass in a document in Greek (using an Analyzer with Tokenizer/Filters for German), a separate document in English (using an English Analyzer), and possibly even a field with mixed-language content in Greek and English. This latter case could pass the content separately through both an analyzer defined for Greek and another Analyzer defined for English, stacking or concatenating the token streams based upon the use-case. There are some distinct advantages in terms of index size and query performance which can be obtained by stacking terms from multiple analyzers in the same field instead of duplicating content in separate fields and searching across multiple fields. Other non-multilingual use cases may include things like switching to a different analyzer for the same field to remove a feature (i.e. turning on/off query-time synonyms against the same field on a per-query basis). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6489) morphlines-cell tests fail after upgrade to TIKA 1.6
[ https://issues.apache.org/jira/browse/SOLR-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126269#comment-14126269 ] Mark Miller commented on SOLR-6489: --- The relevant change to Tika that caused this appears to be https://issues.apache.org/jira/browse/TIKA-1280 GZip now has an official mimetype, which refers to https://tools.ietf.org/html/rfc6713. A list of alternates before the official mimetype can be found off https://issues.apache.org/jira/browse/TIKA-1282 Additional Gzip types. morphlines-cell tests fail after upgrade to TIKA 1.6 Key: SOLR-6489 URL: https://issues.apache.org/jira/browse/SOLR-6489 Project: Solr Issue Type: Bug Components: Tests Affects Versions: 4.11 Reporter: Uwe Schindler Assignee: Mark Miller Fix For: 5.0, 4.11 After upgrade tp Apache TIKA 1.6 (SOLR-6488), solr-morphlines-cell tests fail with scripting error messages. Due to missing understanding, caused by the crazy configuration file format and inability to figure out the test setup, I have to give up and hope that somebody else can take care. In addition, on my own machines, all of Hadoop does not work at all, so I cannot debug (Windows). The whole Morphlines setup is not really good, because Solr core depends on another TIKA version than the included morphlines libraries. This is not a good situation for Solr, because we should be able to upgrade to any version of our core components and not depend on external libraries that itsself depend on Solr in older versions! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6489) morphlines-cell tests fail after upgrade to TIKA 1.6
[ https://issues.apache.org/jira/browse/SOLR-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126293#comment-14126293 ] Uwe Schindler commented on SOLR-6489: - bq. Uwe Schindler, you really should wait longer than a weekend to disable tests and say you don't understand what the problem is because you want to slam in a library upgrade real quick. It's a distributed community, give time for collaboration before you disable tests. I doubt it was critical to jam in Tika 1.6 over the weekend. Sorry, I was not expecting this to fail. And as I said: I was not able to test it locally. morphlines-cell tests fail after upgrade to TIKA 1.6 Key: SOLR-6489 URL: https://issues.apache.org/jira/browse/SOLR-6489 Project: Solr Issue Type: Bug Components: Tests Affects Versions: 4.11 Reporter: Uwe Schindler Assignee: Mark Miller Fix For: 5.0, 4.11 After upgrade tp Apache TIKA 1.6 (SOLR-6488), solr-morphlines-cell tests fail with scripting error messages. Due to missing understanding, caused by the crazy configuration file format and inability to figure out the test setup, I have to give up and hope that somebody else can take care. In addition, on my own machines, all of Hadoop does not work at all, so I cannot debug (Windows). The whole Morphlines setup is not really good, because Solr core depends on another TIKA version than the included morphlines libraries. This is not a good situation for Solr, because we should be able to upgrade to any version of our core components and not depend on external libraries that itsself depend on Solr in older versions! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-6492) Solr field type that supports multiple, dynamic analyzers
[ https://issues.apache.org/jira/browse/SOLR-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126258#comment-14126258 ] Trey Grainger edited comment on SOLR-6492 at 9/8/14 11:55 PM: -- I previously implemented this field type when writing chapter 14 of _Solr in Action_, but I would like to make some improvements and then submit the code back to Solr to (hopefully) be committed. The current code from _Solr in Action_ can be found here: [https://github.com/treygrainger/solr-in-action/tree/first-edition/src/main/java/sia/ch14] To use the current version, you would do the following: 1) Add the following to schema.xml: fieldType name=multiText class=sia.ch14.MultiTextField sortMissingLast=true defaultFieldType=text_general fieldMappings=en:text_english, es:text_spanish, fr:text_french, fr:text_german/ * field name=someMultiTextField type=multiText indexed=true multiValued=true / *note that text_spanish, text_english, text_french, and text_german refer to field types which are defined elsewhere in the schema.xml: 2) Index a document with a field containing multilingual text using syntax like one of the following: field name=someMultiTextFieldsome text/field ** field name=someMultiTextFielden|some text/field field name=someMultiTextFieldes|some more text/field field name=someMultiTextFieldde,fr|some other text/field **uses the default analyzer 3) submit a query specifying which language you want to query in: /select?q=someMultiTextField:en,de|keyword_goes_here -- Improvements to be made before the patch is finalized: 1) Make it possible to specify the field type mappings in the field name instead of the field value: field name=someMultiTextField|de,frsome other text/field /select?q=a bunch of keywords heredf=someMultiTextField|en,de This makes querying easier, because the languages can be detected prior to parsing of the query, which prevents prefixes from having to be substituted on each query term (which is cost-prohibitive for most because it effectively means pre-parsing the query before it goes to Solr). 2) Enable support for switching between stacking token streams from each analyzer (good default because it mostly respects position increments across languages and minimizes duplicate tokens in the index) and concatenating token streams. 3) Possibly add the ability to switch analyzers in the middle of input text: field name=someMultiTextFieldde,fr|some other el|text/field 4) Extensive unit testing was (Author: solrtrey): I previously implemented this field type when writing chapter 14 of _Solr in Action_, but I would like to make some improvements and then submit the code back to Solr to (hopefully) be committed. The current code from _Solr in Action_ can be found here: [https://github.com/treygrainger/solr-in-action/tree/first-edition/src/main/java/sia/ch14] To use the current version, you would do the following: 1) Add the following to schema.xml: fieldType name=multiText class=sia.ch14.MultiTextField sortMissingLast=true defaultFieldType=text_general fieldMappings=en:text_english, es:text_spanish, fr:text_french, fr:text_german/ * field name=someMultiTextField type=multiText indexed=true multiValued=true / *note that text_spanish, text_english, text_french, and text_german refer to field types which are defined elsewhere in the schema.xml: 2) Index a document with a field containing multilingual text using syntax like one of the following: field name=someMultiTextFieldsome text/field ** field name=someMultiTextFielden|some text/field field name=someMultiTextFieldes|some more text/field field name=someMultiTextFieldde,fr|some other text/field **uses the default analyzer 3) submit a query specifying which language you want to query in: /select?q=someMultiTextField:en,de|keyword_goes_here -- Improvements to be made before the patch is finalized: 1) Make it possible to specify the field type mappings in the field name instead of the field value: field name=someMultiTextFieldde,fr|some other text/field /select?q=a bunch of keywords heredf=someMultiTextField|en,de This makes querying easier, because the languages can be detected prior to parsing of the query, which prevents prefixes from having to be substituted on each query term (which is cost-prohibitive for most because it effectively means pre-parsing the query before it goes to Solr). 2) Enable support for switching between stacking token streams from each analyzer (good default because it mostly respects position increments across languages and minimizes duplicate tokens in the index) and
[jira] [Created] (SOLR-6493) stats on multivalued fields don't respect excluded filters
Hoss Man created SOLR-6493: -- Summary: stats on multivalued fields don't respect excluded filters Key: SOLR-6493 URL: https://issues.apache.org/jira/browse/SOLR-6493 Project: Solr Issue Type: Bug Affects Versions: 4.9, 4.8, 4.10 Reporter: Hoss Man Assignee: Hoss Man SOLR-3177 added support to StatsComponent for using the ex local param to exclude tagged filters, but these exclusions have apparently never been correct for multivalued fields -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6493) stats on multivalued fields don't respect excluded filters
[ https://issues.apache.org/jira/browse/SOLR-6493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-6493: --- Attachment: SOLR-6493.patch FWIW: I discovered this while experimenting with some refactoring of StatsComponent to make SOLR-6348's su-tasks more viable. attached patch showing trivial fix and some example test cases for reproducing. nocommits in test just to remind me to make it more robust before committing, hopefully i'll wrap this up tomorrow. stats on multivalued fields don't respect excluded filters -- Key: SOLR-6493 URL: https://issues.apache.org/jira/browse/SOLR-6493 Project: Solr Issue Type: Bug Affects Versions: 4.8, 4.9, 4.10 Reporter: Hoss Man Assignee: Hoss Man Attachments: SOLR-6493.patch SOLR-3177 added support to StatsComponent for using the ex local param to exclude tagged filters, but these exclusions have apparently never been correct for multivalued fields -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6485) ReplicationHandler should have an option to throttle the speed of replication
[ https://issues.apache.org/jira/browse/SOLR-6485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Thacker updated SOLR-6485: Attachment: SOLR-6485.patch New patch. The previous approach was wrong. Now I am using SimpleRateLimiter. Debugging as to why it's not throttling correctly. ReplicationHandler should have an option to throttle the speed of replication - Key: SOLR-6485 URL: https://issues.apache.org/jira/browse/SOLR-6485 Project: Solr Issue Type: Improvement Reporter: Varun Thacker Attachments: SOLR-6485.patch, SOLR-6485.patch The ReplicationHandler should have an option to throttle the speed of replication. It is useful for people who want bring up nodes in their SolrCloud cluster or when have a backup-restore API and not eat up all their network bandwidth while replicating. I am writing a test case and will attach a patch shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Confused about writing a ZK state.
I'm just not getting it. But then again it's late and the code is unfamiliar Anyway, I'm working on SOLR-6491 for which I want to have a preferredLeader property in ZK. I _think_ this fits best as a property in the same place as the leader prop and it would be a boolean. I.e. the cluster state for collection1/shards/shard1/replicas/core_node_2 might have a preferred_leader attribute that could be set to true. This would be totally independent of whether or not leader was true, although they would very often be the same. The preferredLeader is really just supposed to be a hint at leader-election time. Anyway, all this seems well and good but I don't see a convenient way to set/clear a single property in a single node in clusterstate. What I think I'm seeing is that the cluster state is only written by the Overseer and the Overseer doesn't deal with this case yet. Things like updateState seem like they have another purpose. So I'm guessing that I need to write another command for Overseer to implement, something like setnodeprop that takes a collection, shard, node, and one or more (property/propval) pairs. Then, to change the clusterstate I'd put together a ZkNodeProps and put it in the queue returned from Overseer.getInQueue(zkClient). Then wait for it to be processed before declaring victory (actually I'd only have to wait in the test I think). Mostly I'm looking for whether this is on the right track or completely of base. Also giving folks a chance to object before I invest the time and effort in something totally useless. Thanks! Erick - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org