[jira] [Commented] (PYLUCENE-31) JCC Parallel/Multiprocess Compilation + Caching

2014-09-08 Thread Andi Vajda (JIRA)

[ 
https://issues.apache.org/jira/browse/PYLUCENE-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126652#comment-14126652
 ] 

Andi Vajda commented on PYLUCENE-31:


I did indeed eventually forget about this issue. My apologies.
My reservations about maintaining more monkey patches still stand but I need to 
review the patch to see how 'bad' it actually is.

 JCC Parallel/Multiprocess Compilation + Caching
 ---

 Key: PYLUCENE-31
 URL: https://issues.apache.org/jira/browse/PYLUCENE-31
 Project: PyLucene
  Issue Type: Improvement
 Environment: Linux 3.11.0-19-generic #33-Ubuntu SMP x86_64 GNU/Linux
Reporter: Lee Skillen
Priority: Minor
  Labels: build, cache, ccache, distutils, jcc, parallel
 Attachments: feature-parallel-build.patch


 JCC utilises distutils.Extension() in order to build JCC itself and the 
 packages that it generates for Java wrapping - Unfortunately distutils 
 performs its build sequentially and doesn't take advantage of any additional 
 free cores for parallel building.  As discussed on the list this is likely a 
 design decision due to potential issues that may arise when building projects 
 with awkward, cyclic or recursive dependencies.
 These issues shouldn't appear within JCC-based projects because of the 
 generative nature of the build; i.e. all dependencies are resolved and 
 generated prior to building, and the build process itself is about 
 compilation and construction of the wrapper alone, of which the wrapper files 
 are contained to a sequence of flattened compilation units.
 Enabling this requires monkey patching of distutils, which was also discussed 
 on the list as being a potential source of issues, although we feel that the 
 risk is likely lower than the current setuptools patching utilised.  This 
 would be optional functionality that is also only enabled if the 
 monkey-patching succeeds.  Distutils itself is also part of the standard 
 library and might be less susceptible to change than setuptools, and the area 
 of code monkey patched almost hasn't changed since 2002 (see: 
 http://hg.python.org/cpython/file/tip/Lib/distutils/ccompiler.py).
 In addition to the distutils changes this patch also includes changes to the 
 wrapper class generation to make it more cache friendly, with the target 
 being that no changes in the wrapped code equals no changes in the wrapper 
 code.  So any changes that minimally change the wrapped code mean that with a 
 tool such as ccache the rebuild time would be significantly reduced (almost 
 to a nth, where n is the number of files and only one has changed).
 Obviously the maintainers would have to assess this risk and decide whether 
 they would like to accept the patch or not.  Code has only been tested on 
 Linux with Python 2.7.5 but should gracefully fail and prevent 
 parallelisation if one of the requirements hasn't been met (not on linux, no 
 multiprocessing support, or monkey patching somehow fails).  The change to 
 caching should still benefit everyone regardless.
 Please note that an additional dependency on orderedset has been added to 
 achieve the more deterministic ordering - This may not be desirable (i.e. 
 another package might be desired, such as ordered-set, or the code might be 
 inlined into the package instead), as per maintainer comments.
 --- [following repeated from mailing list] ---
 Performance Statistics :-
 The following are some quick and dirty statistics for building the jcc 
 pylucene itself (incl. java lucene which accounts for about 30-ish seconds 
 upfront) - The JCC files are split using --files 8, and each build is 
 preceded with a make clean:
 Serial (unpatched):
 real5m1.502s
 user5m22.887s
 sys 0m7.749s
 Parallel (patched, 4 physical cores, 8 hyperthreads, 8 parallel jobs):
 real1m37.382s
 user7m16.658s
 sys 0m8.697s
 Furthermore, some additional changes were made to the wrapped file generation 
 to make the generated code more ccache friendly (additional deterministic 
 sorting for methods and some usage of an ordered set).  With these in place 
 and the CC and CCACHE_COMPILERCHECK environment variables set to ccache gcc 
 and content respectively, and ensuring ccache is installed, subsequent 
 compilation time is reduced again as follows:
 Parallel (patched, 4 physical cores, 8 hyperthreads, 8 parallel jobs, ccache 
 enabled):
 real0m43.051s
 user1m10.392s
 sys 0m4.547s
 This was a run in which nothing changed between runs, so a realistic run in 
 which changes occur it'll be a figure between 0m43.051s and 1m37.382s, 
 depending on how drastic the change was. If many changes are expected and you 
 want to keep it more cache friendly then using a higher --files would 
 probably work (to an extent), or ideally use --files 

RE: [JENKINS] Lucene-Solr-4.x-Linux (32bit/jdk1.7.0_67) - Build # 11066 - Failure!

2014-09-08 Thread Uwe Schindler
Hi,

those 2 tests fail now after TIKA upgrade:
   [junit4]   - 
org.apache.solr.morphlines.cell.SolrCellMorphlineTest.testSolrCellJPGCompressed
   [junit4]   - 
org.apache.solr.morphlines.cell.SolrCellMorphlineTest.testSolrCellDocumentTypes

I have no idea, sorry. It *could* be related to the new metadata items 
(X-Parsed-By) that is now returned by TIKA.
It does not fail on my computer, but this is because it only works on Linux and 
Java 8 is disabled, too.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Policeman Jenkins Server [mailto:jenk...@thetaphi.de]
 Sent: Monday, September 08, 2014 1:45 AM
 To: u...@thetaphi.de; dev@lucene.apache.org
 Subject: [JENKINS] Lucene-Solr-4.x-Linux (32bit/jdk1.7.0_67) - Build # 11066 -
 Failure!
 
 Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/11066/
 Java: 32bit/jdk1.7.0_67 -client -XX:+UseConcMarkSweepGC
 
 2 tests failed.
 FAILED:
 org.apache.solr.morphlines.cell.SolrCellMorphlineTest.testSolrCellJPGCompr
 essed
 
 Error Message:
 Cannot execute script
 
 Stack Trace:
 org.kitesdk.morphline.api.MorphlineRuntimeException: Cannot execute
 script
   at
 __randomizedtesting.SeedInfo.seed([8A821065A1A75AD1:AE86C65E2124C91
 ]:0)
   at
 org.kitesdk.morphline.stdlib.JavaBuilder$Java.doProcess(JavaBuilder.java:98
 )
   at
 org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.j
 ava:156)
   at org.kitesdk.morphline.base.Connector.process(Connector.java:64)
   at
 org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractComman
 d.java:181)
   at
 org.kitesdk.morphline.stdlib.SeparateAttachmentsBuilder$SeparateAttachm
 ents.doProcess(SeparateAttachmentsBuilder.java:79)
   at
 org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.j
 ava:156)
   at
 org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractComman
 d.java:181)
   at
 org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.j
 ava:156)
   at
 org.apache.solr.morphlines.solr.AbstractSolrMorphlineTestBase.load(Abstra
 ctSolrMorphlineTestBase.java:197)
   at
 org.apache.solr.morphlines.solr.AbstractSolrMorphlineTestBase.testDocume
 ntTypesInternal(AbstractSolrMorphlineTestBase.java:168)
   at
 org.apache.solr.morphlines.cell.SolrCellMorphlineTest.testSolrCellJPGCompr
 essed(SolrCellMorphlineTest.java:153)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j
 ava:57)
   at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
 sorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at
 com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(Randomize
 dRunner.java:1618)
   at
 com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(Rando
 mizedRunner.java:827)
   at
 com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(Rando
 mizedRunner.java:863)
   at
 com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(Rando
 mizedRunner.java:877)
   at
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.
 evaluate(SystemPropertiesRestoreRule.java:53)
   at
 org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRule
 SetupTeardownChained.java:50)
   at
 org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCa
 cheSanity.java:51)
   at
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeA
 fterRule.java:46)
   at
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1
 .evaluate(SystemPropertiesInvariantRule.java:55)
   at
 org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleTh
 readAndTestName.java:49)
   at
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRule
 IgnoreAfterMaxFailures.java:65)
   at
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure
 .java:48)
   at
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(Stat
 ementAdapter.java:36)
   at
 com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.
 run(ThreadLeakControl.java:365)
   at
 com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask
 (ThreadLeakControl.java:798)
   at
 com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadL
 eakControl.java:458)
   at
 com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(Ran
 domizedRunner.java:836)
   at
 com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(Rando
 mizedRunner.java:738)
   at
 com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(Rando
 mizedRunner.java:772)
   at
 com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(Rando
 mizedRunner.java:783)
   at
 

[jira] [Resolved] (SOLR-4765) The new Collections API test deleteCollectionWithDownNodes fails often with a server 500 error.

2014-09-08 Thread Anshum Gupta (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anshum Gupta resolved SOLR-4765.

Resolution: Fixed

I haven't seen this issue since 4.9 (considering 4.8 officially released on the 
28th Apr). Marking this as fixed.

 The new Collections API test deleteCollectionWithDownNodes fails often with a 
 server 500 error.
 ---

 Key: SOLR-4765
 URL: https://issues.apache.org/jira/browse/SOLR-4765
 Project: Solr
  Issue Type: Test
  Components: Tests
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: 5.0, 4.9






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-Tests-4.x-Java7 - Build # 2093 - Failure

2014-09-08 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Tests-4.x-Java7/2093/

2 tests failed.
REGRESSION:  
org.apache.solr.morphlines.cell.SolrCellMorphlineTest.testSolrCellJPGCompressed

Error Message:
Cannot execute script

Stack Trace:
org.kitesdk.morphline.api.MorphlineRuntimeException: Cannot execute script
at 
__randomizedtesting.SeedInfo.seed([68C26E082DEE8164:E8A812086E5B9724]:0)
at 
org.kitesdk.morphline.stdlib.JavaBuilder$Java.doProcess(JavaBuilder.java:98)
at 
org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
at org.kitesdk.morphline.base.Connector.process(Connector.java:64)
at 
org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181)
at 
org.kitesdk.morphline.stdlib.SeparateAttachmentsBuilder$SeparateAttachments.doProcess(SeparateAttachmentsBuilder.java:79)
at 
org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
at 
org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181)
at 
org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
at 
org.apache.solr.morphlines.solr.AbstractSolrMorphlineTestBase.load(AbstractSolrMorphlineTestBase.java:197)
at 
org.apache.solr.morphlines.solr.AbstractSolrMorphlineTestBase.testDocumentTypesInternal(AbstractSolrMorphlineTestBase.java:168)
at 
org.apache.solr.morphlines.cell.SolrCellMorphlineTest.testSolrCellJPGCompressed(SolrCellMorphlineTest.java:153)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 

[jira] [Created] (SOLR-6489) morphlines-cell tests fail after upgrade to TIKA 1.6

2014-09-08 Thread Uwe Schindler (JIRA)
Uwe Schindler created SOLR-6489:
---

 Summary: morphlines-cell tests fail after upgrade to TIKA 1.6
 Key: SOLR-6489
 URL: https://issues.apache.org/jira/browse/SOLR-6489
 Project: Solr
  Issue Type: Bug
  Components: Tests
Affects Versions: 4.11
Reporter: Uwe Schindler
 Fix For: 5.0, 4.11


After upgrade tp Apache TIKA 1.6 (SOLR-6488), solr-morphlines-cell tests fail 
with scripting error messages.

Due to missing understanding, caused by the crazy configuration file format and 
inability to figure out the test setup, I have to give up and hope that 
somebody else can take care. In addition, on my own machines, all of Hadoop 
does not work at all, so I cannot debug (Windows).

The whole Morphlines setup is not really good, because Solr core depends on 
another TIKA version than the included morphlines libraries. This is not a good 
situation for Solr, because we should be able to upgrade to any version of our 
core components and not depend on external libraries that itsself depend on 
Solr in older versions!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6488) Upgrade to TIKA 1.6

2014-09-08 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125262#comment-14125262
 ] 

ASF subversion and git services commented on SOLR-6488:
---

Commit 1623308 from [~thetaphi] in branch 'dev/trunk'
[ https://svn.apache.org/r1623308 ]

SOLR-6489: Disable Morphlines-Cell tests, because Update to Tika 1.6 
(SOLR-6488) broke them

 Upgrade to TIKA 1.6
 ---

 Key: SOLR-6488
 URL: https://issues.apache.org/jira/browse/SOLR-6488
 Project: Solr
  Issue Type: Improvement
  Components: contrib - Solr Cell (Tika extraction)
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 5.0, 4.11

 Attachments: SOLR-6488.patch, SOLR-6488.patch, SOLR-6488.patch


 Apache TIKA 1.6 came out yesterday, we should upgrade it.
 The dependencies of bundled Apache POI changed (xmlbeans upgraded, already 
 done. dom4j is obsolete). We have to carefully verify the dependency tree!!!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6489) morphlines-cell tests fail after upgrade to TIKA 1.6

2014-09-08 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125261#comment-14125261
 ] 

ASF subversion and git services commented on SOLR-6489:
---

Commit 1623308 from [~thetaphi] in branch 'dev/trunk'
[ https://svn.apache.org/r1623308 ]

SOLR-6489: Disable Morphlines-Cell tests, because Update to Tika 1.6 
(SOLR-6488) broke them

 morphlines-cell tests fail after upgrade to TIKA 1.6
 

 Key: SOLR-6489
 URL: https://issues.apache.org/jira/browse/SOLR-6489
 Project: Solr
  Issue Type: Bug
  Components: Tests
Affects Versions: 4.11
Reporter: Uwe Schindler
 Fix For: 5.0, 4.11


 After upgrade tp Apache TIKA 1.6 (SOLR-6488), solr-morphlines-cell tests fail 
 with scripting error messages.
 Due to missing understanding, caused by the crazy configuration file format 
 and inability to figure out the test setup, I have to give up and hope that 
 somebody else can take care. In addition, on my own machines, all of Hadoop 
 does not work at all, so I cannot debug (Windows).
 The whole Morphlines setup is not really good, because Solr core depends on 
 another TIKA version than the included morphlines libraries. This is not a 
 good situation for Solr, because we should be able to upgrade to any version 
 of our core components and not depend on external libraries that itsself 
 depend on Solr in older versions!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6489) morphlines-cell tests fail after upgrade to TIKA 1.6

2014-09-08 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125263#comment-14125263
 ] 

ASF subversion and git services commented on SOLR-6489:
---

Commit 1623309 from [~thetaphi] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1623309 ]

Merged revision(s) 1623308 from lucene/dev/trunk:
SOLR-6489: Disable Morphlines-Cell tests, because Update to Tika 1.6 
(SOLR-6488) broke them

 morphlines-cell tests fail after upgrade to TIKA 1.6
 

 Key: SOLR-6489
 URL: https://issues.apache.org/jira/browse/SOLR-6489
 Project: Solr
  Issue Type: Bug
  Components: Tests
Affects Versions: 4.11
Reporter: Uwe Schindler
 Fix For: 5.0, 4.11


 After upgrade tp Apache TIKA 1.6 (SOLR-6488), solr-morphlines-cell tests fail 
 with scripting error messages.
 Due to missing understanding, caused by the crazy configuration file format 
 and inability to figure out the test setup, I have to give up and hope that 
 somebody else can take care. In addition, on my own machines, all of Hadoop 
 does not work at all, so I cannot debug (Windows).
 The whole Morphlines setup is not really good, because Solr core depends on 
 another TIKA version than the included morphlines libraries. This is not a 
 good situation for Solr, because we should be able to upgrade to any version 
 of our core components and not depend on external libraries that itsself 
 depend on Solr in older versions!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6488) Upgrade to TIKA 1.6

2014-09-08 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125264#comment-14125264
 ] 

ASF subversion and git services commented on SOLR-6488:
---

Commit 1623309 from [~thetaphi] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1623309 ]

Merged revision(s) 1623308 from lucene/dev/trunk:
SOLR-6489: Disable Morphlines-Cell tests, because Update to Tika 1.6 
(SOLR-6488) broke them

 Upgrade to TIKA 1.6
 ---

 Key: SOLR-6488
 URL: https://issues.apache.org/jira/browse/SOLR-6488
 Project: Solr
  Issue Type: Improvement
  Components: contrib - Solr Cell (Tika extraction)
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 5.0, 4.11

 Attachments: SOLR-6488.patch, SOLR-6488.patch, SOLR-6488.patch


 Apache TIKA 1.6 came out yesterday, we should upgrade it.
 The dependencies of bundled Apache POI changed (xmlbeans upgraded, already 
 done. dom4j is obsolete). We have to carefully verify the dependency tree!!!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: [JENKINS] Lucene-Solr-4.x-Linux (32bit/jdk1.7.0_67) - Build # 11066 - Failure!

2014-09-08 Thread Uwe Schindler
I opened https://issues.apache.org/jira/browse/SOLR-6489 and disabled the 
Morphlines-Cell test with @AwaitsFix.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Uwe Schindler [mailto:u...@thetaphi.de]
 Sent: Monday, September 08, 2014 8:30 AM
 To: dev@lucene.apache.org
 Subject: RE: [JENKINS] Lucene-Solr-4.x-Linux (32bit/jdk1.7.0_67) - Build #
 11066 - Failure!
 
 Hi,
 
 those 2 tests fail now after TIKA upgrade:
[junit4]   -
 org.apache.solr.morphlines.cell.SolrCellMorphlineTest.testSolrCellJPGCompr
 essed
[junit4]   -
 org.apache.solr.morphlines.cell.SolrCellMorphlineTest.testSolrCellDocument
 Types
 
 I have no idea, sorry. It *could* be related to the new metadata items (X-
 Parsed-By) that is now returned by TIKA.
 It does not fail on my computer, but this is because it only works on Linux 
 and
 Java 8 is disabled, too.
 
 Uwe
 
 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de
 
 
  -Original Message-
  From: Policeman Jenkins Server [mailto:jenk...@thetaphi.de]
  Sent: Monday, September 08, 2014 1:45 AM
  To: u...@thetaphi.de; dev@lucene.apache.org
  Subject: [JENKINS] Lucene-Solr-4.x-Linux (32bit/jdk1.7.0_67) - Build # 11066
 -
  Failure!
 
  Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/11066/
  Java: 32bit/jdk1.7.0_67 -client -XX:+UseConcMarkSweepGC
 
  2 tests failed.
  FAILED:
 
 org.apache.solr.morphlines.cell.SolrCellMorphlineTest.testSolrCellJPGCompr
  essed
 
  Error Message:
  Cannot execute script
 
  Stack Trace:
  org.kitesdk.morphline.api.MorphlineRuntimeException: Cannot execute
  script
  at
 
 __randomizedtesting.SeedInfo.seed([8A821065A1A75AD1:AE86C65E2124C91
  ]:0)
  at
 
 org.kitesdk.morphline.stdlib.JavaBuilder$Java.doProcess(JavaBuilder.java:98
  )
  at
 
 org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.j
  ava:156)
  at org.kitesdk.morphline.base.Connector.process(Connector.java:64)
  at
 
 org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractComman
  d.java:181)
  at
 
 org.kitesdk.morphline.stdlib.SeparateAttachmentsBuilder$SeparateAttachm
  ents.doProcess(SeparateAttachmentsBuilder.java:79)
  at
 
 org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.j
  ava:156)
  at
 
 org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractComman
  d.java:181)
  at
 
 org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.j
  ava:156)
  at
 
 org.apache.solr.morphlines.solr.AbstractSolrMorphlineTestBase.load(Abstra
  ctSolrMorphlineTestBase.java:197)
  at
 
 org.apache.solr.morphlines.solr.AbstractSolrMorphlineTestBase.testDocume
  ntTypesInternal(AbstractSolrMorphlineTestBase.java:168)
  at
 
 org.apache.solr.morphlines.cell.SolrCellMorphlineTest.testSolrCellJPGCompr
  essed(SolrCellMorphlineTest.java:153)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.j
  ava:57)
  at
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
  sorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at
 
 com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(Randomize
  dRunner.java:1618)
  at
 
 com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(Rando
  mizedRunner.java:827)
  at
 
 com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(Rando
  mizedRunner.java:863)
  at
 
 com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(Rando
  mizedRunner.java:877)
  at
 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.
  evaluate(SystemPropertiesRestoreRule.java:53)
  at
 
 org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRule
  SetupTeardownChained.java:50)
  at
 
 org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCa
  cheSanity.java:51)
  at
 
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeA
  fterRule.java:46)
  at
 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1
  .evaluate(SystemPropertiesInvariantRule.java:55)
  at
 
 org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleTh
  readAndTestName.java:49)
  at
 
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRule
  IgnoreAfterMaxFailures.java:65)
  at
 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure
  .java:48)
  at
 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(Stat
  ementAdapter.java:36)
  at
 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.
  run(ThreadLeakControl.java:365)
  at
 
 com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask
  (ThreadLeakControl.java:798)
  at
 
 

[jira] [Closed] (SOLR-6054) Log progress of transaction log replays

2014-09-08 Thread Anshum Gupta (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anshum Gupta closed SOLR-6054.
--
Resolution: Duplicate

 Log progress of transaction log replays
 ---

 Key: SOLR-6054
 URL: https://issues.apache.org/jira/browse/SOLR-6054
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Shalin Shekhar Mangar
Priority: Minor
 Fix For: 5.0, 4.9

 Attachments: SOLR-6054.patch


 There is zero logging of how a transaction log replay is progressing. We 
 should add some simple checkpoint based progress information. Logging the 
 size of the log file at the beginning would also be useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6379) CloudSolrServer can query the wrong replica if a collection has a SolrCore name that matches a collection name.

2014-09-08 Thread Anshum Gupta (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anshum Gupta updated SOLR-6379:
---
Attachment: SOLR-6379.patch

A small optimization to make sure that the querystring is parsed in the 
DispatchFilter only once.

I am still not sure if we have a consensus on this one though. I don't have a 
strong opinion but would want to have this resolution issue fixed and biased 
towards collection name resolution in SolrCloud mode while also have a 
mechanism to resolve a core if need be.

We should either resolve this or at least in BOLD, specify that a user 
shouldn't be reusing a core-name to create a collection (or vice-versa) when 
the core doesn't belong the the very collection.

 CloudSolrServer can query the wrong replica if a collection has a SolrCore 
 name that matches a collection name.
 ---

 Key: SOLR-6379
 URL: https://issues.apache.org/jira/browse/SOLR-6379
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Hoss Man
Assignee: Anshum Gupta
Priority: Minor
 Fix For: 5.0, 4.10

 Attachments: SOLR-6379.patch, SOLR-6379.patch, SOLR-6379.patch, 
 SOLR-6379.patch, SOLR-6379.patch, SOLR-6379.patch, SOLR-6379.patch, 
 SOLR-6379.patch, SOLR-6379.pristine_collection.test.patch


 spin off of SOLR-2894 where sarowe  miller were getting failures from 
 TestCloudPivot that seemed unrelated to any of hte distrib pivot logic itself.
 in particular: adding a call to waitForThingsToLevelOut at the start of the 
 test, even before indexing any docs, seemed to work around the problem -- but 
 even if all replicas aren't yet up when the test starts, we should either get 
 a failure when adding docs (ie: no replica hosting the target shard) or 
 queries should only be routed to the replicas that are up and fully caught up 
 with the rest of the collection.
 (NOTE: we're specifically talking about a situation where the set of docs in 
 the collection is static during the query request)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6154) SolrCloud: facet range option f.field.facet.mincount=1 omits buckets on response

2014-09-08 Thread Ronald Matamoros (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125347#comment-14125347
 ] 

Ronald Matamoros commented on SOLR-6154:


Thanks Eric, I am sorry that I have not being able to contribute further on the 
ticket. Let me know if you want me to test anything on my side.

 SolrCloud: facet range option f.field.facet.mincount=1 omits buckets on 
 response
 --

 Key: SOLR-6154
 URL: https://issues.apache.org/jira/browse/SOLR-6154
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.5.1, 4.8.1
 Environment: Solr 4.5.1 under Linux  - explicit id routing
  Indexed 400,000+ Documents
  explicit routing 
  custom schema.xml
  
 Solr 4.8.1 under Windows+Cygwin
  Indexed 6 Documents
  implicit id routing
  out of the box schema
Reporter: Ronald Matamoros
Assignee: Erick Erickson
 Attachments: HowToReplicate.pdf, data.xml


 Attached
 - PDF with instructions on how to replicate.
 - data.xml to replicate index
 The f.field.facet.mincount option on a distributed search gives 
 inconsistent list of buckets on a range facet.
  
 Experiencing that some buckets are ignored when using the option 
 f.field.facet.mincount=1.
 The Solr logs do not indicate any error or warning during execution.
 The debug=true option and increasing the log levels to the FacetComponent do 
 not provide any hints to the behaviour.
 Replicated the issue on both Solr 4.5.1  4.8.1.
 Example, 
 Removing the f.field.facet.mincount=1 option gives the expected list of 
 buckets for the 6 documents matched.
 lst name=facet_ranges
  lst name=price
lst name=counts
  int name=0.00/int
  int name=50.01/int
  int name=100.00/int
  int name=150.03/int
  int name=200.00/int
  int name=250.01/int
  int name=300.00/int
  int name=350.00/int
  int name=400.00/int
  int name=450.00/int
  int name=500.00/int
  int name=550.00/int
  int name=600.00/int
  int name=650.00/int
  int name=700.00/int
  int name=750.01/int
  int name=800.00/int
  int name=850.00/int
  int name=900.00/int
  int name=950.00/int
/lst
float name=gap50.0/float
float name=start0.0/float
float name=end1000.0/float
int name=before0/int
int name=after0/int
int name=between2/int
  /lst
/lst
 Using the f.field.facet.mincount=1 option removes the 0 count buckets but 
 will also omit bucket int name=250.0
lst name=facet_ranges
   lst name=price
 lst name=counts
 int name=50.01/int
 int name=150.03/int
 int name=750.01/int
  /lst
  float name=gap50.0/float
  float name=start0.0/float
  float name=end1000.0/float
  int name=before0/int
  int name=after0/int
  int name=between4/int
   /lst
 /lst
 Resubmitting the query renders a different bucket list 
 (May need to resubmit a couple times)
lst name=facet_ranges
   lst name=price
 lst name=counts
 int name=150.03/int
 int name=250.01/int
  /lst
  float name=gap50.0/float
  float name=start0.0/float
  float name=end1000.0/float
  int name=before0/int
  int name=after0/int
  int name=between2/int
   /lst
 /lst



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5922) DocValuesDocIdSet is not cacheable

2014-09-08 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125365#comment-14125365
 ] 

ASF subversion and git services commented on LUCENE-5922:
-

Commit 1623345 from [~jpountz] in branch 'dev/trunk'
[ https://svn.apache.org/r1623345 ]

LUCENE-5922: DocValuesDocIdSet is not cacheable.

 DocValuesDocIdSet is not cacheable
 --

 Key: LUCENE-5922
 URL: https://issues.apache.org/jira/browse/LUCENE-5922
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.10
Reporter: Adrien Grand
 Fix For: 4.11

 Attachments: LUCENE-5922.patch


 This DocIdSet claims it is cacheable although bad things could happen if it 
 was cached since it is not thread-safe and keeps handles to open files.
 The fix is simple, especially given that this doc id set is cheap to create. 
 But I'm wondering if there is a way we could protect ourselves from such bugs 
 in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5922) DocValuesDocIdSet is not cacheable

2014-09-08 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125373#comment-14125373
 ] 

ASF subversion and git services commented on LUCENE-5922:
-

Commit 1623349 from [~jpountz] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1623349 ]

LUCENE-5922: FieldCacheDocIdSet is not cacheable.

 DocValuesDocIdSet is not cacheable
 --

 Key: LUCENE-5922
 URL: https://issues.apache.org/jira/browse/LUCENE-5922
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.10
Reporter: Adrien Grand
 Fix For: 4.11

 Attachments: LUCENE-5922.patch


 This DocIdSet claims it is cacheable although bad things could happen if it 
 was cached since it is not thread-safe and keeps handles to open files.
 The fix is simple, especially given that this doc id set is cheap to create. 
 But I'm wondering if there is a way we could protect ourselves from such bugs 
 in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS-MAVEN] Lucene-Solr-Maven-4.x #695: POMs out of sync

2014-09-08 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Maven-4.x/695/

2 tests failed.
REGRESSION:  
org.apache.solr.morphlines.cell.SolrCellMorphlineTest.testSolrCellDocumentTypes

Error Message:
unexpected results in 
/usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-Maven-4.x/maven-build/solr/contrib/morphlines-cell/target/test-classes/test-documents/cars.csv.gz
 expected:21 but was:16

Stack Trace:
java.lang.AssertionError: unexpected results in 
/usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-Maven-4.x/maven-build/solr/contrib/morphlines-cell/target/test-classes/test-documents/cars.csv.gz
 expected:21 but was:16
at 
__randomizedtesting.SeedInfo.seed([1569E599DD021BC7:8FAA4607B8D94512]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.failNotEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:128)
at org.junit.Assert.assertEquals(Assert.java:472)
at 
org.apache.solr.morphlines.solr.AbstractSolrMorphlineTestBase.testDocumentTypesInternal(AbstractSolrMorphlineTestBase.java:175)
at 
org.apache.solr.morphlines.cell.SolrCellMorphlineTest.testSolrCellDocumentTypes(SolrCellMorphlineTest.java:193)


REGRESSION:  
org.apache.solr.morphlines.cell.SolrCellMorphlineTest.testSolrCellJPGCompressed

Error Message:
Cannot execute script

Stack Trace:
org.kitesdk.morphline.api.MorphlineRuntimeException: Cannot execute script
at 
org.kitesdk.morphline.stdlib.TryRulesBuilder$TryRules.doProcess(TryRulesBuilder.java:121)
at 
org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
at org.kitesdk.morphline.base.Connector.process(Connector.java:64)
at 
org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181)
at 
org.kitesdk.morphline.tika.DetectMimeTypeBuilder$DetectMimeType.doProcess(DetectMimeTypeBuilder.java:166)
at 
org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
at org.kitesdk.morphline.base.Connector.process(Connector.java:64)
at 
org.kitesdk.morphline.scriptengine.java.scripts.MyJavaClass4.eval(MyJavaClass4.java:15)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.kitesdk.morphline.scriptengine.java.FastJavaScriptEngine$JavaCompiledScript.eval(FastJavaScriptEngine.java:82)
at 
org.kitesdk.morphline.scriptengine.java.ScriptEvaluator.evaluate(ScriptEvaluator.java:117)
at 
org.kitesdk.morphline.stdlib.JavaBuilder$Java.doProcess(JavaBuilder.java:96)
at 
org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
at org.kitesdk.morphline.base.Connector.process(Connector.java:64)
at 
org.kitesdk.morphline.scriptengine.java.scripts.MyJavaClass3.eval(MyJavaClass3.java:7)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.kitesdk.morphline.scriptengine.java.FastJavaScriptEngine$JavaCompiledScript.eval(FastJavaScriptEngine.java:82)
at 
org.kitesdk.morphline.scriptengine.java.ScriptEvaluator.evaluate(ScriptEvaluator.java:117)
at 
org.kitesdk.morphline.stdlib.JavaBuilder$Java.doProcess(JavaBuilder.java:96)
at 
org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
at org.kitesdk.morphline.base.Connector.process(Connector.java:64)
at 
org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181)
at 
org.kitesdk.morphline.stdlib.SeparateAttachmentsBuilder$SeparateAttachments.doProcess(SeparateAttachmentsBuilder.java:79)
at 
org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
at 
org.kitesdk.morphline.base.AbstractCommand.doProcess(AbstractCommand.java:181)
at 
org.kitesdk.morphline.base.AbstractCommand.process(AbstractCommand.java:156)
at 
org.apache.solr.morphlines.solr.AbstractSolrMorphlineTestBase.load(AbstractSolrMorphlineTestBase.java:197)
at 
org.apache.solr.morphlines.solr.AbstractSolrMorphlineTestBase.testDocumentTypesInternal(AbstractSolrMorphlineTestBase.java:168)
at 
org.apache.solr.morphlines.cell.SolrCellMorphlineTest.testSolrCellJPGCompressed(SolrCellMorphlineTest.java:153)




Build Log:
[...truncated 51140 lines...]
BUILD FAILED
/usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-Maven-4.x/build.xml:514: 
The following error occurred while executing this line:

[jira] [Commented] (SOLR-6441) MoreLikeThis support for stopwords as in Lucene

2014-09-08 Thread Jeroen Steggink (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125431#comment-14125431
 ] 

Jeroen Steggink commented on SOLR-6441:
---

Hi Ramana,

That's exactly what I was thinking of. Thanks!

Cheers,
Jeroen

 MoreLikeThis support for stopwords as in Lucene
 ---

 Key: SOLR-6441
 URL: https://issues.apache.org/jira/browse/SOLR-6441
 Project: Solr
  Issue Type: Improvement
  Components: MoreLikeThis
Affects Versions: 4.9
Reporter: Jeroen Steggink
Priority: Minor
  Labels: difficulty-easy, impact-low, workaround-exists
 Fix For: 4.10, 4.11


 In the Lucene implementation of MoreLikeThis, it's possible to add a list of 
 stopwords which are considered uninteresting and are ignored.
 It would be a great addition to the MoreLikeThisHandler to be able to specify 
 a list of stopwords.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-5922) DocValuesDocIdSet is not cacheable

2014-09-08 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-5922.
--
   Resolution: Fixed
Fix Version/s: 5.0

Thanks Ryan!

 DocValuesDocIdSet is not cacheable
 --

 Key: LUCENE-5922
 URL: https://issues.apache.org/jira/browse/LUCENE-5922
 Project: Lucene - Core
  Issue Type: Bug
Affects Versions: 4.10
Reporter: Adrien Grand
 Fix For: 5.0, 4.11

 Attachments: LUCENE-5922.patch


 This DocIdSet claims it is cacheable although bad things could happen if it 
 was cached since it is not thread-safe and keeps handles to open files.
 The fix is simple, especially given that this doc id set is cheap to create. 
 But I'm wondering if there is a way we could protect ourselves from such bugs 
 in the future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Reopened] (SOLR-6054) Log progress of transaction log replays

2014-09-08 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller reopened SOLR-6054:
---
  Assignee: Mark Miller

Let's keep this open until my last comment is addressed. 

 Log progress of transaction log replays
 ---

 Key: SOLR-6054
 URL: https://issues.apache.org/jira/browse/SOLR-6054
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Shalin Shekhar Mangar
Assignee: Mark Miller
Priority: Minor
 Fix For: 4.9, 5.0

 Attachments: SOLR-6054.patch


 There is zero logging of how a transaction log replay is progressing. We 
 should add some simple checkpoint based progress information. Logging the 
 size of the log file at the beginning would also be useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-NightlyTests-4.x - Build # 614 - Still Failing

2014-09-08 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-4.x/614/

1 tests failed.
FAILED:  org.apache.solr.cloud.CollectionsAPIDistributedZkTest.testDistribSearch

Error Message:
createcollection the collection time out:180s

Stack Trace:
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: 
createcollection the collection time out:180s
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:550)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
at 
org.apache.solr.cloud.CollectionsAPIDistributedZkTest.testErrorHandling(CollectionsAPIDistributedZkTest.java:614)
at 
org.apache.solr.cloud.CollectionsAPIDistributedZkTest.doTest(CollectionsAPIDistributedZkTest.java:205)
at 
org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:871)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 

[jira] [Updated] (SOLR-5480) Make MoreLikeThisHandler distributable

2014-09-08 Thread Steve Molloy (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Molloy updated SOLR-5480:
---
Attachment: SOLR-5480.patch

Patch adapted for 4.10, unit tests pass.

 Make MoreLikeThisHandler distributable
 --

 Key: SOLR-5480
 URL: https://issues.apache.org/jira/browse/SOLR-5480
 Project: Solr
  Issue Type: Improvement
Reporter: Steve Molloy
Assignee: Noble Paul
 Attachments: SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, 
 SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, 
 SOLR-5480.patch, SOLR-5480.patch


 The MoreLikeThis component, when used in the standard search handler supports 
 distributed searches. But the MoreLikeThisHandler itself doesn't, which 
 prevents from say, passing in text to perform the query. I'll start looking 
 into adapting the SearchHandler logic to the MoreLikeThisHandler. If anyone 
 has some work done already and want to share, or want to contribute, any help 
 will be welcomed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5480) Make MoreLikeThisHandler distributable

2014-09-08 Thread Steve Molloy (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125647#comment-14125647
 ] 

Steve Molloy commented on SOLR-5480:


[~claudenm] I just attached a version of the patch adapted for 4.10 release. 
Tests are passing in my environment (Ubuntu 14.04, Oracle JDK 1.7.0_67), I will 
perform some more integrationt tests in our setup and will let you know if I 
see any issue. What were the failures you were seeing? Do you have logs/stack 
traces?

 Make MoreLikeThisHandler distributable
 --

 Key: SOLR-5480
 URL: https://issues.apache.org/jira/browse/SOLR-5480
 Project: Solr
  Issue Type: Improvement
Reporter: Steve Molloy
Assignee: Noble Paul
 Attachments: SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, 
 SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, 
 SOLR-5480.patch, SOLR-5480.patch


 The MoreLikeThis component, when used in the standard search handler supports 
 distributed searches. But the MoreLikeThisHandler itself doesn't, which 
 prevents from say, passing in text to perform the query. I'll start looking 
 into adapting the SearchHandler logic to the MoreLikeThisHandler. If anyone 
 has some work done already and want to share, or want to contribute, any help 
 will be welcomed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5961) Solr gets crazy on /overseer/queue state change

2014-09-08 Thread Frans Lawaetz (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125675#comment-14125675
 ] 

Frans Lawaetz commented on SOLR-5961:
-

Also saw this issue as well.  Solr filled 2T worth of disk, recording hundreds 
of those messages per ms.  In our case it may have been related to zkeeper 
/overseer/queue becoming overloaded with znodes such that it was unable to 
function normally.  

 Solr gets crazy on /overseer/queue state change
 ---

 Key: SOLR-5961
 URL: https://issues.apache.org/jira/browse/SOLR-5961
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.7.1
 Environment: CentOS, 1 shard - 3 replicas, ZK cluster with 3 nodes 
 (separate machines)
Reporter: Maxim Novikov
Priority: Critical

 No idea how to reproduce it, but sometimes Solr stars littering the log with 
 the following messages:
 419158 [localhost-startStop-1-EventThread] INFO  
 org.apache.solr.cloud.DistributedQueue  ? LatchChildWatcher fired on path: 
 /overseer/queue state: SyncConnected type NodeChildrenChanged
 419190 [Thread-3] INFO  org.apache.solr.cloud.Overseer  ? Update state 
 numShards=1 message={
   operation:state,
   state:recovering,
   base_url:http://${IP_ADDRESS}/solr;,
   core:${CORE_NAME},
   roles:null,
   node_name:${NODE_NAME}_solr,
   shard:shard1,
   collection:${COLLECTION_NAME},
   numShards:1,
   core_node_name:core_node2}
 It continues spamming these messages with no delay and the restarting of all 
 the nodes does not help. I have even tried to stop all the nodes in the 
 cluster first, but then when I start one, the behavior doesn't change, it 
 gets crazy nuts with this  /overseer/queue state again.
 PS The only way to handle this was to stop everything, manually clean up all 
 the data in ZooKeeper related to Solr, and then rebuild everything from 
 scratch. As you should understand, it is kinda unbearable in the production 
 environment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5925) Use rename instead of segments_N fallback / segments.gen etc

2014-09-08 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5925:

Attachment: LUCENE-5925.patch

Updated patch. This also solves LUCENE-2585.

Today, segments.gen helps with the fact that directory listing is not point 
in time (its a weakly consistent iterator and can reflect changes that happen 
during iteration). But it cannot be totally relied upon due to timing (you can 
get unlucky like LUCENE-2585).

Instead, in FindSegmentsFile, we simply detect that contents have changed 
during the execution of listAll, by executing it again and doing a comparison. 
This way we can detect ConcurrentModificationException and just continue the 
loop.

 Use rename instead of segments_N fallback / segments.gen etc
 

 Key: LUCENE-5925
 URL: https://issues.apache.org/jira/browse/LUCENE-5925
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index, core/store
Reporter: Robert Muir
  Labels: Java7
 Attachments: LUCENE-5925.patch, LUCENE-5925.patch


 Our commit logic is strange, we write corrupted commit points and only on the 
 last phase of commit do we correct them.
 This means the logic to get the latest commit is always scary and awkward, 
 since it must deal with partial commits, and try to determine if it should 
 fall back to segments_N-1 or actually relay an exception. 
 This logic is incomplete/sheisty as we, e.g. i think we only fall back so far 
 (at most one).
 If we somehow screw up in all this logic do the wrong thing, then we lose 
 data (e.g. LUCENE-4870 wiped entire index because of TooManyOpenFiles).
 We now require java 7, i think we should expore instead writing 
 {{pending_segments_N}} and then in finishCommit() doing an atomic rename to 
 {{segments_N}}. 
 We could then remove all the complex fallback logic completely, since we no 
 longer have to deal with ignoring partial commits, instead simply 
 delivering any exception we get when trying to read the commit and sleep 
 better at night.
 In java 7, we have the apis for this (ATOMIC_MOVE).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6489) morphlines-cell tests fail after upgrade to TIKA 1.6

2014-09-08 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125687#comment-14125687
 ] 

Mark Miller commented on SOLR-6489:
---

[~thetaphi], you really should wait longer than a weekend to disable tests and 
say you don't understand what the problem is because you want to slam in a 
library upgrade real quick. It's a distributed community, give time for 
collaboration before you disable tests. I doubt it was critical to jam in Tika 
1.6 over the weekend.

I'll look into the issue.

 morphlines-cell tests fail after upgrade to TIKA 1.6
 

 Key: SOLR-6489
 URL: https://issues.apache.org/jira/browse/SOLR-6489
 Project: Solr
  Issue Type: Bug
  Components: Tests
Affects Versions: 4.11
Reporter: Uwe Schindler
 Fix For: 5.0, 4.11


 After upgrade tp Apache TIKA 1.6 (SOLR-6488), solr-morphlines-cell tests fail 
 with scripting error messages.
 Due to missing understanding, caused by the crazy configuration file format 
 and inability to figure out the test setup, I have to give up and hope that 
 somebody else can take care. In addition, on my own machines, all of Hadoop 
 does not work at all, so I cannot debug (Windows).
 The whole Morphlines setup is not really good, because Solr core depends on 
 another TIKA version than the included morphlines libraries. This is not a 
 good situation for Solr, because we should be able to upgrade to any version 
 of our core components and not depend on external libraries that itsself 
 depend on Solr in older versions!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-4.x-Linux (32bit/jdk1.9.0-ea-b28) - Build # 11071 - Failure!

2014-09-08 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/11071/
Java: 32bit/jdk1.9.0-ea-b28 -client -XX:+UseG1GC

3 tests failed.
FAILED:  
junit.framework.TestSuite.org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServerTest

Error Message:
5 threads leaked from SUITE scope at 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServerTest: 1) 
Thread[id=30, name=testCUSS-3-thread-5, state=TIMED_WAITING, 
group=TGRP-ConcurrentUpdateSolrServerTest] at 
sun.misc.Unsafe.park(Native Method) at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) 
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
 at 
java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:385)
 at 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer.request(ConcurrentUpdateSolrServer.java:354)
 at 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServerTest$SendDocsRunnable.run(ConcurrentUpdateSolrServerTest.java:220)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
at java.lang.Thread.run(Thread.java:745)2) Thread[id=28, 
name=testCUSS-3-thread-3, state=TIMED_WAITING, 
group=TGRP-ConcurrentUpdateSolrServerTest] at 
sun.misc.Unsafe.park(Native Method) at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) 
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
 at 
java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:385)
 at 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer.request(ConcurrentUpdateSolrServer.java:354)
 at 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServerTest$SendDocsRunnable.run(ConcurrentUpdateSolrServerTest.java:220)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
at java.lang.Thread.run(Thread.java:745)3) Thread[id=26, 
name=testCUSS-3-thread-1, state=TIMED_WAITING, 
group=TGRP-ConcurrentUpdateSolrServerTest] at 
sun.misc.Unsafe.park(Native Method) at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) 
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
 at 
java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:385)
 at 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer.request(ConcurrentUpdateSolrServer.java:354)
 at 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServerTest$SendDocsRunnable.run(ConcurrentUpdateSolrServerTest.java:220)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
at java.lang.Thread.run(Thread.java:745)4) Thread[id=27, 
name=testCUSS-3-thread-2, state=TIMED_WAITING, 
group=TGRP-ConcurrentUpdateSolrServerTest] at 
sun.misc.Unsafe.park(Native Method) at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) 
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
 at 
java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:385)
 at 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer.request(ConcurrentUpdateSolrServer.java:354)
 at 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServerTest$SendDocsRunnable.run(ConcurrentUpdateSolrServerTest.java:220)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
at java.lang.Thread.run(Thread.java:745)5) Thread[id=29, 
name=testCUSS-3-thread-4, state=TIMED_WAITING, 
group=TGRP-ConcurrentUpdateSolrServerTest] at 
sun.misc.Unsafe.park(Native Method) at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) 
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
 at 
java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:385)
 at 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer.request(ConcurrentUpdateSolrServer.java:354)
 at 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServerTest$SendDocsRunnable.run(ConcurrentUpdateSolrServerTest.java:220)
 at 

[jira] [Updated] (SOLR-6386) make secondary ordering of facet.field values (and facet.pivot?) consistently deterministic

2014-09-08 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-6386:
-
Assignee: (was: Erick Erickson)

 make secondary ordering of facet.field values (and facet.pivot?) consistently 
 deterministic
 ---

 Key: SOLR-6386
 URL: https://issues.apache.org/jira/browse/SOLR-6386
 Project: Solr
  Issue Type: Improvement
Reporter: Hoss Man

 as a fluke of how the SOLR-2894 patch evolved, it wound up adding a bit of 
 testing of distributed facet.field on date fields (see [r1617789 changes to 
 TestDistributedSearch|https://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/test/org/apache/solr/TestDistributedSearch.java?r1=1617789r2=1617788pathrev=1617789])
  ... but this started triggering some random failures due to facet 
 constraints with identical values being sorted differently between the 
 distributed query and the single node control query.
 We should make the facet.field (and facet.pivot) code order constraints with 
 tied counts consistently regardless of whether it's a distrib search or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6386) make secondary ordering of facet.field values (and facet.pivot?) consistently deterministic

2014-09-08 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125694#comment-14125694
 ] 

Erick Erickson commented on SOLR-6386:
--

[~hossman_luc...@fucit.org] Some things I found out this weekend:
[~markrmil...@gmail.com] Pinging you on this because I half suspect that 
there's something weird with the test infrastructure.

Frankly I'm at a loss, but here's the outstanding things I saw. I'm pretty sure 
my question of whether this would just get taken care of by the stuff I'm 
doing for SOLR-6187 is no, so I'm assigning it back to nobody. Adding the 
facet.limit=1 in the test makes the problem disappear just b/c all the bogus 0 
counts that get returned are removed.

 If I optimize the clients and control server in 
 BaseDistributedSearchTestCase.commit, then this test case does NOT fail. But 
 I must optimize both. If I just optimize the control, it fails. If I just 
 optimize the clients it fails. This really weirds me out. I suspected pilot 
 error here frankly, so I just tried it again and I'm pretty sure I'm not 
 hallucinating. I'd expect optimizing the distributed case would fix this up 
 but no. So I wonder if there's something weird here with RAMDirectory 
 which underpins the servers Although just for yucks I tried using a 
 disk-based directory and it still seemed to fail although I won't swear that 
 I got it right.

 I set up IntelliJ with the seeds etc. you provided and it's not until the 
 third pass that it fails. But it fails every time on the third pass. Ditto 
 with running the test from the command shell.

 in DocValuesFacet.getCount, around line 200 or so I'm printing out the values 
 added. This is near the bottom of the clause:
if (sort.equals(FacetParams.FACET_SORT_COUNT) || 
sort.equals(FacetParams.FACET_SORT_COUNT_LEGACY)) {
... near the end
} else...

On the pass that fails, I get these values:
   [junit4]   1 QUERY DUMP 1 Adding string/count 2010-05-03T11:00:00Z 1
   [junit4]   1 QUERY DUMP 1 Adding string/count 2010-05-03T11:00:00Z 0
   [junit4]   1 QUERY DUMP 1 Adding string/count 2010-04-20T11:00:00Z 1
   [junit4]   1 QUERY DUMP 1 Adding string/count 2010-05-03T10:59:56.032Z 0
   [junit4]   1 QUERY DUMP 1 Adding string/count 2010-05-03T11:00:00Z 1
   [junit4]   1 QUERY DUMP 1 Adding string/count 2010-05-03T10:57:12.192Z 0
   [junit4]   1 QUERY DUMP 1 Adding string/count 2010-05-02T11:00:00Z 1
   [junit4]   1 QUERY DUMP 1 Adding string/count 2010-05-03T07:10:00.704Z 0
   [junit4]   1 QUERY DUMP 1 Adding string/count 2010-05-05T11:00:00Z 1
   [junit4]   1 QUERY DUMP 1 Adding string/count 2010-04-27T16:01:01.44Z 0
   [junit4]   1 QUERY DUMP 1 Adding string/count 2009-03-13T13:23:01.248Z 0
   [junit4]   1 QUERY DUMP 1 Adding string/count 1970-01-01T00:00:00Z 0
   [junit4]   1 QUERY DUMP 1 Adding string/count 1970-01-01T00:00:00Z 0
   [junit4]   1 QUERY DUMP 1 Adding string/count 1970-01-01T00:00:00Z 0
   [junit4]   1 QUERY DUMP 1 Adding string/count 1970-01-01T00:00:00Z 0

Notice the Jan-1, 1970. dates. Sure seems like a zero snuck in there somewhere. 
If you sum up the non-zero counts, you wind up with the right facet counts.

On the pass that's optimized, I get this on the third pass which is consistent 
with what the control server gives back, thus it passes.:
   [junit4]   1 QUERY DUMP 1 Adding string/count 2010-04-20T11:00:00Z 1
   [junit4]   1 QUERY DUMP 1 Adding string/count 2010-05-03T11:00:00Z 1
   [junit4]   1 QUERY DUMP 1 Adding string/count 2010-05-03T11:00:00Z 1
   [junit4]   1 QUERY DUMP 1 Adding string/count 2010-05-02T11:00:00Z 1
   [junit4]   1 QUERY DUMP 1 Adding string/count 2010-05-05T11:00:00Z 1

Anyway, this is beyond what I want to deal with just now. Let me know if 
there's anything else I can provide. 


 make secondary ordering of facet.field values (and facet.pivot?) consistently 
 deterministic
 ---

 Key: SOLR-6386
 URL: https://issues.apache.org/jira/browse/SOLR-6386
 Project: Solr
  Issue Type: Improvement
Reporter: Hoss Man
Assignee: Erick Erickson

 as a fluke of how the SOLR-2894 patch evolved, it wound up adding a bit of 
 testing of distributed facet.field on date fields (see [r1617789 changes to 
 TestDistributedSearch|https://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/test/org/apache/solr/TestDistributedSearch.java?r1=1617789r2=1617788pathrev=1617789])
  ... but this started triggering some random failures due to facet 
 constraints with identical values being sorted differently between the 
 distributed query and the single node control query.
 We should make the facet.field (and facet.pivot) code order constraints with 
 tied counts consistently regardless of whether it's a distrib search or not.



--
This message was sent by Atlassian JIRA

[jira] [Assigned] (SOLR-6489) morphlines-cell tests fail after upgrade to TIKA 1.6

2014-09-08 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller reassigned SOLR-6489:
-

Assignee: Mark Miller

 morphlines-cell tests fail after upgrade to TIKA 1.6
 

 Key: SOLR-6489
 URL: https://issues.apache.org/jira/browse/SOLR-6489
 Project: Solr
  Issue Type: Bug
  Components: Tests
Affects Versions: 4.11
Reporter: Uwe Schindler
Assignee: Mark Miller
 Fix For: 5.0, 4.11


 After upgrade tp Apache TIKA 1.6 (SOLR-6488), solr-morphlines-cell tests fail 
 with scripting error messages.
 Due to missing understanding, caused by the crazy configuration file format 
 and inability to figure out the test setup, I have to give up and hope that 
 somebody else can take care. In addition, on my own machines, all of Hadoop 
 does not work at all, so I cannot debug (Windows).
 The whole Morphlines setup is not really good, because Solr core depends on 
 another TIKA version than the included morphlines libraries. This is not a 
 good situation for Solr, because we should be able to upgrade to any version 
 of our core components and not depend on external libraries that itsself 
 depend on Solr in older versions!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6489) morphlines-cell tests fail after upgrade to TIKA 1.6

2014-09-08 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125710#comment-14125710
 ] 

Mark Miller commented on SOLR-6489:
---

Bummer...seems like when someone re-factored the config for these tests, they 
stopped working from the IDE.


 morphlines-cell tests fail after upgrade to TIKA 1.6
 

 Key: SOLR-6489
 URL: https://issues.apache.org/jira/browse/SOLR-6489
 Project: Solr
  Issue Type: Bug
  Components: Tests
Affects Versions: 4.11
Reporter: Uwe Schindler
Assignee: Mark Miller
 Fix For: 5.0, 4.11


 After upgrade tp Apache TIKA 1.6 (SOLR-6488), solr-morphlines-cell tests fail 
 with scripting error messages.
 Due to missing understanding, caused by the crazy configuration file format 
 and inability to figure out the test setup, I have to give up and hope that 
 somebody else can take care. In addition, on my own machines, all of Hadoop 
 does not work at all, so I cannot debug (Windows).
 The whole Morphlines setup is not really good, because Solr core depends on 
 another TIKA version than the included morphlines libraries. This is not a 
 good situation for Solr, because we should be able to upgrade to any version 
 of our core components and not depend on external libraries that itsself 
 depend on Solr in older versions!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6489) morphlines-cell tests fail after upgrade to TIKA 1.6

2014-09-08 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125723#comment-14125723
 ] 

Mark Miller commented on SOLR-6489:
---

Okay, test fails from command line show that .gz files do not appear to be 
getting extracted properly anymore. If I remove those files, the rest of the 
tests pass.

 morphlines-cell tests fail after upgrade to TIKA 1.6
 

 Key: SOLR-6489
 URL: https://issues.apache.org/jira/browse/SOLR-6489
 Project: Solr
  Issue Type: Bug
  Components: Tests
Affects Versions: 4.11
Reporter: Uwe Schindler
Assignee: Mark Miller
 Fix For: 5.0, 4.11


 After upgrade tp Apache TIKA 1.6 (SOLR-6488), solr-morphlines-cell tests fail 
 with scripting error messages.
 Due to missing understanding, caused by the crazy configuration file format 
 and inability to figure out the test setup, I have to give up and hope that 
 somebody else can take care. In addition, on my own machines, all of Hadoop 
 does not work at all, so I cannot debug (Windows).
 The whole Morphlines setup is not really good, because Solr core depends on 
 another TIKA version than the included morphlines libraries. This is not a 
 good situation for Solr, because we should be able to upgrade to any version 
 of our core components and not depend on external libraries that itsself 
 depend on Solr in older versions!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6187) facet.mincount ignored in range faceting using distributed search

2014-09-08 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125725#comment-14125725
 ] 

ASF subversion and git services commented on SOLR-6187:
---

Commit 1623429 from [~erickoerickson] in branch 'dev/trunk'
[ https://svn.apache.org/r1623429 ]

SOLR-6187: facet.mincount ignored in range faceting using distributed search

 facet.mincount ignored in range faceting using distributed search
 -

 Key: SOLR-6187
 URL: https://issues.apache.org/jira/browse/SOLR-6187
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 4.8, 4.8.1
Reporter: Zaccheo Bagnati
Assignee: Erick Erickson
 Attachments: SOLR-6187.patch, SOLR-6187.patch, SOLR-6187.patch, 
 SOLR-6187.patch, SOLR-6187.patch


 While I was trying to do a range faceting with gap +1YEAR using shards, I 
 noticed that facet.mincount parameter seems to be ignored.
 Issue can be reproduced in this way:
 Create 2 cores testshard1 and testshard2 with:
 solrconfig.xml
 ?xml version=1.0 encoding=UTF-8 ?
 config
   luceneMatchVersionLUCENE_41/luceneMatchVersion
   lib dir=/opt/solr/dist regex=solr-cell-.*\.jar/
   directoryFactory name=DirectoryFactory 
 class=${solr.directoryFactory:solr.NRTCachingDirectoryFactory}/
   updateHandler class=solr.DirectUpdateHandler2 /
   requestHandler name=/select class=solr.SearchHandler
  lst name=defaults
str name=echoParamsexplicit/str
int name=rows10/int
str name=dfid/str
  /lst
   /requestHandler
   requestHandler name=/update class=solr.UpdateRequestHandler  /
   requestHandler name=/admin/ 
 class=org.apache.solr.handler.admin.AdminHandlers /
   requestHandler name=/admin/ping class=solr.PingRequestHandler
 lst name=invariants
   str name=qsolrpingquery/str
 /lst
 lst name=defaults
   str name=echoParamsall/str
 /lst
   /requestHandler
 /config
 schema.xml
 ?xml version=1.0 ?
 schema name=${solr.core.name} version=1.5 
 xmlns:xi=http://www.w3.org/2001/XInclude;
   fieldType name=int class=solr.TrieIntField precisionStep=0 
 positionIncrementGap=0/
   fieldType name=long class=solr.TrieLongField precisionStep=0 
 positionIncrementGap=0/
   fieldType name=date class=solr.TrieDateField precisionStep=0 
 positionIncrementGap=0/
   field name=_version_ type=long indexed=true  stored=true/
   field name=id type=int indexed=true stored=true 
 multiValued=false /
   field name=date type=date indexed=true stored=true 
 multiValued=false /
   uniqueKeyid/uniqueKey
   defaultSearchFieldid/defaultSearchField
 /schema
 Insert in testshard1:
 add
  doc
   field name=id1/field
   field name=date2014-06-20T12:51:00Z/field
  /doc
 /add
 Insert into testshard2:
 add
  doc
   field name=id2/field
   field name=date2013-06-20T12:51:00Z/field
  /doc
 /add
 Now if I execute:
 curl 
 http://localhost:8983/solr/testshard1/select?q=id:1facet=truefacet.mincount=1facet.range=datef.date.facet.range.start=1900-01-01T00:00:00Zf.date.facet.range.end=NOWf.date.facet.range.gap=%2B1YEARshards=localhost%3A8983%2Fsolr%2Ftestshard1%2Clocalhost%3A8983%2Fsolr%2Ftestshard2shards.info=truewt=json;
 I obtain:
 

[jira] [Updated] (SOLR-6187) facet.mincount ignored in range faceting using distributed search

2014-09-08 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-6187:
-
Attachment: SOLR-6187.patch

With CHANGES.txt entries

 facet.mincount ignored in range faceting using distributed search
 -

 Key: SOLR-6187
 URL: https://issues.apache.org/jira/browse/SOLR-6187
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 4.8, 4.8.1
Reporter: Zaccheo Bagnati
Assignee: Erick Erickson
 Attachments: SOLR-6187.patch, SOLR-6187.patch, SOLR-6187.patch, 
 SOLR-6187.patch, SOLR-6187.patch, SOLR-6187.patch


 While I was trying to do a range faceting with gap +1YEAR using shards, I 
 noticed that facet.mincount parameter seems to be ignored.
 Issue can be reproduced in this way:
 Create 2 cores testshard1 and testshard2 with:
 solrconfig.xml
 ?xml version=1.0 encoding=UTF-8 ?
 config
   luceneMatchVersionLUCENE_41/luceneMatchVersion
   lib dir=/opt/solr/dist regex=solr-cell-.*\.jar/
   directoryFactory name=DirectoryFactory 
 class=${solr.directoryFactory:solr.NRTCachingDirectoryFactory}/
   updateHandler class=solr.DirectUpdateHandler2 /
   requestHandler name=/select class=solr.SearchHandler
  lst name=defaults
str name=echoParamsexplicit/str
int name=rows10/int
str name=dfid/str
  /lst
   /requestHandler
   requestHandler name=/update class=solr.UpdateRequestHandler  /
   requestHandler name=/admin/ 
 class=org.apache.solr.handler.admin.AdminHandlers /
   requestHandler name=/admin/ping class=solr.PingRequestHandler
 lst name=invariants
   str name=qsolrpingquery/str
 /lst
 lst name=defaults
   str name=echoParamsall/str
 /lst
   /requestHandler
 /config
 schema.xml
 ?xml version=1.0 ?
 schema name=${solr.core.name} version=1.5 
 xmlns:xi=http://www.w3.org/2001/XInclude;
   fieldType name=int class=solr.TrieIntField precisionStep=0 
 positionIncrementGap=0/
   fieldType name=long class=solr.TrieLongField precisionStep=0 
 positionIncrementGap=0/
   fieldType name=date class=solr.TrieDateField precisionStep=0 
 positionIncrementGap=0/
   field name=_version_ type=long indexed=true  stored=true/
   field name=id type=int indexed=true stored=true 
 multiValued=false /
   field name=date type=date indexed=true stored=true 
 multiValued=false /
   uniqueKeyid/uniqueKey
   defaultSearchFieldid/defaultSearchField
 /schema
 Insert in testshard1:
 add
  doc
   field name=id1/field
   field name=date2014-06-20T12:51:00Z/field
  /doc
 /add
 Insert into testshard2:
 add
  doc
   field name=id2/field
   field name=date2013-06-20T12:51:00Z/field
  /doc
 /add
 Now if I execute:
 curl 
 http://localhost:8983/solr/testshard1/select?q=id:1facet=truefacet.mincount=1facet.range=datef.date.facet.range.start=1900-01-01T00:00:00Zf.date.facet.range.end=NOWf.date.facet.range.gap=%2B1YEARshards=localhost%3A8983%2Fsolr%2Ftestshard1%2Clocalhost%3A8983%2Fsolr%2Ftestshard2shards.info=truewt=json;
 I obtain:
 

[jira] [Created] (SOLR-6490) ValueSourceParser function max does not handle dates.

2014-09-08 Thread Aaron McMillin (JIRA)
Aaron McMillin created SOLR-6490:


 Summary: ValueSourceParser function max does not handle dates.
 Key: SOLR-6490
 URL: https://issues.apache.org/jira/browse/SOLR-6490
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Reporter: Aaron McMillin
Priority: Minor


As a user
when trying to use sort=max(date1_field_tdt, date2_field_tdt)
I expect documents to be returned in order

Currently this is not the case. Dates are stored as Long, but max uses 
MaxFloatFunction which casts them to Floats thereby losing precision.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6489) morphlines-cell tests fail after upgrade to TIKA 1.6

2014-09-08 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125754#comment-14125754
 ] 

Mark Miller commented on SOLR-6489:
---

So my initial guess is that kite-morphlines-tika-decompress depending on 
compress 1.4 and our move from 1.7 to 1.8 is perhaps the problem. Standard java 
jar hell.


 morphlines-cell tests fail after upgrade to TIKA 1.6
 

 Key: SOLR-6489
 URL: https://issues.apache.org/jira/browse/SOLR-6489
 Project: Solr
  Issue Type: Bug
  Components: Tests
Affects Versions: 4.11
Reporter: Uwe Schindler
Assignee: Mark Miller
 Fix For: 5.0, 4.11


 After upgrade tp Apache TIKA 1.6 (SOLR-6488), solr-morphlines-cell tests fail 
 with scripting error messages.
 Due to missing understanding, caused by the crazy configuration file format 
 and inability to figure out the test setup, I have to give up and hope that 
 somebody else can take care. In addition, on my own machines, all of Hadoop 
 does not work at all, so I cannot debug (Windows).
 The whole Morphlines setup is not really good, because Solr core depends on 
 another TIKA version than the included morphlines libraries. This is not a 
 good situation for Solr, because we should be able to upgrade to any version 
 of our core components and not depend on external libraries that itsself 
 depend on Solr in older versions!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6154) SolrCloud: facet range option f.field.facet.mincount=1 omits buckets on response

2014-09-08 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125765#comment-14125765
 ] 

Hoss Man commented on SOLR-6154:


Erick, sorry for the late reply.

 I haven't looked in depth at your patch for this issue or SOLR-6187, but in 
response to your question on the mailing list...

bq. The problem here is that it assumes that the first list in has all the 
counts that ever will be reported from any shard.

You are almost certainly correct, it's very probably that the logic for 
distributed range faceting isn't taking into account the possibility of 
mincount suppressing buckets from one or more shards.

the general strategy for dealing with this in field faceting  pivot faceting 
(which i suspect is what you already doing in your patch) is to have the 
coordinator node modify the mincount params when it sends the shard requests to 
force mincount=0, to ensure it gets a response for every bucket from every 
shard, then filter the response based on the (original) combined mincount.

{panel:title=not recommended idea}
I say modify because one of the strategies taken with field/pivot faceting 
when using facet.sort=index is this...

{noformat}
// we're sorting by index order.
// if minCount==0, we should always be able to get accurate results w/o
// over-requesting or refining
// if minCount==1, we should be able to get accurate results w/o
// over-requesting, but we'll need to refine
// if minCount==n (1), we can set the initialMincount to
// minCount/nShards, rounded up.
// ...
{noformat}

there is no sorting or top-n aspect to facet.range, so the idea of 
over-requesting doesn't apply -- but the minCount/nShards idea still applies. 
 if the user requests a minCount of 10 and there are 3 shards, then you could 
set f.foo.facet.mincount=4 for the shard requests -- because unless at lest one 
shard responds back with a count higher then 4, you'll never be able to 
satisfy the original mincount=10 ... HOWEVER: using this strategy requires 
refinement requests, which we currently avoid in range faceting.
{panel}

i would not advise going with the refinement approach described above (hence 
the panel box labeling it not-recommended) because i think the single pass 
approach of range faceting right now is probably better for most common cases 
-- we just need to force mincount=0 on hte shard requests -- but i wanted to 
post it for completeness in case i'm missing something and you think it's a 
really good idea


 SolrCloud: facet range option f.field.facet.mincount=1 omits buckets on 
 response
 --

 Key: SOLR-6154
 URL: https://issues.apache.org/jira/browse/SOLR-6154
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.5.1, 4.8.1
 Environment: Solr 4.5.1 under Linux  - explicit id routing
  Indexed 400,000+ Documents
  explicit routing 
  custom schema.xml
  
 Solr 4.8.1 under Windows+Cygwin
  Indexed 6 Documents
  implicit id routing
  out of the box schema
Reporter: Ronald Matamoros
Assignee: Erick Erickson
 Attachments: HowToReplicate.pdf, data.xml


 Attached
 - PDF with instructions on how to replicate.
 - data.xml to replicate index
 The f.field.facet.mincount option on a distributed search gives 
 inconsistent list of buckets on a range facet.
  
 Experiencing that some buckets are ignored when using the option 
 f.field.facet.mincount=1.
 The Solr logs do not indicate any error or warning during execution.
 The debug=true option and increasing the log levels to the FacetComponent do 
 not provide any hints to the behaviour.
 Replicated the issue on both Solr 4.5.1  4.8.1.
 Example, 
 Removing the f.field.facet.mincount=1 option gives the expected list of 
 buckets for the 6 documents matched.
 lst name=facet_ranges
  lst name=price
lst name=counts
  int name=0.00/int
  int name=50.01/int
  int name=100.00/int
  int name=150.03/int
  int name=200.00/int
  int name=250.01/int
  int name=300.00/int
  int name=350.00/int
  int name=400.00/int
  int name=450.00/int
  int name=500.00/int
  int name=550.00/int
  int name=600.00/int
  int name=650.00/int
  int name=700.00/int
  int name=750.01/int
  int name=800.00/int
  int name=850.00/int
  int name=900.00/int
  int name=950.00/int
/lst
float name=gap50.0/float
float name=start0.0/float
float name=end1000.0/float
int name=before0/int
int name=after0/int
int name=between2/int
  /lst

[jira] [Commented] (LUCENE-5925) Use rename instead of segments_N fallback / segments.gen etc

2014-09-08 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125786#comment-14125786
 ] 

Michael McCandless commented on LUCENE-5925:


+1, I like this solution.  It passed 85 iters of all lucene core + module 
tests...

 Use rename instead of segments_N fallback / segments.gen etc
 

 Key: LUCENE-5925
 URL: https://issues.apache.org/jira/browse/LUCENE-5925
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index, core/store
Reporter: Robert Muir
  Labels: Java7
 Attachments: LUCENE-5925.patch, LUCENE-5925.patch


 Our commit logic is strange, we write corrupted commit points and only on the 
 last phase of commit do we correct them.
 This means the logic to get the latest commit is always scary and awkward, 
 since it must deal with partial commits, and try to determine if it should 
 fall back to segments_N-1 or actually relay an exception. 
 This logic is incomplete/sheisty as we, e.g. i think we only fall back so far 
 (at most one).
 If we somehow screw up in all this logic do the wrong thing, then we lose 
 data (e.g. LUCENE-4870 wiped entire index because of TooManyOpenFiles).
 We now require java 7, i think we should expore instead writing 
 {{pending_segments_N}} and then in finishCommit() doing an atomic rename to 
 {{segments_N}}. 
 We could then remove all the complex fallback logic completely, since we no 
 longer have to deal with ignoring partial commits, instead simply 
 delivering any exception we get when trying to read the commit and sleep 
 better at night.
 In java 7, we have the apis for this (ATOMIC_MOVE).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6187) facet.mincount ignored in range faceting using distributed search

2014-09-08 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125796#comment-14125796
 ] 

ASF subversion and git services commented on SOLR-6187:
---

Commit 1623447 from [~erickoerickson] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1623447 ]

SOLR-6187: facet.mincount ignored in range faceting using distributed search

 facet.mincount ignored in range faceting using distributed search
 -

 Key: SOLR-6187
 URL: https://issues.apache.org/jira/browse/SOLR-6187
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 4.8, 4.8.1
Reporter: Zaccheo Bagnati
Assignee: Erick Erickson
 Attachments: SOLR-6187.patch, SOLR-6187.patch, SOLR-6187.patch, 
 SOLR-6187.patch, SOLR-6187.patch, SOLR-6187.patch


 While I was trying to do a range faceting with gap +1YEAR using shards, I 
 noticed that facet.mincount parameter seems to be ignored.
 Issue can be reproduced in this way:
 Create 2 cores testshard1 and testshard2 with:
 solrconfig.xml
 ?xml version=1.0 encoding=UTF-8 ?
 config
   luceneMatchVersionLUCENE_41/luceneMatchVersion
   lib dir=/opt/solr/dist regex=solr-cell-.*\.jar/
   directoryFactory name=DirectoryFactory 
 class=${solr.directoryFactory:solr.NRTCachingDirectoryFactory}/
   updateHandler class=solr.DirectUpdateHandler2 /
   requestHandler name=/select class=solr.SearchHandler
  lst name=defaults
str name=echoParamsexplicit/str
int name=rows10/int
str name=dfid/str
  /lst
   /requestHandler
   requestHandler name=/update class=solr.UpdateRequestHandler  /
   requestHandler name=/admin/ 
 class=org.apache.solr.handler.admin.AdminHandlers /
   requestHandler name=/admin/ping class=solr.PingRequestHandler
 lst name=invariants
   str name=qsolrpingquery/str
 /lst
 lst name=defaults
   str name=echoParamsall/str
 /lst
   /requestHandler
 /config
 schema.xml
 ?xml version=1.0 ?
 schema name=${solr.core.name} version=1.5 
 xmlns:xi=http://www.w3.org/2001/XInclude;
   fieldType name=int class=solr.TrieIntField precisionStep=0 
 positionIncrementGap=0/
   fieldType name=long class=solr.TrieLongField precisionStep=0 
 positionIncrementGap=0/
   fieldType name=date class=solr.TrieDateField precisionStep=0 
 positionIncrementGap=0/
   field name=_version_ type=long indexed=true  stored=true/
   field name=id type=int indexed=true stored=true 
 multiValued=false /
   field name=date type=date indexed=true stored=true 
 multiValued=false /
   uniqueKeyid/uniqueKey
   defaultSearchFieldid/defaultSearchField
 /schema
 Insert in testshard1:
 add
  doc
   field name=id1/field
   field name=date2014-06-20T12:51:00Z/field
  /doc
 /add
 Insert into testshard2:
 add
  doc
   field name=id2/field
   field name=date2013-06-20T12:51:00Z/field
  /doc
 /add
 Now if I execute:
 curl 
 http://localhost:8983/solr/testshard1/select?q=id:1facet=truefacet.mincount=1facet.range=datef.date.facet.range.start=1900-01-01T00:00:00Zf.date.facet.range.end=NOWf.date.facet.range.gap=%2B1YEARshards=localhost%3A8983%2Fsolr%2Ftestshard1%2Clocalhost%3A8983%2Fsolr%2Ftestshard2shards.info=truewt=json;
 I obtain:
 

[jira] [Commented] (SOLR-6154) SolrCloud: facet range option f.field.facet.mincount=1 omits buckets on response

2014-09-08 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125802#comment-14125802
 ] 

Erick Erickson commented on SOLR-6154:
--

Whew! I _just_ committed this patch and it forces mincount to 0 for  the 
shard requests, which is in-line with your comments I.

 SolrCloud: facet range option f.field.facet.mincount=1 omits buckets on 
 response
 --

 Key: SOLR-6154
 URL: https://issues.apache.org/jira/browse/SOLR-6154
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.5.1, 4.8.1
 Environment: Solr 4.5.1 under Linux  - explicit id routing
  Indexed 400,000+ Documents
  explicit routing 
  custom schema.xml
  
 Solr 4.8.1 under Windows+Cygwin
  Indexed 6 Documents
  implicit id routing
  out of the box schema
Reporter: Ronald Matamoros
Assignee: Erick Erickson
 Attachments: HowToReplicate.pdf, data.xml


 Attached
 - PDF with instructions on how to replicate.
 - data.xml to replicate index
 The f.field.facet.mincount option on a distributed search gives 
 inconsistent list of buckets on a range facet.
  
 Experiencing that some buckets are ignored when using the option 
 f.field.facet.mincount=1.
 The Solr logs do not indicate any error or warning during execution.
 The debug=true option and increasing the log levels to the FacetComponent do 
 not provide any hints to the behaviour.
 Replicated the issue on both Solr 4.5.1  4.8.1.
 Example, 
 Removing the f.field.facet.mincount=1 option gives the expected list of 
 buckets for the 6 documents matched.
 lst name=facet_ranges
  lst name=price
lst name=counts
  int name=0.00/int
  int name=50.01/int
  int name=100.00/int
  int name=150.03/int
  int name=200.00/int
  int name=250.01/int
  int name=300.00/int
  int name=350.00/int
  int name=400.00/int
  int name=450.00/int
  int name=500.00/int
  int name=550.00/int
  int name=600.00/int
  int name=650.00/int
  int name=700.00/int
  int name=750.01/int
  int name=800.00/int
  int name=850.00/int
  int name=900.00/int
  int name=950.00/int
/lst
float name=gap50.0/float
float name=start0.0/float
float name=end1000.0/float
int name=before0/int
int name=after0/int
int name=between2/int
  /lst
/lst
 Using the f.field.facet.mincount=1 option removes the 0 count buckets but 
 will also omit bucket int name=250.0
lst name=facet_ranges
   lst name=price
 lst name=counts
 int name=50.01/int
 int name=150.03/int
 int name=750.01/int
  /lst
  float name=gap50.0/float
  float name=start0.0/float
  float name=end1000.0/float
  int name=before0/int
  int name=after0/int
  int name=between4/int
   /lst
 /lst
 Resubmitting the query renders a different bucket list 
 (May need to resubmit a couple times)
lst name=facet_ranges
   lst name=price
 lst name=counts
 int name=150.03/int
 int name=250.01/int
  /lst
  float name=gap50.0/float
  float name=start0.0/float
  float name=end1000.0/float
  int name=before0/int
  int name=after0/int
  int name=between2/int
   /lst
 /lst



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-6187) facet.mincount ignored in range faceting using distributed search

2014-09-08 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson resolved SOLR-6187.
--
   Resolution: Fixed
Fix Version/s: 4.11
   5.0

 facet.mincount ignored in range faceting using distributed search
 -

 Key: SOLR-6187
 URL: https://issues.apache.org/jira/browse/SOLR-6187
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 4.8, 4.8.1
Reporter: Zaccheo Bagnati
Assignee: Erick Erickson
 Fix For: 5.0, 4.11

 Attachments: SOLR-6187.patch, SOLR-6187.patch, SOLR-6187.patch, 
 SOLR-6187.patch, SOLR-6187.patch, SOLR-6187.patch


 While I was trying to do a range faceting with gap +1YEAR using shards, I 
 noticed that facet.mincount parameter seems to be ignored.
 Issue can be reproduced in this way:
 Create 2 cores testshard1 and testshard2 with:
 solrconfig.xml
 ?xml version=1.0 encoding=UTF-8 ?
 config
   luceneMatchVersionLUCENE_41/luceneMatchVersion
   lib dir=/opt/solr/dist regex=solr-cell-.*\.jar/
   directoryFactory name=DirectoryFactory 
 class=${solr.directoryFactory:solr.NRTCachingDirectoryFactory}/
   updateHandler class=solr.DirectUpdateHandler2 /
   requestHandler name=/select class=solr.SearchHandler
  lst name=defaults
str name=echoParamsexplicit/str
int name=rows10/int
str name=dfid/str
  /lst
   /requestHandler
   requestHandler name=/update class=solr.UpdateRequestHandler  /
   requestHandler name=/admin/ 
 class=org.apache.solr.handler.admin.AdminHandlers /
   requestHandler name=/admin/ping class=solr.PingRequestHandler
 lst name=invariants
   str name=qsolrpingquery/str
 /lst
 lst name=defaults
   str name=echoParamsall/str
 /lst
   /requestHandler
 /config
 schema.xml
 ?xml version=1.0 ?
 schema name=${solr.core.name} version=1.5 
 xmlns:xi=http://www.w3.org/2001/XInclude;
   fieldType name=int class=solr.TrieIntField precisionStep=0 
 positionIncrementGap=0/
   fieldType name=long class=solr.TrieLongField precisionStep=0 
 positionIncrementGap=0/
   fieldType name=date class=solr.TrieDateField precisionStep=0 
 positionIncrementGap=0/
   field name=_version_ type=long indexed=true  stored=true/
   field name=id type=int indexed=true stored=true 
 multiValued=false /
   field name=date type=date indexed=true stored=true 
 multiValued=false /
   uniqueKeyid/uniqueKey
   defaultSearchFieldid/defaultSearchField
 /schema
 Insert in testshard1:
 add
  doc
   field name=id1/field
   field name=date2014-06-20T12:51:00Z/field
  /doc
 /add
 Insert into testshard2:
 add
  doc
   field name=id2/field
   field name=date2013-06-20T12:51:00Z/field
  /doc
 /add
 Now if I execute:
 curl 
 http://localhost:8983/solr/testshard1/select?q=id:1facet=truefacet.mincount=1facet.range=datef.date.facet.range.start=1900-01-01T00:00:00Zf.date.facet.range.end=NOWf.date.facet.range.gap=%2B1YEARshards=localhost%3A8983%2Fsolr%2Ftestshard1%2Clocalhost%3A8983%2Fsolr%2Ftestshard2shards.info=truewt=json;
 I obtain:
 

[jira] [Resolved] (SOLR-6154) SolrCloud: facet range option f.field.facet.mincount=1 omits buckets on response

2014-09-08 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson resolved SOLR-6154.
--
   Resolution: Fixed
Fix Version/s: 4.11
   5.0

Fixed with the checkin for SOLR-6187

Thanks again Ronald for a _great_ problem writeup and reproducible test case!

 SolrCloud: facet range option f.field.facet.mincount=1 omits buckets on 
 response
 --

 Key: SOLR-6154
 URL: https://issues.apache.org/jira/browse/SOLR-6154
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.5.1, 4.8.1
 Environment: Solr 4.5.1 under Linux  - explicit id routing
  Indexed 400,000+ Documents
  explicit routing 
  custom schema.xml
  
 Solr 4.8.1 under Windows+Cygwin
  Indexed 6 Documents
  implicit id routing
  out of the box schema
Reporter: Ronald Matamoros
Assignee: Erick Erickson
 Fix For: 5.0, 4.11

 Attachments: HowToReplicate.pdf, data.xml


 Attached
 - PDF with instructions on how to replicate.
 - data.xml to replicate index
 The f.field.facet.mincount option on a distributed search gives 
 inconsistent list of buckets on a range facet.
  
 Experiencing that some buckets are ignored when using the option 
 f.field.facet.mincount=1.
 The Solr logs do not indicate any error or warning during execution.
 The debug=true option and increasing the log levels to the FacetComponent do 
 not provide any hints to the behaviour.
 Replicated the issue on both Solr 4.5.1  4.8.1.
 Example, 
 Removing the f.field.facet.mincount=1 option gives the expected list of 
 buckets for the 6 documents matched.
 lst name=facet_ranges
  lst name=price
lst name=counts
  int name=0.00/int
  int name=50.01/int
  int name=100.00/int
  int name=150.03/int
  int name=200.00/int
  int name=250.01/int
  int name=300.00/int
  int name=350.00/int
  int name=400.00/int
  int name=450.00/int
  int name=500.00/int
  int name=550.00/int
  int name=600.00/int
  int name=650.00/int
  int name=700.00/int
  int name=750.01/int
  int name=800.00/int
  int name=850.00/int
  int name=900.00/int
  int name=950.00/int
/lst
float name=gap50.0/float
float name=start0.0/float
float name=end1000.0/float
int name=before0/int
int name=after0/int
int name=between2/int
  /lst
/lst
 Using the f.field.facet.mincount=1 option removes the 0 count buckets but 
 will also omit bucket int name=250.0
lst name=facet_ranges
   lst name=price
 lst name=counts
 int name=50.01/int
 int name=150.03/int
 int name=750.01/int
  /lst
  float name=gap50.0/float
  float name=start0.0/float
  float name=end1000.0/float
  int name=before0/int
  int name=after0/int
  int name=between4/int
   /lst
 /lst
 Resubmitting the query renders a different bucket list 
 (May need to resubmit a couple times)
lst name=facet_ranges
   lst name=price
 lst name=counts
 int name=150.03/int
 int name=250.01/int
  /lst
  float name=gap50.0/float
  float name=start0.0/float
  float name=end1000.0/float
  int name=before0/int
  int name=after0/int
  int name=between2/int
   /lst
 /lst



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5871) Ability to see the list of fields that matched the query with scores

2014-09-08 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125813#comment-14125813
 ] 

Erick Erickson commented on SOLR-5871:
--

While I agree this is functionality that people have requested, it's pretty 
clear that I'm not going to get to it in any reasonable time frame, so 
un-assigning it.

 Ability to see the list of fields that matched the query with scores
 

 Key: SOLR-5871
 URL: https://issues.apache.org/jira/browse/SOLR-5871
 Project: Solr
  Issue Type: Wish
Reporter: Alexander S.

 Hello, I need the ability to tell users what content matched their query, 
 this way:
 | Name  | Twitter Profile | Topics | Site Title | Site Description | Site 
 content | 
 | John Doe | Yes| No  | Yes | No  
   | Yes | 
 | Jane Doe | No | Yes | No  | No  
   | Yes | 
 All these columns are indexed text fields and I need to know what content 
 matched the query and would be also cool to be able to show the score per 
 field.
 As far as I know right now there's no way to return this information when 
 running a query request. Debug outputs is suitable for visual review but has 
 lots of nesting levels and is hard for understanding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5871) Ability to see the list of fields that matched the query with scores

2014-09-08 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-5871:
-
Assignee: (was: Erick Erickson)

 Ability to see the list of fields that matched the query with scores
 

 Key: SOLR-5871
 URL: https://issues.apache.org/jira/browse/SOLR-5871
 Project: Solr
  Issue Type: Wish
Reporter: Alexander S.

 Hello, I need the ability to tell users what content matched their query, 
 this way:
 | Name  | Twitter Profile | Topics | Site Title | Site Description | Site 
 content | 
 | John Doe | Yes| No  | Yes | No  
   | Yes | 
 | Jane Doe | No | Yes | No  | No  
   | Yes | 
 All these columns are indexed text fields and I need to know what content 
 matched the query and would be also cool to be able to show the score per 
 field.
 As far as I know right now there's no way to return this information when 
 running a query request. Debug outputs is suitable for visual review but has 
 lots of nesting levels and is hard for understanding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [jira] [Reopened] (SOLR-6457) LBHttpSolrServer: AIOOBE risk if counter overflows

2014-09-08 Thread Chris Hostetter

: I thought it is an extremely trivial fix. That's why I didn't add it to
: changes.txt. I shall do it

To elaborate my thoughts: there are really 2 key issues why i think this 
deserse a CHANGES.txt entry:


1) anytime we fix a bug that affects previously released versions of Solr, 
there should be *something* related to it in CHANGES.txt.  

If a bug was found  tracked in it's own Jira until fixed, then it should 
almost certainly be it's own entry in CHANGES.txt if for no other reason 
then so people who see the jira emails/records can easily find it.  
Frequently small bugs like this might be fixed as part of a larger bug fix 
or new feature that refactors a lot of code -- in which case that larger 
bug/feature's CHANGES.txt entry already notes the relevant change summary, 
so an individual entry may not be needed.  It depends on how impactful/common 
the bug was.  remember: people look at CHANGES.txt to decide if/when it's 
worth their effort to upgrade, if there's a bug/glitch that's affecting 
them, and your CHANGES.txt entry makes them think ah, i bet that's what's 
causin hte weird behavior i'm seeing! yea it's been fixed!! that helps 
people a lot.

2) any time a contributor provides code, they *MUST* get credit for their 
contribution.  As a committer, if you are making minor tweaks or cleanup 
or fixing a small thing as part of a larger change, you can always choose 
not to bother tooting your own horn about every little change you make if 
ou don't think it warrants special attention based on the point #1 
guidelines i was mentioning above -- but when you are acting as the agent 
of another contributor by committing their code, you really must give them 
credit in CHANGES.txt






: On Sep 5, 2014 7:15 PM, Hoss Man (JIRA) j...@apache.org wrote:
: 
: 
:   [
:  
https://issues.apache.org/jira/browse/SOLR-6457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
:  ]
: 
:  Hoss Man reopened SOLR-6457:
:  
: 
:  This needs a CHANGES.txt noting the bug fix and giving longkeyy credit for
:  the contribution.
: 
:   LBHttpSolrServer: AIOOBE risk if counter overflows
:   --
:  
:   Key: SOLR-6457
:   URL: https://issues.apache.org/jira/browse/SOLR-6457
:   Project: Solr
:Issue Type: Bug
:Components: clients - java
:  Affects Versions: 4.0, 4.1, 4.2, 4.2.1, 4.3, 4.3.1, 4.4, 4.5, 4.5.1,
:  4.6, 4.6.1, 4.7, 4.7.1, 4.7.2, 4.8, 4.8.1, 4.9
:  Reporter: longkeyy
:  Assignee: Noble Paul
:Labels: patch
:   Attachments: SOLR-6457.patch
:  
:  
:   org.apache.solr.client.solrj.impl.LBHttpSolrServer
:   line 442
: int count = counter.incrementAndGet();
: ServerWrapper wrapper = serverList[count % serverList.length];
:   when counter overflows, the mod operation of
:   count % serverList.length will start trying to use negative numbers as
:  array indexes.
:   suggess to fixup it ,eg:
:   //keep count is greater than 0
:   int count = counter.incrementAndGet()  0x7FF;
: 
: 
: 
:  --
:  This message was sent by Atlassian JIRA
:  (v6.3.4#6332)
: 
:  -
:  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
:  For additional commands, e-mail: dev-h...@lucene.apache.org
: 
: 
: 

-Hoss
http://www.lucidworks.com/

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3583) Percentiles for facets, pivot facets, and distributed pivot facets

2014-09-08 Thread Steve Molloy (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Molloy updated SOLR-3583:
---
Attachment: SOLR-3583.patch

Adapted patch for 4.10 tag which now includes SOLR-2894 OOB. Ran all unit tests 
successfully.

 Percentiles for facets, pivot facets, and distributed pivot facets
 --

 Key: SOLR-3583
 URL: https://issues.apache.org/jira/browse/SOLR-3583
 Project: Solr
  Issue Type: Improvement
Reporter: Chris Russell
Priority: Minor
  Labels: newbie, patch
 Fix For: 4.9, 5.0

 Attachments: SOLR-3583.patch, SOLR-3583.patch, SOLR-3583.patch, 
 SOLR-3583.patch, SOLR-3583.patch, SOLR-3583.patch, SOLR-3583.patch, 
 SOLR-3583.patch


 Built on top of SOLR-2894, this patch adds percentiles and averages to 
 facets, pivot facets, and distributed pivot facets by making use of range 
 facet internals.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3583) Percentiles for facets, pivot facets, and distributed pivot facets

2014-09-08 Thread Steve Molloy (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125887#comment-14125887
 ] 

Steve Molloy commented on SOLR-3583:


[~hossman] I kind of agree with all your comments, although I still needed this 
functionality to be working today and see no progress on issues you pointed to. 
Anything we can do to help speed things up on the Stats component to support 
this? (specifically, we need distributed pivot faceted stats for average and 
sum for numeric fields).

 Percentiles for facets, pivot facets, and distributed pivot facets
 --

 Key: SOLR-3583
 URL: https://issues.apache.org/jira/browse/SOLR-3583
 Project: Solr
  Issue Type: Improvement
Reporter: Chris Russell
Priority: Minor
  Labels: newbie, patch
 Fix For: 4.9, 5.0

 Attachments: SOLR-3583.patch, SOLR-3583.patch, SOLR-3583.patch, 
 SOLR-3583.patch, SOLR-3583.patch, SOLR-3583.patch, SOLR-3583.patch, 
 SOLR-3583.patch


 Built on top of SOLR-2894, this patch adds percentiles and averages to 
 facets, pivot facets, and distributed pivot facets by making use of range 
 facet internals.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-5594) don't call 'svnversion' over and over in the build

2014-09-08 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler reassigned LUCENE-5594:
-

Assignee: Uwe Schindler

 don't call 'svnversion' over and over in the build
 --

 Key: LUCENE-5594
 URL: https://issues.apache.org/jira/browse/LUCENE-5594
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Uwe Schindler

 Some ant tasks (at least release packaging, i dunno what else), call 
 svnversion over and over and over for each module in the build. can we just 
 do this one time instead?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5594) don't call 'svnversion' over and over in the build

2014-09-08 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125897#comment-14125897
 ] 

Uwe Schindler commented on LUCENE-5594:
---

Hi,
I am working on this. I will put up a patch soon.
My work includes not calling the svn executable at all, instead we use svnkit 
to revision numbers or checkout the src folder.
This speeds up the build, because we do not need to fork a process all the 
time, which is expensive on windows and on freebsd jenkins (because we fork 
Java). Using svnkit to do this and saving the revision number in a property 
helps. It makes also Jenkins builds more easy to configure, because you do not 
depend on local svn installations (e.g. Jenkins checks out with different 
version than locally installed). This is an issue on Policeman enkins and 
FreeBSD Jenkins.

 don't call 'svnversion' over and over in the build
 --

 Key: LUCENE-5594
 URL: https://issues.apache.org/jira/browse/LUCENE-5594
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir

 Some ant tasks (at least release packaging, i dunno what else), call 
 svnversion over and over and over for each module in the build. can we just 
 do this one time instead?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4073) Lucene puts output of svnversion into a property even if svnversion failed

2014-09-08 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125900#comment-14125900
 ] 

Uwe Schindler commented on LUCENE-4073:
---

I will fix this soon.

 Lucene puts output of svnversion into a property even if svnversion failed
 --

 Key: LUCENE-4073
 URL: https://issues.apache.org/jira/browse/LUCENE-4073
 Project: Lucene - Core
  Issue Type: Bug
  Components: general/build
Affects Versions: 3.6
 Environment: Windows 7 x64, Cygwin for tools like svn
Reporter: Trejkaz

 We had a build issue today where Lucene was running svnversion which was 
 failing (the reason for the failure itself is not particularly important.)
 As a result, the error text output of running the command ended up in the 
 svnversion property. The build later attempted to insert this into 
 MANIFEST.MF which resulted in an invalid manifest file, causing the build to 
 fail.
 A related observation is that even if it works, the svnversion would be the 
 version of our own repository, so the usefulness of it in the context of 
 Lucene's version number is questionable anyway. It would be nice if the build 
 could get the svn version number only if it was checked out from Lucene trunk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5594) don't call 'svnversion' over and over in the build

2014-09-08 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-5594:
--
Fix Version/s: 4.11
   5.0

 don't call 'svnversion' over and over in the build
 --

 Key: LUCENE-5594
 URL: https://issues.apache.org/jira/browse/LUCENE-5594
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Uwe Schindler
 Fix For: 5.0, 4.11


 Some ant tasks (at least release packaging, i dunno what else), call 
 svnversion over and over and over for each module in the build. can we just 
 do this one time instead?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5480) Make MoreLikeThisHandler distributable

2014-09-08 Thread Claude (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Claude updated SOLR-5480:
-
Attachment: MoreLikeThisHandlerTestST.txt

Stack trace from running MoreLikeThisHandlerTest on OSX using 1.7.0_67 Java.

 Make MoreLikeThisHandler distributable
 --

 Key: SOLR-5480
 URL: https://issues.apache.org/jira/browse/SOLR-5480
 Project: Solr
  Issue Type: Improvement
Reporter: Steve Molloy
Assignee: Noble Paul
 Attachments: MoreLikeThisHandlerTestST.txt, SOLR-5480.patch, 
 SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, 
 SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch


 The MoreLikeThis component, when used in the standard search handler supports 
 distributed searches. But the MoreLikeThisHandler itself doesn't, which 
 prevents from say, passing in text to perform the query. I'll start looking 
 into adapting the SearchHandler logic to the MoreLikeThisHandler. If anyone 
 has some work done already and want to share, or want to contribute, any help 
 will be welcomed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5480) Make MoreLikeThisHandler distributable

2014-09-08 Thread Claude (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125919#comment-14125919
 ] 

Claude commented on SOLR-5480:
--

[~smolloy],

I've run on OSX and ubuntu 14.04 with the same JDK you are using and get a 
stack trace which I've attached.


 Make MoreLikeThisHandler distributable
 --

 Key: SOLR-5480
 URL: https://issues.apache.org/jira/browse/SOLR-5480
 Project: Solr
  Issue Type: Improvement
Reporter: Steve Molloy
Assignee: Noble Paul
 Attachments: MoreLikeThisHandlerTestST.txt, SOLR-5480.patch, 
 SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, 
 SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch


 The MoreLikeThis component, when used in the standard search handler supports 
 distributed searches. But the MoreLikeThisHandler itself doesn't, which 
 prevents from say, passing in text to perform the query. I'll start looking 
 into adapting the SearchHandler logic to the MoreLikeThisHandler. If anyone 
 has some work done already and want to share, or want to contribute, any help 
 will be welcomed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5480) Make MoreLikeThisHandler distributable

2014-09-08 Thread Steve Molloy (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125933#comment-14125933
 ] 

Steve Molloy commented on SOLR-5480:


[~claudenm] Something is wrong with this stack trace. You actually have 
compilation errors pointing to MLT handler not implementing an abstract method 
of RequestHandlerBase which is implemented in SearchHandler which MLT handler 
extends after applying the patch.
Caused by: java.lang.Error: Unresolved compilation problems: 
  The type MoreLikeThisHandler must implement the inherited abstract method 
RequestHandlerBase.handleRequestBody(SolrQueryRequest, SolrQueryResponse)

After applying the latest patch, is you MoreLikeThisHandler declaration like:
public class MoreLikeThisHandler extends SearchHandler 

I also see you are in eclipse (from the paths), are you running the tests from 
command line or within eclipse? (trying to see where things may differ)

 Make MoreLikeThisHandler distributable
 --

 Key: SOLR-5480
 URL: https://issues.apache.org/jira/browse/SOLR-5480
 Project: Solr
  Issue Type: Improvement
Reporter: Steve Molloy
Assignee: Noble Paul
 Attachments: MoreLikeThisHandlerTestST.txt, SOLR-5480.patch, 
 SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, 
 SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch


 The MoreLikeThis component, when used in the standard search handler supports 
 distributed searches. But the MoreLikeThisHandler itself doesn't, which 
 prevents from say, passing in text to perform the query. I'll start looking 
 into adapting the SearchHandler logic to the MoreLikeThisHandler. If anyone 
 has some work done already and want to share, or want to contribute, any help 
 will be welcomed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4212) Support for facet pivot query for filtered count

2014-09-08 Thread Steve Molloy (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Molloy updated SOLR-4212:
---
Attachment: SOLR-4212.patch

Adapted patch for 4.10 code and validated that tests passed.

 Support for facet pivot query for filtered count
 

 Key: SOLR-4212
 URL: https://issues.apache.org/jira/browse/SOLR-4212
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.0
Reporter: Steve Molloy
 Fix For: 4.9, 5.0

 Attachments: SOLR-4212.patch, SOLR-4212.patch, SOLR-4212.patch, 
 patch-4212.txt


 Facet pivot provide hierarchical support for computing data used to populate 
 a treemap or similar visualization. TreeMaps usually offer users extra 
 information by applying an overlay color on top of the existing square sizes 
 based on hierarchical counts. This second count is based on user choices, 
 representing, usually with gradient, the proportion of the square that fits 
 the user's choices.
 The proposition is to add a facet.pivot.q parameter that would allow to 
 specify a query (per field) that would be intersected with DocSet used to 
 calculate pivot count, stored in separate q-count.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4212) Support for facet pivot query for filtered count

2014-09-08 Thread Steve Molloy (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Molloy updated SOLR-4212:
---
Attachment: (was: SOLR-4212.patch)

 Support for facet pivot query for filtered count
 

 Key: SOLR-4212
 URL: https://issues.apache.org/jira/browse/SOLR-4212
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.0
Reporter: Steve Molloy
 Fix For: 4.9, 5.0

 Attachments: SOLR-4212.patch, SOLR-4212.patch, patch-4212.txt


 Facet pivot provide hierarchical support for computing data used to populate 
 a treemap or similar visualization. TreeMaps usually offer users extra 
 information by applying an overlay color on top of the existing square sizes 
 based on hierarchical counts. This second count is based on user choices, 
 representing, usually with gradient, the proportion of the square that fits 
 the user's choices.
 The proposition is to add a facet.pivot.q parameter that would allow to 
 specify a query (per field) that would be intersected with DocSet used to 
 calculate pivot count, stored in separate q-count.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Deleted] (SOLR-4212) Support for facet pivot query for filtered count

2014-09-08 Thread Steve Molloy (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Molloy updated SOLR-4212:
---
Comment: was deleted

(was: Adapted patch for 4.10 code and validated that tests passed.)

 Support for facet pivot query for filtered count
 

 Key: SOLR-4212
 URL: https://issues.apache.org/jira/browse/SOLR-4212
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.0
Reporter: Steve Molloy
 Fix For: 4.9, 5.0

 Attachments: SOLR-4212.patch, SOLR-4212.patch, patch-4212.txt


 Facet pivot provide hierarchical support for computing data used to populate 
 a treemap or similar visualization. TreeMaps usually offer users extra 
 information by applying an overlay color on top of the existing square sizes 
 based on hierarchical counts. This second count is based on user choices, 
 representing, usually with gradient, the proportion of the square that fits 
 the user's choices.
 The proposition is to add a facet.pivot.q parameter that would allow to 
 specify a query (per field) that would be intersected with DocSet used to 
 calculate pivot count, stored in separate q-count.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4212) Support for facet pivot query for filtered count

2014-09-08 Thread Steve Molloy (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Molloy updated SOLR-4212:
---
Attachment: SOLR-4212.patch

(right attchment this time) Adapted patch for 4.10 code and ensured all tests 
passed.

 Support for facet pivot query for filtered count
 

 Key: SOLR-4212
 URL: https://issues.apache.org/jira/browse/SOLR-4212
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.0
Reporter: Steve Molloy
 Fix For: 4.9, 5.0

 Attachments: SOLR-4212.patch, SOLR-4212.patch, SOLR-4212.patch, 
 patch-4212.txt


 Facet pivot provide hierarchical support for computing data used to populate 
 a treemap or similar visualization. TreeMaps usually offer users extra 
 information by applying an overlay color on top of the existing square sizes 
 based on hierarchical counts. This second count is based on user choices, 
 representing, usually with gradient, the proportion of the square that fits 
 the user's choices.
 The proposition is to add a facet.pivot.q parameter that would allow to 
 specify a query (per field) that would be intersected with DocSet used to 
 calculate pivot count, stored in separate q-count.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5927) 4.9 - 4.10 change in StandardTokenizer behavior on \u1aa2

2014-09-08 Thread Ryan Ernst (JIRA)
Ryan Ernst created LUCENE-5927:
--

 Summary: 4.9 - 4.10 change in StandardTokenizer behavior on \u1aa2
 Key: LUCENE-5927
 URL: https://issues.apache.org/jira/browse/LUCENE-5927
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Ryan Ernst


In 4.9, this string was broken into 2 tokens by StandardTokenizer:
\u1aa2\u1a7f\u1a6f\u1a6f\u1a61\u1a72 = \u1aa2,  
\u1a7f\u1a6f\u1a6f\u1a61\u1a72

However, in 4.10, that has changed so it is now a single token returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5927) 4.9 - 4.10 change in StandardTokenizer behavior on \u1aa2

2014-09-08 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126035#comment-14126035
 ] 

Robert Muir commented on LUCENE-5927:
-

From ryan's explanation to me, this only impacts stuff with line break of 
complex context (it will wrongly split on a combining mark). I think we can be 
sensible about what we do here (I suggest: nothing), because in such a case 
you arent getting useful tokens from the tokenizer anyway unless you are 
doing downstream processing... and if you are doing that, its very good that 
this bug is fixed.

 4.9 - 4.10 change in StandardTokenizer behavior on \u1aa2
 --

 Key: LUCENE-5927
 URL: https://issues.apache.org/jira/browse/LUCENE-5927
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Ryan Ernst

 In 4.9, this string was broken into 2 tokens by StandardTokenizer:
 \u1aa2\u1a7f\u1a6f\u1a6f\u1a61\u1a72 = \u1aa2,  
 \u1a7f\u1a6f\u1a6f\u1a61\u1a72
 However, in 4.10, that has changed so it is now a single token returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5480) Make MoreLikeThisHandler distributable

2014-09-08 Thread Claude (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126039#comment-14126039
 ] 

Claude commented on SOLR-5480:
--

[~smolloy],

Thanks.  I see that the patch wasn't (and isn't) applying cleanly. I checked 
out a branch off of the 4.10 release.  I'm trying to figure out why.  Any 
thoughts? 

 Make MoreLikeThisHandler distributable
 --

 Key: SOLR-5480
 URL: https://issues.apache.org/jira/browse/SOLR-5480
 Project: Solr
  Issue Type: Improvement
Reporter: Steve Molloy
Assignee: Noble Paul
 Attachments: MoreLikeThisHandlerTestST.txt, SOLR-5480.patch, 
 SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, 
 SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch


 The MoreLikeThis component, when used in the standard search handler supports 
 distributed searches. But the MoreLikeThisHandler itself doesn't, which 
 prevents from say, passing in text to perform the query. I'll start looking 
 into adapting the SearchHandler logic to the MoreLikeThisHandler. If anyone 
 has some work done already and want to share, or want to contribute, any help 
 will be welcomed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5480) Make MoreLikeThisHandler distributable

2014-09-08 Thread Steve Molloy (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126057#comment-14126057
 ] 

Steve Molloy commented on SOLR-5480:


I often run into this with patches taken from Jira... The slightest change 
before creating the patch seem to have a significant impact on whether or not 
the patch can apply cleanly. I usually have to resort to applying whatever 
matches while excluding any conflicts which I then apply manually.

 Make MoreLikeThisHandler distributable
 --

 Key: SOLR-5480
 URL: https://issues.apache.org/jira/browse/SOLR-5480
 Project: Solr
  Issue Type: Improvement
Reporter: Steve Molloy
Assignee: Noble Paul
 Attachments: MoreLikeThisHandlerTestST.txt, SOLR-5480.patch, 
 SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, 
 SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch, SOLR-5480.patch


 The MoreLikeThis component, when used in the standard search handler supports 
 distributed searches. But the MoreLikeThisHandler itself doesn't, which 
 prevents from say, passing in text to perform the query. I'll start looking 
 into adapting the SearchHandler logic to the MoreLikeThisHandler. If anyone 
 has some work done already and want to share, or want to contribute, any help 
 will be welcomed. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5927) 4.9 - 4.10 change in StandardTokenizer behavior on \u1aa2

2014-09-08 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126059#comment-14126059
 ] 

Steve Rowe commented on LUCENE-5927:


These characters are in the Tai Tham block, all characters in which have the 
property {{LB:ComplexContext}}, sequences of which are returned as token type 
{{SOUTHEAST_ASIAN}}. 

This behavior change is caused by a grammar fix I included with LUCENE-5770 - 
previous to 4.10, the grammar did not include {{WB:Format}} or {{WB:Extend}} 
chars - here are the relevant parts from the 4.9 grammar:

{noformat}
ContextSupp = ([])  // no supplementary characters in {{LB:ComplexContext}} in 
Unicode 6.3
...
ComplexContext= (\p{LB:Complex_Context} | {ComplexContextSupp})
...
{ComplexContext}+ { return SOUTH_EAST_ASIAN_TYPE; }
{noformat}

and the 4.10 grammar is now (note the addition of {{WB:Format}} and 
{{WB:Extend}} chars):

{noformat}
ComplexContextEx = \p{LB:Complex_Context} [\p{WB:Format}\p{WB:Extend}]*
...
{ComplexContextEx}+ { return SOUTH_EAST_ASIAN_TYPE; }
{noformat}


 4.9 - 4.10 change in StandardTokenizer behavior on \u1aa2
 --

 Key: LUCENE-5927
 URL: https://issues.apache.org/jira/browse/LUCENE-5927
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Ryan Ernst

 In 4.9, this string was broken into 2 tokens by StandardTokenizer:
 \u1aa2\u1a7f\u1a6f\u1a6f\u1a61\u1a72 = \u1aa2,  
 \u1a7f\u1a6f\u1a6f\u1a61\u1a72
 However, in 4.10, that has changed so it is now a single token returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5927) 4.9 - 4.10 change in StandardTokenizer behavior on \u1aa2

2014-09-08 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126064#comment-14126064
 ] 

Steve Rowe commented on LUCENE-5927:


[~rjernst] mentioned to me offline that this behavior change should have 
triggered a version-specific implementation, which did not happen.

I agree, it should have.  

But now that it's been released, should we include a version-specific 
implementation in a bugfix 4.10.1 release?  Or wait till 4.11?  Or just stop 
doing version-specific implementations (as will be the case in 5.x)?

Thoughts?

 4.9 - 4.10 change in StandardTokenizer behavior on \u1aa2
 --

 Key: LUCENE-5927
 URL: https://issues.apache.org/jira/browse/LUCENE-5927
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Ryan Ernst

 In 4.9, this string was broken into 2 tokens by StandardTokenizer:
 \u1aa2\u1a7f\u1a6f\u1a6f\u1a61\u1a72 = \u1aa2,  
 \u1a7f\u1a6f\u1a6f\u1a61\u1a72
 However, in 4.10, that has changed so it is now a single token returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5594) don't call 'svnversion' over and over in the build

2014-09-08 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126069#comment-14126069
 ] 

Uwe Schindler commented on LUCENE-5594:
---

The big question is: Do we really need the SVN revision number in the 
Implementation-Version of every JAR file?

I tend to remove that oen. Opinions?

 don't call 'svnversion' over and over in the build
 --

 Key: LUCENE-5594
 URL: https://issues.apache.org/jira/browse/LUCENE-5594
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Uwe Schindler
 Fix For: 5.0, 4.11


 Some ant tasks (at least release packaging, i dunno what else), call 
 svnversion over and over and over for each module in the build. can we just 
 do this one time instead?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5927) 4.9 - 4.10 change in StandardTokenizer behavior on \u1aa2

2014-09-08 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126081#comment-14126081
 ] 

Steve Rowe commented on LUCENE-5927:


bq. I think we can be sensible about what we do here (I suggest: nothing), 
because in such a case you arent getting useful tokens from the tokenizer 
anyway unless you are doing downstream processing... and if you are doing that, 
its very good that this bug is fixed.

Version-specific behavior is important for people who don't want changes; IMHO 
everybody impacted by this change would want it, so I agree: we should do 
nothing.

 4.9 - 4.10 change in StandardTokenizer behavior on \u1aa2
 --

 Key: LUCENE-5927
 URL: https://issues.apache.org/jira/browse/LUCENE-5927
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Ryan Ernst

 In 4.9, this string was broken into 2 tokens by StandardTokenizer:
 \u1aa2\u1a7f\u1a6f\u1a6f\u1a61\u1a72 = \u1aa2,  
 \u1a7f\u1a6f\u1a6f\u1a61\u1a72
 However, in 4.10, that has changed so it is now a single token returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5594) don't call 'svnversion' over and over in the build

2014-09-08 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126111#comment-14126111
 ] 

Hoss Man commented on LUCENE-5594:
--

bq. The big question is: Do we really need the SVN revision number in the 
Implementation-Version of every JAR file?

Personally, i think it's really handy -- it provides a helpful sanity 
check(sum) to see what version someone is _actually_ running (ie: did they 
build themselves from svn source? did they have local modifications when they 
built?)

Linking to the original issue where this all was added for some context: 
LUCENE-908

And the thread where the original suggestion came from in solr (which then got 
promoted up into the lucene common build stuff later)...
https://mail-archives.apache.org/mod_mbox/lucene-solr-dev/200706.mbox/%3c46633309.5070...@lapnap.net%3E



 don't call 'svnversion' over and over in the build
 --

 Key: LUCENE-5594
 URL: https://issues.apache.org/jira/browse/LUCENE-5594
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Uwe Schindler
 Fix For: 5.0, 4.11


 Some ant tasks (at least release packaging, i dunno what else), call 
 svnversion over and over and over for each module in the build. can we just 
 do this one time instead?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-6491) Add preferredLeader as a ROLE and a collecitons API command to respect this role

2014-09-08 Thread Erick Erickson (JIRA)
Erick Erickson created SOLR-6491:


 Summary: Add preferredLeader as a ROLE and a collecitons API 
command to respect this role
 Key: SOLR-6491
 URL: https://issues.apache.org/jira/browse/SOLR-6491
 Project: Solr
  Issue Type: Improvement
Affects Versions: 5.0, 4.11
Reporter: Erick Erickson
Assignee: Erick Erickson


Leaders can currently get out of balance due to the sequence of how nodes are 
brought up in a cluster. For very good reasons shard leadership cannot be 
permanently assigned.

However, it seems reasonable that a sys admin could optionally specify that a 
particular node be the _preferred_ leader for a particular collection/shard. 
During leader election, preference would be given to any node so marked when 
electing any leader.

So the proposal here is to add another role for preferredLeader to the 
collections API, something like
ADDROLE?role=preferredLeadercollection=collection_nameshard=shardId

Second, it would be good to have a new collections API call like 
ELECTPREFERREDLEADERS?collection=collection_name
(I really hate that name so far, but you see the idea). That command would 
(asynchronously?) make an attempt to transfer leadership for each shard in a 
collection to the leader labeled as the preferred leader by the new ADDROLE 
role.

I'm going to start working on this, any suggestions welcome!

This will subsume several other JIRAs, I'll link them momentarily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6491) Add preferredLeader as a ROLE and a collecitons API command to respect this role

2014-09-08 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126133#comment-14126133
 ] 

Erick Erickson commented on SOLR-6491:
--

I think the functionality of all three of these JIRAs will be provided by this 
one.

 Add preferredLeader as a ROLE and a collecitons API command to respect this 
 role
 

 Key: SOLR-6491
 URL: https://issues.apache.org/jira/browse/SOLR-6491
 Project: Solr
  Issue Type: Improvement
Affects Versions: 5.0, 4.11
Reporter: Erick Erickson
Assignee: Erick Erickson

 Leaders can currently get out of balance due to the sequence of how nodes are 
 brought up in a cluster. For very good reasons shard leadership cannot be 
 permanently assigned.
 However, it seems reasonable that a sys admin could optionally specify that a 
 particular node be the _preferred_ leader for a particular collection/shard. 
 During leader election, preference would be given to any node so marked when 
 electing any leader.
 So the proposal here is to add another role for preferredLeader to the 
 collections API, something like
 ADDROLE?role=preferredLeadercollection=collection_nameshard=shardId
 Second, it would be good to have a new collections API call like 
 ELECTPREFERREDLEADERS?collection=collection_name
 (I really hate that name so far, but you see the idea). That command would 
 (asynchronously?) make an attempt to transfer leadership for each shard in a 
 collection to the leader labeled as the preferred leader by the new ADDROLE 
 role.
 I'm going to start working on this, any suggestions welcome!
 This will subsume several other JIRAs, I'll link them momentarily.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5927) 4.9 - 4.10 change in StandardTokenizer behavior on \u1aa2

2014-09-08 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126163#comment-14126163
 ] 

Robert Muir commented on LUCENE-5927:
-

{quote}
Or just stop doing version-specific implementations (as will be the case in 
5.x)?
{quote}

In my opinion, thats unrelated to this issue (again for this particular issue, 
I think simulating the old bug is overkill because it just will not be useful). 

As far as the 4.6 unicode changes, the API complexity is out of the way in 5.x. 
 Analyzers have getVersion/setVersion and if we want to add 
Lucene40StandardTokenizer and have them make use of this to emulate 4.0 (as 
opposed to 4.6+) grammar, thats fine. With the API ryan has, it wont cause 
users pain and keeps the back compat.

 4.9 - 4.10 change in StandardTokenizer behavior on \u1aa2
 --

 Key: LUCENE-5927
 URL: https://issues.apache.org/jira/browse/LUCENE-5927
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Ryan Ernst

 In 4.9, this string was broken into 2 tokens by StandardTokenizer:
 \u1aa2\u1a7f\u1a6f\u1a6f\u1a61\u1a72 = \u1aa2,  
 \u1a7f\u1a6f\u1a6f\u1a61\u1a72
 However, in 4.10, that has changed so it is now a single token returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5925) Use rename instead of segments_N fallback / segments.gen etc

2014-09-08 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126174#comment-14126174
 ] 

Robert Muir commented on LUCENE-5925:
-

Thanks for beasting guys! I ran 150 iterations of 'test-core' myself. I'll give 
it a few days.


 Use rename instead of segments_N fallback / segments.gen etc
 

 Key: LUCENE-5925
 URL: https://issues.apache.org/jira/browse/LUCENE-5925
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index, core/store
Reporter: Robert Muir
  Labels: Java7
 Attachments: LUCENE-5925.patch, LUCENE-5925.patch


 Our commit logic is strange, we write corrupted commit points and only on the 
 last phase of commit do we correct them.
 This means the logic to get the latest commit is always scary and awkward, 
 since it must deal with partial commits, and try to determine if it should 
 fall back to segments_N-1 or actually relay an exception. 
 This logic is incomplete/sheisty as we, e.g. i think we only fall back so far 
 (at most one).
 If we somehow screw up in all this logic do the wrong thing, then we lose 
 data (e.g. LUCENE-4870 wiped entire index because of TooManyOpenFiles).
 We now require java 7, i think we should expore instead writing 
 {{pending_segments_N}} and then in finishCommit() doing an atomic rename to 
 {{segments_N}}. 
 We could then remove all the complex fallback logic completely, since we no 
 longer have to deal with ignoring partial commits, instead simply 
 delivering any exception we get when trying to read the commit and sleep 
 better at night.
 In java 7, we have the apis for this (ATOMIC_MOVE).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5594) don't call 'svnversion' over and over in the build

2014-09-08 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126178#comment-14126178
 ] 

Uwe Schindler commented on LUCENE-5594:
---

Hi,
I already have a good solution: I cache the resolved svn version in a file in 
build dir. This file is used when building the JAR files. All stuff is no 
longer done with command line tools, it now does svnversion and also the svn 
exports using svnkit, grabbed from Maven Central.

The only thing: the file can get stale. I was trying to not use a file and pass 
the svnversion down the build, but thats hard to do. I am still working on a 
better solution...

 don't call 'svnversion' over and over in the build
 --

 Key: LUCENE-5594
 URL: https://issues.apache.org/jira/browse/LUCENE-5594
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Uwe Schindler
 Fix For: 5.0, 4.11


 Some ant tasks (at least release packaging, i dunno what else), call 
 svnversion over and over and over for each module in the build. can we just 
 do this one time instead?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6489) morphlines-cell tests fail after upgrade to TIKA 1.6

2014-09-08 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126205#comment-14126205
 ] 

Mark Miller commented on SOLR-6489:
---

A good chunk of the issue appears to be that the 'decompress' morphline has 
support for the mime type 'application/x-gzip' but not 'application/gzip' and 
that perhaps Tika mimetype detection is now returning that instead for these 
.gz files.

 morphlines-cell tests fail after upgrade to TIKA 1.6
 

 Key: SOLR-6489
 URL: https://issues.apache.org/jira/browse/SOLR-6489
 Project: Solr
  Issue Type: Bug
  Components: Tests
Affects Versions: 4.11
Reporter: Uwe Schindler
Assignee: Mark Miller
 Fix For: 5.0, 4.11


 After upgrade tp Apache TIKA 1.6 (SOLR-6488), solr-morphlines-cell tests fail 
 with scripting error messages.
 Due to missing understanding, caused by the crazy configuration file format 
 and inability to figure out the test setup, I have to give up and hope that 
 somebody else can take care. In addition, on my own machines, all of Hadoop 
 does not work at all, so I cannot debug (Windows).
 The whole Morphlines setup is not really good, because Solr core depends on 
 another TIKA version than the included morphlines libraries. This is not a 
 good situation for Solr, because we should be able to upgrade to any version 
 of our core components and not depend on external libraries that itsself 
 depend on Solr in older versions!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6482) Add an onlyIfDown flag for DELETEREPLICA collections API command

2014-09-08 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-6482:
-
Attachment: SOLR-6482.patch

Patch with a test as well, the test shamelessly piggy-backed on an existing 
test

This may be ready to commit, all tests pass, but I'd like to give people a 
chance to comment.

 Add an onlyIfDown flag for DELETEREPLICA collections API command
 

 Key: SOLR-6482
 URL: https://issues.apache.org/jira/browse/SOLR-6482
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Affects Versions: 5.0, 4.11
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Minor
 Attachments: SOLR-6482.patch, SOLR-6482.patch


 Having the DELETEREPLICA delete the index is scary for some situations, 
 especially ones in which the operations people are taking more explicit 
 control of the topology of their cluster. As we move towards ZK being the 
 one source of truth and deleting replicas that then come back up, this is 
 even scarier.
 I propose to have an optional flag onlyIfDown that remove the replica from 
 the ZK cluster state if (and only if) the node was offline. Default value: 
 false.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-6492) Solr field type that supports multiple, dynamic analyzers

2014-09-08 Thread Trey Grainger (JIRA)
Trey Grainger created SOLR-6492:
---

 Summary: Solr field type that supports multiple, dynamic analyzers
 Key: SOLR-6492
 URL: https://issues.apache.org/jira/browse/SOLR-6492
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Trey Grainger
 Fix For: 4.11


A common request - particularly for multilingual search - is to be able to 
support one or more dynamically-selected analyzers for a field. For example, 
someone may have a content field and pass in a document in Greek (using an 
Analyzer with Tokenizer/Filters for German), a separate document in English 
(using an English Analyzer), and possibly even a field with mixed-language 
content in Greek and English. This latter case could pass the content 
separately through both an analyzer defined for Greek and another Analyzer 
defined for English, stacking or concatenating the token streams based upon the 
use-case.

There are some distinct advantages in terms of index size and query performance 
which can be obtained by stacking terms from multiple analyzers in the same 
field instead of duplicating content in separate fields and searching across 
multiple fields. 

Other non-multilingual use cases may include things like switching to a 
different analyzer for the same field to remove a feature (i.e. turning on/off 
query-time synonyms against the same field on a per-query basis).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6489) morphlines-cell tests fail after upgrade to TIKA 1.6

2014-09-08 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126233#comment-14126233
 ] 

Mark Miller commented on SOLR-6489:
---

I'll disable the gz tests and file an issue with morphlines.

 morphlines-cell tests fail after upgrade to TIKA 1.6
 

 Key: SOLR-6489
 URL: https://issues.apache.org/jira/browse/SOLR-6489
 Project: Solr
  Issue Type: Bug
  Components: Tests
Affects Versions: 4.11
Reporter: Uwe Schindler
Assignee: Mark Miller
 Fix For: 5.0, 4.11


 After upgrade tp Apache TIKA 1.6 (SOLR-6488), solr-morphlines-cell tests fail 
 with scripting error messages.
 Due to missing understanding, caused by the crazy configuration file format 
 and inability to figure out the test setup, I have to give up and hope that 
 somebody else can take care. In addition, on my own machines, all of Hadoop 
 does not work at all, so I cannot debug (Windows).
 The whole Morphlines setup is not really good, because Solr core depends on 
 another TIKA version than the included morphlines libraries. This is not a 
 good situation for Solr, because we should be able to upgrade to any version 
 of our core components and not depend on external libraries that itsself 
 depend on Solr in older versions!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6492) Solr field type that supports multiple, dynamic analyzers

2014-09-08 Thread Trey Grainger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126258#comment-14126258
 ] 

Trey Grainger commented on SOLR-6492:
-

I previously implemented this field type when writing chapter 14 of _Solr in 
Action_, but I would like to make some improvements and then submit the code 
back to Solr to (hopefully) be committed. The current code from _Solr in 
Action_ can be found here:
[https://github.com/treygrainger/solr-in-action/tree/first-edition/src/main/java/sia/ch14]

To use the current version, you would do the following:
1) Add the following to schema.xml:
  fieldType name=multiText
class=sia.ch14.MultiTextField sortMissingLast=true
defaultFieldType=text_general
fieldMappings=en:text_english,
   es:text_spanish,
   fr:text_french,
   fr:text_german/ *

  field name=someMultiTextField type=multiText indexed=true 
multiValued=true /

  *note that text_spanish, text_english, text_french, and text_german 
refer to field types which are defined elsewhere in the schema.xml:

2) Index a document with a field containing multilingual text using syntax like 
one of the following:
  field name=someMultiTextFieldsome text/field **
  field name=someMultiTextFielden|some text/field
  field name=someMultiTextFieldes|some more text/field
  field name=someMultiTextFieldde,fr|some other text/field

  **uses the default analyzer

3) submit a query specifying which language you want to query in:
  /select?q=someMultiTextField:en,de|keyword_goes_here

--

Improvements to be made before the patch is finalized:
1) Make it possible to specify the field type mappings in the field name 
instead of the field value:
  field name=someMultiTextFieldde,fr|some other text/field
  /select?q=a bunch of keywords heredf=someMultiTextField|en,de

This makes querying easier, because the languages can be detected prior to 
parsing of the query, which prevents prefixes from having to be substituted on 
each query term (which is cost-prohibitive for most because it effectively 
means pre-parsing the query before it goes to Solr).

2) Enable support for switching between stacking token streams from each 
analyzer (good default because it mostly respects position increments across 
languages and minimizes duplicate tokens in the index) and concatenating token 
streams.

3) Possibly add the ability to switch analyzers in the middle of input text:
field name=someMultiTextFieldde,fr|some other el|text/field

4) Extensive unit testing

 Solr field type that supports multiple, dynamic analyzers
 -

 Key: SOLR-6492
 URL: https://issues.apache.org/jira/browse/SOLR-6492
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Trey Grainger
 Fix For: 4.11


 A common request - particularly for multilingual search - is to be able to 
 support one or more dynamically-selected analyzers for a field. For example, 
 someone may have a content field and pass in a document in Greek (using an 
 Analyzer with Tokenizer/Filters for German), a separate document in English 
 (using an English Analyzer), and possibly even a field with mixed-language 
 content in Greek and English. This latter case could pass the content 
 separately through both an analyzer defined for Greek and another Analyzer 
 defined for English, stacking or concatenating the token streams based upon 
 the use-case.
 There are some distinct advantages in terms of index size and query 
 performance which can be obtained by stacking terms from multiple analyzers 
 in the same field instead of duplicating content in separate fields and 
 searching across multiple fields. 
 Other non-multilingual use cases may include things like switching to a 
 different analyzer for the same field to remove a feature (i.e. turning 
 on/off query-time synonyms against the same field on a per-query basis).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6489) morphlines-cell tests fail after upgrade to TIKA 1.6

2014-09-08 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126269#comment-14126269
 ] 

Mark Miller commented on SOLR-6489:
---

The relevant change to Tika that caused this appears to be 
https://issues.apache.org/jira/browse/TIKA-1280 GZip now has an official 
mimetype, which refers to https://tools.ietf.org/html/rfc6713.

A list of alternates before the official mimetype can be found off 
https://issues.apache.org/jira/browse/TIKA-1282 Additional Gzip types.

 morphlines-cell tests fail after upgrade to TIKA 1.6
 

 Key: SOLR-6489
 URL: https://issues.apache.org/jira/browse/SOLR-6489
 Project: Solr
  Issue Type: Bug
  Components: Tests
Affects Versions: 4.11
Reporter: Uwe Schindler
Assignee: Mark Miller
 Fix For: 5.0, 4.11


 After upgrade tp Apache TIKA 1.6 (SOLR-6488), solr-morphlines-cell tests fail 
 with scripting error messages.
 Due to missing understanding, caused by the crazy configuration file format 
 and inability to figure out the test setup, I have to give up and hope that 
 somebody else can take care. In addition, on my own machines, all of Hadoop 
 does not work at all, so I cannot debug (Windows).
 The whole Morphlines setup is not really good, because Solr core depends on 
 another TIKA version than the included morphlines libraries. This is not a 
 good situation for Solr, because we should be able to upgrade to any version 
 of our core components and not depend on external libraries that itsself 
 depend on Solr in older versions!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6489) morphlines-cell tests fail after upgrade to TIKA 1.6

2014-09-08 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126293#comment-14126293
 ] 

Uwe Schindler commented on SOLR-6489:
-

bq. Uwe Schindler, you really should wait longer than a weekend to disable 
tests and say you don't understand what the problem is because you want to slam 
in a library upgrade real quick. It's a distributed community, give time for 
collaboration before you disable tests. I doubt it was critical to jam in Tika 
1.6 over the weekend.

Sorry, I was not expecting this to fail. And as I said: I was not able to test 
it locally.

 morphlines-cell tests fail after upgrade to TIKA 1.6
 

 Key: SOLR-6489
 URL: https://issues.apache.org/jira/browse/SOLR-6489
 Project: Solr
  Issue Type: Bug
  Components: Tests
Affects Versions: 4.11
Reporter: Uwe Schindler
Assignee: Mark Miller
 Fix For: 5.0, 4.11


 After upgrade tp Apache TIKA 1.6 (SOLR-6488), solr-morphlines-cell tests fail 
 with scripting error messages.
 Due to missing understanding, caused by the crazy configuration file format 
 and inability to figure out the test setup, I have to give up and hope that 
 somebody else can take care. In addition, on my own machines, all of Hadoop 
 does not work at all, so I cannot debug (Windows).
 The whole Morphlines setup is not really good, because Solr core depends on 
 another TIKA version than the included morphlines libraries. This is not a 
 good situation for Solr, because we should be able to upgrade to any version 
 of our core components and not depend on external libraries that itsself 
 depend on Solr in older versions!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-6492) Solr field type that supports multiple, dynamic analyzers

2014-09-08 Thread Trey Grainger (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126258#comment-14126258
 ] 

Trey Grainger edited comment on SOLR-6492 at 9/8/14 11:55 PM:
--

I previously implemented this field type when writing chapter 14 of _Solr in 
Action_, but I would like to make some improvements and then submit the code 
back to Solr to (hopefully) be committed. The current code from _Solr in 
Action_ can be found here:
[https://github.com/treygrainger/solr-in-action/tree/first-edition/src/main/java/sia/ch14]

To use the current version, you would do the following:
1) Add the following to schema.xml:
  fieldType name=multiText
class=sia.ch14.MultiTextField sortMissingLast=true
defaultFieldType=text_general
fieldMappings=en:text_english,
   es:text_spanish,
   fr:text_french,
   fr:text_german/ *

  field name=someMultiTextField type=multiText indexed=true 
multiValued=true /

  *note that text_spanish, text_english, text_french, and text_german 
refer to field types which are defined elsewhere in the schema.xml:

2) Index a document with a field containing multilingual text using syntax like 
one of the following:
  field name=someMultiTextFieldsome text/field **
  field name=someMultiTextFielden|some text/field
  field name=someMultiTextFieldes|some more text/field
  field name=someMultiTextFieldde,fr|some other text/field

  **uses the default analyzer

3) submit a query specifying which language you want to query in:
  /select?q=someMultiTextField:en,de|keyword_goes_here

--

Improvements to be made before the patch is finalized:
1) Make it possible to specify the field type mappings in the field name 
instead of the field value:
  field name=someMultiTextField|de,frsome other text/field
  /select?q=a bunch of keywords heredf=someMultiTextField|en,de

This makes querying easier, because the languages can be detected prior to 
parsing of the query, which prevents prefixes from having to be substituted on 
each query term (which is cost-prohibitive for most because it effectively 
means pre-parsing the query before it goes to Solr).

2) Enable support for switching between stacking token streams from each 
analyzer (good default because it mostly respects position increments across 
languages and minimizes duplicate tokens in the index) and concatenating token 
streams.

3) Possibly add the ability to switch analyzers in the middle of input text:
field name=someMultiTextFieldde,fr|some other el|text/field

4) Extensive unit testing


was (Author: solrtrey):
I previously implemented this field type when writing chapter 14 of _Solr in 
Action_, but I would like to make some improvements and then submit the code 
back to Solr to (hopefully) be committed. The current code from _Solr in 
Action_ can be found here:
[https://github.com/treygrainger/solr-in-action/tree/first-edition/src/main/java/sia/ch14]

To use the current version, you would do the following:
1) Add the following to schema.xml:
  fieldType name=multiText
class=sia.ch14.MultiTextField sortMissingLast=true
defaultFieldType=text_general
fieldMappings=en:text_english,
   es:text_spanish,
   fr:text_french,
   fr:text_german/ *

  field name=someMultiTextField type=multiText indexed=true 
multiValued=true /

  *note that text_spanish, text_english, text_french, and text_german 
refer to field types which are defined elsewhere in the schema.xml:

2) Index a document with a field containing multilingual text using syntax like 
one of the following:
  field name=someMultiTextFieldsome text/field **
  field name=someMultiTextFielden|some text/field
  field name=someMultiTextFieldes|some more text/field
  field name=someMultiTextFieldde,fr|some other text/field

  **uses the default analyzer

3) submit a query specifying which language you want to query in:
  /select?q=someMultiTextField:en,de|keyword_goes_here

--

Improvements to be made before the patch is finalized:
1) Make it possible to specify the field type mappings in the field name 
instead of the field value:
  field name=someMultiTextFieldde,fr|some other text/field
  /select?q=a bunch of keywords heredf=someMultiTextField|en,de

This makes querying easier, because the languages can be detected prior to 
parsing of the query, which prevents prefixes from having to be substituted on 
each query term (which is cost-prohibitive for most because it effectively 
means pre-parsing the query before it goes to Solr).

2) Enable support for switching between stacking token streams from each 
analyzer (good default because it mostly respects position increments across 
languages and minimizes duplicate tokens in the index) and 

[jira] [Created] (SOLR-6493) stats on multivalued fields don't respect excluded filters

2014-09-08 Thread Hoss Man (JIRA)
Hoss Man created SOLR-6493:
--

 Summary: stats on multivalued fields don't respect excluded filters
 Key: SOLR-6493
 URL: https://issues.apache.org/jira/browse/SOLR-6493
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.9, 4.8, 4.10
Reporter: Hoss Man
Assignee: Hoss Man


SOLR-3177 added support to StatsComponent for using the ex local param to 
exclude tagged filters, but these exclusions have apparently never been correct 
for multivalued fields



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6493) stats on multivalued fields don't respect excluded filters

2014-09-08 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-6493:
---
Attachment: SOLR-6493.patch

FWIW: I discovered this while experimenting with some refactoring of  
StatsComponent to make SOLR-6348's su-tasks more viable.



attached patch showing trivial fix and some example test cases for reproducing.

nocommits in test just to remind me to make it more robust before committing, 
hopefully i'll wrap this up tomorrow.

 stats on multivalued fields don't respect excluded filters
 --

 Key: SOLR-6493
 URL: https://issues.apache.org/jira/browse/SOLR-6493
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.8, 4.9, 4.10
Reporter: Hoss Man
Assignee: Hoss Man
 Attachments: SOLR-6493.patch


 SOLR-3177 added support to StatsComponent for using the ex local param to 
 exclude tagged filters, but these exclusions have apparently never been 
 correct for multivalued fields



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6485) ReplicationHandler should have an option to throttle the speed of replication

2014-09-08 Thread Varun Thacker (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Thacker updated SOLR-6485:

Attachment: SOLR-6485.patch

New patch. The previous approach was wrong. Now I am using SimpleRateLimiter.

Debugging as to why it's not throttling correctly.

 ReplicationHandler should have an option to throttle the speed of replication
 -

 Key: SOLR-6485
 URL: https://issues.apache.org/jira/browse/SOLR-6485
 Project: Solr
  Issue Type: Improvement
Reporter: Varun Thacker
 Attachments: SOLR-6485.patch, SOLR-6485.patch


 The ReplicationHandler should have an option to throttle the speed of 
 replication.
 It is useful for people who want bring up nodes in their SolrCloud cluster or 
 when have a backup-restore API and not eat up all their network bandwidth 
 while replicating.
 I am writing a test case and will attach a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Confused about writing a ZK state.

2014-09-08 Thread Erick Erickson
I'm just not getting it. But then again it's late and the code is unfamiliar

Anyway, I'm working on SOLR-6491 for which I want to have a
preferredLeader property in ZK.

I _think_ this fits best as a property in the same place as the
leader prop and it would be a boolean. I.e. the cluster state for
collection1/shards/shard1/replicas/core_node_2 might have a
preferred_leader attribute that could be set to true. This would
be totally independent of whether or not leader was true, although
they would very often be the same. The preferredLeader is really
just supposed to be a hint at leader-election time.

Anyway, all this seems well and good but I don't see a convenient way
to set/clear a single property in a single node in clusterstate. What
I think I'm seeing is that the cluster state is only written by the
Overseer and the Overseer doesn't deal with this case yet. Things like
updateState seem like they have another purpose.

So I'm guessing that I need to write another command for Overseer to
implement, something like setnodeprop that takes a collection, shard,
node, and one or more (property/propval) pairs. Then, to change the
clusterstate I'd put together a ZkNodeProps and put it in the queue
returned from Overseer.getInQueue(zkClient). Then wait for it to be
processed before declaring victory (actually I'd only have to wait in
the test I think).

Mostly I'm looking for whether this is on the right track or
completely of base. Also giving folks a chance to object before I
invest the time and effort in something totally useless.

Thanks!
Erick

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org