[jira] [Commented] (LUCENE-5525) Implement MultiFacets.getAllDims

2014-03-15 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936049#comment-13936049
 ] 

Shai Erera commented on LUCENE-5525:


Looks good, +1!

 Implement MultiFacets.getAllDims
 

 Key: LUCENE-5525
 URL: https://issues.apache.org/jira/browse/LUCENE-5525
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Affects Versions: 4.7
Reporter: Jose Peleteiro
Assignee: Michael McCandless
 Attachments: LUCENE-5525.patch


 DrillSideways.DrillSidewaysResult uses Facets when the query does not filter 
 by a facet, but it uses MultiFacets when it does, and MultiFacets 
 implementation is not complete.
 See: 
 https://github.com/apache/lucene-solr/blob/0b0bc89932622f5bc2c4d74f978178b9ae15c700/lucene/facet/src/java/org/apache/lucene/facet/MultiFacets.java#L67
 See http://pastebin.com/5eDbTM2v 
 This code works when DrillDownQuery.add is not called (when there is no 
 facets selected), but it fails with an UnsupportedOperationException.
 Perhaps I'm not using Facets correctly, but I'm trying to figure it out to 
 upgrade from 4.6.1 by my self as I could not find a documentation other than 
 javadocs for facets.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5525) Implement MultiFacets.getAllDims

2014-03-15 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936050#comment-13936050
 ] 

Shai Erera commented on LUCENE-5525:


I reviewed MultiCategoryListsFacetsExample.java under lucene/demo -- do you 
think it should use MultiFacets? And also exercise getAllDims()?

 Implement MultiFacets.getAllDims
 

 Key: LUCENE-5525
 URL: https://issues.apache.org/jira/browse/LUCENE-5525
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Affects Versions: 4.7
Reporter: Jose Peleteiro
Assignee: Michael McCandless
 Attachments: LUCENE-5525.patch


 DrillSideways.DrillSidewaysResult uses Facets when the query does not filter 
 by a facet, but it uses MultiFacets when it does, and MultiFacets 
 implementation is not complete.
 See: 
 https://github.com/apache/lucene-solr/blob/0b0bc89932622f5bc2c4d74f978178b9ae15c700/lucene/facet/src/java/org/apache/lucene/facet/MultiFacets.java#L67
 See http://pastebin.com/5eDbTM2v 
 This code works when DrillDownQuery.add is not called (when there is no 
 facets selected), but it fails with an UnsupportedOperationException.
 Perhaps I'm not using Facets correctly, but I'm trying to figure it out to 
 upgrade from 4.6.1 by my self as I could not find a documentation other than 
 javadocs for facets.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5477) Async execution of OverseerCollectionProcessor tasks

2014-03-15 Thread Anshum Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936061#comment-13936061
 ] 

Anshum Gupta commented on SOLR-5477:


Thanks for pointing that out Steve.
This must have gotten in when I started working on this one i.e. before 
SOLR-3854 went in and just stayed as a result of a bad merge.

I'll fix this up.

 Async execution of OverseerCollectionProcessor tasks
 

 Key: SOLR-5477
 URL: https://issues.apache.org/jira/browse/SOLR-5477
 Project: Solr
  Issue Type: Sub-task
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Anshum Gupta
 Attachments: SOLR-5477-CoreAdminStatus.patch, 
 SOLR-5477-updated.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, 
 SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, 
 SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, 
 SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, 
 SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, 
 SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch


 Typical collection admin commands are long running and it is very common to 
 have the requests get timed out.  It is more of a problem if the cluster is 
 very large.Add an option to run these commands asynchronously
 add an extra param async=true for all collection commands
 the task is written to ZK and the caller is returned a task id. 
 as separate collection admin command will be added to poll the status of the 
 task
 command=statusid=7657668909
 if id is not passed all running async tasks should be listed
 A separate queue is created to store in-process tasks . After the tasks are 
 completed the queue entry is removed. OverSeerColectionProcessor will perform 
 these tasks in multiple threads



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5477) Async execution of OverseerCollectionProcessor tasks

2014-03-15 Thread Anshum Gupta (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anshum Gupta updated SOLR-5477:
---

Attachment: SOLR-5477.urlschemefix.patch

Fix for not modifying url scheme.

 Async execution of OverseerCollectionProcessor tasks
 

 Key: SOLR-5477
 URL: https://issues.apache.org/jira/browse/SOLR-5477
 Project: Solr
  Issue Type: Sub-task
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Anshum Gupta
 Attachments: SOLR-5477-CoreAdminStatus.patch, 
 SOLR-5477-updated.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, 
 SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, 
 SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, 
 SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, 
 SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, 
 SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, 
 SOLR-5477.urlschemefix.patch


 Typical collection admin commands are long running and it is very common to 
 have the requests get timed out.  It is more of a problem if the cluster is 
 very large.Add an option to run these commands asynchronously
 add an extra param async=true for all collection commands
 the task is written to ZK and the caller is returned a task id. 
 as separate collection admin command will be added to poll the status of the 
 task
 command=statusid=7657668909
 if id is not passed all running async tasks should be listed
 A separate queue is created to store in-process tasks . After the tasks are 
 completed the queue entry is removed. OverSeerColectionProcessor will perform 
 these tasks in multiple threads



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5477) Async execution of OverseerCollectionProcessor tasks

2014-03-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936085#comment-13936085
 ] 

ASF subversion and git services commented on SOLR-5477:
---

Commit 1577801 from [~anshumg] in branch 'dev/trunk'
[ https://svn.apache.org/r1577801 ]

SOLR-5477: Fix URL scheme modification from an earlier commit for SOLR-5477.

 Async execution of OverseerCollectionProcessor tasks
 

 Key: SOLR-5477
 URL: https://issues.apache.org/jira/browse/SOLR-5477
 Project: Solr
  Issue Type: Sub-task
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Anshum Gupta
 Attachments: SOLR-5477-CoreAdminStatus.patch, 
 SOLR-5477-updated.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, 
 SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, 
 SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, 
 SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, 
 SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, 
 SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, 
 SOLR-5477.urlschemefix.patch


 Typical collection admin commands are long running and it is very common to 
 have the requests get timed out.  It is more of a problem if the cluster is 
 very large.Add an option to run these commands asynchronously
 add an extra param async=true for all collection commands
 the task is written to ZK and the caller is returned a task id. 
 as separate collection admin command will be added to poll the status of the 
 task
 command=statusid=7657668909
 if id is not passed all running async tasks should be listed
 A separate queue is created to store in-process tasks . After the tasks are 
 completed the queue entry is removed. OverSeerColectionProcessor will perform 
 these tasks in multiple threads



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.7.0_51) - Build # 9800 - Failure!

2014-03-15 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/9800/
Java: 32bit/jdk1.7.0_51 -client -XX:+UseSerialGC

1 tests failed.
REGRESSION:  
org.apache.solr.client.solrj.impl.CloudSolrServerTest.testDistribSearch

Error Message:
java.util.concurrent.TimeoutException: Could not connect to ZooKeeper 
127.0.0.1:44565 within 45000 ms

Stack Trace:
org.apache.solr.common.SolrException: java.util.concurrent.TimeoutException: 
Could not connect to ZooKeeper 127.0.0.1:44565 within 45000 ms
at 
__randomizedtesting.SeedInfo.seed([D09CC97019C4AF45:517A47686E9BCF79]:0)
at 
org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:150)
at 
org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:101)
at 
org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:91)
at 
org.apache.solr.cloud.AbstractZkTestCase.buildZooKeeper(AbstractZkTestCase.java:89)
at 
org.apache.solr.cloud.AbstractZkTestCase.buildZooKeeper(AbstractZkTestCase.java:83)
at 
org.apache.solr.cloud.AbstractDistribZkTestBase.setUp(AbstractDistribZkTestBase.java:70)
at 
org.apache.solr.cloud.AbstractFullDistribZkTestBase.setUp(AbstractFullDistribZkTestBase.java:201)
at 
org.apache.solr.client.solrj.impl.CloudSolrServerTest.setUp(CloudSolrServerTest.java:78)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1617)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:860)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:876)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:359)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:783)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:443)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:835)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:771)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:782)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 

[jira] [Assigned] (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

2014-03-15 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson reassigned LUCENE-1486:
--

Assignee: Erick Erickson

 Wildcards, ORs etc inside Phrase queries
 

 Key: LUCENE-1486
 URL: https://issues.apache.org/jira/browse/LUCENE-1486
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/queryparser
Affects Versions: 2.4
Reporter: Mark Harwood
Assignee: Erick Erickson
Priority: Minor
 Fix For: 4.7

 Attachments: ComplexPhraseQueryParser.java, LUCENE-1486.patch, 
 LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, 
 LUCENE-1486.patch, LUCENE-1486.patch, Lucene-1486 non default field.patch, 
 TestComplexPhraseQuery.java, junit_complex_phrase_qp_07_21_2009.patch, 
 junit_complex_phrase_qp_07_22_2009.patch


 An extension to the default QueryParser that overrides the parsing of 
 PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
 The implementation feels a little hacky - this is arguably better handled in 
 QueryParser itself. This works as a proof of concept  for much of the query 
 parser syntax. Examples from the Junit test include:
   checkMatches(\j*   smyth~\, 1,2); //wildcards and fuzzies 
 are OK in phrases
   checkMatches(\(jo* -john)  smith\, 2); // boolean logic 
 works
   checkMatches(\jo*  smith\~2, 1,2,3); // position logic 
 works.
   
   checkBadQuery(\jo*  id:1 smith\); //mixing fields in a 
 phrase is bad
   checkBadQuery(\jo* \smith\ \); //phrases inside phrases 
 is bad
   checkBadQuery(\jo* [sma TO smZ]\ \); //range queries 
 inside phrases not supported
 Code plus Junit test to follow...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-3758) Allow the ComplexPhraseQueryParser to search order or un-order proximity queries.

2014-03-15 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson reassigned LUCENE-3758:
--

Assignee: Erick Erickson

 Allow the ComplexPhraseQueryParser to search order or un-order proximity 
 queries.
 -

 Key: LUCENE-3758
 URL: https://issues.apache.org/jira/browse/LUCENE-3758
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/queryparser
Affects Versions: 4.0-ALPHA
Reporter: Tomás Fernández Löbbe
Assignee: Erick Erickson
Priority: Minor
 Fix For: 4.7

 Attachments: LUCENE-3758.patch


 The ComplexPhraseQueryParser use SpanNearQuery, but always set the inOrder 
 value hardcoded to true. This could be configurable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1604) Wildcards, ORs etc inside Phrase Queries

2014-03-15 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936189#comment-13936189
 ] 

Erick Erickson commented on SOLR-1604:
--

OK, I was looking around the patch and think I understand at least some of 
what's going on. To drive this forward, I need a couple of things:

1 Vitaliy and Ahmet to resolve the two patches and let me know what the right 
one to use is. BTW, Vitaliy, please use svn diff or the equivalent Git 
command to create patches. Zipped up sources are much harder to work with.

2 Some idea of a roadmap from here. Straw-man proposal:
2a Close 1486 and open a new JIRA if there's a fix for that if necessary. It 
looks to me like this patch can be committed without 1486 and we'll generate a 
separate fix.
2b commit 3758, and remove inOrder from this patch, then commit this patch.
2c I've assigned these to myself so I don't lose track of them. I'll look 
desperately for cycles to work on them :). But I have a couple of long plane 
flights in my future...

3 Of course we need to document the syntax and behavior here, [~ctargett] can 
probably point us in the right direction for doing this right by putting it in 
the new documentation!

4 I'm also curious what we know now in terms of performance, resource 
requirements, that kind of stuff.

5 I notice there's a patch labeled as having to do with license stuff. What's 
up there? Is this just putting the headers in the source files?

5 Anything else? Does anyone out there object to moving forward with this?

 Wildcards, ORs etc inside Phrase Queries
 

 Key: SOLR-1604
 URL: https://issues.apache.org/jira/browse/SOLR-1604
 Project: Solr
  Issue Type: Improvement
  Components: query parsers, search
Affects Versions: 1.4
Reporter: Ahmet Arslan
Assignee: Erick Erickson
Priority: Minor
 Attachments: ASF.LICENSE.NOT.GRANTED--ComplexPhrase.zip, 
 ComplexPhrase-4.2.1.zip, ComplexPhrase-4.7.zip, ComplexPhrase.zip, 
 ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, 
 ComplexPhrase.zip, ComplexPhraseQueryParser.java, ComplexPhrase_solr_3.4.zip, 
 SOLR-1604-alternative.patch, SOLR-1604.patch, SOLR-1604.patch, 
 SOLR-1604.patch, SOLR-1604.patch


 Solr Plugin for ComplexPhraseQueryParser (LUCENE-1486) which supports 
 wildcards, ORs, ranges, fuzzies inside phrase queries.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Analytics test errors

2014-03-15 Thread Erick Erickson
I was all excited by the lack of errors coming from these tests until
I noticed they were BadApples.

So I took the ExpressionTest BadApple designation out and ran the test
20K times without error (it used to fail on my Mac).

I'm going to pull the other BadApple designations out now that I'm
stealing some cycles to work with this run all the tests a bunch of
times on my laptop and, if I can't repro the problem, un-bad-apple
them and commit to trunk unless there are lots of objections.
Otherwise I don't see how to make forward progress on these.

Apologies for the long period when they generated test noise, I've
been unable to devote any time to it for far too long.

Erick

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1604) Wildcards, ORs etc inside Phrase Queries

2014-03-15 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936212#comment-13936212
 ] 

Ahmet Arslan commented on SOLR-1604:


Here are some clarification regarding zipped attachments :

Zipped attachments are not meant for source code inclusion but for to be 
consumed as solr plugin. They will never be committed. Mainly because zipped 
version(s) include a duplicate code from lucene code base. Duplicated class is 
org.apache.lucene.queryparser.complexPhrase.ComplexPhraseQueryParser. 
Duplication is done for two reasons :
* To enable fielded queries. this duplicate code changes package name to 
org.apache.lucene.queryparser.classic.ComplexPhraseQueryParser.  Originally 
Somehow this feature forgotten accidentally in LUCENE1468, while committing 
lucene.ComplexPhraseQueryParser. After that commit, package name changed from 
classic to complexPhrase. For this fix it needs to access a field from super 
class. After realizing this, chancing this fields visibility to protected is 
accepted by lazy consensus. This is the 
[patch|https://issues.apache.org/jira/secure/attachment/12513804/LUCENE-1486.patch]
 for this.
* To enable ability change inOrder parameter. In original lucene code inOrder 
parameter is barcoded to true inSpanNearQuery classes. Separate jira for this 
is LUCENE-3758.

By the way, why LUCENE-1486 is re-opened is a mystery. It is not re-opened 
because of unforgotten non-default patch. 

 Wildcards, ORs etc inside Phrase Queries
 

 Key: SOLR-1604
 URL: https://issues.apache.org/jira/browse/SOLR-1604
 Project: Solr
  Issue Type: Improvement
  Components: query parsers, search
Affects Versions: 1.4
Reporter: Ahmet Arslan
Assignee: Erick Erickson
Priority: Minor
 Attachments: ASF.LICENSE.NOT.GRANTED--ComplexPhrase.zip, 
 ComplexPhrase-4.2.1.zip, ComplexPhrase-4.7.zip, ComplexPhrase.zip, 
 ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, 
 ComplexPhrase.zip, ComplexPhraseQueryParser.java, ComplexPhrase_solr_3.4.zip, 
 SOLR-1604-alternative.patch, SOLR-1604.patch, SOLR-1604.patch, 
 SOLR-1604.patch, SOLR-1604.patch


 Solr Plugin for ComplexPhraseQueryParser (LUCENE-1486) which supports 
 wildcards, ORs, ranges, fuzzies inside phrase queries.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1604) Wildcards, ORs etc inside Phrase Queries

2014-03-15 Thread Ahmet Arslan (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936221#comment-13936221
 ] 

Ahmet Arslan commented on SOLR-1604:


bq. Vitaliy and Ahmet to resolve the two patches and let me know what the right 
one to use is.
none of them actually. They include source code (ComplexPhraseQueryParser.java) 
duplication from lucene. I will attach a patch that consumes lucene's 
ComplexPhraseQueryParser created against trunk.

bq. Close 1486 and open a new JIRA if there's a fix for that if necessary. It 
looks to me like this patch can be committed without 1486 and we'll generate a 
separate fix.
+1. Yes this patch can be committed without LUCENE-1486. +1 for closing 
LUCENE-1486 given that it is re-opened mysteriously. +1 for creating a separate 
jira for 
[this|https://issues.apache.org/jira/secure/attachment/12513804/LUCENE-1486.patch]
 functionality just because it is less confusing. 

bq. commit 3758, and remove inOrder from this patch, then commit this patch.
Request ability change inOrder parameter came from a user. Robert had 
[this|https://issues.apache.org/jira/browse/LUCENE-3758?focusedCommentId=13206996page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13206996]
 comment about this.

bq. I notice there's a patch labeled as having to do with license stuff.
This attachment is old. I accidentally forget selection 'ASF inclusion radio 
box then. Jira weren't displaying feather icon for that. After that incident 
jira had removed that radio button selection option.  Attachments are ASF 
granted by default now. That file is renamed automatically by infra. 


 Wildcards, ORs etc inside Phrase Queries
 

 Key: SOLR-1604
 URL: https://issues.apache.org/jira/browse/SOLR-1604
 Project: Solr
  Issue Type: Improvement
  Components: query parsers, search
Affects Versions: 1.4
Reporter: Ahmet Arslan
Assignee: Erick Erickson
Priority: Minor
 Attachments: ASF.LICENSE.NOT.GRANTED--ComplexPhrase.zip, 
 ComplexPhrase-4.2.1.zip, ComplexPhrase-4.7.zip, ComplexPhrase.zip, 
 ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, 
 ComplexPhrase.zip, ComplexPhraseQueryParser.java, ComplexPhrase_solr_3.4.zip, 
 SOLR-1604-alternative.patch, SOLR-1604.patch, SOLR-1604.patch, 
 SOLR-1604.patch, SOLR-1604.patch


 Solr Plugin for ComplexPhraseQueryParser (LUCENE-1486) which supports 
 wildcards, ORs, ranges, fuzzies inside phrase queries.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2014-03-15 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir reassigned LUCENE-2878:
---

Assignee: Robert Muir  (was: Simon Willnauer)

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Robert Muir
  Labels: gsoc2014
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, 
 PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5865) Provide a MiniSolrCloudCluster to enable easier testing

2014-03-15 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936239#comment-13936239
 ] 

Mark Miller commented on SOLR-5865:
---

Looks great!

+  // We could upload the minimum set of files rather than the directory, 
but that requires keeping the list up to date
+  ZkController.uploadToZK(zkClient, new File(configDir), 
ZkController.CONFIGS_ZKNODE + / + configName);

The main reason most of the cloud tests have gone with specifying which config 
files to put in zk was that uploading the entire directory of test configs was 
damn slow and then repeated for all cloud tests.

A better solution at some point would be a new test config folder just for 
solrcloud. We already have a lot of configs, but we could probably merge some 
things into this - like the common solrconfig and schema that almost all cloud 
tests use anyway. If we kept it to one set, I think it would be an improvement 
for cloud tests.

 Provide a MiniSolrCloudCluster to enable easier testing
 ---

 Key: SOLR-5865
 URL: https://issues.apache.org/jira/browse/SOLR-5865
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Affects Versions: 4.7, 5.0
Reporter: Gregory Chanan
 Attachments: SOLR-5865.patch


 Today, the SolrCloud tests are based on the LuceneTestCase class hierarchy, 
 which has a couple of issues around support for downstream projects:
 - It's difficult to test SolrCloud support in a downstream project that may 
 have its own test framework.  For example, some projects have support for 
 different storage backends (e.g. Solr/ElasticSearch/HBase) and want tests 
 against each of the different backends.  This is difficult to do cleanly, 
 because the Solr tests require derivation from LuceneTestCase, while the 
 other don't
 - The LuceneTestCase class hierarchy is really designed for internal solr 
 tests (e.g. it randomizes a lot of parameters to get test coverage, but a 
 downstream project probably doesn't care about that).  It's also quite 
 complicated and dense, much more so than a downstream project would want.
 Given these reasons, it would be nice to provide a simple 
 MiniSolrCloudCluster, similar to how HDFS provides a MiniHdfsCluster or 
 HBase provides a MiniHBaseCluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2014-03-15 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936240#comment-13936240
 ] 

Simon Willnauer commented on LUCENE-2878:
-

Now we are talking

Sent from my iPhone



 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Robert Muir
  Labels: gsoc2014
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, 
 PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-1604) Wildcards, ORs etc inside Phrase Queries

2014-03-15 Thread Ahmet Arslan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated SOLR-1604:
---

Attachment: SOLR-1604.patch

This is solr-only patch (solr/core/src/) and does not touch lucene code case. 
It adds two new java classes (ComplexPhraseQParserPlugin and 
TestComplexPhraseQParserPlugin) and consumes 
o.a.lucene.queryparser.complexPhrase.ComplexPhraseQueryParser

 Wildcards, ORs etc inside Phrase Queries
 

 Key: SOLR-1604
 URL: https://issues.apache.org/jira/browse/SOLR-1604
 Project: Solr
  Issue Type: Improvement
  Components: query parsers, search
Affects Versions: 1.4
Reporter: Ahmet Arslan
Assignee: Erick Erickson
Priority: Minor
 Attachments: ASF.LICENSE.NOT.GRANTED--ComplexPhrase.zip, 
 ComplexPhrase-4.2.1.zip, ComplexPhrase-4.7.zip, ComplexPhrase.zip, 
 ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, 
 ComplexPhrase.zip, ComplexPhraseQueryParser.java, ComplexPhrase_solr_3.4.zip, 
 SOLR-1604-alternative.patch, SOLR-1604.patch, SOLR-1604.patch, 
 SOLR-1604.patch, SOLR-1604.patch, SOLR-1604.patch


 Solr Plugin for ComplexPhraseQueryParser (LUCENE-1486) which supports 
 wildcards, ORs, ranges, fuzzies inside phrase queries.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

2014-03-15 Thread Mikhail Khludnev (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936265#comment-13936265
 ] 

Mikhail Khludnev commented on LUCENE-5189:
--

Just want to leave one caveat for memories. When you call 
{code}IW.updateNumericDocValue(Term, String, Long){code} make sure that the 
term is deeply cloned before. Otherwise, if you modify term or bytes, then the 
modified version will be applied. That's might be a problem.

 Numeric DocValues Updates
 -

 Key: LUCENE-5189
 URL: https://issues.apache.org/jira/browse/LUCENE-5189
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 4.6, 5.0

 Attachments: LUCENE-5189-4x.patch, LUCENE-5189-4x.patch, 
 LUCENE-5189-no-lost-updates.patch, LUCENE-5189-renames.patch, 
 LUCENE-5189-segdv.patch, LUCENE-5189-updates-order.patch, 
 LUCENE-5189-updates-order.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
 LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
 LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
 LUCENE-5189.patch, LUCENE-5189_process_events.patch, 
 LUCENE-5189_process_events.patch


 In LUCENE-4258 we started to work on incremental field updates, however the 
 amount of changes are immense and hard to follow/consume. The reason is that 
 we targeted postings, stored fields, DV etc., all from the get go.
 I'd like to start afresh here, with numeric-dv-field updates only. There are 
 a couple of reasons to that:
 * NumericDV fields should be easier to update, if e.g. we write all the 
 values of all the documents in a segment for the updated field (similar to 
 how livedocs work, and previously norms).
 * It's a fairly contained issue, attempting to handle just one data type to 
 update, yet requires many changes to core code which will also be useful for 
 updating other data types.
 * It has value in and on itself, and we don't need to allow updating all the 
 data types in Lucene at once ... we can do that gradually.
 I have some working patch already which I'll upload next, explaining the 
 changes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

2014-03-15 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936279#comment-13936279
 ] 

Shai Erera commented on LUCENE-5189:


I checked the code and it looks the same with e.g. deleteDocuments(Term) - the 
Term isn't cloned internally. So your comment pertains to other IW methods.

 Numeric DocValues Updates
 -

 Key: LUCENE-5189
 URL: https://issues.apache.org/jira/browse/LUCENE-5189
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 4.6, 5.0

 Attachments: LUCENE-5189-4x.patch, LUCENE-5189-4x.patch, 
 LUCENE-5189-no-lost-updates.patch, LUCENE-5189-renames.patch, 
 LUCENE-5189-segdv.patch, LUCENE-5189-updates-order.patch, 
 LUCENE-5189-updates-order.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
 LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
 LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
 LUCENE-5189.patch, LUCENE-5189_process_events.patch, 
 LUCENE-5189_process_events.patch


 In LUCENE-4258 we started to work on incremental field updates, however the 
 amount of changes are immense and hard to follow/consume. The reason is that 
 we targeted postings, stored fields, DV etc., all from the get go.
 I'd like to start afresh here, with numeric-dv-field updates only. There are 
 a couple of reasons to that:
 * NumericDV fields should be easier to update, if e.g. we write all the 
 values of all the documents in a segment for the updated field (similar to 
 how livedocs work, and previously norms).
 * It's a fairly contained issue, attempting to handle just one data type to 
 update, yet requires many changes to core code which will also be useful for 
 updating other data types.
 * It has value in and on itself, and we don't need to allow updating all the 
 data types in Lucene at once ... we can do that gradually.
 I have some working patch already which I'll upload next, explaining the 
 changes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5770) All attempts to match a SolrCore with it's state in clusterstate.json should be done with the NodeName rather than the baseUrl.

2014-03-15 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936290#comment-13936290
 ] 

Mark Miller commented on SOLR-5770:
---

Awesome, thanks Steve - had not had a chance to look further at this yet. I'll 
try your patch this weekend.

 All attempts to match a SolrCore with it's state in clusterstate.json should 
 be done with the NodeName rather than the baseUrl.
 ---

 Key: SOLR-5770
 URL: https://issues.apache.org/jira/browse/SOLR-5770
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.8, 5.0

 Attachments: SOLR-5770.patch, SOLR-5770.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5488) Fix up test failures for Analytics Component

2014-03-15 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-5488:
-

Attachment: SOLR-5488.patch

Takes of @Ignore and @BadApple.

See comments here:
https://issues.apache.org/jira/browse/SOLR-5685

This fix suddently caused FieldFacetTest to start failing. It fails first time, 
every time. Interestingly, when it does fail it's because 
MinMaxStatsCollection.getStat is looking for the stat min, but this.min is 
null. Seems like it _may_ be related to the mysterious failures we were seeing, 
but I'm grasping at straws. 

I'll be trying ExpressionTest repeatedly to see if we're back now...

 Fix up test failures for Analytics Component
 

 Key: SOLR-5488
 URL: https://issues.apache.org/jira/browse/SOLR-5488
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.7, 5.0
Reporter: Erick Erickson
Assignee: Erick Erickson
 Attachments: SOLR-5488.patch, SOLR-5488.patch, SOLR-5488.patch, 
 SOLR-5488.patch, SOLR-5488.patch, SOLR-5488.patch, SOLR-5488.patch, eoe.errors


 The analytics component has a few test failures, perhaps 
 environment-dependent. This is just to collect the test fixes in one place 
 for convenience when we merge back into 4.x



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.7.0_51) - Build # 9800 - Failure!

2014-03-15 Thread Mark Miller
Hmm…only interesting logging I see is this:

57473 T32 oazsp.FileTxnLog.commit WARN fsync-ing the write ahead log in 
SyncThread:0 took 50531ms which will adversely effect operation latency. See 
the ZooKeeper troubleshooting guide
I wonder if that means that if i boost the connect timeout from 45 to 60 
seconds, it will pass.
Perhaps this machine has some IO issues?

-- 
Mark Miller
about.me/markrmiller

On March 15, 2014 at 9:23:25 AM, Policeman Jenkins Server (jenk...@thetaphi.de) 
wrote:

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/9800/  
Java: 32bit/jdk1.7.0_51 -client -XX:+UseSerialGC  

1 tests failed.  
REGRESSION: 
org.apache.solr.client.solrj.impl.CloudSolrServerTest.testDistribSearch  

Error Message:  
java.util.concurrent.TimeoutException: Could not connect to ZooKeeper 
127.0.0.1:44565 within 45000 ms  

Stack Trace:  
org.apache.solr.common.SolrException: java.util.concurrent.TimeoutException: 
Could not connect to ZooKeeper 127.0.0.1:44565 within 45000 ms  
at __randomizedtesting.SeedInfo.seed([D09CC97019C4AF45:517A47686E9BCF79]:0)  
at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:150)  
at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:101)  
at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:91)  
at 
org.apache.solr.cloud.AbstractZkTestCase.buildZooKeeper(AbstractZkTestCase.java:89)
  
at 
org.apache.solr.cloud.AbstractZkTestCase.buildZooKeeper(AbstractZkTestCase.java:83)
  
at 
org.apache.solr.cloud.AbstractDistribZkTestBase.setUp(AbstractDistribZkTestBase.java:70)
  
at 
org.apache.solr.cloud.AbstractFullDistribZkTestBase.setUp(AbstractFullDistribZkTestBase.java:201)
  
at 
org.apache.solr.client.solrj.impl.CloudSolrServerTest.setUp(CloudSolrServerTest.java:78)
  
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)  
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)  
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  
at java.lang.reflect.Method.invoke(Method.java:606)  
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1617)
  
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:860)
  
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:876)
  
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
  
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
  
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
  
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
  
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
  
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
  
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
  
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
  
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:359)
  
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:783)
  
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:443)
  
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:835)
  
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:737)
  
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:771)
  
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:782)
  
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
  
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
  
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
  
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
  
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
  
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
  
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
  
at 

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2014-03-15 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936304#comment-13936304
 ] 

Alan Woodward commented on LUCENE-2878:
---

Ooh, hello.

So the LUCENE-2878 branch is a bit of a mess, in that it has two semi-working 
versions of this code: Simon's initial IntervalIterator API, in the 
o.a.l.search.intervals package, and my DocsEnum.nextPosition() API in 
o.a.l.search.positions.  Simon's code is much more complete, and I've been 
using a separately maintained version of that in production code for various 
clients, which you can see at 
https://github.com/flaxsearch/lucene-solr-intervals.  I think the 
nextPosition() API is nicer, but the IntervalIterator API has the advantage of 
actually working.

The github repository has some other stuff on it too, around making the 
intervals code work across different fields.  The API that I've come up with 
there is not very nice, though.

It would be ace to get this moving again!

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Robert Muir
  Labels: gsoc2014
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878_trunk.patch, LUCENE-2878_trunk.patch, PosHighlighter.patch, 
 PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.8.0-fcs-b132) - Build # 9804 - Still Failing!

2014-03-15 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/9804/
Java: 32bit/jdk1.8.0-fcs-b132 -client -XX:+UseSerialGC

1 tests failed.
FAILED:  org.apache.solr.client.solrj.impl.CloudSolrServerTest.testDistribSearch

Error Message:
java.util.concurrent.TimeoutException: Could not connect to ZooKeeper 
127.0.0.1:58601 within 45000 ms

Stack Trace:
org.apache.solr.common.SolrException: java.util.concurrent.TimeoutException: 
Could not connect to ZooKeeper 127.0.0.1:58601 within 45000 ms
at 
__randomizedtesting.SeedInfo.seed([2C01501183016211:ADE7DE09F45E022D]:0)
at 
org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:150)
at 
org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:101)
at 
org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:91)
at 
org.apache.solr.cloud.AbstractZkTestCase.buildZooKeeper(AbstractZkTestCase.java:89)
at 
org.apache.solr.cloud.AbstractZkTestCase.buildZooKeeper(AbstractZkTestCase.java:83)
at 
org.apache.solr.cloud.AbstractDistribZkTestBase.setUp(AbstractDistribZkTestBase.java:70)
at 
org.apache.solr.cloud.AbstractFullDistribZkTestBase.setUp(AbstractFullDistribZkTestBase.java:201)
at 
org.apache.solr.client.solrj.impl.CloudSolrServerTest.setUp(CloudSolrServerTest.java:78)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1617)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:860)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:876)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:359)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:783)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:443)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:835)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:771)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:782)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 

[jira] [Commented] (SOLR-5488) Fix up test failures for Analytics Component

2014-03-15 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936313#comment-13936313
 ] 

Erick Erickson commented on SOLR-5488:
--

OK, maybe we're on to something, ExpressionTest (run with a bunch of 
iterations) failed with a very similar message to FieldFacetTest.

FWIW

 Fix up test failures for Analytics Component
 

 Key: SOLR-5488
 URL: https://issues.apache.org/jira/browse/SOLR-5488
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.7, 5.0
Reporter: Erick Erickson
Assignee: Erick Erickson
 Attachments: SOLR-5488.patch, SOLR-5488.patch, SOLR-5488.patch, 
 SOLR-5488.patch, SOLR-5488.patch, SOLR-5488.patch, SOLR-5488.patch, eoe.errors


 The analytics component has a few test failures, perhaps 
 environment-dependent. This is just to collect the test fixes in one place 
 for convenience when we merge back into 4.x



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.7.0_51) - Build # 9800 - Failure!

2014-03-15 Thread Uwe Schindler
No IO issues and it runs on SSD. Machine is also stable and has no SATA 
timeouts or similar stuff.

It is just a 3 year old server CPU and its running a Vbox VM in parallel.

Uwe

On 15. März 2014 21:31:10 MEZ, Mark Miller markrmil...@gmail.com wrote:
Hmm…only interesting logging I see is this:

57473 T32 oazsp.FileTxnLog.commit WARN fsync-ing the write ahead log in
SyncThread:0 took 50531ms which will adversely effect operation
latency. See the ZooKeeper troubleshooting guide
I wonder if that means that if i boost the connect timeout from 45 to
60 seconds, it will pass.
Perhaps this machine has some IO issues?

-- 
Mark Miller
about.me/markrmiller

On March 15, 2014 at 9:23:25 AM, Policeman Jenkins Server
(jenk...@thetaphi.de) wrote:

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/9800/  
Java: 32bit/jdk1.7.0_51 -client -XX:+UseSerialGC  

1 tests failed.  
REGRESSION:
org.apache.solr.client.solrj.impl.CloudSolrServerTest.testDistribSearch
 

Error Message:  
java.util.concurrent.TimeoutException: Could not connect to ZooKeeper
127.0.0.1:44565 within 45000 ms  

Stack Trace:  
org.apache.solr.common.SolrException:
java.util.concurrent.TimeoutException: Could not connect to ZooKeeper
127.0.0.1:44565 within 45000 ms  
at
__randomizedtesting.SeedInfo.seed([D09CC97019C4AF45:517A47686E9BCF79]:0)
 
at
org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:150)
 
at
org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:101)
 
at
org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:91) 

at
org.apache.solr.cloud.AbstractZkTestCase.buildZooKeeper(AbstractZkTestCase.java:89)
 
at
org.apache.solr.cloud.AbstractZkTestCase.buildZooKeeper(AbstractZkTestCase.java:83)
 
at
org.apache.solr.cloud.AbstractDistribZkTestBase.setUp(AbstractDistribZkTestBase.java:70)
 
at
org.apache.solr.cloud.AbstractFullDistribZkTestBase.setUp(AbstractFullDistribZkTestBase.java:201)
 
at
org.apache.solr.client.solrj.impl.CloudSolrServerTest.setUp(CloudSolrServerTest.java:78)
 
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)  
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 
at java.lang.reflect.Method.invoke(Method.java:606)  
at
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1617)
 
at
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:860)
 
at
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:876)
 
at
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
 
at
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
 
at
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
 
at
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
 
at
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
 
at
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
 
at
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
 
at
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 
at
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 
at
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:359)
 
at
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:783)
 
at
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:443)
 
at
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:835)
 
at
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:737)
 
at
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:771)
 
at
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:782)
 
at
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 
at
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
 
at
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
 
at
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
 
at
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
 
at
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 
at

[jira] [Updated] (LUCENE-3758) Allow the ComplexPhraseQueryParser to search order or un-order proximity queries.

2014-03-15 Thread Ahmet Arslan (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmet Arslan updated LUCENE-3758:
-

Attachment: LUCENE-3758.patch

patch for trunk (revision 1577942)

 Allow the ComplexPhraseQueryParser to search order or un-order proximity 
 queries.
 -

 Key: LUCENE-3758
 URL: https://issues.apache.org/jira/browse/LUCENE-3758
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/queryparser
Affects Versions: 4.0-ALPHA
Reporter: Tomás Fernández Löbbe
Assignee: Erick Erickson
Priority: Minor
 Fix For: 4.7

 Attachments: LUCENE-3758.patch, LUCENE-3758.patch


 The ComplexPhraseQueryParser use SpanNearQuery, but always set the inOrder 
 value hardcoded to true. This could be configurable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5866) UpdateShardHandler needs to use the system default scheme registry to properly handle https via javax.net.ssl.* properties

2014-03-15 Thread Steve Davids (JIRA)
Steve Davids created SOLR-5866:
--

 Summary: UpdateShardHandler needs to use the system default scheme 
registry to properly handle https via javax.net.ssl.* properties
 Key: SOLR-5866
 URL: https://issues.apache.org/jira/browse/SOLR-5866
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.7
Reporter: Steve Davids
 Fix For: 4.8


The UpdateShardHandler configures it's own PoolingClientConnectionManager which 
*doesn't* use the system default scheme registry factory which interrogates the 
javax.net.ssl.* system properties to wire up the https scheme into HttpClient. 
To ease configuration the system default registry should be used.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5866) UpdateShardHandler needs to use the system default scheme registry to properly handle https via javax.net.ssl.* properties

2014-03-15 Thread Steve Davids (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Davids updated SOLR-5866:
---

Attachment: SOLR-5866.patch

Attached the trivial patch.

 UpdateShardHandler needs to use the system default scheme registry to 
 properly handle https via javax.net.ssl.* properties
 --

 Key: SOLR-5866
 URL: https://issues.apache.org/jira/browse/SOLR-5866
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.7
Reporter: Steve Davids
 Fix For: 4.8

 Attachments: SOLR-5866.patch


 The UpdateShardHandler configures it's own PoolingClientConnectionManager 
 which *doesn't* use the system default scheme registry factory which 
 interrogates the javax.net.ssl.* system properties to wire up the https 
 scheme into HttpClient. To ease configuration the system default registry 
 should be used.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5867) OverseerCollectionProcessor isn't properly generating https urls in some cases

2014-03-15 Thread Steve Davids (JIRA)
Steve Davids created SOLR-5867:
--

 Summary: OverseerCollectionProcessor isn't properly generating 
https urls in some cases
 Key: SOLR-5867
 URL: https://issues.apache.org/jira/browse/SOLR-5867
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.7
Reporter: Steve Davids
 Fix For: 4.8


All URLs should generated using a call out to the zk state reader:
{code}
zkStateReader.getBaseUrlForNodeName(nodeName);
{code}

This is because the url scheme is stored in the clusterprops.json file and is 
necessary to properly build the correct URL to propagate the request. Please 
note that if the base_url is available, that should be used since it does have 
the properly built schemed url without the need to check zk.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5867) OverseerCollectionProcessor isn't properly generating https urls in some cases

2014-03-15 Thread Steve Davids (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Davids updated SOLR-5867:
---

Attachment: SOLR-5867.patch

Attached patch.

 OverseerCollectionProcessor isn't properly generating https urls in some cases
 --

 Key: SOLR-5867
 URL: https://issues.apache.org/jira/browse/SOLR-5867
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.7
Reporter: Steve Davids
 Fix For: 4.8

 Attachments: SOLR-5867.patch


 All URLs should generated using a call out to the zk state reader:
 {code}
 zkStateReader.getBaseUrlForNodeName(nodeName);
 {code}
 This is because the url scheme is stored in the clusterprops.json file and is 
 necessary to properly build the correct URL to propagate the request. Please 
 note that if the base_url is available, that should be used since it does 
 have the properly built schemed url without the need to check zk.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5477) Async execution of OverseerCollectionProcessor tasks

2014-03-15 Thread Steve Davids (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936377#comment-13936377
 ] 

Steve Davids commented on SOLR-5477:


You should drop the unnecessary assignment:
{code}
String replica = zkStateReader.getBaseUrlForNodeName(nodeName);
{code}

on line 1829, making an unnecessary call out to zk for a value that isn't being 
used.

 Async execution of OverseerCollectionProcessor tasks
 

 Key: SOLR-5477
 URL: https://issues.apache.org/jira/browse/SOLR-5477
 Project: Solr
  Issue Type: Sub-task
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Anshum Gupta
 Attachments: SOLR-5477-CoreAdminStatus.patch, 
 SOLR-5477-updated.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, 
 SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, 
 SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, 
 SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, 
 SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, 
 SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, 
 SOLR-5477.urlschemefix.patch


 Typical collection admin commands are long running and it is very common to 
 have the requests get timed out.  It is more of a problem if the cluster is 
 very large.Add an option to run these commands asynchronously
 add an extra param async=true for all collection commands
 the task is written to ZK and the caller is returned a task id. 
 as separate collection admin command will be added to poll the status of the 
 task
 command=statusid=7657668909
 if id is not passed all running async tasks should be listed
 A separate queue is created to store in-process tasks . After the tasks are 
 completed the queue entry is removed. OverSeerColectionProcessor will perform 
 these tasks in multiple threads



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5868) HttpClient should be configured to use ALLOW_ALL_HOSTNAME hostname verifier to simplify SSL setup

2014-03-15 Thread Steve Davids (JIRA)
Steve Davids created SOLR-5868:
--

 Summary: HttpClient should be configured to use ALLOW_ALL_HOSTNAME 
hostname verifier to simplify SSL setup
 Key: SOLR-5868
 URL: https://issues.apache.org/jira/browse/SOLR-5868
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.7
Reporter: Steve Davids
 Fix For: 4.8


The default HttpClient hostname verifier is the 
BROWSER_COMPATIBLE_HOSTNAME_VERIFIER which verifies the hostname that is being 
connected to matches the hostname presented within the certificate. This is 
meant to protect clients that are making external requests out across the 
internet, but requests within the the SOLR cluster should be trusted and can be 
relaxed to simplify the SSL/certificate setup process.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5477) Async execution of OverseerCollectionProcessor tasks

2014-03-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936380#comment-13936380
 ] 

ASF subversion and git services commented on SOLR-5477:
---

Commit 1577965 from [~anshumg] in branch 'dev/trunk'
[ https://svn.apache.org/r1577965 ]

SOLR-5477: Removing an unwanted call to zk

 Async execution of OverseerCollectionProcessor tasks
 

 Key: SOLR-5477
 URL: https://issues.apache.org/jira/browse/SOLR-5477
 Project: Solr
  Issue Type: Sub-task
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Anshum Gupta
 Attachments: SOLR-5477-CoreAdminStatus.patch, 
 SOLR-5477-updated.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, 
 SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, 
 SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, 
 SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, 
 SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, 
 SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, 
 SOLR-5477.urlschemefix.patch


 Typical collection admin commands are long running and it is very common to 
 have the requests get timed out.  It is more of a problem if the cluster is 
 very large.Add an option to run these commands asynchronously
 add an extra param async=true for all collection commands
 the task is written to ZK and the caller is returned a task id. 
 as separate collection admin command will be added to poll the status of the 
 task
 command=statusid=7657668909
 if id is not passed all running async tasks should be listed
 A separate queue is created to store in-process tasks . After the tasks are 
 completed the queue entry is removed. OverSeerColectionProcessor will perform 
 these tasks in multiple threads



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5868) HttpClient should be configured to use ALLOW_ALL_HOSTNAME hostname verifier to simplify SSL setup

2014-03-15 Thread Steve Davids (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936381#comment-13936381
 ] 

Steve Davids commented on SOLR-5868:


In the current HttpClientUtil paradigm this can be achieved by retrieving the 
url scheme and setting the hostname verifier on the SSLSocketFactory: 
https://gist.github.com/sdavids13/9577027

If the HTTPClientBuilder approach is introduced (SOLR-5604) then it can be 
simply done via:
{code}
HttpClientBuilder.create().useSystemProperties().setHostnameVerifier(new 
AllowAllHostnameVerifier())...;
{code}


 HttpClient should be configured to use ALLOW_ALL_HOSTNAME hostname verifier 
 to simplify SSL setup
 -

 Key: SOLR-5868
 URL: https://issues.apache.org/jira/browse/SOLR-5868
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.7
Reporter: Steve Davids
 Fix For: 4.8


 The default HttpClient hostname verifier is the 
 BROWSER_COMPATIBLE_HOSTNAME_VERIFIER which verifies the hostname that is 
 being connected to matches the hostname presented within the certificate. 
 This is meant to protect clients that are making external requests out across 
 the internet, but requests within the the SOLR cluster should be trusted and 
 can be relaxed to simplify the SSL/certificate setup process.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5867) OverseerCollectionProcessor isn't properly generating https urls in some cases

2014-03-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936392#comment-13936392
 ] 

ASF subversion and git services commented on SOLR-5867:
---

Commit 1577968 from sha...@apache.org in branch 'dev/trunk'
[ https://svn.apache.org/r1577968 ]

SOLR-5867: OverseerCollectionProcessor isn't properly generating https urls in 
some cases

 OverseerCollectionProcessor isn't properly generating https urls in some cases
 --

 Key: SOLR-5867
 URL: https://issues.apache.org/jira/browse/SOLR-5867
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.7
Reporter: Steve Davids
 Fix For: 4.8

 Attachments: SOLR-5867.patch


 All URLs should generated using a call out to the zk state reader:
 {code}
 zkStateReader.getBaseUrlForNodeName(nodeName);
 {code}
 This is because the url scheme is stored in the clusterprops.json file and is 
 necessary to properly build the correct URL to propagate the request. Please 
 note that if the base_url is available, that should be used since it does 
 have the properly built schemed url without the need to check zk.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5867) OverseerCollectionProcessor isn't properly generating https urls in some cases

2014-03-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936394#comment-13936394
 ] 

ASF subversion and git services commented on SOLR-5867:
---

Commit 1577969 from sha...@apache.org in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1577969 ]

SOLR-5867: OverseerCollectionProcessor isn't properly generating https urls in 
some cases

 OverseerCollectionProcessor isn't properly generating https urls in some cases
 --

 Key: SOLR-5867
 URL: https://issues.apache.org/jira/browse/SOLR-5867
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.7
Reporter: Steve Davids
 Fix For: 4.8, 5.0

 Attachments: SOLR-5867.patch


 All URLs should generated using a call out to the zk state reader:
 {code}
 zkStateReader.getBaseUrlForNodeName(nodeName);
 {code}
 This is because the url scheme is stored in the clusterprops.json file and is 
 necessary to properly build the correct URL to propagate the request. Please 
 note that if the base_url is available, that should be used since it does 
 have the properly built schemed url without the need to check zk.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-5867) OverseerCollectionProcessor isn't properly generating https urls in some cases

2014-03-15 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar resolved SOLR-5867.
-

   Resolution: Fixed
Fix Version/s: 5.0
 Assignee: Shalin Shekhar Mangar

Thanks Steve!

 OverseerCollectionProcessor isn't properly generating https urls in some cases
 --

 Key: SOLR-5867
 URL: https://issues.apache.org/jira/browse/SOLR-5867
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.7
Reporter: Steve Davids
Assignee: Shalin Shekhar Mangar
 Fix For: 4.8, 5.0

 Attachments: SOLR-5867.patch


 All URLs should generated using a call out to the zk state reader:
 {code}
 zkStateReader.getBaseUrlForNodeName(nodeName);
 {code}
 This is because the url scheme is stored in the clusterprops.json file and is 
 necessary to properly build the correct URL to propagate the request. Please 
 note that if the base_url is available, that should be used since it does 
 have the properly built schemed url without the need to check zk.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5866) UpdateShardHandler needs to use the system default scheme registry to properly handle https via javax.net.ssl.* properties

2014-03-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936398#comment-13936398
 ] 

ASF subversion and git services commented on SOLR-5866:
---

Commit 1577971 from sha...@apache.org in branch 'dev/trunk'
[ https://svn.apache.org/r1577971 ]

SOLR-5866: UpdateShardHandler needs to use the system default scheme registry 
to properly handle https via javax.net.ssl.* properties

 UpdateShardHandler needs to use the system default scheme registry to 
 properly handle https via javax.net.ssl.* properties
 --

 Key: SOLR-5866
 URL: https://issues.apache.org/jira/browse/SOLR-5866
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.7
Reporter: Steve Davids
 Fix For: 4.8

 Attachments: SOLR-5866.patch


 The UpdateShardHandler configures it's own PoolingClientConnectionManager 
 which *doesn't* use the system default scheme registry factory which 
 interrogates the javax.net.ssl.* system properties to wire up the https 
 scheme into HttpClient. To ease configuration the system default registry 
 should be used.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5866) UpdateShardHandler needs to use the system default scheme registry to properly handle https via javax.net.ssl.* properties

2014-03-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936399#comment-13936399
 ] 

ASF subversion and git services commented on SOLR-5866:
---

Commit 1577972 from sha...@apache.org in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1577972 ]

SOLR-5866: UpdateShardHandler needs to use the system default scheme registry 
to properly handle https via javax.net.ssl.* properties

 UpdateShardHandler needs to use the system default scheme registry to 
 properly handle https via javax.net.ssl.* properties
 --

 Key: SOLR-5866
 URL: https://issues.apache.org/jira/browse/SOLR-5866
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.7
Reporter: Steve Davids
 Fix For: 4.8, 5.0

 Attachments: SOLR-5866.patch


 The UpdateShardHandler configures it's own PoolingClientConnectionManager 
 which *doesn't* use the system default scheme registry factory which 
 interrogates the javax.net.ssl.* system properties to wire up the https 
 scheme into HttpClient. To ease configuration the system default registry 
 should be used.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-5866) UpdateShardHandler needs to use the system default scheme registry to properly handle https via javax.net.ssl.* properties

2014-03-15 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar resolved SOLR-5866.
-

   Resolution: Fixed
Fix Version/s: 5.0
 Assignee: Shalin Shekhar Mangar

Thanks Steve!

 UpdateShardHandler needs to use the system default scheme registry to 
 properly handle https via javax.net.ssl.* properties
 --

 Key: SOLR-5866
 URL: https://issues.apache.org/jira/browse/SOLR-5866
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.7
Reporter: Steve Davids
Assignee: Shalin Shekhar Mangar
 Fix For: 4.8, 5.0

 Attachments: SOLR-5866.patch


 The UpdateShardHandler configures it's own PoolingClientConnectionManager 
 which *doesn't* use the system default scheme registry factory which 
 interrogates the javax.net.ssl.* system properties to wire up the https 
 scheme into HttpClient. To ease configuration the system default registry 
 should be used.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4978) Spatial search with point query won't find identical indexed point

2014-03-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-4978:
-

Attachment: LUCENE-4978_fix_small_grid_false_negatives.patch

This patch addresses the issue simply by removing the optimization.  I did some 
performance tests with rects  circles and it was very minor, although I didn't 
test polygons which should have a greater effect.

While I was at it, I beefed up the tests further in ways that would have 
previously failed due to the false-negative.  I removed an older test: 
RecursivePrefixTreeTest.geohashRecursiveRandom() which is hard to maintain and 
is now obsoleted by SpatialOpRecursivePrefixTreeTest which now uses geohashes.

I'll commit this Monday.

 Spatial search with point query won't find identical indexed point
 --

 Key: LUCENE-4978
 URL: https://issues.apache.org/jira/browse/LUCENE-4978
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/spatial
Affects Versions: 4.1
Reporter: David Smiley
Assignee: David Smiley
Priority: Minor
 Fix For: 4.7

 Attachments: LUCENE-4978_fix_small_grid_false_negatives.patch


 Given a document with indexed POINT (10 20), when a search for INTERSECTS( 
 POINT (10 20)) is issued, no results are returned.
 The work-around is to not search with a point shape, use a very small-radius 
 circle or rectangle.  (I'm marking this issue as minor because it's easy to 
 do this).
 An unstated objective of the PrefixTree/grid approximation is that no matter 
 what precision you use, an intersects query will find all true-positives.  
 Due to approximations, it may also find some close false-positives.  But in 
 the case above, that unstated promise is violated.  But it can also happen 
 for query shapes other than points which do in fact barely enclose the point 
 given at index time yet the indexed point is in-effect shifted to the center 
 point of a cell which could be outside the query shape, and ultimately 
 leading to a false-negative.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4978) Spatial search with point query won't find identical indexed point

2014-03-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-4978:
-

Fix Version/s: (was: 4.7)
   4.8

 Spatial search with point query won't find identical indexed point
 --

 Key: LUCENE-4978
 URL: https://issues.apache.org/jira/browse/LUCENE-4978
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/spatial
Affects Versions: 4.1
Reporter: David Smiley
Assignee: David Smiley
Priority: Minor
 Fix For: 4.8

 Attachments: LUCENE-4978_fix_small_grid_false_negatives.patch


 Given a document with indexed POINT (10 20), when a search for INTERSECTS( 
 POINT (10 20)) is issued, no results are returned.
 The work-around is to not search with a point shape, use a very small-radius 
 circle or rectangle.  (I'm marking this issue as minor because it's easy to 
 do this).
 An unstated objective of the PrefixTree/grid approximation is that no matter 
 what precision you use, an intersects query will find all true-positives.  
 Due to approximations, it may also find some close false-positives.  But in 
 the case above, that unstated promise is violated.  But it can also happen 
 for query shapes other than points which do in fact barely enclose the point 
 given at index time yet the indexed point is in-effect shifted to the center 
 point of a cell which could be outside the query shape, and ultimately 
 leading to a false-negative.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3177) Excluding tagged filter in StatsComponent

2014-03-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936415#comment-13936415
 ] 

ASF subversion and git services commented on SOLR-3177:
---

Commit 1577976 from sha...@apache.org in branch 'dev/trunk'
[ https://svn.apache.org/r1577976 ]

SOLR-3177: Enable tagging and excluding filters in StatsComponent via the 
localParams syntax

 Excluding tagged filter in StatsComponent
 -

 Key: SOLR-3177
 URL: https://issues.apache.org/jira/browse/SOLR-3177
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Affects Versions: 3.5, 3.6, 4.0-ALPHA, 4.1
Reporter: Mathias H.
Assignee: Shalin Shekhar Mangar
Priority: Minor
  Labels: localparams, stats, statscomponent
 Attachments: SOLR-3177.patch, SOLR-3177.patch, SOLR-3177.patch


 It would be useful to exclude the effects of some fq params from the set of 
 documents used to compute stats -- similar to 
 how you can exclude tagged filters when generating facet counts... 
 https://wiki.apache.org/solr/SimpleFacetParameters#Tagging_and_excluding_Filters
 So that it's possible to do something like this... 
 http://localhost:8983/solr/select?fq={!tag=priceFilter}price:[1 TO 
 20]q=*:*stats=truestats.field={!ex=priceFilter}price 
 If you want to create a price slider this is very useful because then you can 
 filter the price ([1 TO 20) and nevertheless get the lower and upper bound of 
 the unfiltered price (min=0, max=100):
 {noformat}
 |-[---]--|
 $0 $1 $20$100
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3177) Excluding tagged filter in StatsComponent

2014-03-15 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936416#comment-13936416
 ] 

ASF subversion and git services commented on SOLR-3177:
---

Commit 1577977 from sha...@apache.org in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1577977 ]

SOLR-3177: Enable tagging and excluding filters in StatsComponent via the 
localParams syntax

 Excluding tagged filter in StatsComponent
 -

 Key: SOLR-3177
 URL: https://issues.apache.org/jira/browse/SOLR-3177
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Affects Versions: 3.5, 3.6, 4.0-ALPHA, 4.1
Reporter: Mathias H.
Assignee: Shalin Shekhar Mangar
Priority: Minor
  Labels: localparams, stats, statscomponent
 Attachments: SOLR-3177.patch, SOLR-3177.patch, SOLR-3177.patch


 It would be useful to exclude the effects of some fq params from the set of 
 documents used to compute stats -- similar to 
 how you can exclude tagged filters when generating facet counts... 
 https://wiki.apache.org/solr/SimpleFacetParameters#Tagging_and_excluding_Filters
 So that it's possible to do something like this... 
 http://localhost:8983/solr/select?fq={!tag=priceFilter}price:[1 TO 
 20]q=*:*stats=truestats.field={!ex=priceFilter}price 
 If you want to create a price slider this is very useful because then you can 
 filter the price ([1 TO 20) and nevertheless get the lower and upper bound of 
 the unfiltered price (min=0, max=100):
 {noformat}
 |-[---]--|
 $0 $1 $20$100
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-3177) Excluding tagged filter in StatsComponent

2014-03-15 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar resolved SOLR-3177.
-

   Resolution: Fixed
Fix Version/s: 5.0
   4.8

This will be released with Solr 4.8

Thank you all for the comments and upvotes and sorry that this took so much 
time. Thanks Nikolai and Vitaliy for the patches!

 Excluding tagged filter in StatsComponent
 -

 Key: SOLR-3177
 URL: https://issues.apache.org/jira/browse/SOLR-3177
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Affects Versions: 3.5, 3.6, 4.0-ALPHA, 4.1
Reporter: Mathias H.
Assignee: Shalin Shekhar Mangar
Priority: Minor
  Labels: localparams, stats, statscomponent
 Fix For: 4.8, 5.0

 Attachments: SOLR-3177.patch, SOLR-3177.patch, SOLR-3177.patch


 It would be useful to exclude the effects of some fq params from the set of 
 documents used to compute stats -- similar to 
 how you can exclude tagged filters when generating facet counts... 
 https://wiki.apache.org/solr/SimpleFacetParameters#Tagging_and_excluding_Filters
 So that it's possible to do something like this... 
 http://localhost:8983/solr/select?fq={!tag=priceFilter}price:[1 TO 
 20]q=*:*stats=truestats.field={!ex=priceFilter}price 
 If you want to create a price slider this is very useful because then you can 
 filter the price ([1 TO 20) and nevertheless get the lower and upper bound of 
 the unfiltered price (min=0, max=100):
 {noformat}
 |-[---]--|
 $0 $1 $20$100
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.7.0_51) - Build # 9800 - Failure!

2014-03-15 Thread Mark Miller
Hmm…I’ll check with Patrick Hunt and see if he has any thoughts on that logging 
warning.
-- 
Mark Miller
about.me/markrmiller

On March 15, 2014 at 5:38:51 PM, Uwe Schindler (u...@thetaphi.de) wrote:

No IO issues and it runs on SSD. Machine is also stable and has no SATA 
timeouts or similar stuff.

It is just a 3 year old server CPU and its running a Vbox VM in parallel.

Uwe

On 15. März 2014 21:31:10 MEZ, Mark Miller markrmil...@gmail.com wrote:
Hmm…only interesting logging I see is this:

57473 T32 oazsp.FileTxnLog.commit WARN fsync-ing the write ahead log in 
SyncThread:0 took 50531ms which will adversely effect operation latency. See 
the ZooKeeper troubleshooting guide
I wonder if that means that if i boost the connect timeout from 45 to 60 
seconds, it will pass.
Perhaps this machine has some IO issues?

-- 
Mark Miller
about.me/markrmiller

On March 15, 2014 at 9:23:25 AM, Policeman Jenkins Server (jenk...@thetaphi.de) 
wrote:

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/9800/
Java: 32bit/jdk1.7.0_51 -client -XX:+UseSerialGC

1 tests failed.
REGRESSION: 
org.apache.solr.client.solrj.impl.CloudSolrServerTest.testDistribSearch

Error Message:
java.util.concurrent.TimeoutException: Could not connect to ZooKeeper 
127.0.0.1:44565 within 45000 ms

Stack Trace:
org.apache.solr.common.SolrException: java.util.concurrent.TimeoutException: 
Could not connect to ZooKeeper 127.0.0.1:44565 within 45000 ms
at __randomizedtesting.SeedInfo.seed([D09CC97019C4AF45:517A47686E9BCF79]:0)
at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:150)
at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:101)
at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:91)
at 
org.apache.solr.cloud.AbstractZkTestCase.buildZooKeeper(AbstractZkTestCase.java:89)
at 
org.apache.solr.cloud.AbstractZkTestCase.buildZooKeeper(AbstractZkTestCase.java:83)
at 
org.apache.solr.cloud.AbstractDistribZkTestBase.setUp(AbstractDistribZkTestBase.java:70)
at 
org.apache.solr.cloud.AbstractFullDistribZkTestBase.setUp(AbstractFullDistribZkTestBase.java:201)
at 
org.apache.solr.client.solrj.impl.CloudSolrServerTest.setUp(CloudSolrServerTest.java:78)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1617)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:860)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:876)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:359)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:783)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:443)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:835)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:771)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:782)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 

[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment

2014-03-15 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936426#comment-13936426
 ] 

David Smiley commented on LUCENE-5527:
--

+1 I like it!

 Make the Collector API work per-segment
 ---

 Key: LUCENE-5527
 URL: https://issues.apache.org/jira/browse/LUCENE-5527
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Priority: Minor

 Spin-off of LUCENE-5299.
 LUCENE-5229 proposes different changes, some of them being controversial, but 
 there is one of them that I really really like that consists in refactoring 
 the {{Collector}} API in order to have a different Collector per segment.
 The idea is, instead of having a single Collector object that needs to be 
 able to take care of all segments, to have a top-level Collector:
 {code}
 public interface Collector {
   AtomicCollector setNextReader(AtomicReaderContext context) throws 
 IOException;
   
 }
 {code}
 and a per-AtomicReaderContext collector:
 {code}
 public interface AtomicCollector {
   void setScorer(Scorer scorer) throws IOException;
   void collect(int doc) throws IOException;
   boolean acceptsDocsOutOfOrder();
 }
 {code}
 I think it makes the API clearer since it is now obious {{setScorer}} and 
 {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is 
 otherwise unclear.
 It also makes things more flexible. For example, a collector could much more 
 easily decide to use different strategies on different segments. In 
 particular, it makes the early-termination collector much cleaner since it 
 can return different atomic collectors implementations depending on whether 
 the current segment is sorted or not.
 Even if we have lots of collectors all over the place, we could make it 
 easier to migrate by having a Collector that would implement both Collector 
 and AtomicCollector, return {{this}} in setNextReader and make current 
 concrete Collector implementations extend this class instead of directly 
 extending Collector.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5527) Make the Collector API work per-segment

2014-03-15 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936430#comment-13936430
 ] 

Shai Erera commented on LUCENE-5527:


Maybe we can get rid of setScorer, passing Scorer to 
{{setNextReader(AtomicReaderContext,Scorer)}}?

 Make the Collector API work per-segment
 ---

 Key: LUCENE-5527
 URL: https://issues.apache.org/jira/browse/LUCENE-5527
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Priority: Minor

 Spin-off of LUCENE-5299.
 LUCENE-5229 proposes different changes, some of them being controversial, but 
 there is one of them that I really really like that consists in refactoring 
 the {{Collector}} API in order to have a different Collector per segment.
 The idea is, instead of having a single Collector object that needs to be 
 able to take care of all segments, to have a top-level Collector:
 {code}
 public interface Collector {
   AtomicCollector setNextReader(AtomicReaderContext context) throws 
 IOException;
   
 }
 {code}
 and a per-AtomicReaderContext collector:
 {code}
 public interface AtomicCollector {
   void setScorer(Scorer scorer) throws IOException;
   void collect(int doc) throws IOException;
   boolean acceptsDocsOutOfOrder();
 }
 {code}
 I think it makes the API clearer since it is now obious {{setScorer}} and 
 {{acceptDocsOutOfOrder}} need to be called after {{setNextReader}} which is 
 otherwise unclear.
 It also makes things more flexible. For example, a collector could much more 
 easily decide to use different strategies on different segments. In 
 particular, it makes the early-termination collector much cleaner since it 
 can return different atomic collectors implementations depending on whether 
 the current segment is sorted or not.
 Even if we have lots of collectors all over the place, we could make it 
 easier to migrate by having a Collector that would implement both Collector 
 and AtomicCollector, return {{this}} in setNextReader and make current 
 concrete Collector implementations extend this class instead of directly 
 extending Collector.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5430) Suggesters should verify its index before loading it from disk

2014-03-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-5430:
-

Fix Version/s: (was: 4.7)
   4.8

 Suggesters should verify its index before loading it from disk
 --

 Key: LUCENE-5430
 URL: https://issues.apache.org/jira/browse/LUCENE-5430
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Affects Versions: 4.7, 5.0
Reporter: Areek Zillur
Assignee: Areek Zillur
 Fix For: 4.8, 5.0


 The issue was pointed out by Michael in the discussion on LUCENE-5404.
 The idea is to make all the suggesters use CodecUtils.writeHeader when they 
 are about to store their index on file and subsequently perform a check when 
 they are loaded.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5438) add near-real-time replication

2014-03-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-5438:
-

Fix Version/s: (was: 4.7)
   4.8

 add near-real-time replication
 --

 Key: LUCENE-5438
 URL: https://issues.apache.org/jira/browse/LUCENE-5438
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/replicator
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.8, 5.0

 Attachments: LUCENE-5438.patch, LUCENE-5438.patch


 Lucene's replication module makes it easy to incrementally sync index
 changes from a master index to any number of replicas, and it
 handles/abstracts all the underlying complexity of holding a
 time-expiring snapshot, finding which files need copying, syncing more
 than one index (e.g., taxo + index), etc.
 But today you must first commit on the master, and then again the
 replica's copied files are fsync'd, because the code operates on
 commit points.  But this isn't technically necessary, and it mixes
 up durability and fast turnaround time.
 Long ago we added near-real-time readers to Lucene, for the same
 reason: you shouldn't have to commit just to see the new index
 changes.
 I think we should do the same for replication: allow the new segments
 to be copied out to replica(s), and new NRT readers to be opened, to
 fully decouple committing from visibility.  This way apps can then
 separately choose when to replicate (for freshness), and when to
 commit (for durability).
 I think for some apps this could be a compelling alternative to the
 re-index all documents on each shard approach that Solr Cloud /
 ElasticSearch implement today, and it may also mean that the
 transaction log can remain external to / above the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5411) Upgrade to released JFlex 1.5.0

2014-03-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-5411:
-

Fix Version/s: (was: 4.7)
   4.8

 Upgrade to released JFlex 1.5.0
 ---

 Key: LUCENE-5411
 URL: https://issues.apache.org/jira/browse/LUCENE-5411
 Project: Lucene - Core
  Issue Type: Improvement
  Components: general/build
Reporter: Steve Rowe
Assignee: Steve Rowe
Priority: Minor
 Fix For: 4.8, 5.0

 Attachments: LUCENE-5411.patch


 The JFlex 1.5.0 release will be officially announced shortly.  The jar is 
 already on Maven Central.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5406) ShingleAnalyzerWrapper should expose the delegated analyzer as a public final

2014-03-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-5406:
-

Fix Version/s: (was: 4.7)
   4.8

 ShingleAnalyzerWrapper should expose the delegated analyzer as a public final
 -

 Key: LUCENE-5406
 URL: https://issues.apache.org/jira/browse/LUCENE-5406
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
 Fix For: 4.8, 5.0


 I'm sometimes given a ShingleAnalyzerWrapper that I would like to change the 
 shingle size on, so I need to create a new instance.  However, I don't always 
 know what the underlying analyzer is and I can't access it b/c it is a 
 protected method on a final class.  
 The solution here is to make the getAnalyzer method public final for the 
 ShingleAnalyzerWrapper.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5402) Add support for index-time pruning in Document*Dictionary (Suggester)

2014-03-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-5402:
-

Fix Version/s: (was: 4.7)
   4.8

 Add support for index-time pruning in Document*Dictionary (Suggester)
 -

 Key: LUCENE-5402
 URL: https://issues.apache.org/jira/browse/LUCENE-5402
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Reporter: Areek Zillur
 Fix For: 4.8, 5.0

 Attachments: LUCENE-5402.patch, LUCENE-5402.patch


 It would be nice to be able to prune out entries that the suggester consumes 
 by some query.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5417) Solr function query supports reading multiple values from a field.

2014-03-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-5417:
-

Fix Version/s: (was: 4.7)
   4.8

 Solr function query supports reading multiple values from a field.
 --

 Key: LUCENE-5417
 URL: https://issues.apache.org/jira/browse/LUCENE-5417
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/query/scoring
Affects Versions: 4.6
 Environment: N/A
Reporter: Peng Cheng
Priority: Minor
 Fix For: 4.8

 Attachments: MultiFieldCacheValueSources.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 Solr function query is a powerful tool to customize search criterion and 
 ranking function (http://wiki.apache.org/solr/FunctionQuery). However, it 
 cannot effectively benefit from field values from multi-valued field, namely, 
 the field(...) function can only read one value and discard the others.
 This limitation has been associated with FieldCacheSource, and the fact that 
 FieldCache cannot fetch multiple values from a field, but such constraint has 
 been largely lifted by LUCENE-3354, which allows multiple values to be 
 extracted from one field. Those values in turn can be used as parameters of 
 other functions to yield a single score.
 I personally find this limitation very unhandy when building a 
 learning-to-rank system that uses many cues and string features. Therefore I 
 would like to post this feature request and (hopefully) work on it myself.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5356) more generic lucene-morfologik integration

2014-03-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-5356:
-

Fix Version/s: (was: 4.7)
   4.8

 more generic lucene-morfologik integration
 --

 Key: LUCENE-5356
 URL: https://issues.apache.org/jira/browse/LUCENE-5356
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 4.6
Reporter: Michal Hlavac
Assignee: Dawid Weiss
Priority: Minor
  Labels: newbie, patch
 Fix For: 4.8, 5.0

 Attachments:  LUCENE-5356.patch, LUCENE-5356.patch, LUCENE-5356.patch


 I have little proposal for morfologik lucene module. Current module is 
 tightly coupled with polish DICTIONARY enumeration.
 But other people (like me) can build own dictionaries to FSA and use it with 
 lucene. 
 You can find proposal in attachment and also example usage in analyzer 
 (SlovakLemmaAnalyzer).
 It uses dictionary property as String resource from classpath, not 
 enumeration.
 One change is, that dictionary variable must be set in MofologikFilterFactory 
 (no default value).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5351) DirectoryReader#close can throw AlreadyClosedException if it's and NRT reader

2014-03-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-5351:
-

Fix Version/s: (was: 4.7)
   4.8

 DirectoryReader#close can throw AlreadyClosedException if it's and NRT reader
 -

 Key: LUCENE-5351
 URL: https://issues.apache.org/jira/browse/LUCENE-5351
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.6
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.8, 5.0

 Attachments: LUCENE-5351.patch


 in StandartDirectoryReader#doClose we do this:
 {noformat}
if (writer != null) {
   // Since we just closed, writer may now be able to
   // delete unused files:
   writer.deletePendingFiles();
 }
 {noformat}
 which can throw AlreadyClosedException from the directory if the Direcotory 
 has already closed. To me this looks like a bug and we should catch this 
 exception from the directory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5381) Lucene highlighter doesn't honor hl.fragsize; it appends all text for last fragment

2014-03-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-5381:
-

Fix Version/s: (was: 4.7)
   4.8

 Lucene highlighter doesn't honor hl.fragsize; it appends all text for last 
 fragment
 ---

 Key: LUCENE-5381
 URL: https://issues.apache.org/jira/browse/LUCENE-5381
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/highlighter
Affects Versions: 4.0, 4.6
Reporter: yuanyun.cn
Priority: Minor
  Labels: highlighter, lucene
 Fix For: 4.8, 5.0

 Attachments: LUCENE-5381.patch

   Original Estimate: 4h
  Remaining Estimate: 4h

 Recently, we hit a problem related with highlighter: I set hl.fragsize = 300, 
 but the highlight section for one document outputs more than 2000 characters.
 Look into the code, in 
 org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(TokenStream,
  String, boolean, int),  after the for loop, it appends whole remaining text 
 into last fragment.
 if (
   // if there is text beyond the last token considered..
   (lastEndOffset  text.length())
   
   // and that text is not too large...
   (text.length()= maxDocCharsToAnalyze)
   )
 {
   //append it to the last fragment
   newText.append(encoder.encodeText(text.substring(lastEndOffset)));
 }
 currentFrag.textEndPos = newText.length();
 This code is problematical, as in some cases, the last fragment is the most 
 relevant section and will be selected to return to client.
 I made some change to the code like below:  Now it works.
 //Test what remains of the original text beyond the point where we stopped 
 analyzing
 if(lastEndOffset  text.length())
 {
   if(textFragmenter instanceof SimpleFragmenter)
   {
   SimpleFragmenter simpleFragmenter = (SimpleFragmenter) 
 textFragmenter;
   int remain =simpleFragmenter.getFragmentSize() 
 -(newText.length() - currentFrag.textStartPos);
   if(remain  0 )
   {
   int endIndex = lastEndOffset + remain;
   if (endIndex  text.length()) {
   endIndex = text.length();
   }
   
 newText.append(encoder.encodeText(text.substring(lastEndOffset,
   endIndex)));
   }
   }
   else
   {
   
 newText.append(encoder.encodeText(text.substring(lastEndOffset)));
   }
 }
 currentFrag.textEndPos = newText.length();



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5350) Add Context Aware Suggester

2014-03-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-5350:
-

Fix Version/s: (was: 4.7)
   4.8

 Add Context Aware Suggester
 ---

 Key: LUCENE-5350
 URL: https://issues.apache.org/jira/browse/LUCENE-5350
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/search
Reporter: Areek Zillur
 Fix For: 4.8, 5.0

 Attachments: LUCENE-5350-benchmark.patch, 
 LUCENE-5350-benchmark.patch, LUCENE-5350.patch, LUCENE-5350.patch


 It would be nice to have a Context Aware Suggester (i.e. a suggester that 
 could return suggestions depending on some specified context(s)).
 Use-cases: 
   - location-based suggestions:
   -- returns suggestions which 'match' the context of a particular area
   --- suggest restaurants names which are in Palo Alto (context - 
 Palo Alto)
   - category-based suggestions:
   -- returns suggestions for items that are only in certain 
 categories/genres (contexts)
   --- suggest movies that are of the genre sci-fi and adventure 
 (context - [sci-fi, adventure])



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5056) Indexing non-point shapes close to the poles doesn't scale

2014-03-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-5056:
-

Fix Version/s: (was: 4.7)
   4.8

 Indexing non-point shapes close to the poles doesn't scale
 --

 Key: LUCENE-5056
 URL: https://issues.apache.org/jira/browse/LUCENE-5056
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/spatial
Affects Versions: 4.3
Reporter: Hal Deadman
Assignee: David Smiley
 Fix For: 4.8

 Attachments: indexed circle close to the pole.png


 From: [~hdeadman]
 We are seeing an issue where certain shapes are causing Solr to use up all 
 available heap space when a record with one of those shapes is indexed. We 
 were indexing polygons where we had the points going clockwise instead of 
 counter-clockwise and the shape would be so large that we would run out of 
 memory. We fixed those shapes but we are seeing this circle eat up about 
 700MB of memory before we get an OutOfMemory error (heap space) with a 1GB 
 JVM heap.
 Circle(3.0 90 d=0.0499542757922153)
 Google Earth can't plot that circle either, maybe it is invalid or too close 
 to the north pole due to the latitude of 90, but it would be nice if there 
 was a way for shapes to be validated before they cause an OOM error.
 The objects (4.5 million) are all GeohashPrefixTree$GhCell objects in an 
 ArrayList owned by PrefixTreeStrategy$CellTokenStream.
 Is there anyway to have a max number of cells in a shape before it is 
 considered too large and is not indexed? Is there a geo library that could 
 validate the shape as being reasonably sized and bounded before it is 
 processed?
 We are currently using Solr 4.1.
 fieldType name=location_rpt 
 class=solr.SpatialRecursivePrefixTreeFieldType
 spatialContextFactory=com.spatial4j.core.context.jts.JtsSpatialContextFactory
 geo=true distErrPct=0.025 maxDistErr=0.09 units=degrees /



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4872) BooleanWeight should decide how to execute minNrShouldMatch

2014-03-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-4872:
-

Fix Version/s: (was: 4.7)
   4.8

 BooleanWeight should decide how to execute minNrShouldMatch
 ---

 Key: LUCENE-4872
 URL: https://issues.apache.org/jira/browse/LUCENE-4872
 Project: Lucene - Core
  Issue Type: Sub-task
  Components: core/search
Reporter: Robert Muir
 Fix For: 4.8

 Attachments: crazyMinShouldMatch.tasks


 LUCENE-4571 adds a dedicated document-at-time scorer for minNrShouldMatch 
 which can use advance() behind the scenes. 
 In cases where you have some really common terms and some rare ones this can 
 be a huge performance improvement.
 On the other hand BooleanScorer might still be faster in some cases.
 We should think about what the logic should be here: one simple thing to do 
 is to always use the new scorer when minShouldMatch is set: thats where i'm 
 leaning. 
 But maybe we could have a smarter heuristic too, perhaps based on cost()



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5199) Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual DocValuesFormat used per-field

2014-03-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-5199:
-

Fix Version/s: (was: 4.7)
   4.8

 Improve LuceneTestCase.defaultCodecSupportsDocsWithField to check the actual 
 DocValuesFormat used per-field
 ---

 Key: LUCENE-5199
 URL: https://issues.apache.org/jira/browse/LUCENE-5199
 Project: Lucene - Core
  Issue Type: Improvement
  Components: general/test
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 4.8

 Attachments: LUCENE-5199.patch, LUENE-5199.patch


 On LUCENE-5178 Han reported the following test failure:
 {noformat}
 [junit4] FAILURE 0.27s | TestRangeAccumulator.testMissingValues 
[junit4] Throwable #1: org.junit.ComparisonFailure: expected:...(0)
[junit4]   less than 10 ([8)
[junit4]   less than or equal to 10 (]8)
[junit4]   over 90 (8)
[junit4]   9... but was:...(0)
[junit4]   less than 10 ([28)
[junit4]   less than or equal to 10 (2]8)
[junit4]   over 90 (8)
[junit4]   9...
[junit4]  at 
 __randomizedtesting.SeedInfo.seed([815B6AA86D05329C:EBC638EE498F066D]:0)
[junit4]  at 
 org.apache.lucene.facet.range.TestRangeAccumulator.testMissingValues(TestRangeAccumulator.java:670)
[junit4]  at java.lang.Thread.run(Thread.java:722)
 {noformat}
 which can be reproduced with
 {noformat}
 tcase=TestRangeAccumulator -Dtests.method=testMissingValues 
 -Dtests.seed=815B6AA86D05329C -Dtests.slow=true 
 -Dtests.postingsformat=Lucene41 -Dtests.locale=ca 
 -Dtests.timezone=Australia/Currie -Dtests.file.encoding=UTF-8
 {noformat}
 It seems that the Codec that is picked is a Lucene45Codec with 
 Lucene42DVFormat, which does not support docsWithFields for numericDV. We 
 should improve LTC.defaultCodecSupportsDocsWithField to take a list of fields 
 and check that the actual DVF used for each field supports it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4960) Require minimum ivy version

2014-03-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-4960:
-

Fix Version/s: (was: 4.7)
   4.8

 Require minimum ivy version
 ---

 Key: LUCENE-4960
 URL: https://issues.apache.org/jira/browse/LUCENE-4960
 Project: Lucene - Core
  Issue Type: Bug
  Components: general/build
Affects Versions: 4.2.1
Reporter: Shawn Heisey
Priority: Minor
 Fix For: 4.8


 Someone on solr-user ran into a problem while trying to run 'ant idea' so 
 they could work on Solr in their IDE.  [~steve_rowe] indicated that this is 
 probably due to IVY-1194, requiring an ivy jar upgrade.
 The build system should check for a minimum ivy version, just like it does 
 with ant.  The absolute minimum we require appears to be 2.2.0, but do we 
 want to make it 2.3.0 due to IVY-1388?
 I'm not sure how to go about checking the ivy version.  Checking the ant 
 version is easy because it's ant itself that does the checking.
 There might be other component versions that should be checked too.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4950) AssertingIndexSearcher isn't wrapping the Collector to AssertingCollector

2014-03-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-4950:
-

Fix Version/s: (was: 4.7)
   4.8

 AssertingIndexSearcher isn't wrapping the Collector to AssertingCollector
 -

 Key: LUCENE-4950
 URL: https://issues.apache.org/jira/browse/LUCENE-4950
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless
 Fix For: 4.8

 Attachments: LUCENE-4950.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5317) [PATCH] Concordance capability

2014-03-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-5317:
-

Fix Version/s: (was: 4.7)
   4.8

 [PATCH] Concordance capability
 --

 Key: LUCENE-5317
 URL: https://issues.apache.org/jira/browse/LUCENE-5317
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/search
Affects Versions: 4.5
Reporter: Tim Allison
  Labels: patch
 Fix For: 4.8

 Attachments: concordance_v1.patch.gz


 This patch enables a Lucene-powered concordance search capability.
 Concordances are extremely useful for linguists, lawyers and other analysts 
 performing analytic search vs. traditional snippeting/document retrieval 
 tasks.  By analytic search, I mean that the user wants to browse every time 
 a term appears (or at least the topn)  in a subset of documents and see the 
 words before and after.  
 Concordance technology is far simpler and less interesting than IR relevance 
 models/methods, but it can be extremely useful for some use cases.
 Traditional concordance sort orders are available (sort on words before the 
 target, words after, target then words before and target then words after).
 Under the hood, this is running SpanQuery's getSpans() and reanalyzing to 
 obtain character offsets.  There is plenty of room for optimizations and 
 refactoring.
 Many thanks to my colleague, Jason Robinson, for input on the design of this 
 patch.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4524) Merge DocsEnum and DocsAndPositionsEnum into PostingsEnum

2014-03-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-4524:
-

Fix Version/s: (was: 4.7)
   4.8

 Merge DocsEnum and DocsAndPositionsEnum into PostingsEnum
 -

 Key: LUCENE-4524
 URL: https://issues.apache.org/jira/browse/LUCENE-4524
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs, core/index, core/search
Affects Versions: 4.0
Reporter: Simon Willnauer
 Fix For: 4.8

 Attachments: LUCENE-4524.patch, LUCENE-4524.patch


 spinnoff from http://www.gossamer-threads.com/lists/lucene/java-dev/172261
 {noformat}
 hey folks, 
 I have spend a hell lot of time on the positions branch to make 
 positions and offsets working on all queries if needed. The one thing 
 that bugged me the most is the distinction between DocsEnum and 
 DocsAndPositionsEnum. Really when you look at it closer DocsEnum is a 
 DocsAndFreqsEnum and if we omit Freqs we should return a DocIdSetIter. 
 Same is true for 
 DocsAndPostionsAndPayloadsAndOffsets*YourFancyFeatureHere*Enum. I 
 don't really see the benefits from this. We should rather make the 
 interface simple and call it something like PostingsEnum where you 
 have to specify flags on the TermsIterator and if we can't provide the 
 sufficient enum we throw an exception? 
 I just want to bring up the idea here since it might simplify a lot 
 for users as well for us when improving our positions / offset etc. 
 support. 
 thoughts? Ideas? 
 simon 
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4943) remove 'Changes to Backwards Compatibility Policy' from lucene/CHANGES.txt

2014-03-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-4943:
-

Fix Version/s: (was: 4.7)
   4.8

 remove 'Changes to Backwards Compatibility Policy' from lucene/CHANGES.txt
 --

 Key: LUCENE-4943
 URL: https://issues.apache.org/jira/browse/LUCENE-4943
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 4.8


 CHANGES.txt is useful to summarize the changes in a release. 
 However its expected that a lot of changes will impact the APIs, this 
 currently hurts the quality of CHANGES.txt because it leads to a significant 
 portion of changes (whether they be bugs, features, whatever) being grouped 
 under this one title.
 It also leads to descriptions of CHANGES being unnecessarily verbose.
 I think it makes CHANGES confusing and overwhelming, and it would be better 
 to have a simpler 'upgrading' section with practical information on what you 
 actually need to do (like Solr's CHANGES.txt).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5288) Add ProxBooleanTermQuery, like BooleanQuery but boosting when term occur close together (in proximity) in each document

2014-03-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-5288:
-

Fix Version/s: (was: 4.7)
   4.8

 Add ProxBooleanTermQuery, like BooleanQuery but boosting when term occur 
 close together (in proximity) in each document
 -

 Key: LUCENE-5288
 URL: https://issues.apache.org/jira/browse/LUCENE-5288
 Project: Lucene - Core
  Issue Type: New Feature
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.8, 5.0

 Attachments: LUCENE-5288.patch, LUCENE-5288.patch, LUCENE-5288.patch, 
 LUCENE-5288.patch


 This is very much a work in progress, tons of nocommits...  It adds two 
 classes:
   * ProxBooleanTermQuery: like BooleanQuery (currently, all clauses
 must be TermQuery, and only Occur.SHOULD is supported), which is
 essentially a BooleanQuery (same matching/scoring) except for each
 matching docs the positions are merge-sorted and scored to boost
 the document's score
   * QueryRescorer: simple API to re-score top hits using a different
 query.  Because ProxBooleanTermQuery is so costly, apps would
 normally run an ordinary BooleanQuery across the full index, to
 get the top few hundred hits, and then rescore using the more
 costly ProxBooleanTermQuery (or other costly queries).
 I'm not sure how to actually compute the appropriate prox boost (this
 is the hard part!!) and I've completely punted on that in the current
 patch (it's just a hack now), but the patch does all the mechanics
 to merge/visit all the positions in order per hit.
 Maybe we could do the similar scoring that SpanNearQuery or sloppy
 PhraseQuery would do, or maybe this paper:
   http://plg.uwaterloo.ca/~claclark/sigir2006_term_proximity.pdf
 which Rob also used in LUCENE-4909 to add proximity scoring to
 PostingsHighlighter.  Maybe we need to make it (how the prox boost is
 computed/folded in) somehow pluggable ...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5024) Can we reliably detect an incomplete first commit vs index corruption?

2014-03-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-5024:
-

Fix Version/s: (was: 4.7)
   4.8

 Can we reliably detect an incomplete first commit vs index corruption?
 --

 Key: LUCENE-5024
 URL: https://issues.apache.org/jira/browse/LUCENE-5024
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Reporter: Michael McCandless
 Fix For: 4.8


 Normally, if something bad happens (OS, JVM, hardware crashes) while
 IndexWriter is committing, we will just fallback to the prior commit
 and no intervention necessary from the app.
 But if that commit is the first commit, then on restart IndexWriter
 will now throw CorruptIndexException, as of LUCENE-4738.
 Prior to LUCENE-4738, in LUCENE-2812, we used to try to detect the
 corrupt first commit, but that logic was dangerous and could result in
 falsely believing no index is present when one is, e.g. when transient
 IOExceptions are thrown due to file descriptor exhaustion.
 But now two users have hit this change ... see CorruptIndexException
 when opening Index during first commit and Calling
 IndexWriter.commit() immediately after creating the writer, both on
 java-user.
 It would be nice to get back to not marking an incomplete first commit
 as corruption ... but we have to proceed carefully.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5093) nightly-smoke should run some fail fast checks before doing the full smoke tester

2014-03-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-5093:
-

Fix Version/s: (was: 4.7)
   4.8

 nightly-smoke should run some fail fast checks before doing the full smoke 
 tester
 -

 Key: LUCENE-5093
 URL: https://issues.apache.org/jira/browse/LUCENE-5093
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: 4.8

 Attachments: LUCENE-5093.patch


 If something like the NOTICES fail the smoke tester, it currently takes 22 
 minutes to find out on my pretty fast machine. That means testing a fix also 
 takes 22 minutes.
 It would be nice if some of these types of checks happened right away on the 
 src tree - we should also check the actual artifacts with the same check 
 later - but also have this fail fast path.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4813) Allow DirectSpellchecker to use totalTermFrequency rather than docFrequency

2014-03-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-4813:
-

Fix Version/s: (was: 4.7)
   4.8

 Allow DirectSpellchecker to use totalTermFrequency rather than docFrequency
 ---

 Key: LUCENE-4813
 URL: https://issues.apache.org/jira/browse/LUCENE-4813
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/spellchecker
Affects Versions: 4.1
Reporter: Simon Willnauer
 Fix For: 4.8

 Attachments: LUCENE-4813.patch, LUCENE-4813.patch


 we have a bunch of new statistics in on our term dictionaries that we should 
 make use of where it makes sense. For DirectSpellChecker totalTermFreq and 
 sumTotalTermFreq might be better suited for spell correction on top of a 
 fulltext index than docFreq and maxDoc



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4491) Make analyzing suggester more flexible

2014-03-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-4491:
-

Fix Version/s: (was: 4.7)
   4.8

 Make analyzing suggester more flexible
 --

 Key: LUCENE-4491
 URL: https://issues.apache.org/jira/browse/LUCENE-4491
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/other
Affects Versions: 4.1
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.8

 Attachments: LUCENE-4491.patch, LUCENE-4491.patch


 Today we have a analyzing suggester that is bound to a single key. Yet, if 
 you want to have a totally different surface form compared to the key used to 
 find the suggestion you either have to copy the code or play some super ugly 
 analyzer tricks. For example I want to suggest Barbar Streisand if somebody 
 types strei in that case the surface form is totally different from the 
 analyzed form. 
 Even one step further I want to embed some meta-data in the suggested key 
 like a user id or some type my surface form could look like Barbar 
 Streisand|15. Ideally I want to encode this as binary and that might not be 
 a valid UTF-8 byte sequence.
 I'm actually doing this in production and my only option was to copy the 
 analyzing suggester and some of it's related classes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5318) Co-occurrence counts from Concordance

2014-03-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-5318:
-

Fix Version/s: (was: 4.7)
   4.8

 Co-occurrence counts from Concordance
 -

 Key: LUCENE-5318
 URL: https://issues.apache.org/jira/browse/LUCENE-5318
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/search
Affects Versions: 4.5
Reporter: Tim Allison
  Labels: patch
 Fix For: 4.8

 Attachments: cooccur_v1.patch.gz


 This patch calculates co-occurrence statistics on search terms within a 
 window of x tokens.  This can help in synonym discovery and anywhere else 
 co-occurrence stats have been used.
 The attached patch depends on LUCENE-5317.
 Again, many thanks to my colleague, Jason Robinson, for advice in developing 
 this code and for his modifications to this code to make it more 
 Solr-friendly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4734) FastVectorHighlighter Overlapping Proximity Queries Do Not Highlight

2014-03-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-4734:
-

Fix Version/s: (was: 4.7)
   4.8

 FastVectorHighlighter Overlapping Proximity Queries Do Not Highlight
 

 Key: LUCENE-4734
 URL: https://issues.apache.org/jira/browse/LUCENE-4734
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/highlighter
Affects Versions: 4.0, 4.1, 5.0
Reporter: Ryan Lauck
  Labels: fastvectorhighlighter, highlighter
 Fix For: 4.8

 Attachments: LUCENE-4734-2.patch, LUCENE-4734.patch, lucene-4734.patch


 If a proximity phrase query overlaps with any other query term it will not be 
 highlighted.
 Example Text:  A B C D E F G
 Example Queries: 
 B E~10 D
 (D will be highlighted instead of B C D E)
 B E~10 C F~10
 (nothing will be highlighted)
 This can be traced to the FieldPhraseList constructor's inner while loop. 
 From the first example query, the first TermInfo popped off the stack will be 
 B. The second TermInfo will be D which will not be found in the submap 
 for B E~10 and will trigger a failed match.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4746) Create a move method in Directory.

2014-03-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-4746:
-

Fix Version/s: (was: 4.7)
   4.8

 Create a move method in Directory.
 --

 Key: LUCENE-4746
 URL: https://issues.apache.org/jira/browse/LUCENE-4746
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.8

 Attachments: LUCENE-4746.patch


 I'd like to make a move method for directory.
 We already have a move for Solr in DirectoryFactory, but it seems it belongs 
 at the directory level really.
 The default impl can do a copy and delete, but most implementations will be 
 able to optimize to a rename.
 Besides the move we do for Solr (to move a replicated index into place), it 
 would also be useful for another feature I'd like to add - the ability to 
 merge an index with moves rather than copies. In some cases, you don't 
 need/want to copy all the files and could just rename/move them. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4281) Delegate to default thread factory in NamedThreadFactory

2014-03-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-4281:
-

Fix Version/s: (was: 4.7)
   4.8

 Delegate to default thread factory in NamedThreadFactory
 

 Key: LUCENE-4281
 URL: https://issues.apache.org/jira/browse/LUCENE-4281
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 3.6.1, 4.0-BETA, 5.0
Reporter: Simon Willnauer
Priority: Minor
 Fix For: 4.8

 Attachments: LUCENE-4281.patch


 currently we state that we yield the same behavior as 
 Executors#defaultThreadFactory() but this behavior could change over time 
 even if it is compatible. We should just delegate to the default thread 
 factory instead of creating the threads ourself.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4823) Add a separate registration singleton for Lucene's SPI, so there is only one central instance to request rescanning of classpath (e.g. from Solr's ResourceLoader)

2014-03-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-4823:
-

Fix Version/s: (was: 4.7)
   4.8

 Add a separate registration singleton for Lucene's SPI, so there is only 
 one central instance to request rescanning of classpath (e.g. from Solr's 
 ResourceLoader)
 

 Key: LUCENE-4823
 URL: https://issues.apache.org/jira/browse/LUCENE-4823
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 4.8


 Currently there is no easy way to do a global rescan/reload of all of 
 Lucene's SPIs in the right order. In solr there is a long list of reload 
 instructions in the ResourceLoader. If somebody adds a new SPI type, you have 
 to add it there.
 It would be good to java a central instance in oal.util that keeps track of 
 all NamedSPILoaders and AnalysisSPILoaders (in the order they were 
 instantiated), so you have one central entry point to trigger a reload.
 This issue will introduce:
 - A singleton that makes reloading possible. The singleton keeps weak refs to 
 all loaders (of any kind) in the order they were created.
 - NamedSPILoader and AnalysisSPILoader cleanup (unfortunately we need both 
 instances, as they differ in the internals (one keeps classes, the other one 
 instances). Both should implement a reloadable interface.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5310) Merge Threads unnecessarily block on SerialMergeScheduler

2014-03-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-5310:
-

Fix Version/s: (was: 4.7)
   4.8

 Merge Threads unnecessarily block on SerialMergeScheduler
 -

 Key: LUCENE-5310
 URL: https://issues.apache.org/jira/browse/LUCENE-5310
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.5, 5.0
Reporter: Simon Willnauer
Priority: Minor
 Fix For: 4.8, 5.0

 Attachments: LUCENE-5310.patch, LUCENE-5310.patch


 I have been working on a high level merge multiplexer that shares threads 
 across different IW instances and I came across the fact that 
 SerialMergeScheduler actually blocks incoming thread is a merge in going on. 
 Yet this blocks threads unnecessarily since we pull the merges in a loop 
 anyway. We should use a tryLock operation instead of syncing the entire 
 method?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4803) DrillDownQuery should rewrite to FilteredQuery?

2014-03-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-4803:
-

Fix Version/s: (was: 4.7)
   4.8

 DrillDownQuery should rewrite to FilteredQuery?
 ---

 Key: LUCENE-4803
 URL: https://issues.apache.org/jira/browse/LUCENE-4803
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless
 Fix For: 4.8


 Today we rewrite to a query like +baseQuery +ConstantScoreQuery(boost=0.0 
 TermQuery(drillDownTerm)), but I'm not certain 0.0 boost is safe / doesn't 
 actually change scores.
 We should also add a test to assert that scores are not changed by drill down.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4630) add a system property to allow testing of suspicious stuff

2014-03-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-4630:
-

Fix Version/s: (was: 4.7)
   4.8

 add a system property to allow testing of suspicious stuff
 --

 Key: LUCENE-4630
 URL: https://issues.apache.org/jira/browse/LUCENE-4630
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Hoss Man
 Fix For: 4.8


 there are times when people want to add assumptions in test to prevent 
 confusing/false failures in certain situations (eg: known bugs in JVM X, 
 known incompatibilities between lucene feature Z and filesystem Y, etc...)
 By default we want these situations to be skiped in tests with clear 
 messages so that it's clear to end users trying out releases that these tests 
 can't be run for specific sitautions.
 But at the same time we need a way for developers to be able to try running 
 these tests anyway so we know if/when the underliyng problem is resolved.
 i propose we add a tests.suspicious.shit system property, which defaults to 
 false in the javacode, but can be set at runtime to true
 assumptions about things like incompatibilities with OSs, JVM vendors, JVM 
 versions, filesystems, etc.. can all be dependent on this system propery.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4526) Allow runtime settings on Codecs

2014-03-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-4526:
-

Fix Version/s: (was: 4.7)
   4.8

 Allow runtime settings on Codecs
 

 Key: LUCENE-4526
 URL: https://issues.apache.org/jira/browse/LUCENE-4526
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/codecs
Affects Versions: 4.0
Reporter: Simon Willnauer
 Fix For: 4.8

 Attachments: LUCENE-4526.patch


 Today we expose termIndexInterval and termIndexDivisor via several APIs and 
 they are deprecated. Those settings are 1. codec / postingformat specific and 
 2. not extendable. We should provide a more flexible way to pass information 
 down to our codecs.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5326) Add enum facet method to Lucene facet module

2014-03-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-5326:
-

Fix Version/s: (was: 4.7)
   4.8

 Add enum facet method to Lucene facet module
 

 Key: LUCENE-5326
 URL: https://issues.apache.org/jira/browse/LUCENE-5326
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.8, 5.0

 Attachments: LUCENE-5326.patch


 I've been testing Solr facet performance, and the enum method works
 very well for low cardinality (not many unique values) fields.  So I
 think we should fold a similar option into Lucene's facet module.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4835) Raise maxClauseCount in BooleanQuery to Integer.MAX_VALUE

2014-03-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-4835:
-

Fix Version/s: (was: 4.7)
   4.8

 Raise maxClauseCount in BooleanQuery to Integer.MAX_VALUE
 -

 Key: LUCENE-4835
 URL: https://issues.apache.org/jira/browse/LUCENE-4835
 Project: Lucene - Core
  Issue Type: Improvement
Affects Versions: 4.2
Reporter: Shawn Heisey
 Fix For: 4.8


 Discussion on SOLR-4586 raised the idea of raising the limit on boolean 
 clauses from 1024 to Integer.MAX_VALUE.  This should be a safe change.  It 
 will change the nature of help requests from Why can't I do 2000 clauses? 
 to Why is my 5000-clause query slow?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3997) join module should not depend on grouping module

2014-03-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-3997:
-

Fix Version/s: (was: 4.7)
   4.8

 join module should not depend on grouping module
 

 Key: LUCENE-3997
 URL: https://issues.apache.org/jira/browse/LUCENE-3997
 Project: Lucene - Core
  Issue Type: Task
Affects Versions: 4.0-ALPHA
Reporter: Robert Muir
 Fix For: 4.8

 Attachments: LUCENE-3997.patch, LUCENE-3997.patch


 I think TopGroups/GroupDocs should simply be in core? 
 Both grouping and join modules use these trivial classes, but join depends on 
 grouping just for them.
 I think its better that we try to minimize these inter-module dependencies.
 Of course, another option is to combine grouping and join into one module, but
 last time i brought that up nobody could agree on a name. 
 Anyway I think the change is pretty clean: its similar to having basic stuff 
 like Analyzer.java in core,
 so other things can work with Analyzer without depending on any specific 
 implementing modules.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4954) LuceneTestFramework fails to catch temporary FieldCache insanity

2014-03-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-4954:
-

Fix Version/s: (was: 4.7)
   4.8

 LuceneTestFramework fails to catch temporary FieldCache insanity
 

 Key: LUCENE-4954
 URL: https://issues.apache.org/jira/browse/LUCENE-4954
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Michael McCandless
 Fix For: 4.8


 Ever since we added readerClosedListeners to evict FieldCache entries, LTC 
 will no longer detect insanity as long as the test closes all readers leading 
 to insanity ...
 So this has weakened our testing of catching accidental insanity producing 
 code.
 To fix this I think we could tap into FieldCacheImpl.setInfoStream ... and 
 ensure the test didn't print anything to it.
 This was a spinoff from LUCENE-4953, where that test 
 (AllGroupHeadsCollectorTest) is always producing insanity, but then because 
 of a bug the FC eviction wasn't working right, and LTC then detected the 
 insanity.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5130) fail the build on compilation warnings

2014-03-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-5130:
-

Fix Version/s: (was: 4.7)
   4.8

 fail the build on compilation warnings
 --

 Key: LUCENE-5130
 URL: https://issues.apache.org/jira/browse/LUCENE-5130
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
 Fix For: 4.8

 Attachments: LUCENE-5130.patch, LUCENE-5130.patch


 Many modules compile w/o warnings ... we should lock this in and fail the 
 build if warnings are ever added, and try to fix the warnings in existing 
 modules.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4121) Standardize ramBytesUsed/sizeInBytes/memSize

2014-03-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-4121:
-

Fix Version/s: (was: 4.7)
   4.8

 Standardize ramBytesUsed/sizeInBytes/memSize
 

 Key: LUCENE-4121
 URL: https://issues.apache.org/jira/browse/LUCENE-4121
 Project: Lucene - Core
  Issue Type: Task
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Fix For: 4.8

 Attachments: LUCENE-4121.patch


 We should standardize the names of the methods we use to estimate the sizes 
 of objects in memory and on disk. (cf. discussion on dev@lucene 
 http://search-lucene.com/m/VbXSx1BP60G).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3843) implement PositionLengthAttribute for all tokenstreams where its appropriate

2014-03-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-3843:
-

Fix Version/s: (was: 4.7)
   4.8

 implement PositionLengthAttribute for all tokenstreams where its appropriate
 

 Key: LUCENE-3843
 URL: https://issues.apache.org/jira/browse/LUCENE-3843
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Robert Muir
 Fix For: 4.8


 LUCENE-3767 introduces PositionLengthAttribute, which extends the tokenstream 
 API
 from a sausage to a real graph. 
 Currently tokenstreams such as WordDelimiterFilter and SynonymsFilter 
 theoretically
 work at a graph level, but then serialize themselves to a sausage, for 
 example:
 wi-fi with WDF creates:
 wi(posinc=1), fi(posinc=1), wifi(posinc=0)
 So the lossiness is that the 'wifi' is simply stacked ontop of 'fi'
 PositionLengthAttribute fixes this by allowing a token to declare how far it 
 spans,
 so we don't lose any information.
 While the indexer currently can only support sausages anyway (and for 
 performance reasons,
 this is probably just fine!), other tokenstream consumers such as 
 queryparsers and suggesters
 such as LUCENE-3842 can actually make use of this information for better 
 behavior.
 So I think its ideal if the TokenStream API doesn't reflect the lossiness of 
 the index format,
 but instead keeps all information, and after LUCENE-3767 is committed we 
 should fix tokenstreams
 to preserve this information for consumers that can use it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3451) Remove special handling of pure negative Filters in BooleanFilter, disallow pure negative queries in BooleanQuery

2014-03-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-3451:
-

Fix Version/s: (was: 4.7)
   4.8

 Remove special handling of pure negative Filters in BooleanFilter, disallow 
 pure negative queries in BooleanQuery
 -

 Key: LUCENE-3451
 URL: https://issues.apache.org/jira/browse/LUCENE-3451
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 4.8

 Attachments: LUCENE-3451.patch, LUCENE-3451.patch, LUCENE-3451.patch, 
 LUCENE-3451.patch, LUCENE-3451.patch


 We should at least in Lucene 4.0 remove the hack in BooleanFilter that allows 
 pure negative Filter clauses. This is not supported by BooleanQuery and 
 confuses users (I think that's the problem in LUCENE-3450).
 The hack is buggy, as it does not respect deleted documents and returns them 
 in its DocIdSet.
 Also we should think about disallowing pure-negative Queries at all and throw 
 UOE.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4545) Better error reporting StemmerOverrideFilterFactory

2014-03-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-4545:
-

Fix Version/s: (was: 4.7)
   4.8

 Better error reporting StemmerOverrideFilterFactory
 ---

 Key: LUCENE-4545
 URL: https://issues.apache.org/jira/browse/LUCENE-4545
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 4.0
Reporter: Markus Jelsma
Priority: Trivial
 Fix For: 4.8

 Attachments: LUCENE-4545-trunk-1.patch


 If the dictionary contains an error such as a space instead of a tab 
 somewhere in the dictionary it is hard to find the error in a long 
 dictionary. This patch includes the file and line number in the exception, 
 helping to debug it quickly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4246) Fix IndexWriter.close() to not commit or wait for pending merges

2014-03-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-4246:
-

Fix Version/s: (was: 4.7)
   4.8

 Fix IndexWriter.close() to not commit or wait for pending merges
 

 Key: LUCENE-4246
 URL: https://issues.apache.org/jira/browse/LUCENE-4246
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 4.8






--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4382) Unicode escape no longer works for non-suffix-only wildcard terms

2014-03-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-4382:
-

Fix Version/s: (was: 4.7)
   4.8

 Unicode escape no longer works for non-suffix-only wildcard terms
 -

 Key: LUCENE-4382
 URL: https://issues.apache.org/jira/browse/LUCENE-4382
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/queryparser
Affects Versions: 4.0-BETA
Reporter: Jack Krupansky
 Fix For: 4.8


 LUCENE-588 added support for escaping of wildcard characters, but when the 
 de-escaping logic was pushed down from the query parser (QueryParserBase) 
 into WildcardQuery, support for Unicode escaping (backslash, u, and the 
 four-digit hex Unicode code) was not included.
 Two solutions:
 1. Do the Unicode de-escaping in the query parser before calling 
 getWildcardQuery.
 2. Support Unicode de-escaping in WildcardQuery.
 A suffix-only wildcard does not exhibit this problem because full de-escaping 
 is performed in the query parser before calling getPrefixQuery.
 My test case, added at the beginning of 
 TestExtendedDismaxParser.testFocusQueryParser:
 {code}
 assertQ(expected doc is missing (using escaped edismax w/field),
 req(q, t_special:literal\\:\\u0063olo*n, 
 defType, edismax),
 //doc[1]/str[@name='id'][.='46']); 
 {code}
 Note: That test case was only used to debug into WildcardQuery to see that 
 the Unicode escape was not processed correctly. It fails in all cases, but 
 that's because of how the field type is analyzed.
 Here is a Lucene-level test case that can also be debugged to see that 
 WildcardQuery is not processing the Unicode escape properly. I added it at 
 the start of TestMultiAnalyzer.testMultiAnalyzer:
 {code}
 assertEquals(literal\\:\\u0063olo*n, 
 qp.parse(literal\\:\\u0063olo*n).toString());
 {code}
 Note: This case will always run correctly since it is only checking the input 
 pattern string for WildcardQuery and not how the de-escaping was performed 
 within WildcardQuery.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4159) Code review before 4.0 release

2014-03-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-4159:
-

Fix Version/s: (was: 4.7)
   4.8

 Code review before 4.0 release
 --

 Key: LUCENE-4159
 URL: https://issues.apache.org/jira/browse/LUCENE-4159
 Project: Lucene - Core
  Issue Type: Task
Reporter: Tommaso Teofili
Priority: Minor
 Fix For: 4.8


 Before the 4.0 release I think it makes sense to plan for a (Lucene and Solr) 
 comprehensive code review in order to improve APIs, performance and code 
 style.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3978) redo how our download redirect pages work

2014-03-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-3978:
-

Fix Version/s: (was: 4.7)
   4.8

 redo how our download redirect pages work
 -

 Key: LUCENE-3978
 URL: https://issues.apache.org/jira/browse/LUCENE-3978
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Hoss Man
 Fix For: 4.8


 the download latest redirect pages are kind of a pain to change when we 
 release a new version...
 http://lucene.apache.org/core/mirrors-core-latest-redir.html
 http://lucene.apache.org/solr/mirrors-solr-latest-redir.html



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4688) Reuse TermsEnum in BlockTreeTermsReader

2014-03-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-4688:
-

Fix Version/s: (was: 4.7)
   4.8

 Reuse TermsEnum in BlockTreeTermsReader
 ---

 Key: LUCENE-4688
 URL: https://issues.apache.org/jira/browse/LUCENE-4688
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs
Affects Versions: 4.0, 4.1
Reporter: Simon Willnauer
 Fix For: 4.8

 Attachments: LUCENE-4688.patch


 Opening a TermsEnum comes with a significant cost at this point if done 
 frequently like primary key lookups or if many segments are present. 
 Currently we don't reuse it at all and create a lot of objects even if the 
 enum is just used for a single seekExact (ie. TermQuery). Stressing the 
 Terms#iterator(reuse) call shows significant gains with reuse...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3610) Revamp spatial APIs that use primitives (or arrays of primitives) in their args/results so that they use strongly typed objects

2014-03-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-3610:
-

Fix Version/s: (was: 4.7)
   4.8

 Revamp spatial APIs that use primitives (or arrays of primitives) in their 
 args/results so that they use strongly typed objects
 ---

 Key: LUCENE-3610
 URL: https://issues.apache.org/jira/browse/LUCENE-3610
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spatial
Reporter: Hoss Man
 Fix For: 4.8


 My spatial awareness is pretty meek, but LUCENE-3599 seems like a prime 
 example of the types of mistakes that are probably really easy to make with 
 all of the Spatial related APIs that deal with arrays (or sequences) of 
 doubles where specific indexes of those arrays (or sequences) have 
 significant meaning: mainly latitude vs longitude.
 We should probably reconsider any method that takes in double[] or multiple 
 doubles to express latlon pairs and rewrite them to use the existing LatLng 
 class -- or if people think that class is too heavyweight, then add a new 
 lightweight class to handle the strong typing of a basic latlon point instead 
 of just passing around a double[2] or two doubles called x and y ...
 {code}
 public static final class SimpleLatLonPointInRadians {
   public double latitude;
   public double longitude;
 }
 {code}
 ...then all those various methods that expect lat+lon pairs in radians (like 
 DistanceUtils.haversine, DistanceUtils.normLat, DistanceUtils.normLng, 
 DistanceUtils.pointOnBearing, DistanceUtils.latLonCorner, etc...) can start 
 having APIs that don't make your eyes bleed when you start trying to 
 understand what order the args go in.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3888) split off the spell check word and surface form in spell check dictionary

2014-03-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-3888:
-

Fix Version/s: (was: 4.7)
   4.8

 split off the spell check word and surface form in spell check dictionary
 -

 Key: LUCENE-3888
 URL: https://issues.apache.org/jira/browse/LUCENE-3888
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spellchecker
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 4.8

 Attachments: LUCENE-3888.patch, LUCENE-3888.patch, LUCENE-3888.patch, 
 LUCENE-3888.patch, LUCENE-3888.patch, LUCENE-3888.patch


 The did you mean? feature by using Lucene's spell checker cannot work well 
 for Japanese environment unfortunately and is the longstanding problem, 
 because the logic needs comparatively long text to check spells, but for some 
 languages (e.g. Japanese), most words are too short to use the spell checker.
 I think, for at least Japanese, the things can be improved if we split off 
 the spell check word and surface form in the spell check dictionary. Then we 
 can use ReadingAttribute for spell checking but CharTermAttribute for 
 suggesting, for example.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3912) Improved the checked-in tiny line file docs

2014-03-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-3912:
-

Fix Version/s: (was: 4.7)
   4.8

 Improved the checked-in tiny line file docs
 ---

 Key: LUCENE-3912
 URL: https://issues.apache.org/jira/browse/LUCENE-3912
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
 Fix For: 4.8


 I think it may not have any surrogate pairs (it was derived from 
 Europarl).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4556) FuzzyTermsEnum creates tons of objects

2014-03-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-4556:
-

Fix Version/s: (was: 4.7)
   4.8

 FuzzyTermsEnum creates tons of objects
 --

 Key: LUCENE-4556
 URL: https://issues.apache.org/jira/browse/LUCENE-4556
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search, modules/spellchecker
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Priority: Critical
 Fix For: 4.8

 Attachments: LUCENE-4556.patch, LUCENE-4556.patch


 I ran into this problem in production using the DirectSpellchecker. The 
 number of objects created by the spellchecker shoot through the roof very 
 very quickly. We ran about 130 queries and ended up with  2M transitions / 
 states. We spend 50% of the time in GC just because of transitions. Other 
 parts of the system behave just fine here.
 I talked quickly to robert and gave a POC a shot providing a 
 LevenshteinAutomaton#toRunAutomaton(prefix, n) method to optimize this case 
 and build a array based strucuture converted into UTF-8 directly instead of 
 going through the object based APIs. This involved quite a bit of changes but 
 they are all package private at this point. I have a patch that still has a 
 fair set of nocommits but its shows that its possible and IMO worth the 
 trouble to make this really useable in production. All tests pass with the 
 patch - its a start



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4731) New ReplicatingDirectory mirrors index files to HDFS

2014-03-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-4731:
-

Fix Version/s: (was: 4.7)
   4.8

 New ReplicatingDirectory mirrors index files to HDFS
 

 Key: LUCENE-4731
 URL: https://issues.apache.org/jira/browse/LUCENE-4731
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/store
Reporter: David Arthur
 Fix For: 4.8

 Attachments: ReplicatingDirectory.java


 I've been working on a Directory implementation that mirrors the index files 
 to HDFS (or other Hadoop supported FileSystem).
 A ReplicatingDirectory delegates all calls to an underlying Directory 
 (supplied in the constructor). The only hooks are the deleteFile and sync 
 calls. We submit deletes and replications to a single scheduler thread to 
 keep things serializer. During a sync call, if segments.gen is seen in the 
 list of files, we know a commit is finishing. After calling the deletage's 
 sync method, we initialize an asynchronous replication as follows.
 * Read segments.gen (before leaving ReplicatingDirectory#sync), save the 
 values for later
 * Get a list of local files from ReplicatingDirectory#listAll before leaving 
 ReplicatingDirectory#sync
 * Submit replication task (DirectoryReplicator) to scheduler thread
 * Compare local files to remote files, determine which remote files get 
 deleted, and which need to get copied
 * Submit a thread to copy each file (one thead per file)
 * Submit a thread to delete each file (one thead per file)
 * Submit a finalizer thread. This thread waits on the previous two batches 
 of threads to finish. Once finished, this thread generates a new 
 segments.gen remotely (using the version and generation number previously 
 read in).
 I have no idea where this would belong in the Lucene project, so i'll just 
 attach the standalone class instead of a patch. It introduces dependencies on 
 Hadoop core (and all the deps that brings with it).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3797) 3xCodec should throw UOE if a DocValuesConsumer is pulled

2014-03-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-3797:
-

Fix Version/s: (was: 4.7)
   4.8

 3xCodec should throw UOE if a DocValuesConsumer is pulled 
 --

 Key: LUCENE-3797
 URL: https://issues.apache.org/jira/browse/LUCENE-3797
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs, core/index
Affects Versions: 4.0-ALPHA
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.8

 Attachments: LUCENE-3797.patch, LUCENE-3797.patch


 currently we just return null if a DVConsumer is pulled from 3.x which is 
 trappy since it causes an NPE in DocFieldProcessor. We should rather throw a 
 UOE.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



  1   2   3   4   5   6   7   >