date:20140130


 [ 
https://issues.apache.org/jira/browse/LUCENE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss reassigned LUCENE-5421:
---

Assignee: Dawid Weiss

 MorfologicFilter doesn't stem properly invented names: surnames, rivers, etc.
 -

 Key: LUCENE-5421
 URL: https://issues.apache.org/jira/browse/LUCENE-5421
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 4.6.1
Reporter: Grzegorz Sobczyk
Assignee: Dawid Weiss

 Morfologic filter search stems in input or lowercase format:
 org.apache.lucene.analysis.morfologik.MorfologikFilter.incrementToken()
 {code}
 if (!keywordAttr.isKeyword()  (lookupSurfaceForm(termAtt) || 
 lookupSurfaceForm(toLowercase(termAtt {
   [...]
 }
 {code}
 In this situation, if input token is *sienkiewicza* - it isn't stemmed
 but: *Sienkiewicza* -- *Sienkiewicz*
 for comparison:
 *pRoDuKtY* -- *produkt*
 It should stem also input token with capitalized first letter



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5421) MorfologicFilter doesn't stem legitimate uppercase terms (surnames, proper nouns, etc.)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated LUCENE-5421:


Summary: MorfologicFilter doesn't stem legitimate uppercase terms 
(surnames, proper nouns, etc.)  (was: MorfologicFilter doesn't stem properly 
invented names: surnames, rivers, etc.)

 MorfologicFilter doesn't stem legitimate uppercase terms (surnames, proper 
 nouns, etc.)
 ---

 Key: LUCENE-5421
 URL: https://issues.apache.org/jira/browse/LUCENE-5421
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 4.6.1
Reporter: Grzegorz Sobczyk
Assignee: Dawid Weiss
Priority: Minor

 Morfologic filter search stems in input or lowercase format:
 org.apache.lucene.analysis.morfologik.MorfologikFilter.incrementToken()
 {code}
 if (!keywordAttr.isKeyword()  (lookupSurfaceForm(termAtt) || 
 lookupSurfaceForm(toLowercase(termAtt {
   [...]
 }
 {code}
 In this situation, if input token is *sienkiewicza* - it isn't stemmed
 but: *Sienkiewicza* -- *Sienkiewicz*
 for comparison:
 *pRoDuKtY* -- *produkt*
 It should stem also input token with capitalized first letter



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5421) MorfologicFilter doesn't stem properly invented names: surnames, rivers, etc.


 [ 
https://issues.apache.org/jira/browse/LUCENE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated LUCENE-5421:


Priority: Minor  (was: Major)

 MorfologicFilter doesn't stem properly invented names: surnames, rivers, etc.
 -

 Key: LUCENE-5421
 URL: https://issues.apache.org/jira/browse/LUCENE-5421
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 4.6.1
Reporter: Grzegorz Sobczyk
Assignee: Dawid Weiss
Priority: Minor

 Morfologic filter search stems in input or lowercase format:
 org.apache.lucene.analysis.morfologik.MorfologikFilter.incrementToken()
 {code}
 if (!keywordAttr.isKeyword()  (lookupSurfaceForm(termAtt) || 
 lookupSurfaceForm(toLowercase(termAtt {
   [...]
 }
 {code}
 In this situation, if input token is *sienkiewicza* - it isn't stemmed
 but: *Sienkiewicza* -- *Sienkiewicz*
 for comparison:
 *pRoDuKtY* -- *produkt*
 It should stem also input token with capitalized first letter



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5421) MorfologicFilter doesn't stem legitimate uppercase terms (surnames, proper nouns, etc.)


[ 
https://issues.apache.org/jira/browse/LUCENE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886434#comment-13886434
 ] 

Dawid Weiss commented on LUCENE-5421:
-

Yeah, do you care to provide a patch or a github pull request? It'd speed up 
the process. Include a test case for this as well, thanks.

 MorfologicFilter doesn't stem legitimate uppercase terms (surnames, proper 
 nouns, etc.)
 ---

 Key: LUCENE-5421
 URL: https://issues.apache.org/jira/browse/LUCENE-5421
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 4.6.1
Reporter: Grzegorz Sobczyk
Assignee: Dawid Weiss
Priority: Minor

 Morfologic filter search stems in input or lowercase format:
 org.apache.lucene.analysis.morfologik.MorfologikFilter.incrementToken()
 {code}
 if (!keywordAttr.isKeyword()  (lookupSurfaceForm(termAtt) || 
 lookupSurfaceForm(toLowercase(termAtt {
   [...]
 }
 {code}
 In this situation, if input token is *sienkiewicza* - it isn't stemmed
 but: *Sienkiewicza* -- *Sienkiewicz*
 for comparison:
 *pRoDuKtY* -- *produkt*
 It should stem also input token with capitalized first letter



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5422) Postings lists deduplication

2014-01-30 Thread Dmitry Kan (JIRA)

Dmitry Kan created LUCENE-5422:
--

 Summary: Postings lists deduplication
 Key: LUCENE-5422
 URL: https://issues.apache.org/jira/browse/LUCENE-5422
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs, core/index
Reporter: Dmitry Kan


The context:
http://markmail.org/thread/tywtrjjcfdbzww6f

Robert Muir and I have discussed what Robert eventually named postings
lists deduplication at Berlin Buzzwords 2013 conference.

The idea is to allow multiple terms to point to the same postings list to
save space. This can be achieved by new index codec implementation, but this 
jira is open to other ideas as well.

The application / impact of this is positive for synonyms, exact / inexact
terms, leading wildcard support via storing reversed term etc.

For example, at the moment, when supporting exact (unstemmed) and inexact 
(stemmed)
searches, we store both unstemmed and stemmed variant of a word form and
that leads to index bloating. That is why we had to remove the leading
wildcard support via reversing a token on index and query time because of
the same index size considerations.

Comment from Mike McCandless:
Neat idea!

Would this idea allow a single term to point to (the union of) N other
posting lists?  It seems like that's necessary e.g. to handle the
exact/inexact case.

And then, to produce the Docs/AndPositionsEnum you'd need to do the
merge sort across those N posting lists?

Such a thing might also be do-able as runtime only wrapper around the
postings API (FieldsProducer), if you could at runtime do the reverse
expansion (e.g. stem - all of its surface forms).


Comment from Robert Muir:
I think the exact/inexact is trickier (detecting it would be the hard
part), and you are right, another solution might work better.

but for the reverse wildcard and synonyms situation, it seems we could even
detect it on write if we created some hash of the previous terms postings.
if the hash matches for the current term, we know it might be a duplicate
and would have to actually do the costly check they are the same.

maybe there are better ways to do it, but it might be a fun postingformat
experiment to try.





--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5675) cloud-scripts/zkcli.bat: quote option -Dlog4j...

2014-01-30 Thread Stefan Matheis (steffkes) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886444#comment-13886444
 ] 

Stefan Matheis (steffkes) commented on SOLR-5675:
-

Thanks for reporting this Günter!

Just to make sure, i get it right .. you suggest quoting the whole thing, 
instead of the arguments value only? I'd expect something like

{{-Dlog4j.configuration=file:%SDIR%\log4j.properties}} instead
{{-Dlog4j.configuration=file:%SDIR%\log4j.properties}}

If you are sure, that quoting the whole argument does the trick .. we can go 
ahead with that :)

 cloud-scripts/zkcli.bat: quote option  -Dlog4j...
 -

 Key: SOLR-5675
 URL: https://issues.apache.org/jira/browse/SOLR-5675
 Project: Solr
  Issue Type: Bug
  Components: scripts and tools
Affects Versions: 4.6
 Environment: Windows 7 64 bit
Reporter: Günther Ruck
Priority: Minor

 In Skript zkcli.bat this java command line is build:
 %JVM% -Dlog4j.configuration=file:%SDIR%\log4j.properties
 The command failes if %SDIR% contains spaces (C:\Program Files\...).
 Quoting the hole option -Dlog4j.configuration=file:%SDIR%\log4j.properties 
 solved the isssue.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.7.0_60-ea-b03) - Build # 9292 - Still Failing!

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/9292/
Java: 32bit/jdk1.7.0_60-ea-b03 -client -XX:+UseSerialGC

All tests passed

Build Log:
[...truncated 50357 lines...]
BUILD FAILED
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:453: The following 
error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:392: The following 
error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/extra-targets.xml:87: The 
following error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/extra-targets.xml:185: The 
following files are missing svn:eol-style (or binary svn:mime-type):
* 
./lucene/core/src/test/org/apache/lucene/index/TestDocInverterPerFieldErrorInfo.java

Total time: 60 minutes 57 seconds
Build step 'Invoke Ant' marked build as failure
Description set: Java: 32bit/jdk1.7.0_60-ea-b03 -client -XX:+UseSerialGC
Archiving artifacts
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-trunk-Windows (64bit/jdk1.7.0_60-ea-b03) - Build # 3726 - Still Failing!

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3726/
Java: 64bit/jdk1.7.0_60-ea-b03 -XX:+UseCompressedOops -XX:+UseParallelGC

All tests passed

Build Log:
[...truncated 57363 lines...]
BUILD FAILED
C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\build.xml:453: The 
following error occurred while executing this line:
C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\build.xml:392: The 
following error occurred while executing this line:
C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\extra-targets.xml:87: 
The following error occurred while executing this line:
C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\extra-targets.xml:185:
 The following files are missing svn:eol-style (or binary svn:mime-type):
* 
./lucene/core/src/test/org/apache/lucene/index/TestDocInverterPerFieldErrorInfo.java

Total time: 108 minutes 35 seconds
Build step 'Invoke Ant' marked build as failure
Description set: Java: 64bit/jdk1.7.0_60-ea-b03 -XX:+UseCompressedOops 
-XX:+UseParallelGC
Archiving artifacts
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5675) cloud-scripts/zkcli.bat: quote option -Dlog4j...


[ 
https://issues.apache.org/jira/browse/SOLR-5675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886462#comment-13886462
 ] 

Günther Ruck commented on SOLR-5675:


Hallo Stefan,

I've tried your suggestion and it works. It is sufficient to quote the value of 
the option {{-Dlog4j.configuration=file:%SDIR%\log4j.properties}}




 cloud-scripts/zkcli.bat: quote option  -Dlog4j...
 -

 Key: SOLR-5675
 URL: https://issues.apache.org/jira/browse/SOLR-5675
 Project: Solr
  Issue Type: Bug
  Components: scripts and tools
Affects Versions: 4.6
 Environment: Windows 7 64 bit
Reporter: Günther Ruck
Priority: Minor

 In Skript zkcli.bat this java command line is build:
 %JVM% -Dlog4j.configuration=file:%SDIR%\log4j.properties
 The command failes if %SDIR% contains spaces (C:\Program Files\...).
 Quoting the hole option -Dlog4j.configuration=file:%SDIR%\log4j.properties 
 solved the isssue.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5405) Exception strategy for analysis improved

2014-01-30 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886465#comment-13886465
 ] 

Michael McCandless commented on LUCENE-5405:


Shouldn't we port this to 4.x as well?

 Exception strategy for analysis improved
 

 Key: LUCENE-5405
 URL: https://issues.apache.org/jira/browse/LUCENE-5405
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Benson Margulies
Assignee: Benson Margulies
 Fix For: 5.0


 SOLR-5623 included some conversation about the dilemmas of exception 
 management and reporting in the analysis chain. 
 I've belatedly become educated about the infostream, and this situation is a 
 job for it. The DocInverterPerField can note exceptions in the analysis 
 chain, log out to the infostream, and then rethrow them as before. No 
 wrapping, no muss, no fuss.
 There are comments on this JIRA from a more complex prior idea that readers 
 might want to ignore.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-5678) When SolrJ/SolrCloud can't talk to Zookeeper, it throws a RuntimeException

2014-01-30 Thread Karl Wright (JIRA)

Karl Wright created SOLR-5678:
-

 Summary: When SolrJ/SolrCloud can't talk to Zookeeper, it throws a 
RuntimeException
 Key: SOLR-5678
 URL: https://issues.apache.org/jira/browse/SOLR-5678
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.6
Reporter: Karl Wright


This class of exception should not be used for run-of-the-mill networking kinds 
of issues.  SolrServerException or some variety of IOException should be thrown 
instead.  Here's the trace:

{code}
java.lang.RuntimeException: java.util.concurrent.TimeoutException: Could not 
connect to ZooKeeper localhost:2181 within 6 ms
at 
org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:130)
at 
org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:88)
at 
org.apache.solr.common.cloud.ZkStateReader.init(ZkStateReader.java:148)
at 
org.apache.solr.client.solrj.impl.CloudSolrServer.connect(CloudSolrServer.java:147)
at 
org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:173)
at 
org.apache.manifoldcf.agents.output.solr.HttpPoster$SolrPing.process(HttpPoster.java:1315)
at 
org.apache.manifoldcf.agents.output.solr.HttpPoster$StatusThread.run(HttpPoster.java:1208)
Caused by: java.util.concurrent.TimeoutException: Could not connect to 
ZooKeeper localhost:2181 within 6 ms
at 
org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:173)
at 
org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:127)
... 6 more
{code}




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-Solr-trunk-Windows (64bit/jdk1.7.0_60-ea-b03) - Build # 3726 - Still Failing!

2014-01-30 Thread Michael McCandless

I committed a fix (svn prop-set svn:eol-style native ...).

Benson, you may want to add this to your ~/.subversion/config:

http://www.apache.org/dev/svn-eol-style.txt

It automatically sets the eol-style when you add the files w/ those
extensions...

Mike McCandless

http://blog.mikemccandless.com

On Thu, Jan 30, 2014 at 5:05 AM, Policeman Jenkins Server
jenk...@thetaphi.de wrote:
 Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3726/
 Java: 64bit/jdk1.7.0_60-ea-b03 -XX:+UseCompressedOops -XX:+UseParallelGC

 All tests passed

 Build Log:
 [...truncated 57363 lines...]
 BUILD FAILED
 C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\build.xml:453: The 
 following error occurred while executing this line:
 C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\build.xml:392: The 
 following error occurred while executing this line:
 C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\extra-targets.xml:87:
  The following error occurred while executing this line:
 C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\extra-targets.xml:185:
  The following files are missing svn:eol-style (or binary svn:mime-type):
 * 
 ./lucene/core/src/test/org/apache/lucene/index/TestDocInverterPerFieldErrorInfo.java

 Total time: 108 minutes 35 seconds
 Build step 'Invoke Ant' marked build as failure
 Description set: Java: 64bit/jdk1.7.0_60-ea-b03 -XX:+UseCompressedOops 
 -XX:+UseParallelGC
 Archiving artifacts
 Recording test results
 Email was triggered for: Failure
 Sending email for trigger: Failure




 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-4x-Linux-Java6-64-test-only - Build # 11503 - Failure!

2014-01-30 Thread Michael McCandless

I'll fix.

Mike McCandless

http://blog.mikemccandless.com


On Thu, Jan 30, 2014 at 1:35 AM,  buil...@flonkings.com wrote:
 Build: builds.flonkings.com/job/Lucene-4x-Linux-Java6-64-test-only/11503/

 1 tests failed.
 REGRESSION:  org.apache.lucene.index.TestIndexWriterReader.testNoTermsIndex

 Error Message:
 should have failed to seek since terms index was not loaded.

 Stack Trace:
 java.lang.AssertionError: should have failed to seek since terms index was 
 not loaded.
 at 
 __randomizedtesting.SeedInfo.seed([9CE6A84339D41DE3:107F46CCDEE31B19]:0)
 at org.junit.Assert.fail(Assert.java:93)
 at 
 org.apache.lucene.index.TestIndexWriterReader.testNoTermsIndex(TestIndexWriterReader.java:1043)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
 at 
 org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
 at 
 org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
 at 
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
 at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
 at 
 org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
 at 
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
 at 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
 at 
 com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
 at 
 org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
 at 
 org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
 at 
 com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
 at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 at 
 com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
 at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 at 
 org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
 at 
 org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
 at 
 org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
 at 
 org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
 at 
 com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
 at 
 com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
 at java.lang.Thread.run(Thread.java:662)




 Build Log:
 [...truncated 641 lines...]
[junit4] Suite: org.apache.lucene.index.TestIndexWriterReader
[junit4]   2 NOTE: reproduce with: ant test  
 -Dtestcase=TestIndexWriterReader -Dtests.method=testNoTermsIndex 
 -Dtests.seed=9CE6A84339D41DE3 -Dtests.slow=true -Dtests.locale=es 
 -Dtests.timezone=Etc/GMT+11

[jira] [Commented] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary


[ 
https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886478#comment-13886478
 ] 

ASF subversion and git services commented on LUCENE-3069:
-

Commit 1562771 from [~mikemccand] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1562771 ]

LUCENE-3069: also exclude MockRandom from this test case

 Lucene should have an entirely memory resident term dictionary
 --

 Key: LUCENE-3069
 URL: https://issues.apache.org/jira/browse/LUCENE-3069
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index, core/search
Affects Versions: 4.0-ALPHA
Reporter: Simon Willnauer
Assignee: Han Jiang
  Labels: gsoc2014
 Fix For: 4.7

 Attachments: LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, 
 LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, 
 LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, 
 LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, df-ttf-estimate.txt, 
 example.png


 FST based TermDictionary has been a great improvement yet it still uses a 
 delta codec file for scanning to terms. Some environments have enough memory 
 available to keep the entire FST based term dict in memory. We should add a 
 TermDictionary implementation that encodes all needed information for each 
 term into the FST (custom fst.Output) and builds a FST from the entire term 
 not just the delta.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5514) atomic update throws exception if the schema contains uuid fields: Invalid UUID String: 'java.util.UUID:e26c4d56-e98d-41de-9b7f-f63192089670'

2014-01-30 Thread Arun Kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886482#comment-13886482
 ] 

Arun Kumar commented on SOLR-5514:
--

I can reproduce this by only providing the wrong value in the xml while 
updating, like this

adddoc
  field name=id1 update='set'b95639d6-b579-41dc-b9f0-d17634937528/field
  field name=MyUUIDField 
update='set'java.util.UUID:b95639d6-b579-41dc-b9f0-d17634937529/field

Can you confirm the value used while updating index is not corrupted as 
mentioned above? if not, can you share the doc xml and the schema xml? that 
would help to reproduce and fix this issue.

 atomic update throws exception if the schema contains uuid fields: Invalid 
 UUID String: 'java.util.UUID:e26c4d56-e98d-41de-9b7f-f63192089670'
 -

 Key: SOLR-5514
 URL: https://issues.apache.org/jira/browse/SOLR-5514
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.5.1
 Environment: unix and windows
Reporter: Dirk Reuss 
Assignee: Shalin Shekhar Mangar

 I am updating an exiting document with the statement 
 adddocfield name='name' update='set'newvalue/field
 All fields are stored and I have several UUID fields. About 10-20% of the 
 update commands will fail with the message: (example)
 Invalid UUID String: 'java.util.UUID:532c9353-d391-4a04-8618-dc2fa1ef8b35'
 the point is that java.util.UUID seems to be prepended to the original uuid 
 stored in the field and when the value is written this error occours.
 I tried to check if this specific uuid field was the problem and
 added the uuid field in the update xml with(field name='id1' 
 update='set'...). But the error simply moved to an other uuid field.
 here is the original exception:
 lst name=responseHeaderint name=status500/intint 
 name=QTime34/int/lstlst name=errorstr name=msgError while 
 creating field 
 'MyUUIDField{type=uuid,properties=indexed,stored,omitTermFreqAndPositions,required,
  required=true}' from value 
 'java.util.UUID:e26c4d56-e98d-41de-9b7f-f63192089670'/strstr 
 name=traceorg.apache.solr.common.SolrException: Error while creating field 
 'MyUUIDField{type=uuid,properties=indexed,stored,omitTermFreqAndPositions,required,
  required=true}' from value 
 'java.util.UUID:e26c4d56-e98d-41de-9b7f-f63192089670'
   at org.apache.solr.schema.FieldType.createField(FieldType.java:259)
   at org.apache.solr.schema.StrField.createFields(StrField.java:56)
   at 
 org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:47)
   at 
 org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:118)
   at 
 org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:77)
   at 
 org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:215)
   at 
 org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
   at 
 org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
   at 
 org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:556)
   at 
 org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:692)
   at 
 org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:435)
   at 
 org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
   at 
 org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:247)
   at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174)
   at 
 org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
   at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
   at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
   at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
   at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
   at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
   at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
   at

[jira] [Resolved] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary

2014-01-30 Thread Han Jiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Han Jiang resolved LUCENE-3069.
---

Resolution: Fixed

 Lucene should have an entirely memory resident term dictionary
 --

 Key: LUCENE-3069
 URL: https://issues.apache.org/jira/browse/LUCENE-3069
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index, core/search
Affects Versions: 4.0-ALPHA
Reporter: Simon Willnauer
Assignee: Han Jiang
  Labels: gsoc2014
 Fix For: 4.7

 Attachments: LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, 
 LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, 
 LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, 
 LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, df-ttf-estimate.txt, 
 example.png


 FST based TermDictionary has been a great improvement yet it still uses a 
 delta codec file for scanning to terms. Some environments have enough memory 
 available to keep the entire FST based term dict in memory. We should add a 
 TermDictionary implementation that encodes all needed information for each 
 term into the FST (custom fst.Output) and builds a FST from the entire term 
 not just the delta.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

lucene-solr pull request: LUCENE-5421: MorfologicFilter doesn't stem legiti...

2014-01-30 Thread gsobczyk

GitHub user gsobczyk opened a pull request:

https://github.com/apache/lucene-solr/pull/25

LUCENE-5421: MorfologicFilter doesn't stem legitimate uppercase terms 
(surnames, proper nouns, etc.)



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gsobczyk/lucene-solr branch_4x

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/lucene-solr/pull/25.patch


commit 41e399c27241ab8b0185c23c8035e632046b36e0
Author: Grzegorz Sobczyk grzegorz.sobc...@unity.pl
Date:   2014-01-30T10:56:53Z

LUCENE-5421: MorfologicFilter doesn't stem legitimate uppercase terms 
(surnames, proper nouns, etc.)




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5421) MorfologicFilter doesn't stem legitimate uppercase terms (surnames, proper nouns, etc.)

2014-01-30 Thread Grzegorz Sobczyk (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886492#comment-13886492
 ] 

Grzegorz Sobczyk commented on LUCENE-5421:
--

https://github.com/apache/lucene-solr/pull/25
but see: TestMorfologikAnalyzer:128
- I didn't find the reason why assertion *aarona*-*aarona* is present there

Other thing: poznania (TestMorfologikAnalyzer:125) should return poznanie, 
poznać, Poznań but I don't know how to do this.
If this would be possible, then we could configure MorfologikFilter.

 MorfologicFilter doesn't stem legitimate uppercase terms (surnames, proper 
 nouns, etc.)
 ---

 Key: LUCENE-5421
 URL: https://issues.apache.org/jira/browse/LUCENE-5421
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 4.6.1
Reporter: Grzegorz Sobczyk
Assignee: Dawid Weiss
Priority: Minor

 Morfologic filter search stems in input or lowercase format:
 org.apache.lucene.analysis.morfologik.MorfologikFilter.incrementToken()
 {code}
 if (!keywordAttr.isKeyword()  (lookupSurfaceForm(termAtt) || 
 lookupSurfaceForm(toLowercase(termAtt {
   [...]
 }
 {code}
 In this situation, if input token is *sienkiewicza* - it isn't stemmed
 but: *Sienkiewicza* -- *Sienkiewicz*
 for comparison:
 *pRoDuKtY* -- *produkt*
 It should stem also input token with capitalized first letter



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

lucene auto complete index

2014-01-30 Thread connect2sant...@gmail.com

can anybody provide lucene autocomplete code for indexing and searching based
on Lucene 4.4.

do we need to have separate index for autocomplete? or normal index can be
used?

i need to get suggestions from the documents only i have access to. How can
it be implemented?

 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/lucene-auto-complete-index-tp4114409.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5320) Create SearcherTaxonomyManager over Directory


 [ 
https://issues.apache.org/jira/browse/LUCENE-5320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-5320:
---

Attachment: LUCENE-5320.patch

Patch adds a ctor to SearcherTaxoManager and tests.

 Create SearcherTaxonomyManager over Directory
 -

 Key: LUCENE-5320
 URL: https://issues.apache.org/jira/browse/LUCENE-5320
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/facet
Reporter: Shai Erera
 Attachments: LUCENE-5320.patch


 SearcherTaxonomyManager now only allows working in NRT mode. It could be 
 useful to have an STM which allows reopening a SearcherAndTaxonomy pair over 
 Directories, e.g. for replication. The problem is that if the thread that 
 calls maybeRefresh() is not the one that does the commit(), it could lead to 
 a pair that is not synchronized.
 Perhaps at first we could have a simple version that works under some 
 assumptions, i.e. that the app does the commit + reopen in the same thread in 
 that order, so that it can be used by such apps + when replicating the 
 indexes, and later we can figure out how to generalize it to work even if 
 commit + reopen are done by separate threads/JVMs.
 I'll see if SearcherTaxonomyManager can be extended to support it, or a new 
 STM is required.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5423) OpenBitSet hashCode and equals incongruent

2014-01-30 Thread Jakob Zwiener (JIRA)

Jakob Zwiener created LUCENE-5423:
-

 Summary: OpenBitSet hashCode and equals incongruent
 Key: LUCENE-5423
 URL: https://issues.apache.org/jira/browse/LUCENE-5423
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Affects Versions: 4.6.1
Reporter: Jakob Zwiener


In org.apache.lucene.util.OpenBitSet the hashCode method might return different 
hash codes for equal bitsets. 
This happens when there are bits set in words right of wlen. This might happen 
through a getAndSet call (the documentation states that getAndSet may only be 
called on positions that are smaller than the size - which is the length of the 
array not wlen - this might be another issue).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5320) Create SearcherTaxonomyManager over Directory

2014-01-30 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886521#comment-13886521
 ] 

Michael McCandless commented on LUCENE-5320:


+1, thanks Shai!

 Create SearcherTaxonomyManager over Directory
 -

 Key: LUCENE-5320
 URL: https://issues.apache.org/jira/browse/LUCENE-5320
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/facet
Reporter: Shai Erera
 Attachments: LUCENE-5320.patch


 SearcherTaxonomyManager now only allows working in NRT mode. It could be 
 useful to have an STM which allows reopening a SearcherAndTaxonomy pair over 
 Directories, e.g. for replication. The problem is that if the thread that 
 calls maybeRefresh() is not the one that does the commit(), it could lead to 
 a pair that is not synchronized.
 Perhaps at first we could have a simple version that works under some 
 assumptions, i.e. that the app does the commit + reopen in the same thread in 
 that order, so that it can be used by such apps + when replicating the 
 indexes, and later we can figure out how to generalize it to work even if 
 commit + reopen are done by separate threads/JVMs.
 I'll see if SearcherTaxonomyManager can be extended to support it, or a new 
 STM is required.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5678) When SolrJ/SolrCloud can't talk to Zookeeper, it throws a RuntimeException

2014-01-30 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886531#comment-13886531
 ] 

Karl Wright commented on SOLR-5678:
---

Throwing SolrException (which is derived from RuntimeException) would be fine 
too.

 When SolrJ/SolrCloud can't talk to Zookeeper, it throws a RuntimeException
 --

 Key: SOLR-5678
 URL: https://issues.apache.org/jira/browse/SOLR-5678
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.6
Reporter: Karl Wright

 This class of exception should not be used for run-of-the-mill networking 
 kinds of issues.  SolrServerException or some variety of IOException should 
 be thrown instead.  Here's the trace:
 {code}
 java.lang.RuntimeException: java.util.concurrent.TimeoutException: Could not 
 connect to ZooKeeper localhost:2181 within 6 ms
   at 
 org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:130)
   at 
 org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:88)
   at 
 org.apache.solr.common.cloud.ZkStateReader.init(ZkStateReader.java:148)
   at 
 org.apache.solr.client.solrj.impl.CloudSolrServer.connect(CloudSolrServer.java:147)
   at 
 org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:173)
   at 
 org.apache.manifoldcf.agents.output.solr.HttpPoster$SolrPing.process(HttpPoster.java:1315)
   at 
 org.apache.manifoldcf.agents.output.solr.HttpPoster$StatusThread.run(HttpPoster.java:1208)
 Caused by: java.util.concurrent.TimeoutException: Could not connect to 
 ZooKeeper localhost:2181 within 6 ms
   at 
 org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:173)
   at 
 org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:127)
   ... 6 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

lucene-solr pull request: Reproduces and fixes issue LUCENE-5423

2014-01-30 Thread jakob-zwiener

GitHub user jakob-zwiener opened a pull request:

https://github.com/apache/lucene-solr/pull/26

Reproduces and fixes issue LUCENE-5423

First commit adds a test that shows the unexpected behaviour. Should the 
dcumentation of getAndSet be updated to recommend not setting bits right of 
wlen the unexpected hashCode behaviour could be triggered by manually setting 
wlen. Although bitsets having bits set right of wlen are kind of invalid in 
themselves, fixing the hashCode method seems advisable.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jakob-zwiener/lucene-solr trunk

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/lucene-solr/pull/26.patch


commit 1802f1bc128f85af604aa75d40629eef3ac9515c
Author: Jakob Zwiener jakob-zwie...@gmx.de
Date:   2014-01-30T12:16:22Z

LUCENE-5423 added failing test reproducing the problem

commit d41d17deccc383523cdc24cecb81f269a9643c55
Author: Jakob Zwiener jakob-zwie...@gmx.de
Date:   2014-01-30T12:20:18Z

LUCENE-5423 fixing hashCode method to only iterate valid bits

commit ab4290071d9a53716d93c625779e796600eefeab
Author: Jakob Zwiener jakob-zwie...@gmx.de
Date:   2014-01-30T12:23:07Z

LUCENE-5423 fixed getAndSet documentation
bits right of wlen should not be set




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5423) OpenBitSet hashCode and equals incongruent

2014-01-30 Thread Jakob Zwiener (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886535#comment-13886535
 ] 

Jakob Zwiener commented on LUCENE-5423:
---

A pull request with a test and a possible fix has been sent on github.

 OpenBitSet hashCode and equals incongruent
 --

 Key: LUCENE-5423
 URL: https://issues.apache.org/jira/browse/LUCENE-5423
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Affects Versions: 4.6.1
Reporter: Jakob Zwiener
   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 In org.apache.lucene.util.OpenBitSet the hashCode method might return 
 different hash codes for equal bitsets. 
 This happens when there are bits set in words right of wlen. This might 
 happen through a getAndSet call (the documentation states that getAndSet may 
 only be called on positions that are smaller than the size - which is the 
 length of the array not wlen - this might be another issue).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5320) Create SearcherTaxonomyManager over Directory


[ 
https://issues.apache.org/jira/browse/LUCENE-5320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886550#comment-13886550
 ] 

ASF subversion and git services commented on LUCENE-5320:
-

Commit 1562806 from [~shaie] in branch 'dev/trunk'
[ https://svn.apache.org/r1562806 ]

LUCENE-5320: Add SearcherTaxonomyManager over Directory

 Create SearcherTaxonomyManager over Directory
 -

 Key: LUCENE-5320
 URL: https://issues.apache.org/jira/browse/LUCENE-5320
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/facet
Reporter: Shai Erera
 Attachments: LUCENE-5320.patch


 SearcherTaxonomyManager now only allows working in NRT mode. It could be 
 useful to have an STM which allows reopening a SearcherAndTaxonomy pair over 
 Directories, e.g. for replication. The problem is that if the thread that 
 calls maybeRefresh() is not the one that does the commit(), it could lead to 
 a pair that is not synchronized.
 Perhaps at first we could have a simple version that works under some 
 assumptions, i.e. that the app does the commit + reopen in the same thread in 
 that order, so that it can be used by such apps + when replicating the 
 indexes, and later we can figure out how to generalize it to work even if 
 commit + reopen are done by separate threads/JVMs.
 I'll see if SearcherTaxonomyManager can be extended to support it, or a new 
 STM is required.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5320) Create SearcherTaxonomyManager over Directory

2014-01-30 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886555#comment-13886555
 ] 

ASF subversion and git services commented on LUCENE-5320:
-

Commit 1562808 from [~shaie] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1562808 ]

LUCENE-5320: Add SearcherTaxonomyManager over Directory

 Create SearcherTaxonomyManager over Directory
 -

 Key: LUCENE-5320
 URL: https://issues.apache.org/jira/browse/LUCENE-5320
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/facet
Reporter: Shai Erera
 Fix For: 5.0, 4.7

 Attachments: LUCENE-5320.patch


 SearcherTaxonomyManager now only allows working in NRT mode. It could be 
 useful to have an STM which allows reopening a SearcherAndTaxonomy pair over 
 Directories, e.g. for replication. The problem is that if the thread that 
 calls maybeRefresh() is not the one that does the commit(), it could lead to 
 a pair that is not synchronized.
 Perhaps at first we could have a simple version that works under some 
 assumptions, i.e. that the app does the commit + reopen in the same thread in 
 that order, so that it can be used by such apps + when replicating the 
 indexes, and later we can figure out how to generalize it to work even if 
 commit + reopen are done by separate threads/JVMs.
 I'll see if SearcherTaxonomyManager can be extended to support it, or a new 
 STM is required.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-5320) Create SearcherTaxonomyManager over Directory

[
https://issues.apache.org/jira/browse/LUCENE-5320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Shai Erera resolved LUCENE-5320.

Resolution: Fixed
Fix Version/s: 4.7
5.0
Assignee: Shai Erera
Lucene Fields: New,Patch Available (was: New)

Committed to trunk and 4x.

Create SearcherTaxonomyManager over Directory
-

Key: LUCENE-5320
URL: https://issues.apache.org/jira/browse/LUCENE-5320
Project: Lucene - Core
Issue Type: New Feature
Components: modules/facet
Reporter: Shai Erera
Assignee: Shai Erera
Fix For: 5.0, 4.7

Attachments: LUCENE-5320.patch

SearcherTaxonomyManager now only allows working in NRT mode. It could be
useful to have an STM which allows reopening a SearcherAndTaxonomy pair over
Directories, e.g. for replication. The problem is that if the thread that
calls maybeRefresh() is not the one that does the commit(), it could lead to
a pair that is not synchronized.
Perhaps at first we could have a simple version that works under some
assumptions, i.e. that the app does the commit + reopen in the same thread in
that order, so that it can be used by such apps + when replicating the
indexes, and later we can figure out how to generalize it to work even if
commit + reopen are done by separate threads/JVMs.
I'll see if SearcherTaxonomyManager can be extended to support it, or a new
STM is required.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5654) Create a synonym filter factory that is (re)configurable, and capable of reporting its configuration, via REST API

2014-01-30 Thread Jack Krupansky (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-5654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886574#comment-13886574
]

Jack Krupansky commented on SOLR-5654:
--

Two reasonable and reliable use cases I have encountered:

1. Update or replace query-time synonyms - no risk for existing indexed data.

2. Add new index-time synonyms that will apply to new indexed documents -
again, no expectation that they would apply to existing documents, but
reindexing would of course apply them anyway.

Create a synonym filter factory that is (re)configurable, and capable of
reporting its configuration, via REST API
--

Key: SOLR-5654
URL: https://issues.apache.org/jira/browse/SOLR-5654
Project: Solr
Issue Type: Sub-task
Components: Schema and Analysis
Reporter: Steve Rowe

A synonym filter factory could be (re)configurable via REST API by
registering with the RESTManager described in SOLR-5653, and then responding
to REST API calls to modify its init params and its synonyms resource file.
Read-only (GET) REST API calls should also be provided, both for init params
and the synonyms resource file.
It should be possible to add/remove/modify one or more entries in the
synonyms resource file.
We should probably use JSON for the REST request body, as is done in the
Schema REST API methods.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5422) Postings lists deduplication

2014-01-30 Thread Dmitry Kan (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-5422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Dmitry Kan updated LUCENE-5422:
---

Labels: gsoc2014 (was: )

Postings lists deduplication

Key: LUCENE-5422
URL: https://issues.apache.org/jira/browse/LUCENE-5422
Project: Lucene - Core
Issue Type: Improvement
Components: core/codecs, core/index
Reporter: Dmitry Kan
Labels: gsoc2014

The context:
http://markmail.org/thread/tywtrjjcfdbzww6f
Robert Muir and I have discussed what Robert eventually named postings
lists deduplication at Berlin Buzzwords 2013 conference.
The idea is to allow multiple terms to point to the same postings list to
save space. This can be achieved by new index codec implementation, but this
jira is open to other ideas as well.
The application / impact of this is positive for synonyms, exact / inexact
terms, leading wildcard support via storing reversed term etc.
For example, at the moment, when supporting exact (unstemmed) and inexact
(stemmed)
searches, we store both unstemmed and stemmed variant of a word form and
that leads to index bloating. That is why we had to remove the leading
wildcard support via reversing a token on index and query time because of
the same index size considerations.
Comment from Mike McCandless:
Neat idea!
Would this idea allow a single term to point to (the union of) N other
posting lists? It seems like that's necessary e.g. to handle the
exact/inexact case.
And then, to produce the Docs/AndPositionsEnum you'd need to do the
merge sort across those N posting lists?
Such a thing might also be do-able as runtime only wrapper around the
postings API (FieldsProducer), if you could at runtime do the reverse
expansion (e.g. stem - all of its surface forms).
Comment from Robert Muir:
I think the exact/inexact is trickier (detecting it would be the hard
part), and you are right, another solution might work better.
but for the reverse wildcard and synonyms situation, it seems we could even
detect it on write if we created some hash of the previous terms postings.
if the hash matches for the current term, we know it might be a duplicate
and would have to actually do the costly check they are the same.
maybe there are better ways to do it, but it might be a fun postingformat
experiment to try.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-5679) Shard splitting fails to split old clusterstate.json with router as a string

Shalin Shekhar Mangar created SOLR-5679:
---

 Summary: Shard splitting fails to split old clusterstate.json with 
router as a string
 Key: SOLR-5679
 URL: https://issues.apache.org/jira/browse/SOLR-5679
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.6, 4.6.1
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
 Fix For: 4.7


SOLR-5246 added support for splitting collections configured with a 
router.field but the fix was not back-compatible.

After upgrading an old SolrCloud cluster to 4.6 or 4.6.1, shard splitting can 
fail with the following message:
{quote}
ERROR o.a.s.handler.admin.CoreAdminHandler - ERROR executing split:
java.lang.ClassCastException: java.lang.String cannot be cast to java.util.Map
at 
org.apache.solr.handler.admin.CoreAdminHandler.handleSplitAction(CoreAdminHandler.java:285)
 [solr-core-4.6.1.jar:4.6.1 1560866 - mark - 2014-01-23 20:21:50]
at 
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:193)
 [solr-core-4.6.1.jar:4.6.1 1560866 - mark - 2014-01-23 20:21:50]
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
 [solr-core-4.6.1.jar:4.6.1 1560866 - mark - 2014-01-23 20:21:50]
{quote}

This happens because the cluster state still contains the router as a string. 
The clusterstate.json is supposed to auto-upgrade if cluster state is upgraded 
but according to the user report that did not happen. In any case, we need to 
fix the core admin split.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5679) Shard splitting fails with ClassCastException on clusterstate.json with router as string

2014-01-30 Thread ASF subversion and git services (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-5679:


Summary: Shard splitting fails with ClassCastException on clusterstate.json 
with router as string  (was: Shard splitting fails to split old 
clusterstate.json with router as a string)

 Shard splitting fails with ClassCastException on clusterstate.json with 
 router as string
 

 Key: SOLR-5679
 URL: https://issues.apache.org/jira/browse/SOLR-5679
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.6, 4.6.1
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
 Fix For: 4.7


 SOLR-5246 added support for splitting collections configured with a 
 router.field but the fix was not back-compatible.
 After upgrading an old SolrCloud cluster to 4.6 or 4.6.1, shard splitting can 
 fail with the following message:
 {quote}
 ERROR o.a.s.handler.admin.CoreAdminHandler - ERROR executing split:
 java.lang.ClassCastException: java.lang.String cannot be cast to java.util.Map
 at 
 org.apache.solr.handler.admin.CoreAdminHandler.handleSplitAction(CoreAdminHandler.java:285)
  [solr-core-4.6.1.jar:4.6.1 1560866 - mark - 2014-01-23 20:21:50]
 at 
 org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:193)
  [solr-core-4.6.1.jar:4.6.1 1560866 - mark - 2014-01-23 20:21:50]
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
  [solr-core-4.6.1.jar:4.6.1 1560866 - mark - 2014-01-23 20:21:50]
 {quote}
 This happens because the cluster state still contains the router as a string. 
 The clusterstate.json is supposed to auto-upgrade if cluster state is 
 upgraded but according to the user report that did not happen. In any case, 
 we need to fix the core admin split.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5658) commitWithin does not reflect the new documents added


[ 
https://issues.apache.org/jira/browse/SOLR-5658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886604#comment-13886604
 ] 

ASF subversion and git services commented on SOLR-5658:
---

Commit 1562836 from sha...@apache.org in branch 'dev/trunk'
[ https://svn.apache.org/r1562836 ]

SOLR-5658: Removing System.out.println in JavaBinUpdatedRequestCodec added for 
debugging

 commitWithin does not reflect the new documents added
 -

 Key: SOLR-5658
 URL: https://issues.apache.org/jira/browse/SOLR-5658
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.6, 5.0
Reporter: Varun Thacker
Assignee: Mark Miller
Priority: Critical
 Fix For: 5.0, 4.7, 4.6.1

 Attachments: SOLR-5658.patch, SOLR-5658.patch


 I start 4 nodes using the setup mentioned on - 
 https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud
  
 I added a document using - 
 curl http://localhost:8983/solr/update?commitWithin=1 -H Content-Type: 
 text/xml --data-binary 'adddocfield 
 name=idtestdoc/field/doc/add'
 In Solr 4.5.1 there is 1 soft commit with openSearcher=true and 1 hard commit 
 with openSearcher=false
 In Solr 4.6.x there is there is only one commit hard commit with 
 openSearcher=false
  
 So even after 10 seconds queries on none of the shards reflect the added 
 document. 
 This was also reported on the solr-user list ( 
 http://lucene.472066.n3.nabble.com/Possible-regression-for-Solr-4-6-0-commitWithin-does-not-work-with-replicas-td4106102.html
  )
 Here are the relevant logs 
 Logs from Solr 4.5.1
 Node 1:
 {code}
 420021 [qtp619011445-12] INFO  
 org.apache.solr.update.processor.LogUpdateProcessor  – [collection1] 
 webapp=/solr path=/update params={commitWithin=1} {add=[testdoc]} 0 45
 {code}
  
 Node 2:
 {code}
 119896 [qtp1608701025-10] INFO  
 org.apache.solr.update.processor.LogUpdateProcessor  – [collection1] 
 webapp=/solr path=/update 
 params={distrib.from=http://192.168.1.103:8983/solr/collection1/update.distrib=TOLEADERwt=javabinversion=2}
  {add=[testdoc (1458003295513608192)]} 0 348
 129648 [commitScheduler-8-thread-1] INFO  
 org.apache.solr.update.UpdateHandler  – start 
 commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
 129679 [commitScheduler-8-thread-1] INFO  
 org.apache.solr.search.SolrIndexSearcher  – Opening Searcher@e174f70 main
 129680 [commitScheduler-8-thread-1] INFO  
 org.apache.solr.update.UpdateHandler  – end_commit_flush
 129681 [searcherExecutor-5-thread-1] INFO  org.apache.solr.core.SolrCore  – 
 QuerySenderListener sending requests to Searcher@e174f70 
 main{StandardDirectoryReader(segments_3:11:nrt _2(4.5.1):C1)}
 129681 [searcherExecutor-5-thread-1] INFO  org.apache.solr.core.SolrCore  – 
 QuerySenderListener done.
 129681 [searcherExecutor-5-thread-1] INFO  org.apache.solr.core.SolrCore  – 
 [collection1] Registered new searcher Searcher@e174f70 
 main{StandardDirectoryReader(segments_3:11:nrt _2(4.5.1):C1)}
 134648 [commitScheduler-7-thread-1] INFO  
 org.apache.solr.update.UpdateHandler  – start 
 commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
 134658 [commitScheduler-7-thread-1] INFO  org.apache.solr.core.SolrCore  – 
 SolrDeletionPolicy.onCommit: commits: num=2
   
 commit{dir=NRTCachingDirectory(org.apache.lucene.store.NIOFSDirectory@/Users/varun/solr-4.5.1/node2/solr/collection1/data/index
  lockFactory=org.apache.lucene.store.NativeFSLockFactory@66a394a3; 
 maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_3,generation=3}
   
 commit{dir=NRTCachingDirectory(org.apache.lucene.store.NIOFSDirectory@/Users/varun/solr-4.5.1/node2/solr/collection1/data/index
  lockFactory=org.apache.lucene.store.NativeFSLockFactory@66a394a3; 
 maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_4,generation=4}
 134658 [commitScheduler-7-thread-1] INFO  org.apache.solr.core.SolrCore  – 
 newest commit generation = 4
 134660 [commitScheduler-7-thread-1] INFO  
 org.apache.solr.update.UpdateHandler  – end_commit_flush
  {code}
  
 Node 3:
  
 Node 4:
 {code}
 374545 [qtp1608701025-16] INFO  
 org.apache.solr.update.processor.LogUpdateProcessor  – [collection1] 
 webapp=/solr path=/update 
 params={distrib.from=http://192.168.1.103:7574/solr/collection1/update.distrib=FROMLEADERwt=javabinversion=2}
  {add=[testdoc (1458002133233172480)]} 0 20
 384545 [commitScheduler-8-thread-1] INFO  
 org.apache.solr.update.UpdateHandler  – start 
 commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
 384552 [commitScheduler-8-thread-1] INFO  
 org.apache.solr.search.SolrIndexSearcher  – Opening Searcher@36137e08 main
 384553

[jira] [Commented] (SOLR-5658) commitWithin does not reflect the new documents added


[ 
https://issues.apache.org/jira/browse/SOLR-5658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886608#comment-13886608
 ] 

Shalin Shekhar Mangar commented on SOLR-5658:
-

Perhaps I should remove this println as another issue because this has already 
been released?

 commitWithin does not reflect the new documents added
 -

 Key: SOLR-5658
 URL: https://issues.apache.org/jira/browse/SOLR-5658
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.6, 5.0
Reporter: Varun Thacker
Assignee: Mark Miller
Priority: Critical
 Fix For: 5.0, 4.7, 4.6.1

 Attachments: SOLR-5658.patch, SOLR-5658.patch


 I start 4 nodes using the setup mentioned on - 
 https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud
  
 I added a document using - 
 curl http://localhost:8983/solr/update?commitWithin=1 -H Content-Type: 
 text/xml --data-binary 'adddocfield 
 name=idtestdoc/field/doc/add'
 In Solr 4.5.1 there is 1 soft commit with openSearcher=true and 1 hard commit 
 with openSearcher=false
 In Solr 4.6.x there is there is only one commit hard commit with 
 openSearcher=false
  
 So even after 10 seconds queries on none of the shards reflect the added 
 document. 
 This was also reported on the solr-user list ( 
 http://lucene.472066.n3.nabble.com/Possible-regression-for-Solr-4-6-0-commitWithin-does-not-work-with-replicas-td4106102.html
  )
 Here are the relevant logs 
 Logs from Solr 4.5.1
 Node 1:
 {code}
 420021 [qtp619011445-12] INFO  
 org.apache.solr.update.processor.LogUpdateProcessor  – [collection1] 
 webapp=/solr path=/update params={commitWithin=1} {add=[testdoc]} 0 45
 {code}
  
 Node 2:
 {code}
 119896 [qtp1608701025-10] INFO  
 org.apache.solr.update.processor.LogUpdateProcessor  – [collection1] 
 webapp=/solr path=/update 
 params={distrib.from=http://192.168.1.103:8983/solr/collection1/update.distrib=TOLEADERwt=javabinversion=2}
  {add=[testdoc (1458003295513608192)]} 0 348
 129648 [commitScheduler-8-thread-1] INFO  
 org.apache.solr.update.UpdateHandler  – start 
 commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
 129679 [commitScheduler-8-thread-1] INFO  
 org.apache.solr.search.SolrIndexSearcher  – Opening Searcher@e174f70 main
 129680 [commitScheduler-8-thread-1] INFO  
 org.apache.solr.update.UpdateHandler  – end_commit_flush
 129681 [searcherExecutor-5-thread-1] INFO  org.apache.solr.core.SolrCore  – 
 QuerySenderListener sending requests to Searcher@e174f70 
 main{StandardDirectoryReader(segments_3:11:nrt _2(4.5.1):C1)}
 129681 [searcherExecutor-5-thread-1] INFO  org.apache.solr.core.SolrCore  – 
 QuerySenderListener done.
 129681 [searcherExecutor-5-thread-1] INFO  org.apache.solr.core.SolrCore  – 
 [collection1] Registered new searcher Searcher@e174f70 
 main{StandardDirectoryReader(segments_3:11:nrt _2(4.5.1):C1)}
 134648 [commitScheduler-7-thread-1] INFO  
 org.apache.solr.update.UpdateHandler  – start 
 commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
 134658 [commitScheduler-7-thread-1] INFO  org.apache.solr.core.SolrCore  – 
 SolrDeletionPolicy.onCommit: commits: num=2
   
 commit{dir=NRTCachingDirectory(org.apache.lucene.store.NIOFSDirectory@/Users/varun/solr-4.5.1/node2/solr/collection1/data/index
  lockFactory=org.apache.lucene.store.NativeFSLockFactory@66a394a3; 
 maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_3,generation=3}
   
 commit{dir=NRTCachingDirectory(org.apache.lucene.store.NIOFSDirectory@/Users/varun/solr-4.5.1/node2/solr/collection1/data/index
  lockFactory=org.apache.lucene.store.NativeFSLockFactory@66a394a3; 
 maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_4,generation=4}
 134658 [commitScheduler-7-thread-1] INFO  org.apache.solr.core.SolrCore  – 
 newest commit generation = 4
 134660 [commitScheduler-7-thread-1] INFO  
 org.apache.solr.update.UpdateHandler  – end_commit_flush
  {code}
  
 Node 3:
  
 Node 4:
 {code}
 374545 [qtp1608701025-16] INFO  
 org.apache.solr.update.processor.LogUpdateProcessor  – [collection1] 
 webapp=/solr path=/update 
 params={distrib.from=http://192.168.1.103:7574/solr/collection1/update.distrib=FROMLEADERwt=javabinversion=2}
  {add=[testdoc (1458002133233172480)]} 0 20
 384545 [commitScheduler-8-thread-1] INFO  
 org.apache.solr.update.UpdateHandler  – start 
 commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
 384552 [commitScheduler-8-thread-1] INFO  
 org.apache.solr.search.SolrIndexSearcher  – Opening Searcher@36137e08 main
 384553 [commitScheduler-8-thread-1] INFO  
 org.apache.solr.update.UpdateHandler  – end_commit_flush
 384553

[JENKINS] Lucene-Solr-4.x-MacOSX (64bit/jdk1.7.0) - Build # 1248 - Failure!

2014-01-30 Thread ASF subversion and git services (JIRA)

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-MacOSX/1248/
Java: 64bit/jdk1.7.0 -XX:+UseCompressedOops -XX:+UseParallelGC

All tests passed

Build Log:
[...truncated 10556 lines...]
   [junit4] JVM J0: stderr was not empty, see: 
/Users/jenkins/workspace/Lucene-Solr-4.x-MacOSX/solr/build/solr-core/test/temp/junit4-J0-20140130_143859_837.syserr
   [junit4]  JVM J0: stderr (verbatim) 
   [junit4] java(749,0x13c179000) malloc: *** error for object 0x113c1683a0: 
pointer being freed was not allocated
   [junit4] *** set a breakpoint in malloc_error_break to debug
   [junit4]  JVM J0: EOF 

[...truncated 1 lines...]
   [junit4] ERROR: JVM J0 ended with an exception, command line: 
/Library/Java/JavaVirtualMachines/jdk1.7.0_51.jdk/Contents/Home/jre/bin/java 
-XX:+UseCompressedOops -XX:+UseParallelGC -XX:+HeapDumpOnOutOfMemoryError 
-XX:HeapDumpPath=/Users/jenkins/workspace/Lucene-Solr-4.x-MacOSX/heapdumps 
-Dtests.prefix=tests -Dtests.seed=FFC506EA979174D6 -Xmx512M -Dtests.iters= 
-Dtests.verbose=false -Dtests.infostream=false -Dtests.codec=random 
-Dtests.postingsformat=random -Dtests.docvaluesformat=random 
-Dtests.locale=random -Dtests.timezone=random -Dtests.directory=random 
-Dtests.linedocsfile=europarl.lines.txt.gz -Dtests.luceneMatchVersion=4.7 
-Dtests.cleanthreads=perClass 
-Djava.util.logging.config.file=/Users/jenkins/workspace/Lucene-Solr-4.x-MacOSX/lucene/tools/junit4/logging.properties
 -Dtests.nightly=false -Dtests.weekly=false -Dtests.slow=true 
-Dtests.asserts.gracious=false -Dtests.multiplier=1 -DtempDir=. 
-Djava.io.tmpdir=. 
-Djunit4.tempDir=/Users/jenkins/workspace/Lucene-Solr-4.x-MacOSX/solr/build/solr-core/test/temp
 
-Dclover.db.dir=/Users/jenkins/workspace/Lucene-Solr-4.x-MacOSX/lucene/build/clover/db
 -Djava.security.manager=org.apache.lucene.util.TestSecurityManager 
-Djava.security.policy=/Users/jenkins/workspace/Lucene-Solr-4.x-MacOSX/lucene/tools/junit4/tests.policy
 -Dlucene.version=4.7-SNAPSHOT -Djetty.testMode=1 -Djetty.insecurerandom=1 
-Dsolr.directoryFactory=org.apache.solr.core.MockDirectoryFactory 
-Djava.awt.headless=true -Djdk.map.althashing.threshold=0 
-Dtests.disableHdfs=true -Dfile.encoding=US-ASCII -classpath

[jira] [Commented] (SOLR-5658) commitWithin does not reflect the new documents added


[ 
https://issues.apache.org/jira/browse/SOLR-5658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886676#comment-13886676
 ] 

ASF subversion and git services commented on SOLR-5658:
---

Commit 1562860 from sha...@apache.org in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1562860 ]

SOLR-5658: Removing System.out.println in JavaBinUpdatedRequestCodec added for 
debugging

 commitWithin does not reflect the new documents added
 -

 Key: SOLR-5658
 URL: https://issues.apache.org/jira/browse/SOLR-5658
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.6, 5.0
Reporter: Varun Thacker
Assignee: Mark Miller
Priority: Critical
 Fix For: 5.0, 4.7, 4.6.1

 Attachments: SOLR-5658.patch, SOLR-5658.patch


 I start 4 nodes using the setup mentioned on - 
 https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud
  
 I added a document using - 
 curl http://localhost:8983/solr/update?commitWithin=1 -H Content-Type: 
 text/xml --data-binary 'adddocfield 
 name=idtestdoc/field/doc/add'
 In Solr 4.5.1 there is 1 soft commit with openSearcher=true and 1 hard commit 
 with openSearcher=false
 In Solr 4.6.x there is there is only one commit hard commit with 
 openSearcher=false
  
 So even after 10 seconds queries on none of the shards reflect the added 
 document. 
 This was also reported on the solr-user list ( 
 http://lucene.472066.n3.nabble.com/Possible-regression-for-Solr-4-6-0-commitWithin-does-not-work-with-replicas-td4106102.html
  )
 Here are the relevant logs 
 Logs from Solr 4.5.1
 Node 1:
 {code}
 420021 [qtp619011445-12] INFO  
 org.apache.solr.update.processor.LogUpdateProcessor  – [collection1] 
 webapp=/solr path=/update params={commitWithin=1} {add=[testdoc]} 0 45
 {code}
  
 Node 2:
 {code}
 119896 [qtp1608701025-10] INFO  
 org.apache.solr.update.processor.LogUpdateProcessor  – [collection1] 
 webapp=/solr path=/update 
 params={distrib.from=http://192.168.1.103:8983/solr/collection1/update.distrib=TOLEADERwt=javabinversion=2}
  {add=[testdoc (1458003295513608192)]} 0 348
 129648 [commitScheduler-8-thread-1] INFO  
 org.apache.solr.update.UpdateHandler  – start 
 commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
 129679 [commitScheduler-8-thread-1] INFO  
 org.apache.solr.search.SolrIndexSearcher  – Opening Searcher@e174f70 main
 129680 [commitScheduler-8-thread-1] INFO  
 org.apache.solr.update.UpdateHandler  – end_commit_flush
 129681 [searcherExecutor-5-thread-1] INFO  org.apache.solr.core.SolrCore  – 
 QuerySenderListener sending requests to Searcher@e174f70 
 main{StandardDirectoryReader(segments_3:11:nrt _2(4.5.1):C1)}
 129681 [searcherExecutor-5-thread-1] INFO  org.apache.solr.core.SolrCore  – 
 QuerySenderListener done.
 129681 [searcherExecutor-5-thread-1] INFO  org.apache.solr.core.SolrCore  – 
 [collection1] Registered new searcher Searcher@e174f70 
 main{StandardDirectoryReader(segments_3:11:nrt _2(4.5.1):C1)}
 134648 [commitScheduler-7-thread-1] INFO  
 org.apache.solr.update.UpdateHandler  – start 
 commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
 134658 [commitScheduler-7-thread-1] INFO  org.apache.solr.core.SolrCore  – 
 SolrDeletionPolicy.onCommit: commits: num=2
   
 commit{dir=NRTCachingDirectory(org.apache.lucene.store.NIOFSDirectory@/Users/varun/solr-4.5.1/node2/solr/collection1/data/index
  lockFactory=org.apache.lucene.store.NativeFSLockFactory@66a394a3; 
 maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_3,generation=3}
   
 commit{dir=NRTCachingDirectory(org.apache.lucene.store.NIOFSDirectory@/Users/varun/solr-4.5.1/node2/solr/collection1/data/index
  lockFactory=org.apache.lucene.store.NativeFSLockFactory@66a394a3; 
 maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_4,generation=4}
 134658 [commitScheduler-7-thread-1] INFO  org.apache.solr.core.SolrCore  – 
 newest commit generation = 4
 134660 [commitScheduler-7-thread-1] INFO  
 org.apache.solr.update.UpdateHandler  – end_commit_flush
  {code}
  
 Node 3:
  
 Node 4:
 {code}
 374545 [qtp1608701025-16] INFO  
 org.apache.solr.update.processor.LogUpdateProcessor  – [collection1] 
 webapp=/solr path=/update 
 params={distrib.from=http://192.168.1.103:7574/solr/collection1/update.distrib=FROMLEADERwt=javabinversion=2}
  {add=[testdoc (1458002133233172480)]} 0 20
 384545 [commitScheduler-8-thread-1] INFO  
 org.apache.solr.update.UpdateHandler  – start 
 commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
 384552 [commitScheduler-8-thread-1] INFO  
 org.apache.solr.search.SolrIndexSearcher  – Opening Searcher@36137e08

[jira] [Updated] (SOLR-5673) HTTPSolrServer doesn't set own property correctly in setFollowRedirects

2014-01-30 Thread Frank Wesemann (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frank Wesemann updated SOLR-5673:
-

Attachment: SOLR-5673.diff

This diff adds a test and a patch for HTTPSolrServer

 HTTPSolrServer doesn't set own property correctly in setFollowRedirects
 ---

 Key: SOLR-5673
 URL: https://issues.apache.org/jira/browse/SOLR-5673
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.6
Reporter: Frank Wesemann
Priority: Minor
 Attachments: SOLR-5673.diff


 in setFollowRedirects(boolean newValue)
 HTTPSolr sets its internal property followRedirects always to true, 
 regardless of the given parameter.
 Patch and tests will follow tomorrow.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Patches in Jira or pull requests on github?

2014-01-30 Thread Bram Van Dam


Opening a pull request will send a notification to the mailing list, so
that should get noticed the same as opening a JIRA.


Excellent, thanks!

Will also have a closer look at the developer tips.

 - Bram

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5679) Shard splitting fails with ClassCastException on clusterstate.json with router as string

2014-01-30 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886703#comment-13886703
 ] 

ASF subversion and git services commented on SOLR-5679:
---

Commit 1562872 from sha...@apache.org in branch 'dev/trunk'
[ https://svn.apache.org/r1562872 ]

SOLR-5679: SOLR-5679: Shard splitting fails with ClassCastException on 
collections upgraded from 4.5 and earlier versions

 Shard splitting fails with ClassCastException on clusterstate.json with 
 router as string
 

 Key: SOLR-5679
 URL: https://issues.apache.org/jira/browse/SOLR-5679
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.6, 4.6.1
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
 Fix For: 4.7


 SOLR-5246 added support for splitting collections configured with a 
 router.field but the fix was not back-compatible.
 After upgrading an old SolrCloud cluster to 4.6 or 4.6.1, shard splitting can 
 fail with the following message:
 {quote}
 ERROR o.a.s.handler.admin.CoreAdminHandler - ERROR executing split:
 java.lang.ClassCastException: java.lang.String cannot be cast to java.util.Map
 at 
 org.apache.solr.handler.admin.CoreAdminHandler.handleSplitAction(CoreAdminHandler.java:285)
  [solr-core-4.6.1.jar:4.6.1 1560866 - mark - 2014-01-23 20:21:50]
 at 
 org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:193)
  [solr-core-4.6.1.jar:4.6.1 1560866 - mark - 2014-01-23 20:21:50]
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
  [solr-core-4.6.1.jar:4.6.1 1560866 - mark - 2014-01-23 20:21:50]
 {quote}
 This happens because the cluster state still contains the router as a string. 
 The clusterstate.json is supposed to auto-upgrade if cluster state is 
 upgraded but according to the user report that did not happen. In any case, 
 we need to fix the core admin split.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5679) Shard splitting fails with ClassCastException on clusterstate.json with router as string

2014-01-30 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886707#comment-13886707
 ] 

ASF subversion and git services commented on SOLR-5679:
---

Commit 1562873 from sha...@apache.org in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1562873 ]

SOLR-5679: Shard splitting fails with ClassCastException on collections 
upgraded from 4.5 and earlier versions

 Shard splitting fails with ClassCastException on clusterstate.json with 
 router as string
 

 Key: SOLR-5679
 URL: https://issues.apache.org/jira/browse/SOLR-5679
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.6, 4.6.1
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
 Fix For: 4.7


 SOLR-5246 added support for splitting collections configured with a 
 router.field but the fix was not back-compatible.
 After upgrading an old SolrCloud cluster to 4.6 or 4.6.1, shard splitting can 
 fail with the following message:
 {quote}
 ERROR o.a.s.handler.admin.CoreAdminHandler - ERROR executing split:
 java.lang.ClassCastException: java.lang.String cannot be cast to java.util.Map
 at 
 org.apache.solr.handler.admin.CoreAdminHandler.handleSplitAction(CoreAdminHandler.java:285)
  [solr-core-4.6.1.jar:4.6.1 1560866 - mark - 2014-01-23 20:21:50]
 at 
 org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:193)
  [solr-core-4.6.1.jar:4.6.1 1560866 - mark - 2014-01-23 20:21:50]
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
  [solr-core-4.6.1.jar:4.6.1 1560866 - mark - 2014-01-23 20:21:50]
 {quote}
 This happens because the cluster state still contains the router as a string. 
 The clusterstate.json is supposed to auto-upgrade if cluster state is 
 upgraded but according to the user report that did not happen. In any case, 
 we need to fix the core admin split.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-5679) Shard splitting fails with ClassCastException on clusterstate.json with router as string


 [ 
https://issues.apache.org/jira/browse/SOLR-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar resolved SOLR-5679.
-

Resolution: Fixed

This is fixed.

I'll investigate the auto-upgrade of clusterstate.json separately.

 Shard splitting fails with ClassCastException on clusterstate.json with 
 router as string
 

 Key: SOLR-5679
 URL: https://issues.apache.org/jira/browse/SOLR-5679
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.6, 4.6.1
Reporter: Shalin Shekhar Mangar
Assignee: Shalin Shekhar Mangar
 Fix For: 4.7


 SOLR-5246 added support for splitting collections configured with a 
 router.field but the fix was not back-compatible.
 After upgrading an old SolrCloud cluster to 4.6 or 4.6.1, shard splitting can 
 fail with the following message:
 {quote}
 ERROR o.a.s.handler.admin.CoreAdminHandler - ERROR executing split:
 java.lang.ClassCastException: java.lang.String cannot be cast to java.util.Map
 at 
 org.apache.solr.handler.admin.CoreAdminHandler.handleSplitAction(CoreAdminHandler.java:285)
  [solr-core-4.6.1.jar:4.6.1 1560866 - mark - 2014-01-23 20:21:50]
 at 
 org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:193)
  [solr-core-4.6.1.jar:4.6.1 1560866 - mark - 2014-01-23 20:21:50]
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
  [solr-core-4.6.1.jar:4.6.1 1560866 - mark - 2014-01-23 20:21:50]
 {quote}
 This happens because the cluster state still contains the router as a string. 
 The clusterstate.json is supposed to auto-upgrade if cluster state is 
 upgraded but according to the user report that did not happen. In any case, 
 we need to fix the core admin split.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5673) HTTPSolrServer doesn't set own property correctly in setFollowRedirects

2014-01-30 Thread Frank Wesemann (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frank Wesemann updated SOLR-5673:
-

Affects Version/s: (was: 4.6)

 HTTPSolrServer doesn't set own property correctly in setFollowRedirects
 ---

 Key: SOLR-5673
 URL: https://issues.apache.org/jira/browse/SOLR-5673
 Project: Solr
  Issue Type: Bug
Reporter: Frank Wesemann
Priority: Minor
 Attachments: SOLR-5673.diff


 in setFollowRedirects(boolean newValue)
 HTTPSolr sets its internal property followRedirects always to true, 
 regardless of the given parameter.
 Patch and tests will follow tomorrow.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5673) HTTPSolrServer doesn't set own property correctly in setFollowRedirects

2014-01-30 Thread Frank Wesemann (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frank Wesemann updated SOLR-5673:
-

Affects Version/s: 4.7
   5.0
   4.6

 HTTPSolrServer doesn't set own property correctly in setFollowRedirects
 ---

 Key: SOLR-5673
 URL: https://issues.apache.org/jira/browse/SOLR-5673
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.6, 5.0, 4.7
Reporter: Frank Wesemann
Priority: Minor
 Attachments: SOLR-5673.diff


 in setFollowRedirects(boolean newValue)
 HTTPSolr sets its internal property followRedirects always to true, 
 regardless of the given parameter.
 Patch and tests will follow tomorrow.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5418) Don't use .advance on costly (e.g. distance range facets) filters


[ 
https://issues.apache.org/jira/browse/LUCENE-5418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886760#comment-13886760
 ] 

David Smiley commented on LUCENE-5418:
--

How ironic that I was contemplating this very same issue yesterday (shared on 
IRC #lucene-dev) as I work on LUCENE-5408 and now I see you guys were just 
thinking about it.  Rob's right; the problem isn't just advance(), it's next() 
too.  

There may be a place to share some code that Mike is committing here in his 
facet module with a static utility class I coded yesterday in LUCENE-5408 (not 
yet posted).  It's a BitsDocIdSet and it's roughly similar to Mike's 
SlowBitsDocIdSetIterator:
{code:java}

  /** Utility class that wraps a {@link Bits} with a {@link DocIdSet}. */
  private static class BitsDocIdSet extends DocIdSet {
final Bits bits;//not null

public BitsDocIdSet(Bits bits) {
  if (bits == null)
throw new NullPointerException(bits arg should be non-null);
  this.bits = bits;
}

@Override
public DocIdSetIterator iterator() throws IOException {
  return new DocIdSetIterator() {
final Bits bits = BitsDocIdSet.this.bits;//copy reference to reduce 
outer class access
int docId = -1;

@Override
public int docID() {
  return docId;
}

@Override
public int nextDoc() throws IOException {
  return advance(docId + 1);
}

@Override
public int advance(int target) throws IOException {
  for (docId = target; docId  bits.length(); docId++) {
if (bits.get(docId))
  return docId;
  }
  return NO_MORE_DOCS;
}

@Override
public long cost() {
  return bits.length();
}
  };
}

@Override
public Bits bits() throws IOException {
  return bits;//won't be null
}

//we don't override isCacheable because we want the default of false
  }//class BitsDocIdSet
{code}

So Mike; you've got just the DISI portion, and you're also incorporating 
acceptDocs.  For me I elected to have acceptDocs be pre-incorporated into the 
Bits I pass through.  I'll post my intermediate progress on LUCENE-5408.  So 
any way; how about we have something in the utils package to share?

 Don't use .advance on costly (e.g. distance range facets) filters
 -

 Key: LUCENE-5418
 URL: https://issues.apache.org/jira/browse/LUCENE-5418
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.7

 Attachments: LUCENE-5418.patch


 If you use a distance filter today (see 
 http://blog.mikemccandless.com/2014/01/geospatial-distance-faceting-using.html
  ), then drill down on one of those ranges, under the hood Lucene is using 
 .advance on the Filter, which is very costly because we end up computing 
 distance on (possibly many) hits that don't match the query.
 It's better performance to find the hits matching the Query first, and then 
 check the filter.
 FilteredQuery can already do this today, when you use its 
 QUERY_FIRST_FILTER_STRATEGY.  This essentially accomplishes the same thing as 
 Solr's post filters (I think?) but with a far simpler/better/less code 
 approach.
 E.g., I believe ElasticSearch uses this API when it applies costly filters.
 Longish term, I think  Query/Filter ought to know itself that it's expensive, 
 and cases where such a Query/Filter is MUST'd onto a BooleanQuery (e.g. 
 ConstantScoreQuery), or the Filter is a clause in BooleanFilter, or it's 
 passed to IndexSearcher.search, we should also be smart here and not call 
 .advance on such clauses.  But that'd be a biggish change ... so for today 
 the workaround is the user must carefully construct the FilteredQuery 
 themselves.
 In the mean time, as another workaround, I want to fix DrillSideways so that 
 when you drill down on such filters it doesn't use .advance; this should give 
 a good speedup for the normal path API usage with a costly filter.
 I'm iterating on the lucene server branch (LUCENE-5376) but once it's working 
 I plan to merge this back to trunk / 4.7.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5408) GeometryStrategy -- match geometries in DocValues

[
https://issues.apache.org/jira/browse/LUCENE-5408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

David Smiley updated LUCENE-5408:
-

Attachment: LUCENE-5408_GeometryStrategy.patch

This is intermediate progress; it needs to be tested. And I hope to possible
share a Bits based DocIdSet with [~mikemccand] in LUCENE-5418. The sentiment
in that issue about how to handle super-slow Filters is a problem here too.

I had an epiphany last night that the current Spatial RPT grid and algorithm
doesn't need to be modified to be able to differentiate the matching docs into
confirmed un-confirmed matches for common scenarios. As such, to prevent
mis-use of the expensive Filter returned from this GeometryStrategy, I might
force it to be paired with RecursivePrefixTreeStrategy. And then leave an
expert method exposed to grab Bits or a Filter purely based on the Geometry
DocValues check. ElasticSearch and Solr wouldn't use that but someone coding
directly to Lucene would have the ability to wire things together in ways more
flexible than are possible in ES or Solr. The most ideal way is to compute a
fast pre-filter bitset separate from the slow post-filter, with user keyword
queries and other filters in the middle. But the slow post-filter to operate
best needs a side-artifact bitset computed when the pre-filter bitset is
generated. I'll eventually be more clear in javadocs.

GeometryStrategy -- match geometries in DocValues
-

Key: LUCENE-5408
URL: https://issues.apache.org/jira/browse/LUCENE-5408
Project: Lucene - Core
Issue Type: New Feature
Components: modules/spatial
Reporter: David Smiley
Assignee: David Smiley
Fix For: 4.7

Attachments: LUCENE-5408_GeometryStrategy.patch

I've started work on a new SpatialStrategy implementation I'm tentatively
calling GeometryStrategy. It's similar to the [JtsGeoStrategy in
Spatial-Solr-Sandbox|https://github.com/ryantxu/spatial-solr-sandbox/tree/master/LSE/src/main/java/org/apache/lucene/spatial/pending/jts]
but a little different in the details -- certainly faster. Using Spatial4j
0.4's BinaryCodec, it'll serialize the shape to bytes (for polygons this in
internally WKB format) and the strategy will put it in a
BinaryDocValuesField. In practice the shape is likely a polygon but it
needn't be. Then I'll implement a Filter that returns a DocIdSetIterator
that evaluates a given document passed via advance(docid)) to see if the
query shape matches a shape in DocValues. It's improper usage for it to be
used in a situation where it will evaluate every document id via nextDoc().
And in practice the DocValues format chosen should be a disk resident one
since each value tends to be kind of big.
This spatial strategy in and of itself has no _index_; it's O(N) where N is
the number of documents that get passed thru it. So it should be placed last
in the query/filter tree so that the other queries limit the documents it
needs to see. At a minimum, another query/filter to use in conjunction is
another SpatialStrategy like RecursivePrefixTreeStrategy.
Eventually once the PrefixTree grid encoding has a little bit more metadata,
it will be possible to further combine the grid this strategy in such a way
that many documents won't need to be checked against the serialized geometry.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5424) FilteredQuery useRandomAccess() should use cost()

David Smiley created LUCENE-5424:


 Summary: FilteredQuery useRandomAccess() should use cost()
 Key: LUCENE-5424
 URL: https://issues.apache.org/jira/browse/LUCENE-5424
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/query/scoring
Reporter: David Smiley


Now that Lucene's DISI has a cost() method, it's possible for FilteredQuery's 
RANDOM_ACCESS_FILTER_STRATEGY to use a smarter algorithm in its 
useRandomAccess() method.  In particular, it might examine filterIter.cost() to 
see if it is greater than the cost returned by weight.scorer().cost() of the 
query.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-5673) HTTPSolrServer doesn't set own property correctly in setFollowRedirects

2014-01-30 Thread ASF subversion and git services (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar reassigned SOLR-5673:
---

Assignee: Shalin Shekhar Mangar

 HTTPSolrServer doesn't set own property correctly in setFollowRedirects
 ---

 Key: SOLR-5673
 URL: https://issues.apache.org/jira/browse/SOLR-5673
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.6, 5.0, 4.7
Reporter: Frank Wesemann
Assignee: Shalin Shekhar Mangar
Priority: Minor
 Attachments: SOLR-5673.diff


 in setFollowRedirects(boolean newValue)
 HTTPSolr sets its internal property followRedirects always to true, 
 regardless of the given parameter.
 Patch and tests will follow tomorrow.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5673) HTTPSolrServer doesn't set own property correctly in setFollowRedirects


[ 
https://issues.apache.org/jira/browse/SOLR-5673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886798#comment-13886798
 ] 

ASF subversion and git services commented on SOLR-5673:
---

Commit 1562898 from sha...@apache.org in branch 'dev/trunk'
[ https://svn.apache.org/r1562898 ]

SOLR-5673: HttpSolrServer doesn't set own property correctly in 
setFollowRedirects

 HTTPSolrServer doesn't set own property correctly in setFollowRedirects
 ---

 Key: SOLR-5673
 URL: https://issues.apache.org/jira/browse/SOLR-5673
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.6, 5.0, 4.7
Reporter: Frank Wesemann
Assignee: Shalin Shekhar Mangar
Priority: Minor
 Attachments: SOLR-5673.diff


 in setFollowRedirects(boolean newValue)
 HTTPSolr sets its internal property followRedirects always to true, 
 regardless of the given parameter.
 Patch and tests will follow tomorrow.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5673) HTTPSolrServer doesn't set own property correctly in setFollowRedirects

2014-01-30 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886800#comment-13886800
 ] 

ASF subversion and git services commented on SOLR-5673:
---

Commit 1562899 from sha...@apache.org in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1562899 ]

SOLR-5673: HttpSolrServer doesn't set own property correctly in 
setFollowRedirects

 HTTPSolrServer doesn't set own property correctly in setFollowRedirects
 ---

 Key: SOLR-5673
 URL: https://issues.apache.org/jira/browse/SOLR-5673
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.6, 5.0, 4.7
Reporter: Frank Wesemann
Assignee: Shalin Shekhar Mangar
Priority: Minor
 Attachments: SOLR-5673.diff


 in setFollowRedirects(boolean newValue)
 HTTPSolr sets its internal property followRedirects always to true, 
 regardless of the given parameter.
 Patch and tests will follow tomorrow.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5673) HTTPSolrServer doesn't set own property correctly in setFollowRedirects


 [ 
https://issues.apache.org/jira/browse/SOLR-5673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-5673:


Fix Version/s: 5.0

 HTTPSolrServer doesn't set own property correctly in setFollowRedirects
 ---

 Key: SOLR-5673
 URL: https://issues.apache.org/jira/browse/SOLR-5673
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.6, 5.0, 4.7
Reporter: Frank Wesemann
Assignee: Shalin Shekhar Mangar
Priority: Minor
 Fix For: 5.0, 4.7

 Attachments: SOLR-5673.diff


 in setFollowRedirects(boolean newValue)
 HTTPSolr sets its internal property followRedirects always to true, 
 regardless of the given parameter.
 Patch and tests will follow tomorrow.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5424) FilteredQuery useRandomAccess() should use cost()


[ 
https://issues.apache.org/jira/browse/LUCENE-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886924#comment-13886924
 ] 

Michael McCandless commented on LUCENE-5424:


+1, there is a TODO about this in the code.

But I'm not sure how to translate cost to the right filter strategy maybe 
we need a hasAdvance() as Rob suggested on LUCENE-5418?

Also, useRandomAccess should not be used for costly filters: it should be used 
for cheap filters (e.g. a FixedBitSet), because this is passed down as the 
acceptDocs to possibly many, many postings iterators.  EG, if you run a BQ with 
10 terms and pass a Filter to IS when doing the search ... if useRandomAccess 
is true, that filter is checked in all 10 of those DocsEnums, quite possibly 
many times per document.


 FilteredQuery useRandomAccess() should use cost()
 -

 Key: LUCENE-5424
 URL: https://issues.apache.org/jira/browse/LUCENE-5424
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/query/scoring
Reporter: David Smiley

 Now that Lucene's DISI has a cost() method, it's possible for FilteredQuery's 
 RANDOM_ACCESS_FILTER_STRATEGY to use a smarter algorithm in its 
 useRandomAccess() method.  In particular, it might examine filterIter.cost() 
 to see if it is greater than the cost returned by weight.scorer().cost() of 
 the query.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5418) Don't use .advance on costly (e.g. distance range facets) filters

[
https://issues.apache.org/jira/browse/LUCENE-5418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886925#comment-13886925
]

Michael McCandless commented on LUCENE-5418:

Actually, I've removed the SlowBitsDocIdSetIterator (need to post a patch again
soon...), because it's too trappy. I think it's better if the user gets an
exception here than just silently run super slowly, and at least for this issue
there are always ways to run the Filter quickly (use DrillSideways or
DrillDownQuery, or create FilteredQuery directly).

Don't use .advance on costly (e.g. distance range facets) filters
-

Key: LUCENE-5418
URL: https://issues.apache.org/jira/browse/LUCENE-5418
Project: Lucene - Core
Issue Type: Improvement
Components: modules/facet
Reporter: Michael McCandless
Assignee: Michael McCandless
Fix For: 5.0, 4.7

Attachments: LUCENE-5418.patch

If you use a distance filter today (see
http://blog.mikemccandless.com/2014/01/geospatial-distance-faceting-using.html
), then drill down on one of those ranges, under the hood Lucene is using
.advance on the Filter, which is very costly because we end up computing
distance on (possibly many) hits that don't match the query.
It's better performance to find the hits matching the Query first, and then
check the filter.
FilteredQuery can already do this today, when you use its
QUERY_FIRST_FILTER_STRATEGY. This essentially accomplishes the same thing as
Solr's post filters (I think?) but with a far simpler/better/less code
approach.
E.g., I believe ElasticSearch uses this API when it applies costly filters.
Longish term, I think Query/Filter ought to know itself that it's expensive,
and cases where such a Query/Filter is MUST'd onto a BooleanQuery (e.g.
ConstantScoreQuery), or the Filter is a clause in BooleanFilter, or it's
passed to IndexSearcher.search, we should also be smart here and not call
.advance on such clauses. But that'd be a biggish change ... so for today
the workaround is the user must carefully construct the FilteredQuery
themselves.
In the mean time, as another workaround, I want to fix DrillSideways so that
when you drill down on such filters it doesn't use .advance; this should give
a good speedup for the normal path API usage with a costly filter.
I'm iterating on the lucene server branch (LUCENE-5376) but once it's working
I plan to merge this back to trunk / 4.7.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5424) FilteredQuery useRandomAccess() should use cost()


[ 
https://issues.apache.org/jira/browse/LUCENE-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886963#comment-13886963
 ] 

David Smiley commented on LUCENE-5424:
--

I know I commented on LUCENE-5418 and then immediately created this issue but 
these are not particularly related.  I totally recognize that 
RANDOM_ACCESS_FILTER_STRATEGY is for the typical case of fast filters.  And 
indeed I observed the TODO comment and thought, _hey_, DISI *does* have a 
{{cost()}} now -- lets do this!  Now there's this JIRA issue :-)

Not sure how to arrive at the right tuning ratio between the cost() of both 
DISI's.  Maybe use the benchmark module and try various filters that match 1%, 
2%, etc. up to 99% of the documents, against some simple query that always 
matches the same 50 % of the total docs?  And then test this method given 
configurable threshold ratios of query_cost/filter_cost of 10%, 20%, ... etc. 
and see where the inflection point is.  That's complicated, yeah.  

 FilteredQuery useRandomAccess() should use cost()
 -

 Key: LUCENE-5424
 URL: https://issues.apache.org/jira/browse/LUCENE-5424
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/query/scoring
Reporter: David Smiley

 Now that Lucene's DISI has a cost() method, it's possible for FilteredQuery's 
 RANDOM_ACCESS_FILTER_STRATEGY to use a smarter algorithm in its 
 useRandomAccess() method.  In particular, it might examine filterIter.cost() 
 to see if it is greater than the cost returned by weight.scorer().cost() of 
 the query.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-5680) ConcurrentUpdateSolrServer ignores HttpClient parameter

2014-01-30 Thread Edgar Espina (JIRA)

Edgar Espina created SOLR-5680:
--

 Summary: ConcurrentUpdateSolrServer ignores HttpClient parameter
 Key: SOLR-5680
 URL: https://issues.apache.org/jira/browse/SOLR-5680
 Project: Solr
  Issue Type: Bug
  Components: clients - java
Affects Versions: 4.6
Reporter: Edgar Espina
Priority: Minor


Since 4.6.1 ConcurrentUpdateSolrServer ignores HttpClient parameter

Here is the source code:

public ConcurrentUpdateSolrServer(String solrServerUrl, HttpClient client, int 
queueSize, int threadCount) {
  this(solrServerUrl, null, queueSize, threadCount, 
Executors.newCachedThreadPool(new 
SolrjNamedThreadFactory(concurrentUpdateScheduler)));
  shutdownExecutor = true;
}

It calls this with null as 2nd parameter

Thanks



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5681) Make the OverseerCollectionProcessor multi-threaded


 [ 
https://issues.apache.org/jira/browse/SOLR-5681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anshum Gupta updated SOLR-5681:
---

Description: 
Right now, the OverseerCollectionProcessor is single threaded i.e submitting 
anything long running would have it block processing of other mutually 
exclusive tasks.
When OCP tasks become optionally async (SOLR-5477), it'd be good to have truly 
non-blocking behavior by multi-threading the OCP itself.

For example, a ShardSplit call on Collection1 would block the thread and 
thereby, not processing a create collection task (which would stay queued in 
zk) though both the tasks are mutually exclusive.

Here are a few of the challenges:
* Mutual exclusivity: Only let mutually exclusive tasks run in parallel. An 
easy way to handle that is to only let 1 task per collection run at a time.
* ZK Distributed Queue to feed tasks: The OCP consumes tasks from a queue. The 
task from the workQueue is only removed on completion so that in case of a 
failure, the new Overseer can re-consume the same task and retry. A queue is 
not the right data structure in the first place to look ahead i.e. get the 2nd 
task from the queue when the 1st one is in process. Also, deleting tasks which 
are not at the head of a queue is not really an 'intuitive' thing.

Proposed solutions for task management:
* Task funnel and peekAfter(): The parent thread is responsible for getting and 
passing the request to a new thread (or one from the pool). The parent method 
uses a peekAfter(last element) instead of a peek(). The peekAfter returns the 
task after the 'last element'. Maintain this request information and use it for 
deleting/cleaning up the workQueue.
* Another (almost duplicate) queue: While offering tasks to workQueue, also 
offer them to a new queue (call it volatileWorkQueue?). The difference is, as 
soon as a task from this is picked up for processing by the thread, it's 
removed from the queue. At the end, the cleanup is done from the workQueue.

  was:
Right now, the OverseerCollectionProcessor is single threaded i.e submitting 
anything long running would have it block processing of other mutually 
exclusive tasks.
With OCP tasks becoming async, it'd be good to have truly non-blocking behavior 
by multi-threading the OCP itself.

For example, a ShardSplit call on Collection1 would block the thread and 
thereby, not processing a create collection task (which would stay queued in 
zk) though both the tasks are mutually exclusive.

Here are a few of the challenges:
* Mutual exclusivity: Only let mutually exclusive tasks run in parallel. An 
easy way to handle that is to only let 1 task per collection run at a time.
* ZK Distributed Queue to feed tasks: The OCP consumes tasks from a queue. The 
task from the workQueue is only removed on completion so that in case of a 
failure, the new Overseer can re-consume the same task and retry. A queue is 
not the right data structure in the first place to look ahead i.e. get the 2nd 
task from the queue when the 1st one is in process. Also, deleting tasks which 
are not at the head of a queue is not really an 'intuitive' thing.

Proposed solutions for task management:
* Task funnel and peekAfter(): The parent thread is responsible for getting and 
passing the request to a new thread (or one from the pool). The parent method 
uses a peekAfter(last element) instead of a peek(). The peekAfter returns the 
task after the 'last element'. Maintain this request information and use it for 
deleting/cleaning up the workQueue.
* Another (almost duplicate) queue: While offering tasks to workQueue, also 
offer them to a new queue (call it volatileWorkQueue?). The difference is, as 
soon as a task from this is picked up for processing by the thread, it's 
removed from the queue. At the end, the cleanup is done from the workQueue.


 Make the OverseerCollectionProcessor multi-threaded
 ---

 Key: SOLR-5681
 URL: https://issues.apache.org/jira/browse/SOLR-5681
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Anshum Gupta
Assignee: Anshum Gupta

 Right now, the OverseerCollectionProcessor is single threaded i.e submitting 
 anything long running would have it block processing of other mutually 
 exclusive tasks.
 When OCP tasks become optionally async (SOLR-5477), it'd be good to have 
 truly non-blocking behavior by multi-threading the OCP itself.
 For example, a ShardSplit call on Collection1 would block the thread and 
 thereby, not processing a create collection task (which would stay queued in 
 zk) though both the tasks are mutually exclusive.
 Here are a few of the challenges:
 * Mutual exclusivity: Only let mutually exclusive tasks run in parallel. An 
 easy way to handle that is to only let 1 task per collection run at a time.
 *

[jira] [Created] (SOLR-5681) Make the OverseerCollectionProcessor multi-threaded

Anshum Gupta created SOLR-5681:
--

 Summary: Make the OverseerCollectionProcessor multi-threaded
 Key: SOLR-5681
 URL: https://issues.apache.org/jira/browse/SOLR-5681
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Anshum Gupta
Assignee: Anshum Gupta


Right now, the OverseerCollectionProcessor is single threaded i.e submitting 
anything long running would have it block processing of other mutually 
exclusive tasks.
With OCP tasks becoming async, it'd be good to have truly non-blocking behavior 
by multi-threading the OCP itself.

For example, a ShardSplit call on Collection1 would block the thread and 
thereby, not processing a create collection task (which would stay queued in 
zk) though both the tasks are mutually exclusive.

Here are a few of the challenges:
* Mutual exclusivity: Only let mutually exclusive tasks run in parallel. An 
easy way to handle that is to only let 1 task per collection run at a time.
* ZK Distributed Queue to feed tasks: The OCP consumes tasks from a queue. The 
task from the workQueue is only removed on completion so that in case of a 
failure, the new Overseer can re-consume the same task and retry. A queue is 
not the right data structure in the first place to look ahead i.e. get the 2nd 
task from the queue when the 1st one is in process. Also, deleting tasks which 
are not at the head of a queue is not really an 'intuitive' thing.

Proposed solutions for task management:
* Task funnel and peekAfter(): The parent thread is responsible for getting and 
passing the request to a new thread (or one from the pool). The parent method 
uses a peekAfter(last element) instead of a peek(). The peekAfter returns the 
task after the 'last element'. Maintain this request information and use it for 
deleting/cleaning up the workQueue.
* Another (almost duplicate) queue: While offering tasks to workQueue, also 
offer them to a new queue (call it volatileWorkQueue?). The difference is, as 
soon as a task from this is picked up for processing by the thread, it's 
removed from the queue. At the end, the cleanup is done from the workQueue.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5477) Async execution of OverseerCollectionProcessor tasks


[ 
https://issues.apache.org/jira/browse/SOLR-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13887175#comment-13887175
 ] 

Anshum Gupta commented on SOLR-5477:


Making OverseerCollectionProcessor multi-threaded.

 Async execution of OverseerCollectionProcessor tasks
 

 Key: SOLR-5477
 URL: https://issues.apache.org/jira/browse/SOLR-5477
 Project: Solr
  Issue Type: Sub-task
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Anshum Gupta
 Attachments: SOLR-5477-CoreAdminStatus.patch, SOLR-5477.patch, 
 SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, 
 SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, 
 SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, 
 SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, 
 SOLR-5477.patch


 Typical collection admin commands are long running and it is very common to 
 have the requests get timed out.  It is more of a problem if the cluster is 
 very large.Add an option to run these commands asynchronously
 add an extra param async=true for all collection commands
 the task is written to ZK and the caller is returned a task id. 
 as separate collection admin command will be added to poll the status of the 
 task
 command=statusid=7657668909
 if id is not passed all running async tasks should be listed
 A separate queue is created to store in-process tasks . After the tasks are 
 completed the queue entry is removed. OverSeerColectionProcessor will perform 
 these tasks in multiple threads



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5681) Make the OverseerCollectionProcessor multi-threaded


[ 
https://issues.apache.org/jira/browse/SOLR-5681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13887178#comment-13887178
 ] 

Anshum Gupta commented on SOLR-5681:


Async Collection API calls.

 Make the OverseerCollectionProcessor multi-threaded
 ---

 Key: SOLR-5681
 URL: https://issues.apache.org/jira/browse/SOLR-5681
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Anshum Gupta
Assignee: Anshum Gupta

 Right now, the OverseerCollectionProcessor is single threaded i.e submitting 
 anything long running would have it block processing of other mutually 
 exclusive tasks.
 When OCP tasks become optionally async (SOLR-5477), it'd be good to have 
 truly non-blocking behavior by multi-threading the OCP itself.
 For example, a ShardSplit call on Collection1 would block the thread and 
 thereby, not processing a create collection task (which would stay queued in 
 zk) though both the tasks are mutually exclusive.
 Here are a few of the challenges:
 * Mutual exclusivity: Only let mutually exclusive tasks run in parallel. An 
 easy way to handle that is to only let 1 task per collection run at a time.
 * ZK Distributed Queue to feed tasks: The OCP consumes tasks from a queue. 
 The task from the workQueue is only removed on completion so that in case of 
 a failure, the new Overseer can re-consume the same task and retry. A queue 
 is not the right data structure in the first place to look ahead i.e. get the 
 2nd task from the queue when the 1st one is in process. Also, deleting tasks 
 which are not at the head of a queue is not really an 'intuitive' thing.
 Proposed solutions for task management:
 * Task funnel and peekAfter(): The parent thread is responsible for getting 
 and passing the request to a new thread (or one from the pool). The parent 
 method uses a peekAfter(last element) instead of a peek(). The peekAfter 
 returns the task after the 'last element'. Maintain this request information 
 and use it for deleting/cleaning up the workQueue.
 * Another (almost duplicate) queue: While offering tasks to workQueue, also 
 offer them to a new queue (call it volatileWorkQueue?). The difference is, as 
 soon as a task from this is picked up for processing by the thread, it's 
 removed from the queue. At the end, the cleanup is done from the workQueue.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5678) When SolrJ/SolrCloud can't talk to Zookeeper, it throws a RuntimeException


 [ 
https://issues.apache.org/jira/browse/SOLR-5678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anshum Gupta updated SOLR-5678:
---

Attachment: SOLR-5678.patch

Changed the exception to a SolrException and added a test to confirm that the 
correct exception is thrown.

 When SolrJ/SolrCloud can't talk to Zookeeper, it throws a RuntimeException
 --

 Key: SOLR-5678
 URL: https://issues.apache.org/jira/browse/SOLR-5678
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.6
Reporter: Karl Wright
 Attachments: SOLR-5678.patch


 This class of exception should not be used for run-of-the-mill networking 
 kinds of issues.  SolrServerException or some variety of IOException should 
 be thrown instead.  Here's the trace:
 {code}
 java.lang.RuntimeException: java.util.concurrent.TimeoutException: Could not 
 connect to ZooKeeper localhost:2181 within 6 ms
   at 
 org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:130)
   at 
 org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:88)
   at 
 org.apache.solr.common.cloud.ZkStateReader.init(ZkStateReader.java:148)
   at 
 org.apache.solr.client.solrj.impl.CloudSolrServer.connect(CloudSolrServer.java:147)
   at 
 org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:173)
   at 
 org.apache.manifoldcf.agents.output.solr.HttpPoster$SolrPing.process(HttpPoster.java:1315)
   at 
 org.apache.manifoldcf.agents.output.solr.HttpPoster$StatusThread.run(HttpPoster.java:1208)
 Caused by: java.util.concurrent.TimeoutException: Could not connect to 
 ZooKeeper localhost:2181 within 6 ms
   at 
 org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:173)
   at 
 org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:127)
   ... 6 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5425) Make creation of FixedBitSet in FacetsCollector overridable

John Wang created LUCENE-5425:
-

 Summary: Make creation of FixedBitSet in FacetsCollector 
overridable
 Key: LUCENE-5425
 URL: https://issues.apache.org/jira/browse/LUCENE-5425
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Affects Versions: 4.6
Reporter: John Wang


In FacetsCollector, creation of bits in MatchingDocs are allocated per query. 
For large indexes where maxDocs are large creating a bitset of maxDoc bits will 
be expensive and would great a lot of garbage.

Attached patch is to allow for this allocation customizable while maintaining 
current behavior.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5425) Make creation of FixedBitSet in FacetsCollector overridable


 [ 
https://issues.apache.org/jira/browse/LUCENE-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Wang updated LUCENE-5425:
--

Attachment: facetscollector.patch

 Make creation of FixedBitSet in FacetsCollector overridable
 ---

 Key: LUCENE-5425
 URL: https://issues.apache.org/jira/browse/LUCENE-5425
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Affects Versions: 4.6
Reporter: John Wang
 Attachments: facetscollector.patch


 In FacetsCollector, creation of bits in MatchingDocs are allocated per query. 
 For large indexes where maxDocs are large creating a bitset of maxDoc bits 
 will be expensive and would great a lot of garbage.
 Attached patch is to allow for this allocation customizable while maintaining 
 current behavior.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5426) Make SortedSetDocValuesReaderState customizable


 [ 
https://issues.apache.org/jira/browse/LUCENE-5426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Wang updated LUCENE-5426:
--

Attachment: sortedsetreaderstate.patch

 Make SortedSetDocValuesReaderState customizable
 ---

 Key: LUCENE-5426
 URL: https://issues.apache.org/jira/browse/LUCENE-5426
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Affects Versions: 4.6
Reporter: John Wang
 Attachments: sortedsetreaderstate.patch


 We have a reader that have a different data structure (in memory) where the 
 cost of computing ordinals per reader open is too expensive in the realtime 
 setting.
 We are maintaining in memory data structure that supports all functionality 
 and would like to leverage SortedSetDocValuesAccumulator.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5426) Make SortedSetDocValuesReaderState customizable

John Wang created LUCENE-5426:
-

 Summary: Make SortedSetDocValuesReaderState customizable
 Key: LUCENE-5426
 URL: https://issues.apache.org/jira/browse/LUCENE-5426
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Affects Versions: 4.6
Reporter: John Wang
 Attachments: sortedsetreaderstate.patch

We have a reader that have a different data structure (in memory) where the 
cost of computing ordinals per reader open is too expensive in the realtime 
setting.

We are maintaining in memory data structure that supports all functionality and 
would like to leverage SortedSetDocValuesAccumulator.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-trunk-Linux (64bit/jdk1.7.0_51) - Build # 9300 - Failure!

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/9300/
Java: 64bit/jdk1.7.0_51 -XX:+UseCompressedOops -XX:+UseParallelGC

1 tests failed.
REGRESSION:  
org.apache.solr.hadoop.MapReduceIndexerToolArgumentParserTest.testArgsParserHelp

Error Message:
Conversion = '१'

Stack Trace:
java.util.UnknownFormatConversionException: Conversion = '१'
at 
__randomizedtesting.SeedInfo.seed([7EDF355AC943FBE2:E20D9F2D8E2D97AF]:0)
at java.util.Formatter.checkText(Formatter.java:2547)
at java.util.Formatter.parse(Formatter.java:2523)
at java.util.Formatter.format(Formatter.java:2469)
at java.io.PrintWriter.format(PrintWriter.java:905)
at 
net.sourceforge.argparse4j.helper.TextHelper.printHelp(TextHelper.java:206)
at 
net.sourceforge.argparse4j.internal.ArgumentImpl.printHelp(ArgumentImpl.java:247)
at 
net.sourceforge.argparse4j.internal.ArgumentParserImpl.printArgumentHelp(ArgumentParserImpl.java:253)
at 
net.sourceforge.argparse4j.internal.ArgumentParserImpl.printHelp(ArgumentParserImpl.java:279)
at 
org.apache.solr.hadoop.MapReduceIndexerTool$MyArgumentParser$1.run(MapReduceIndexerTool.java:187)
at 
net.sourceforge.argparse4j.internal.ArgumentImpl.run(ArgumentImpl.java:425)
at 
net.sourceforge.argparse4j.internal.ArgumentParserImpl.processArg(ArgumentParserImpl.java:913)
at 
net.sourceforge.argparse4j.internal.ArgumentParserImpl.parseArgs(ArgumentParserImpl.java:810)
at 
net.sourceforge.argparse4j.internal.ArgumentParserImpl.parseArgs(ArgumentParserImpl.java:683)
at 
net.sourceforge.argparse4j.internal.ArgumentParserImpl.parseArgs(ArgumentParserImpl.java:580)
at 
net.sourceforge.argparse4j.internal.ArgumentParserImpl.parseArgs(ArgumentParserImpl.java:573)
at 
org.apache.solr.hadoop.MapReduceIndexerTool$MyArgumentParser.parseArgs(MapReduceIndexerTool.java:505)
at 
org.apache.solr.hadoop.MapReduceIndexerToolArgumentParserTest.testArgsParserHelp(MapReduceIndexerToolArgumentParserTest.java:194)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)

[jira] [Commented] (LUCENE-5426) Make SortedSetDocValuesReaderState customizable

2014-01-30 Thread Lei Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13887337#comment-13887337
 ] 

Lei Wang commented on LUCENE-5426:
--

looks like the DefaultSortedSetDocsValuesReaderState.java is missing in the 
patch. forgot to attach?

 Make SortedSetDocValuesReaderState customizable
 ---

 Key: LUCENE-5426
 URL: https://issues.apache.org/jira/browse/LUCENE-5426
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Affects Versions: 4.6
Reporter: John Wang
 Attachments: sortedsetreaderstate.patch


 We have a reader that have a different data structure (in memory) where the 
 cost of computing ordinals per reader open is too expensive in the realtime 
 setting.
 We are maintaining in memory data structure that supports all functionality 
 and would like to leverage SortedSetDocValuesAccumulator.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5425) Make creation of FixedBitSet in FacetsCollector overridable

2014-01-30 Thread Lei Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13887340#comment-13887340
 ] 

Lei Wang commented on LUCENE-5425:
--

Better not to depend on the bitset, it's better to depend on a more general 
interface. In the user's application, it they can get rid of the memset part of 
that data structure, like us in twitter, it can get 4X + performance 
improvements than a simple caching of the bitset.

 Make creation of FixedBitSet in FacetsCollector overridable
 ---

 Key: LUCENE-5425
 URL: https://issues.apache.org/jira/browse/LUCENE-5425
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Affects Versions: 4.6
Reporter: John Wang
 Attachments: facetscollector.patch


 In FacetsCollector, creation of bits in MatchingDocs are allocated per query. 
 For large indexes where maxDocs are large creating a bitset of maxDoc bits 
 will be expensive and would great a lot of garbage.
 Attached patch is to allow for this allocation customizable while maintaining 
 current behavior.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-5682) Make the admin InfoHandler more pluggable / derivable

2014-01-30 Thread Gregory Chanan (JIRA)

Gregory Chanan created SOLR-5682:


 Summary: Make the admin InfoHandler more pluggable / derivable
 Key: SOLR-5682
 URL: https://issues.apache.org/jira/browse/SOLR-5682
 Project: Solr
  Issue Type: Improvement
Reporter: Gregory Chanan
Priority: Minor


As of SOLR-5556 a user can specify the class of the admin InfoHandler, but 
can't easily override the individual handlers that it provides (the 
PropertiesRequestHandler, LoggingHandler, ThreadDumpHandler, SystemInfoHandler).

Contrast this with say, the AdminHandlers, where a user can provide his/her own 
implementations of the underlying request handlers easily.

I've run into this limitation in the following setup: I use derived versions of 
the various AdminHandlers, and would like to use the same implementations for 
the InfoHandler.  I can do this by deriving from InfoHandler, but then I'd need 
to duplicate the handleRequestBody dispatching code.  That's doable, but not as 
nice as what the AdminHandlers provides.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5682) Make the admin InfoHandler more pluggable / derivable

2014-01-30 Thread Gregory Chanan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gregory Chanan updated SOLR-5682:
-

Attachment: SOLR-5682.patch

Here's a patch that provides this functionality along with a unit test.

 Make the admin InfoHandler more pluggable / derivable
 -

 Key: SOLR-5682
 URL: https://issues.apache.org/jira/browse/SOLR-5682
 Project: Solr
  Issue Type: Improvement
Reporter: Gregory Chanan
Priority: Minor
 Attachments: SOLR-5682.patch


 As of SOLR-5556 a user can specify the class of the admin InfoHandler, but 
 can't easily override the individual handlers that it provides (the 
 PropertiesRequestHandler, LoggingHandler, ThreadDumpHandler, 
 SystemInfoHandler).
 Contrast this with say, the AdminHandlers, where a user can provide his/her 
 own implementations of the underlying request handlers easily.
 I've run into this limitation in the following setup: I use derived versions 
 of the various AdminHandlers, and would like to use the same implementations 
 for the InfoHandler.  I can do this by deriving from InfoHandler, but then 
 I'd need to duplicate the handleRequestBody dispatching code.  That's doable, 
 but not as nice as what the AdminHandlers provides.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-5683) Documentation of Suggester V2


 [ 
https://issues.apache.org/jira/browse/SOLR-5683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Areek Zillur reassigned SOLR-5683:
--

Assignee: Areek Zillur

 Documentation of Suggester V2
 -

 Key: SOLR-5683
 URL: https://issues.apache.org/jira/browse/SOLR-5683
 Project: Solr
  Issue Type: Task
  Components: SearchComponents - other
Reporter: Areek Zillur
Assignee: Areek Zillur
 Fix For: 5.0, 4.7


 Place holder for documentation that will eventually end up in the Solr Ref 
 guide.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-5683) Documentation of Suggester V2

Areek Zillur created SOLR-5683:
--

 Summary: Documentation of Suggester V2
 Key: SOLR-5683
 URL: https://issues.apache.org/jira/browse/SOLR-5683
 Project: Solr
  Issue Type: Task
  Components: SearchComponents - other
Reporter: Areek Zillur
 Fix For: 5.0, 4.7


Place holder for documentation that will eventually end up in the Solr Ref 
guide.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5426) Make SortedSetDocValuesReaderState customizable


 [ 
https://issues.apache.org/jira/browse/LUCENE-5426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Wang updated LUCENE-5426:
--

Attachment: sortedsetreaderstate.patch

 Make SortedSetDocValuesReaderState customizable
 ---

 Key: LUCENE-5426
 URL: https://issues.apache.org/jira/browse/LUCENE-5426
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Affects Versions: 4.6
Reporter: John Wang
 Attachments: sortedsetreaderstate.patch, sortedsetreaderstate.patch


 We have a reader that have a different data structure (in memory) where the 
 cost of computing ordinals per reader open is too expensive in the realtime 
 setting.
 We are maintaining in memory data structure that supports all functionality 
 and would like to leverage SortedSetDocValuesAccumulator.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5426) Make SortedSetDocValuesReaderState customizable


[ 
https://issues.apache.org/jira/browse/LUCENE-5426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13887438#comment-13887438
 ] 

John Wang commented on LUCENE-5426:
---

You are right. Re-attached.

 Make SortedSetDocValuesReaderState customizable
 ---

 Key: LUCENE-5426
 URL: https://issues.apache.org/jira/browse/LUCENE-5426
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Affects Versions: 4.6
Reporter: John Wang
 Attachments: sortedsetreaderstate.patch, sortedsetreaderstate.patch


 We have a reader that have a different data structure (in memory) where the 
 cost of computing ordinals per reader open is too expensive in the realtime 
 setting.
 We are maintaining in memory data structure that supports all functionality 
 and would like to leverage SortedSetDocValuesAccumulator.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5683) Documentation of Suggester V2


 [ 
https://issues.apache.org/jira/browse/SOLR-5683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Areek Zillur updated SOLR-5683:
---

Description: 
Place holder for documentation that will eventually end up in the Solr Ref 
guide.

The new Suggester Component allows Solr to fully utilize the Lucene suggesters. 
The main features are:
- lookup pluggability (TODO: add description):
  -- AnalyzingInfixLookupFactory
  -- AnalyzingLookupFactory
  -- FuzzyLookupFactory
  -- FreeTextLookupFactory
  -- FSTLookupFactory
  -- WFSTLookupFactory
  -- TSTLookupFactory
  --  JaspellLookupFactory
   - Dictionary pluggability (give users the option to choose the dictionary 
implementation to use for their suggesters to consume)
   -- Input from search index
  --- DocumentDictionaryFactory – user can specify suggestion field along 
with optional weight and payload fields from their search index.
  --- DocumentExpressionFactory – same as DocumentDictionaryFactory but 
allows users to specify arbitrary expression using existing numeric fields.
 --- HighFrequencyDictionaryFactory – user can specify a suggestion field 
and specify a threshold to prune out less frequent terms. 
   -- Input from external files
 --- FileDictionaryFactory – user can specify a file which contains suggest 
entries, along with optional weights and payloads.

Config (index time) options:
  - name - name of suggester
  - sourceLocation - external file location (for file-based suggesters)
  - lookupImpl - type of lookup to use [default JaspellLookupFactory]
  - dictionaryImpl - type of dictionary to use (lookup input) [default
(sourceLocation == null ? HighFrequencyDictionaryFactory : 
FileDictionaryFactory)]
  - storeDir - location to store in-memory data structure in disk
  - buildOnCommit - command to build suggester for every commit
  - buildOnOptimize - command to build suggester for every optimize



Query time options:
  - suggest.dictionary - name of suggester to use (can occur multiple times for 
batching suggester requests)
  - suggest.count - number of suggestions to return
  - suggest.q - query to use for lookup
  - suggest.build - command to build the suggester
  - suggest.reload - command to reload the suggester
  - buildAll – command to build all suggesters in the component
  - reloadAll – command to reload all suggesters in the component


Example query:
{code}
http://localhost:8983/solr/suggest?suggest.dictionary=suggester1suggest=truesuggest.build=truesuggest.q=elec
{code}
Distributed query:
{code}
http://localhost:7574/solr/suggest?suggest.dictionary=suggester2suggest=truesuggest.build=truesuggest.q=elecshards=localhost:8983/solr,localhost:7574/solrshards.qt=/suggest
{code}  
Response Format:
The response format can be either XML or JSON. The typical response structure 
is as follows:
 {code}
{
  suggest: {
suggester_name: {
   suggest_query: { numFound:  .., suggestions: [ {term: .., weight: .., 
payload: ..}, .. ]} 
   }
}   
{code}
  




Example Response:
{code}
{
responseHeader: {
status: 0,
QTime: 3
},
suggest: {
suggester1: {
e: {
numFound: 1,
suggestions: [
{
term: electronics and computer1,
weight: 100,
payload: 
}
]
}
},
suggester2: {
e: {
numFound: 1,
suggestions: [
{
term: electronics and computer1,
weight: 10,
payload: 
}
]
}
}
}
}
{code}

Example solrconfig snippet with multiple suggester configuration:
{code}  
  searchComponent name=suggest class=solr.SuggestComponent
lst name=suggester
  str name=namesuggester1/str
  str name=lookupImplFuzzyLookupFactory/str  
  str name=dictionaryImplDocumentDictionaryFactory/str  
  str name=fieldcat/str
  str name=weightFieldprice/str
  str name=suggestAnalyzerFieldTypestring/str
/lst
   lst name=suggester
str name=namesuggester2 /str
str name=dictionaryImplDocumentExpressionDictionaryFactory/str
str name=lookupImplFuzzyLookupFactory/str
str name=fieldproduct_name/str
str name=weightExpression((price * 2) + ln(popularity))/str
str name=sortFieldweight/str
str name=sortFieldprice/str
str name=strtoreDirsuggest_fuzzy_doc_expr_dict/str
str name=suggestAnalyzerFieldTypetext/str
  /lst  
/searchComponent
{code}


  was:Place holder for documentation that will eventually end up in the Solr 
Ref guide.


 Documentation of Suggester V2
 -

 Key: SOLR-5683
 URL:

[jira] [Commented] (SOLR-5683) Documentation of Suggester V2


[ 
https://issues.apache.org/jira/browse/SOLR-5683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13887442#comment-13887442
 ] 

Areek Zillur commented on SOLR-5683:


The documentation still has a lot of TODOs, but should be a good start.

 Documentation of Suggester V2
 -

 Key: SOLR-5683
 URL: https://issues.apache.org/jira/browse/SOLR-5683
 Project: Solr
  Issue Type: Task
  Components: SearchComponents - other
Reporter: Areek Zillur
Assignee: Areek Zillur
 Fix For: 5.0, 4.7


 Place holder for documentation that will eventually end up in the Solr Ref 
 guide.
 
 The new Suggester Component allows Solr to fully utilize the Lucene 
 suggesters. 
 The main features are:
 - lookup pluggability (TODO: add description):
   -- AnalyzingInfixLookupFactory
   -- AnalyzingLookupFactory
   -- FuzzyLookupFactory
   -- FreeTextLookupFactory
   -- FSTLookupFactory
   -- WFSTLookupFactory
   -- TSTLookupFactory
   --  JaspellLookupFactory
- Dictionary pluggability (give users the option to choose the dictionary 
 implementation to use for their suggesters to consume)
-- Input from search index
   --- DocumentDictionaryFactory – user can specify suggestion field along 
 with optional weight and payload fields from their search index.
   --- DocumentExpressionFactory – same as DocumentDictionaryFactory but 
 allows users to specify arbitrary expression using existing numeric fields.
  --- HighFrequencyDictionaryFactory – user can specify a suggestion field 
 and specify a threshold to prune out less frequent terms.   
-- Input from external files
  --- FileDictionaryFactory – user can specify a file which contains 
 suggest entries, along with optional weights and payloads.
 Config (index time) options:
   - name - name of suggester
   - sourceLocation - external file location (for file-based suggesters)
   - lookupImpl - type of lookup to use [default JaspellLookupFactory]
   - dictionaryImpl - type of dictionary to use (lookup input) [default
 (sourceLocation == null ? HighFrequencyDictionaryFactory : 
 FileDictionaryFactory)]
   - storeDir - location to store in-memory data structure in disk
   - buildOnCommit - command to build suggester for every commit
   - buildOnOptimize - command to build suggester for every optimize
 Query time options:
   - suggest.dictionary - name of suggester to use (can occur multiple times 
 for batching suggester requests)
   - suggest.count - number of suggestions to return
   - suggest.q - query to use for lookup
   - suggest.build - command to build the suggester
   - suggest.reload - command to reload the suggester
   - buildAll – command to build all suggesters in the component
   - reloadAll – command to reload all suggesters in the component
 Example query:
 {code}
 http://localhost:8983/solr/suggest?suggest.dictionary=suggester1suggest=truesuggest.build=truesuggest.q=elec
 {code}
 Distributed query:
 {code}
 http://localhost:7574/solr/suggest?suggest.dictionary=suggester2suggest=truesuggest.build=truesuggest.q=elecshards=localhost:8983/solr,localhost:7574/solrshards.qt=/suggest
 {code}
 Response Format:
 The response format can be either XML or JSON. The typical response structure 
 is as follows:
  {code}
 {
   suggest: {
 suggester_name: {
suggest_query: { numFound:  .., suggestions: [ {term: .., weight: .., 
 payload: ..}, .. ]} 
}
 } 
 {code}
   
 Example Response:
 {code}
 {
 responseHeader: {
 status: 0,
 QTime: 3
 },
 suggest: {
 suggester1: {
 e: {
 numFound: 1,
 suggestions: [
 {
 term: electronics and computer1,
 weight: 100,
 payload: 
 }
 ]
 }
 },
 suggester2: {
 e: {
 numFound: 1,
 suggestions: [
 {
 term: electronics and computer1,
 weight: 10,
 payload: 
 }
 ]
 }
 }
 }
 }
 {code}
 Example solrconfig snippet with multiple suggester configuration:
 {code}  
   searchComponent name=suggest class=solr.SuggestComponent
 lst name=suggester
   str name=namesuggester1/str
   str name=lookupImplFuzzyLookupFactory/str  
   str name=dictionaryImplDocumentDictionaryFactory/str  
   str name=fieldcat/str
   str name=weightFieldprice/str
   str name=suggestAnalyzerFieldTypestring/str
 /lst
lst name=suggester
 str name=namesuggester2 /str
 str name=dictionaryImplDocumentExpressionDictionaryFactory/str
 str

[jira] [Commented] (LUCENE-5426) Make SortedSetDocValuesReaderState customizable


[ 
https://issues.apache.org/jira/browse/LUCENE-5426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13887465#comment-13887465
 ] 

Shai Erera commented on LUCENE-5426:


I've got few questions:

* Why is this code now in the accumulator:

{code}
+if (dv.getValueCount()  Integer.MAX_VALUE) {
+  throw new IllegalArgumentException(can only handle valueCount  
Integer.MAX_VALUE; got  + dv.getValueCount());
+}
{code}

I see that it's still in DefaultSSDVReaderState, i.e. you cannot construct it 
if DV-count is more than Integer.MAX_VALUE. It also looks odd in the 
accumulator - it only uses it if the given FacetArrays are {{null}}?

* Can you please make sure all the added getters are not called from inside 
loops, such as state.getIndexReader/separatorRegex?

* Perhaps you should pull getSize() up to SSDVReaderState as well and use it 
instead of getDV().valueCount()? Just in case you can compute the size without 
obtaining the DV (i.e. lazy). Currently you're forced to pull a DV from the 
reader. If you do that, then please fix the Accumulator to use it too.

Otherwise this looks good. The gist of this patch is that you made 
SSDVReaderState abstract (i.e. could have been an interface) and 
DefaultSSDVReaderState is the current concrete implementation, right?

 Make SortedSetDocValuesReaderState customizable
 ---

 Key: LUCENE-5426
 URL: https://issues.apache.org/jira/browse/LUCENE-5426
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Affects Versions: 4.6
Reporter: John Wang
 Attachments: sortedsetreaderstate.patch, sortedsetreaderstate.patch


 We have a reader that have a different data structure (in memory) where the 
 cost of computing ordinals per reader open is too expensive in the realtime 
 setting.
 We are maintaining in memory data structure that supports all functionality 
 and would like to leverage SortedSetDocValuesAccumulator.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5425) Make creation of FixedBitSet in FacetsCollector overridable

[
https://issues.apache.org/jira/browse/LUCENE-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13887469#comment-13887469
]

Shai Erera commented on LUCENE-5425:

The current patch's intention seems to allow the app to cache FixedBitSet, such
that it can clear() it before each search? To save the long[] allocation? This
looks fine to me. Can you please add javadocs to FC.createBitSet?

bq. Better not to depend on the bitset, it's better to depend on a more general
interface.

We need to be careful here. Mike and I experimented with cutting the API over
to a general DocIdSet, but it hurt the performance of faceted search, even with
smarter bitsets. When we move to a DocIdSet, we add the DocIdSetIterator
layer while iterating over the bits which slows down the search ... at least
per luceneutil benchmarks. So if we want to do that, we should do it only after
proving, by means of benchmarking, that it doesn't hurt performance severely.
Would actually be good to show that more compressed bitsets maybe even improve
performance, but I would go with the change if the performance-loss is
marginal. And we should do it under a separate issue, to not block this one.

Make creation of FixedBitSet in FacetsCollector overridable
---

Key: LUCENE-5425
URL: https://issues.apache.org/jira/browse/LUCENE-5425
Project: Lucene - Core
Issue Type: Improvement
Components: modules/facet
Affects Versions: 4.6
Reporter: John Wang
Attachments: facetscollector.patch

In FacetsCollector, creation of bits in MatchingDocs are allocated per query.
For large indexes where maxDocs are large creating a bitset of maxDoc bits
will be expensive and would great a lot of garbage.
Attached patch is to allow for this allocation customizable while maintaining
current behavior.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5621) Let Solr use Lucene's SeacherManager

[
https://issues.apache.org/jira/browse/SOLR-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Tomás Fernández Löbbe updated SOLR-5621:

Attachment: SOLR-5621.patch

I'm uploading a patch with some changes. I added a test for
SolrSearcherFacrtory. There are still some use cases missing:
newReaderCreator: My understanding is that, during a core reload, Solr uses the
reader from the old core to warm the first searcher of the new core. Right now
I'm not doing that after these changes.
searchersOnDeck/maxWarmingSearchers is also not implemented. Right now there
can be 0 or 1 searcher warming, but no more than that. All searchers except for
the first one created are warmed inside the SolrSearcherFactory, and
immediately after that registered. the SearcherManager won't try to create
more than one Searcher at a time.
IndexReaderFactory is not implemented

There are still some tests failing intermittently, I'm looking into that.
I think I should be able to do a much better job managing the realtime
searcher vs the regular searcher, but I didn't focus on that yet.
I'm using this github repository: https://github.com/tflobbe/lucene-solr

Let Solr use Lucene's SeacherManager

Key: SOLR-5621
URL: https://issues.apache.org/jira/browse/SOLR-5621
Project: Solr
Issue Type: Improvement
Affects Versions: 5.0
Reporter: Tomás Fernández Löbbe
Fix For: 5.0

Attachments: SOLR-5621.patch, SOLR-5621.patch

It would be nice if Solr could take advantage of Lucene's SearcherManager and
get rid of most of the logic related to managing Searchers in SolrCore.
I've been taking a look at how possible it is to achieve this, and even if I
haven't finish with the changes (there are some use cases that are still not
working exactly the same) it looks like it is possible to do. Some things
still could use a lot of improvement (like the realtime searcher management)
and some other not yet implemented, like Searchers on deck or
IndexReaderFactory
I'm attaching an initial patch (many TODOs yet).

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Submission to ApacheCon on Tika

2014-01-30 Thread Chris Mattmann

Hey Guys,

I submitted the below talk on Apache Tika, Nutch and Solr to ApacheCon NA
2014:

Real Data Science: Exploring the FBI's Vault dataset with Apache Tika,
Nutch and Solr
Event ApacheCon North America
Submission Type Lightning Talk
Category Developer
Biography Chris Mattmann has a wealth of experience in software design,
and in the construction of large-scale data-intensive systems. His work
has infected a broad set of communities, ranging from helping NASA unlock
data from its next generation of earth science system satellites, to
assisting graduate students at the University of Southern California (his
Alma mater) in the study of software architecture, all the way to helping
industry and open source as a member of the Apache Software Foundation.
When he's not busy being busy, he's spending time with his lovely wife and
son braving the mean streets of Southern California.
Abstract Apache Tika is a content detection and analysis toolkit allowing
automated MIME type identification and rapid parsing of text and metadata
from over 1200 types of files including all major file types from the
Internet Assigned Number Authority's MIME database. In this talk I'll show
you how to practically use Apache Tika to explore the FBI's vault of
declassified PDF documents, and to use Apache Nutch to pull down the
dataset, and how to use Solr to ingest, and geoclassify the documents so
that can build a map of FBI PDF documents corresponding to your favorite
conspiracies throughout the USA. I've taught this material in my CSCI 572
Search Engines class at USC and it's a big hit. These are normally three
assignments, so I will do my best to boil down their essence into a
45min-60 min talk replete with danger and excitement.
Audience Developers interested in using Tika, Nutch and Solr. Folks
interested in the FBI vault dataset. GIS wonks. The like.
Experience Level Intermediate
Benefits to the Ecosystem The core of the talk will be Tika, but there
will be some Nutch magic, and some Solr magic at very basic levels. The
benefits of the ecosystem will be the real display of data science
involved and on a real dataset.
Technical Requirements I need an internet connection, and a projector.
Status New




Cheers,
Chris



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5425) Make creation of FixedBitSet in FacetsCollector overridable

2014-01-30 Thread Lei Wang (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13887506#comment-13887506
]

Lei Wang commented on LUCENE-5425:
--

agree it can be a done in a separate issue. didn't know a wrapper will affect
performance that much. it's just an additional method call to me. the default
OpenBitSetIterator impl is different with nextSetBit, maybe that's the reason?
anyway, start with this change won't hurt anything. and caching of the bitset
should be able to get 20% down on the overhead of new a bitset each time (the
other 80% is from the memset). After getting this in, i can do a separate test
on the DocIdSet. See if we can get an acceptable performance for the default
behavior without reusing the memory.

Make creation of FixedBitSet in FacetsCollector overridable
---

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5684) Shutdown SolrServer clients created in BasicDistributedZk2Test and BasicDistributedZkTest


 [ 
https://issues.apache.org/jira/browse/SOLR-5684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomás Fernández Löbbe updated SOLR-5684:


Attachment: SOLR-5684.patch

 Shutdown SolrServer clients created in BasicDistributedZk2Test and 
 BasicDistributedZkTest
 -

 Key: SOLR-5684
 URL: https://issues.apache.org/jira/browse/SOLR-5684
 Project: Solr
  Issue Type: Bug
Affects Versions: 5.0
Reporter: Tomás Fernández Löbbe
Priority: Minor
 Fix For: 5.0

 Attachments: SOLR-5684.patch


 I found that the tests BasicDistributedZk2Test and BasicDistributedZkTest are 
 creating multiple HttpSolrServer objects to which they don't call the 
 shutdown method after using them.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-5684) Shutdown SolrServer clients created in BasicDistributedZk2Test and BasicDistributedZkTest

Tomás Fernández Löbbe created SOLR-5684:
---

 Summary: Shutdown SolrServer clients created in 
BasicDistributedZk2Test and BasicDistributedZkTest
 Key: SOLR-5684
 URL: https://issues.apache.org/jira/browse/SOLR-5684
 Project: Solr
  Issue Type: Bug
Affects Versions: 5.0
Reporter: Tomás Fernández Löbbe
Priority: Minor
 Fix For: 5.0
 Attachments: SOLR-5684.patch

I found that the tests BasicDistributedZk2Test and BasicDistributedZkTest are 
creating multiple HttpSolrServer objects to which they don't call the shutdown 
method after using them.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5302) Analytics Component

2014-01-30 Thread Shawn Heisey (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13887541#comment-13887541
 ] 

Shawn Heisey commented on SOLR-5302:


bq. This is link I meant in the pdf: 
https://cms.prod.bloomberg.com/team/display/fdns/Search+Analytics+Component

If I had to guess, I would say that is an internal website for Bloomberg, 
something that only employees can get to.  If they intend it for public 
consumption, they'll need to publish the data on a public website and fix the 
links in the PDF.

bq. Any idea when the future 5x could be released?

Quick answer: 5.0 is many months away.  It's impossible to give any kind of 
release date prediction.  Hopefully this particular feature will end up in a 
4.x release, once Erick (or another committer) has the time to devote to giving 
the code a thorough review.

Longer answer:

At this time, nobody has come up with a timeframe for Solr 5.0.  Once somebody 
decides we're going to begin the process and agrees to be the release manager, 
a LOT has to happen, and there's really no way to make it happen quickly.

Even if we began the 5.0 release process tomorrow and everything were to be 
extremely smooth, I don't think you'd even see a 5.0-ALPHA release for a few 
months.  We can't begin the release process that soon, so it's going to be even 
longer.  One of the big items still left to do is to embed the HTTP server 
layer and make Solr into a standalone application.

I wasn't involved with the development when 4.0 was released, so I don't know 
how much time passed between the beginning of the 4.0 release process and 
4.0-ALPHA, but I can tell you that there were three months between 4.0-ALPHA 
and 4.0-FINAL.

 Analytics Component
 ---

 Key: SOLR-5302
 URL: https://issues.apache.org/jira/browse/SOLR-5302
 Project: Solr
  Issue Type: New Feature
Reporter: Steven Bower
Assignee: Erick Erickson
 Attachments: SOLR-5302.patch, SOLR-5302.patch, SOLR-5302.patch, 
 SOLR-5302.patch, Search Analytics Component.pdf, Statistical Expressions.pdf, 
 solr_analytics-2013.10.04-2.patch


 This ticket is to track a replacement for the StatsComponent. The 
 AnalyticsComponent supports the following features:
 * All functionality of StatsComponent (SOLR-4499)
 * Field Faceting (SOLR-3435)
 ** Support for limit
 ** Sorting (bucket name or any stat in the bucket
 ** Support for offset
 * Range Faceting
 ** Supports all options of standard range faceting
 * Query Faceting (SOLR-2925)
 * Ability to use overall/field facet statistics as input to range/query 
 faceting (ie calc min/max date and then facet over that range
 * Support for more complex aggregate/mapping operations (SOLR-1622)
 ** Aggregations: min, max, sum, sum-of-square, count, missing, stddev, mean, 
 median, percentiles
 ** Operations: negation, abs, add, multiply, divide, power, log, date math, 
 string reversal, string concat
 ** Easily pluggable framework to add additional operations
 * New / cleaner output format
 Outstanding Issues:
 * Multi-value field support for stats (supported for faceting)
 * Multi-shard support (may not be possible for some operations, eg median)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5425) Make creation of FixedBitSet in FacetsCollector overridable

[
https://issues.apache.org/jira/browse/LUCENE-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13887548#comment-13887548
]

Shai Erera commented on LUCENE-5425:

bq. didn't know a wrapper will affect performance that much.

Me neither! We were very surprised to see some of the performance implications
of saving a method call or changing the order we iterate on results + facet
requests. But it's hard to argue w/ consistent numbers, and IIRC they weren't
in the 3-5% range, but 10+%. Could be though that we measured several changes
at once .. so I think it's worthwhile to benchmark the move to DocIdSet.

And yes, it could be OpenBitSetIterator, I don't rule it out.

+1 to allow caching in this issue, we can investigate generalizing the APIs in
a separate issue.

Make creation of FixedBitSet in FacetsCollector overridable
---

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-5685) Some tests are not really committing when they intend to

Tomás Fernández Löbbe created SOLR-5685:
---

 Summary: Some tests are not really committing when they intend to
 Key: SOLR-5685
 URL: https://issues.apache.org/jira/browse/SOLR-5685
 Project: Solr
  Issue Type: Bug
Affects Versions: 5.0
Reporter: Tomás Fernández Löbbe
 Fix For: 5.0


There are some tests that call SolrTestCaseJ4.commit() but don't really submit 
the commit command (using assertU() ). This causes the commit to not really 
run. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5685) Some tests are not really committing when they intend to