[JENKINS] Lucene-Solr-Tests-trunk-Java7 - Build # 4597 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-Tests-trunk-Java7/4597/ All tests passed Build Log: [...truncated 57494 lines...] BUILD FAILED /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-Java7/build.xml:453: The following error occurred while executing this line: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-Java7/build.xml:392: The following error occurred while executing this line: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-Java7/extra-targets.xml:87: The following error occurred while executing this line: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-Java7/extra-targets.xml:185: The following files are missing svn:eol-style (or binary svn:mime-type): * ./lucene/core/src/test/org/apache/lucene/index/TestDocInverterPerFieldErrorInfo.java Total time: 110 minutes 42 seconds Build step 'Invoke Ant' marked build as failure Archiving artifacts Recording test results Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5421) MorfologicFilter doesn't stem properly invented names: surnames, rivers, etc.
Grzegorz Sobczyk created LUCENE-5421: Summary: MorfologicFilter doesn't stem properly invented names: surnames, rivers, etc. Key: LUCENE-5421 URL: https://issues.apache.org/jira/browse/LUCENE-5421 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Affects Versions: 4.6.1 Reporter: Grzegorz Sobczyk Morfologic filter search stems in input or lowercase format: org.apache.lucene.analysis.morfologik.MorfologikFilter.incrementToken() {code} if (!keywordAttr.isKeyword() (lookupSurfaceForm(termAtt) || lookupSurfaceForm(toLowercase(termAtt { {code} In this situation, if input token is *sienkiewicza* - it isn't stemmed but: *Sienkiewicza* -- *Sienkiewicz* for comparison: *pRoDuKtY* -- *produkt* It should stem also input token with capitalized first letter -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5421) MorfologicFilter doesn't stem properly invented names: surnames, rivers, etc.
[ https://issues.apache.org/jira/browse/LUCENE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grzegorz Sobczyk updated LUCENE-5421: - Description: Morfologic filter search stems in input or lowercase format: org.apache.lucene.analysis.morfologik.MorfologikFilter.incrementToken() {code} if (!keywordAttr.isKeyword() (lookupSurfaceForm(termAtt) || lookupSurfaceForm(toLowercase(termAtt { [...] } {code} In this situation, if input token is *sienkiewicza* - it isn't stemmed but: *Sienkiewicza* -- *Sienkiewicz* for comparison: *pRoDuKtY* -- *produkt* It should stem also input token with capitalized first letter was: Morfologic filter search stems in input or lowercase format: org.apache.lucene.analysis.morfologik.MorfologikFilter.incrementToken() {code} if (!keywordAttr.isKeyword() (lookupSurfaceForm(termAtt) || lookupSurfaceForm(toLowercase(termAtt { {code} In this situation, if input token is *sienkiewicza* - it isn't stemmed but: *Sienkiewicza* -- *Sienkiewicz* for comparison: *pRoDuKtY* -- *produkt* It should stem also input token with capitalized first letter MorfologicFilter doesn't stem properly invented names: surnames, rivers, etc. - Key: LUCENE-5421 URL: https://issues.apache.org/jira/browse/LUCENE-5421 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Affects Versions: 4.6.1 Reporter: Grzegorz Sobczyk Morfologic filter search stems in input or lowercase format: org.apache.lucene.analysis.morfologik.MorfologikFilter.incrementToken() {code} if (!keywordAttr.isKeyword() (lookupSurfaceForm(termAtt) || lookupSurfaceForm(toLowercase(termAtt { [...] } {code} In this situation, if input token is *sienkiewicza* - it isn't stemmed but: *Sienkiewicza* -- *Sienkiewicz* for comparison: *pRoDuKtY* -- *produkt* It should stem also input token with capitalized first letter -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-5421) MorfologicFilter doesn't stem properly invented names: surnames, rivers, etc.
[ https://issues.apache.org/jira/browse/LUCENE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss reassigned LUCENE-5421: --- Assignee: Dawid Weiss MorfologicFilter doesn't stem properly invented names: surnames, rivers, etc. - Key: LUCENE-5421 URL: https://issues.apache.org/jira/browse/LUCENE-5421 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Affects Versions: 4.6.1 Reporter: Grzegorz Sobczyk Assignee: Dawid Weiss Morfologic filter search stems in input or lowercase format: org.apache.lucene.analysis.morfologik.MorfologikFilter.incrementToken() {code} if (!keywordAttr.isKeyword() (lookupSurfaceForm(termAtt) || lookupSurfaceForm(toLowercase(termAtt { [...] } {code} In this situation, if input token is *sienkiewicza* - it isn't stemmed but: *Sienkiewicza* -- *Sienkiewicz* for comparison: *pRoDuKtY* -- *produkt* It should stem also input token with capitalized first letter -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5421) MorfologicFilter doesn't stem legitimate uppercase terms (surnames, proper nouns, etc.)
[ https://issues.apache.org/jira/browse/LUCENE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-5421: Summary: MorfologicFilter doesn't stem legitimate uppercase terms (surnames, proper nouns, etc.) (was: MorfologicFilter doesn't stem properly invented names: surnames, rivers, etc.) MorfologicFilter doesn't stem legitimate uppercase terms (surnames, proper nouns, etc.) --- Key: LUCENE-5421 URL: https://issues.apache.org/jira/browse/LUCENE-5421 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Affects Versions: 4.6.1 Reporter: Grzegorz Sobczyk Assignee: Dawid Weiss Priority: Minor Morfologic filter search stems in input or lowercase format: org.apache.lucene.analysis.morfologik.MorfologikFilter.incrementToken() {code} if (!keywordAttr.isKeyword() (lookupSurfaceForm(termAtt) || lookupSurfaceForm(toLowercase(termAtt { [...] } {code} In this situation, if input token is *sienkiewicza* - it isn't stemmed but: *Sienkiewicza* -- *Sienkiewicz* for comparison: *pRoDuKtY* -- *produkt* It should stem also input token with capitalized first letter -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5421) MorfologicFilter doesn't stem properly invented names: surnames, rivers, etc.
[ https://issues.apache.org/jira/browse/LUCENE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-5421: Priority: Minor (was: Major) MorfologicFilter doesn't stem properly invented names: surnames, rivers, etc. - Key: LUCENE-5421 URL: https://issues.apache.org/jira/browse/LUCENE-5421 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Affects Versions: 4.6.1 Reporter: Grzegorz Sobczyk Assignee: Dawid Weiss Priority: Minor Morfologic filter search stems in input or lowercase format: org.apache.lucene.analysis.morfologik.MorfologikFilter.incrementToken() {code} if (!keywordAttr.isKeyword() (lookupSurfaceForm(termAtt) || lookupSurfaceForm(toLowercase(termAtt { [...] } {code} In this situation, if input token is *sienkiewicza* - it isn't stemmed but: *Sienkiewicza* -- *Sienkiewicz* for comparison: *pRoDuKtY* -- *produkt* It should stem also input token with capitalized first letter -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5421) MorfologicFilter doesn't stem legitimate uppercase terms (surnames, proper nouns, etc.)
[ https://issues.apache.org/jira/browse/LUCENE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886434#comment-13886434 ] Dawid Weiss commented on LUCENE-5421: - Yeah, do you care to provide a patch or a github pull request? It'd speed up the process. Include a test case for this as well, thanks. MorfologicFilter doesn't stem legitimate uppercase terms (surnames, proper nouns, etc.) --- Key: LUCENE-5421 URL: https://issues.apache.org/jira/browse/LUCENE-5421 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Affects Versions: 4.6.1 Reporter: Grzegorz Sobczyk Assignee: Dawid Weiss Priority: Minor Morfologic filter search stems in input or lowercase format: org.apache.lucene.analysis.morfologik.MorfologikFilter.incrementToken() {code} if (!keywordAttr.isKeyword() (lookupSurfaceForm(termAtt) || lookupSurfaceForm(toLowercase(termAtt { [...] } {code} In this situation, if input token is *sienkiewicza* - it isn't stemmed but: *Sienkiewicza* -- *Sienkiewicz* for comparison: *pRoDuKtY* -- *produkt* It should stem also input token with capitalized first letter -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5422) Postings lists deduplication
Dmitry Kan created LUCENE-5422: -- Summary: Postings lists deduplication Key: LUCENE-5422 URL: https://issues.apache.org/jira/browse/LUCENE-5422 Project: Lucene - Core Issue Type: Improvement Components: core/codecs, core/index Reporter: Dmitry Kan The context: http://markmail.org/thread/tywtrjjcfdbzww6f Robert Muir and I have discussed what Robert eventually named postings lists deduplication at Berlin Buzzwords 2013 conference. The idea is to allow multiple terms to point to the same postings list to save space. This can be achieved by new index codec implementation, but this jira is open to other ideas as well. The application / impact of this is positive for synonyms, exact / inexact terms, leading wildcard support via storing reversed term etc. For example, at the moment, when supporting exact (unstemmed) and inexact (stemmed) searches, we store both unstemmed and stemmed variant of a word form and that leads to index bloating. That is why we had to remove the leading wildcard support via reversing a token on index and query time because of the same index size considerations. Comment from Mike McCandless: Neat idea! Would this idea allow a single term to point to (the union of) N other posting lists? It seems like that's necessary e.g. to handle the exact/inexact case. And then, to produce the Docs/AndPositionsEnum you'd need to do the merge sort across those N posting lists? Such a thing might also be do-able as runtime only wrapper around the postings API (FieldsProducer), if you could at runtime do the reverse expansion (e.g. stem - all of its surface forms). Comment from Robert Muir: I think the exact/inexact is trickier (detecting it would be the hard part), and you are right, another solution might work better. but for the reverse wildcard and synonyms situation, it seems we could even detect it on write if we created some hash of the previous terms postings. if the hash matches for the current term, we know it might be a duplicate and would have to actually do the costly check they are the same. maybe there are better ways to do it, but it might be a fun postingformat experiment to try. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5675) cloud-scripts/zkcli.bat: quote option -Dlog4j...
[ https://issues.apache.org/jira/browse/SOLR-5675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886444#comment-13886444 ] Stefan Matheis (steffkes) commented on SOLR-5675: - Thanks for reporting this Günter! Just to make sure, i get it right .. you suggest quoting the whole thing, instead of the arguments value only? I'd expect something like {{-Dlog4j.configuration=file:%SDIR%\log4j.properties}} instead {{-Dlog4j.configuration=file:%SDIR%\log4j.properties}} If you are sure, that quoting the whole argument does the trick .. we can go ahead with that :) cloud-scripts/zkcli.bat: quote option -Dlog4j... - Key: SOLR-5675 URL: https://issues.apache.org/jira/browse/SOLR-5675 Project: Solr Issue Type: Bug Components: scripts and tools Affects Versions: 4.6 Environment: Windows 7 64 bit Reporter: Günther Ruck Priority: Minor In Skript zkcli.bat this java command line is build: %JVM% -Dlog4j.configuration=file:%SDIR%\log4j.properties The command failes if %SDIR% contains spaces (C:\Program Files\...). Quoting the hole option -Dlog4j.configuration=file:%SDIR%\log4j.properties solved the isssue. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.7.0_60-ea-b03) - Build # 9292 - Still Failing!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/9292/ Java: 32bit/jdk1.7.0_60-ea-b03 -client -XX:+UseSerialGC All tests passed Build Log: [...truncated 50357 lines...] BUILD FAILED /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:453: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:392: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/extra-targets.xml:87: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/extra-targets.xml:185: The following files are missing svn:eol-style (or binary svn:mime-type): * ./lucene/core/src/test/org/apache/lucene/index/TestDocInverterPerFieldErrorInfo.java Total time: 60 minutes 57 seconds Build step 'Invoke Ant' marked build as failure Description set: Java: 32bit/jdk1.7.0_60-ea-b03 -client -XX:+UseSerialGC Archiving artifacts Recording test results Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Windows (64bit/jdk1.7.0_60-ea-b03) - Build # 3726 - Still Failing!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3726/ Java: 64bit/jdk1.7.0_60-ea-b03 -XX:+UseCompressedOops -XX:+UseParallelGC All tests passed Build Log: [...truncated 57363 lines...] BUILD FAILED C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\build.xml:453: The following error occurred while executing this line: C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\build.xml:392: The following error occurred while executing this line: C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\extra-targets.xml:87: The following error occurred while executing this line: C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\extra-targets.xml:185: The following files are missing svn:eol-style (or binary svn:mime-type): * ./lucene/core/src/test/org/apache/lucene/index/TestDocInverterPerFieldErrorInfo.java Total time: 108 minutes 35 seconds Build step 'Invoke Ant' marked build as failure Description set: Java: 64bit/jdk1.7.0_60-ea-b03 -XX:+UseCompressedOops -XX:+UseParallelGC Archiving artifacts Recording test results Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5675) cloud-scripts/zkcli.bat: quote option -Dlog4j...
[ https://issues.apache.org/jira/browse/SOLR-5675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886462#comment-13886462 ] Günther Ruck commented on SOLR-5675: Hallo Stefan, I've tried your suggestion and it works. It is sufficient to quote the value of the option {{-Dlog4j.configuration=file:%SDIR%\log4j.properties}} cloud-scripts/zkcli.bat: quote option -Dlog4j... - Key: SOLR-5675 URL: https://issues.apache.org/jira/browse/SOLR-5675 Project: Solr Issue Type: Bug Components: scripts and tools Affects Versions: 4.6 Environment: Windows 7 64 bit Reporter: Günther Ruck Priority: Minor In Skript zkcli.bat this java command line is build: %JVM% -Dlog4j.configuration=file:%SDIR%\log4j.properties The command failes if %SDIR% contains spaces (C:\Program Files\...). Quoting the hole option -Dlog4j.configuration=file:%SDIR%\log4j.properties solved the isssue. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5405) Exception strategy for analysis improved
[ https://issues.apache.org/jira/browse/LUCENE-5405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886465#comment-13886465 ] Michael McCandless commented on LUCENE-5405: Shouldn't we port this to 4.x as well? Exception strategy for analysis improved Key: LUCENE-5405 URL: https://issues.apache.org/jira/browse/LUCENE-5405 Project: Lucene - Core Issue Type: Improvement Reporter: Benson Margulies Assignee: Benson Margulies Fix For: 5.0 SOLR-5623 included some conversation about the dilemmas of exception management and reporting in the analysis chain. I've belatedly become educated about the infostream, and this situation is a job for it. The DocInverterPerField can note exceptions in the analysis chain, log out to the infostream, and then rethrow them as before. No wrapping, no muss, no fuss. There are comments on this JIRA from a more complex prior idea that readers might want to ignore. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5678) When SolrJ/SolrCloud can't talk to Zookeeper, it throws a RuntimeException
Karl Wright created SOLR-5678: - Summary: When SolrJ/SolrCloud can't talk to Zookeeper, it throws a RuntimeException Key: SOLR-5678 URL: https://issues.apache.org/jira/browse/SOLR-5678 Project: Solr Issue Type: Bug Affects Versions: 4.6 Reporter: Karl Wright This class of exception should not be used for run-of-the-mill networking kinds of issues. SolrServerException or some variety of IOException should be thrown instead. Here's the trace: {code} java.lang.RuntimeException: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper localhost:2181 within 6 ms at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:130) at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:88) at org.apache.solr.common.cloud.ZkStateReader.init(ZkStateReader.java:148) at org.apache.solr.client.solrj.impl.CloudSolrServer.connect(CloudSolrServer.java:147) at org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:173) at org.apache.manifoldcf.agents.output.solr.HttpPoster$SolrPing.process(HttpPoster.java:1315) at org.apache.manifoldcf.agents.output.solr.HttpPoster$StatusThread.run(HttpPoster.java:1208) Caused by: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper localhost:2181 within 6 ms at org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:173) at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:127) ... 6 more {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-trunk-Windows (64bit/jdk1.7.0_60-ea-b03) - Build # 3726 - Still Failing!
I committed a fix (svn prop-set svn:eol-style native ...). Benson, you may want to add this to your ~/.subversion/config: http://www.apache.org/dev/svn-eol-style.txt It automatically sets the eol-style when you add the files w/ those extensions... Mike McCandless http://blog.mikemccandless.com On Thu, Jan 30, 2014 at 5:05 AM, Policeman Jenkins Server jenk...@thetaphi.de wrote: Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3726/ Java: 64bit/jdk1.7.0_60-ea-b03 -XX:+UseCompressedOops -XX:+UseParallelGC All tests passed Build Log: [...truncated 57363 lines...] BUILD FAILED C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\build.xml:453: The following error occurred while executing this line: C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\build.xml:392: The following error occurred while executing this line: C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\extra-targets.xml:87: The following error occurred while executing this line: C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\extra-targets.xml:185: The following files are missing svn:eol-style (or binary svn:mime-type): * ./lucene/core/src/test/org/apache/lucene/index/TestDocInverterPerFieldErrorInfo.java Total time: 108 minutes 35 seconds Build step 'Invoke Ant' marked build as failure Description set: Java: 64bit/jdk1.7.0_60-ea-b03 -XX:+UseCompressedOops -XX:+UseParallelGC Archiving artifacts Recording test results Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-4x-Linux-Java6-64-test-only - Build # 11503 - Failure!
I'll fix. Mike McCandless http://blog.mikemccandless.com On Thu, Jan 30, 2014 at 1:35 AM, buil...@flonkings.com wrote: Build: builds.flonkings.com/job/Lucene-4x-Linux-Java6-64-test-only/11503/ 1 tests failed. REGRESSION: org.apache.lucene.index.TestIndexWriterReader.testNoTermsIndex Error Message: should have failed to seek since terms index was not loaded. Stack Trace: java.lang.AssertionError: should have failed to seek since terms index was not loaded. at __randomizedtesting.SeedInfo.seed([9CE6A84339D41DE3:107F46CCDEE31B19]:0) at org.junit.Assert.fail(Assert.java:93) at org.apache.lucene.index.TestIndexWriterReader.testNoTermsIndex(TestIndexWriterReader.java:1043) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at java.lang.Thread.run(Thread.java:662) Build Log: [...truncated 641 lines...] [junit4] Suite: org.apache.lucene.index.TestIndexWriterReader [junit4] 2 NOTE: reproduce with: ant test -Dtestcase=TestIndexWriterReader -Dtests.method=testNoTermsIndex -Dtests.seed=9CE6A84339D41DE3 -Dtests.slow=true -Dtests.locale=es -Dtests.timezone=Etc/GMT+11
[jira] [Commented] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary
[ https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886478#comment-13886478 ] ASF subversion and git services commented on LUCENE-3069: - Commit 1562771 from [~mikemccand] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1562771 ] LUCENE-3069: also exclude MockRandom from this test case Lucene should have an entirely memory resident term dictionary -- Key: LUCENE-3069 URL: https://issues.apache.org/jira/browse/LUCENE-3069 Project: Lucene - Core Issue Type: Improvement Components: core/index, core/search Affects Versions: 4.0-ALPHA Reporter: Simon Willnauer Assignee: Han Jiang Labels: gsoc2014 Fix For: 4.7 Attachments: LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, df-ttf-estimate.txt, example.png FST based TermDictionary has been a great improvement yet it still uses a delta codec file for scanning to terms. Some environments have enough memory available to keep the entire FST based term dict in memory. We should add a TermDictionary implementation that encodes all needed information for each term into the FST (custom fst.Output) and builds a FST from the entire term not just the delta. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5514) atomic update throws exception if the schema contains uuid fields: Invalid UUID String: 'java.util.UUID:e26c4d56-e98d-41de-9b7f-f63192089670'
[ https://issues.apache.org/jira/browse/SOLR-5514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886482#comment-13886482 ] Arun Kumar commented on SOLR-5514: -- I can reproduce this by only providing the wrong value in the xml while updating, like this adddoc field name=id1 update='set'b95639d6-b579-41dc-b9f0-d17634937528/field field name=MyUUIDField update='set'java.util.UUID:b95639d6-b579-41dc-b9f0-d17634937529/field Can you confirm the value used while updating index is not corrupted as mentioned above? if not, can you share the doc xml and the schema xml? that would help to reproduce and fix this issue. atomic update throws exception if the schema contains uuid fields: Invalid UUID String: 'java.util.UUID:e26c4d56-e98d-41de-9b7f-f63192089670' - Key: SOLR-5514 URL: https://issues.apache.org/jira/browse/SOLR-5514 Project: Solr Issue Type: Bug Affects Versions: 4.5.1 Environment: unix and windows Reporter: Dirk Reuss Assignee: Shalin Shekhar Mangar I am updating an exiting document with the statement adddocfield name='name' update='set'newvalue/field All fields are stored and I have several UUID fields. About 10-20% of the update commands will fail with the message: (example) Invalid UUID String: 'java.util.UUID:532c9353-d391-4a04-8618-dc2fa1ef8b35' the point is that java.util.UUID seems to be prepended to the original uuid stored in the field and when the value is written this error occours. I tried to check if this specific uuid field was the problem and added the uuid field in the update xml with(field name='id1' update='set'...). But the error simply moved to an other uuid field. here is the original exception: lst name=responseHeaderint name=status500/intint name=QTime34/int/lstlst name=errorstr name=msgError while creating field 'MyUUIDField{type=uuid,properties=indexed,stored,omitTermFreqAndPositions,required, required=true}' from value 'java.util.UUID:e26c4d56-e98d-41de-9b7f-f63192089670'/strstr name=traceorg.apache.solr.common.SolrException: Error while creating field 'MyUUIDField{type=uuid,properties=indexed,stored,omitTermFreqAndPositions,required, required=true}' from value 'java.util.UUID:e26c4d56-e98d-41de-9b7f-f63192089670' at org.apache.solr.schema.FieldType.createField(FieldType.java:259) at org.apache.solr.schema.StrField.createFields(StrField.java:56) at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:47) at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:118) at org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:77) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:215) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:556) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:692) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:435) at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100) at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:247) at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at
[jira] [Resolved] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary
[ https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Han Jiang resolved LUCENE-3069. --- Resolution: Fixed Lucene should have an entirely memory resident term dictionary -- Key: LUCENE-3069 URL: https://issues.apache.org/jira/browse/LUCENE-3069 Project: Lucene - Core Issue Type: Improvement Components: core/index, core/search Affects Versions: 4.0-ALPHA Reporter: Simon Willnauer Assignee: Han Jiang Labels: gsoc2014 Fix For: 4.7 Attachments: LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, df-ttf-estimate.txt, example.png FST based TermDictionary has been a great improvement yet it still uses a delta codec file for scanning to terms. Some environments have enough memory available to keep the entire FST based term dict in memory. We should add a TermDictionary implementation that encodes all needed information for each term into the FST (custom fst.Output) and builds a FST from the entire term not just the delta. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
lucene-solr pull request: LUCENE-5421: MorfologicFilter doesn't stem legiti...
GitHub user gsobczyk opened a pull request: https://github.com/apache/lucene-solr/pull/25 LUCENE-5421: MorfologicFilter doesn't stem legitimate uppercase terms (surnames, proper nouns, etc.) You can merge this pull request into a Git repository by running: $ git pull https://github.com/gsobczyk/lucene-solr branch_4x Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/25.patch commit 41e399c27241ab8b0185c23c8035e632046b36e0 Author: Grzegorz Sobczyk grzegorz.sobc...@unity.pl Date: 2014-01-30T10:56:53Z LUCENE-5421: MorfologicFilter doesn't stem legitimate uppercase terms (surnames, proper nouns, etc.) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5421) MorfologicFilter doesn't stem legitimate uppercase terms (surnames, proper nouns, etc.)
[ https://issues.apache.org/jira/browse/LUCENE-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886492#comment-13886492 ] Grzegorz Sobczyk commented on LUCENE-5421: -- https://github.com/apache/lucene-solr/pull/25 but see: TestMorfologikAnalyzer:128 - I didn't find the reason why assertion *aarona*-*aarona* is present there Other thing: poznania (TestMorfologikAnalyzer:125) should return poznanie, poznać, Poznań but I don't know how to do this. If this would be possible, then we could configure MorfologikFilter. MorfologicFilter doesn't stem legitimate uppercase terms (surnames, proper nouns, etc.) --- Key: LUCENE-5421 URL: https://issues.apache.org/jira/browse/LUCENE-5421 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Affects Versions: 4.6.1 Reporter: Grzegorz Sobczyk Assignee: Dawid Weiss Priority: Minor Morfologic filter search stems in input or lowercase format: org.apache.lucene.analysis.morfologik.MorfologikFilter.incrementToken() {code} if (!keywordAttr.isKeyword() (lookupSurfaceForm(termAtt) || lookupSurfaceForm(toLowercase(termAtt { [...] } {code} In this situation, if input token is *sienkiewicza* - it isn't stemmed but: *Sienkiewicza* -- *Sienkiewicz* for comparison: *pRoDuKtY* -- *produkt* It should stem also input token with capitalized first letter -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
lucene auto complete index
can anybody provide lucene autocomplete code for indexing and searching based on Lucene 4.4. do we need to have separate index for autocomplete? or normal index can be used? i need to get suggestions from the documents only i have access to. How can it be implemented? -- View this message in context: http://lucene.472066.n3.nabble.com/lucene-auto-complete-index-tp4114409.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5320) Create SearcherTaxonomyManager over Directory
[ https://issues.apache.org/jira/browse/LUCENE-5320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-5320: --- Attachment: LUCENE-5320.patch Patch adds a ctor to SearcherTaxoManager and tests. Create SearcherTaxonomyManager over Directory - Key: LUCENE-5320 URL: https://issues.apache.org/jira/browse/LUCENE-5320 Project: Lucene - Core Issue Type: New Feature Components: modules/facet Reporter: Shai Erera Attachments: LUCENE-5320.patch SearcherTaxonomyManager now only allows working in NRT mode. It could be useful to have an STM which allows reopening a SearcherAndTaxonomy pair over Directories, e.g. for replication. The problem is that if the thread that calls maybeRefresh() is not the one that does the commit(), it could lead to a pair that is not synchronized. Perhaps at first we could have a simple version that works under some assumptions, i.e. that the app does the commit + reopen in the same thread in that order, so that it can be used by such apps + when replicating the indexes, and later we can figure out how to generalize it to work even if commit + reopen are done by separate threads/JVMs. I'll see if SearcherTaxonomyManager can be extended to support it, or a new STM is required. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5423) OpenBitSet hashCode and equals incongruent
Jakob Zwiener created LUCENE-5423: - Summary: OpenBitSet hashCode and equals incongruent Key: LUCENE-5423 URL: https://issues.apache.org/jira/browse/LUCENE-5423 Project: Lucene - Core Issue Type: Bug Components: core/other Affects Versions: 4.6.1 Reporter: Jakob Zwiener In org.apache.lucene.util.OpenBitSet the hashCode method might return different hash codes for equal bitsets. This happens when there are bits set in words right of wlen. This might happen through a getAndSet call (the documentation states that getAndSet may only be called on positions that are smaller than the size - which is the length of the array not wlen - this might be another issue). -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5320) Create SearcherTaxonomyManager over Directory
[ https://issues.apache.org/jira/browse/LUCENE-5320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886521#comment-13886521 ] Michael McCandless commented on LUCENE-5320: +1, thanks Shai! Create SearcherTaxonomyManager over Directory - Key: LUCENE-5320 URL: https://issues.apache.org/jira/browse/LUCENE-5320 Project: Lucene - Core Issue Type: New Feature Components: modules/facet Reporter: Shai Erera Attachments: LUCENE-5320.patch SearcherTaxonomyManager now only allows working in NRT mode. It could be useful to have an STM which allows reopening a SearcherAndTaxonomy pair over Directories, e.g. for replication. The problem is that if the thread that calls maybeRefresh() is not the one that does the commit(), it could lead to a pair that is not synchronized. Perhaps at first we could have a simple version that works under some assumptions, i.e. that the app does the commit + reopen in the same thread in that order, so that it can be used by such apps + when replicating the indexes, and later we can figure out how to generalize it to work even if commit + reopen are done by separate threads/JVMs. I'll see if SearcherTaxonomyManager can be extended to support it, or a new STM is required. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5678) When SolrJ/SolrCloud can't talk to Zookeeper, it throws a RuntimeException
[ https://issues.apache.org/jira/browse/SOLR-5678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886531#comment-13886531 ] Karl Wright commented on SOLR-5678: --- Throwing SolrException (which is derived from RuntimeException) would be fine too. When SolrJ/SolrCloud can't talk to Zookeeper, it throws a RuntimeException -- Key: SOLR-5678 URL: https://issues.apache.org/jira/browse/SOLR-5678 Project: Solr Issue Type: Bug Affects Versions: 4.6 Reporter: Karl Wright This class of exception should not be used for run-of-the-mill networking kinds of issues. SolrServerException or some variety of IOException should be thrown instead. Here's the trace: {code} java.lang.RuntimeException: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper localhost:2181 within 6 ms at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:130) at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:88) at org.apache.solr.common.cloud.ZkStateReader.init(ZkStateReader.java:148) at org.apache.solr.client.solrj.impl.CloudSolrServer.connect(CloudSolrServer.java:147) at org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:173) at org.apache.manifoldcf.agents.output.solr.HttpPoster$SolrPing.process(HttpPoster.java:1315) at org.apache.manifoldcf.agents.output.solr.HttpPoster$StatusThread.run(HttpPoster.java:1208) Caused by: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper localhost:2181 within 6 ms at org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:173) at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:127) ... 6 more {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
lucene-solr pull request: Reproduces and fixes issue LUCENE-5423
GitHub user jakob-zwiener opened a pull request: https://github.com/apache/lucene-solr/pull/26 Reproduces and fixes issue LUCENE-5423 First commit adds a test that shows the unexpected behaviour. Should the dcumentation of getAndSet be updated to recommend not setting bits right of wlen the unexpected hashCode behaviour could be triggered by manually setting wlen. Although bitsets having bits set right of wlen are kind of invalid in themselves, fixing the hashCode method seems advisable. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jakob-zwiener/lucene-solr trunk Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/26.patch commit 1802f1bc128f85af604aa75d40629eef3ac9515c Author: Jakob Zwiener jakob-zwie...@gmx.de Date: 2014-01-30T12:16:22Z LUCENE-5423 added failing test reproducing the problem commit d41d17deccc383523cdc24cecb81f269a9643c55 Author: Jakob Zwiener jakob-zwie...@gmx.de Date: 2014-01-30T12:20:18Z LUCENE-5423 fixing hashCode method to only iterate valid bits commit ab4290071d9a53716d93c625779e796600eefeab Author: Jakob Zwiener jakob-zwie...@gmx.de Date: 2014-01-30T12:23:07Z LUCENE-5423 fixed getAndSet documentation bits right of wlen should not be set - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5423) OpenBitSet hashCode and equals incongruent
[ https://issues.apache.org/jira/browse/LUCENE-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886535#comment-13886535 ] Jakob Zwiener commented on LUCENE-5423: --- A pull request with a test and a possible fix has been sent on github. OpenBitSet hashCode and equals incongruent -- Key: LUCENE-5423 URL: https://issues.apache.org/jira/browse/LUCENE-5423 Project: Lucene - Core Issue Type: Bug Components: core/other Affects Versions: 4.6.1 Reporter: Jakob Zwiener Original Estimate: 0.5h Remaining Estimate: 0.5h In org.apache.lucene.util.OpenBitSet the hashCode method might return different hash codes for equal bitsets. This happens when there are bits set in words right of wlen. This might happen through a getAndSet call (the documentation states that getAndSet may only be called on positions that are smaller than the size - which is the length of the array not wlen - this might be another issue). -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5320) Create SearcherTaxonomyManager over Directory
[ https://issues.apache.org/jira/browse/LUCENE-5320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886550#comment-13886550 ] ASF subversion and git services commented on LUCENE-5320: - Commit 1562806 from [~shaie] in branch 'dev/trunk' [ https://svn.apache.org/r1562806 ] LUCENE-5320: Add SearcherTaxonomyManager over Directory Create SearcherTaxonomyManager over Directory - Key: LUCENE-5320 URL: https://issues.apache.org/jira/browse/LUCENE-5320 Project: Lucene - Core Issue Type: New Feature Components: modules/facet Reporter: Shai Erera Attachments: LUCENE-5320.patch SearcherTaxonomyManager now only allows working in NRT mode. It could be useful to have an STM which allows reopening a SearcherAndTaxonomy pair over Directories, e.g. for replication. The problem is that if the thread that calls maybeRefresh() is not the one that does the commit(), it could lead to a pair that is not synchronized. Perhaps at first we could have a simple version that works under some assumptions, i.e. that the app does the commit + reopen in the same thread in that order, so that it can be used by such apps + when replicating the indexes, and later we can figure out how to generalize it to work even if commit + reopen are done by separate threads/JVMs. I'll see if SearcherTaxonomyManager can be extended to support it, or a new STM is required. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5320) Create SearcherTaxonomyManager over Directory
[ https://issues.apache.org/jira/browse/LUCENE-5320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886555#comment-13886555 ] ASF subversion and git services commented on LUCENE-5320: - Commit 1562808 from [~shaie] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1562808 ] LUCENE-5320: Add SearcherTaxonomyManager over Directory Create SearcherTaxonomyManager over Directory - Key: LUCENE-5320 URL: https://issues.apache.org/jira/browse/LUCENE-5320 Project: Lucene - Core Issue Type: New Feature Components: modules/facet Reporter: Shai Erera Fix For: 5.0, 4.7 Attachments: LUCENE-5320.patch SearcherTaxonomyManager now only allows working in NRT mode. It could be useful to have an STM which allows reopening a SearcherAndTaxonomy pair over Directories, e.g. for replication. The problem is that if the thread that calls maybeRefresh() is not the one that does the commit(), it could lead to a pair that is not synchronized. Perhaps at first we could have a simple version that works under some assumptions, i.e. that the app does the commit + reopen in the same thread in that order, so that it can be used by such apps + when replicating the indexes, and later we can figure out how to generalize it to work even if commit + reopen are done by separate threads/JVMs. I'll see if SearcherTaxonomyManager can be extended to support it, or a new STM is required. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5320) Create SearcherTaxonomyManager over Directory
[ https://issues.apache.org/jira/browse/LUCENE-5320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera resolved LUCENE-5320. Resolution: Fixed Fix Version/s: 4.7 5.0 Assignee: Shai Erera Lucene Fields: New,Patch Available (was: New) Committed to trunk and 4x. Create SearcherTaxonomyManager over Directory - Key: LUCENE-5320 URL: https://issues.apache.org/jira/browse/LUCENE-5320 Project: Lucene - Core Issue Type: New Feature Components: modules/facet Reporter: Shai Erera Assignee: Shai Erera Fix For: 5.0, 4.7 Attachments: LUCENE-5320.patch SearcherTaxonomyManager now only allows working in NRT mode. It could be useful to have an STM which allows reopening a SearcherAndTaxonomy pair over Directories, e.g. for replication. The problem is that if the thread that calls maybeRefresh() is not the one that does the commit(), it could lead to a pair that is not synchronized. Perhaps at first we could have a simple version that works under some assumptions, i.e. that the app does the commit + reopen in the same thread in that order, so that it can be used by such apps + when replicating the indexes, and later we can figure out how to generalize it to work even if commit + reopen are done by separate threads/JVMs. I'll see if SearcherTaxonomyManager can be extended to support it, or a new STM is required. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5654) Create a synonym filter factory that is (re)configurable, and capable of reporting its configuration, via REST API
[ https://issues.apache.org/jira/browse/SOLR-5654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886574#comment-13886574 ] Jack Krupansky commented on SOLR-5654: -- Two reasonable and reliable use cases I have encountered: 1. Update or replace query-time synonyms - no risk for existing indexed data. 2. Add new index-time synonyms that will apply to new indexed documents - again, no expectation that they would apply to existing documents, but reindexing would of course apply them anyway. Create a synonym filter factory that is (re)configurable, and capable of reporting its configuration, via REST API -- Key: SOLR-5654 URL: https://issues.apache.org/jira/browse/SOLR-5654 Project: Solr Issue Type: Sub-task Components: Schema and Analysis Reporter: Steve Rowe A synonym filter factory could be (re)configurable via REST API by registering with the RESTManager described in SOLR-5653, and then responding to REST API calls to modify its init params and its synonyms resource file. Read-only (GET) REST API calls should also be provided, both for init params and the synonyms resource file. It should be possible to add/remove/modify one or more entries in the synonyms resource file. We should probably use JSON for the REST request body, as is done in the Schema REST API methods. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5422) Postings lists deduplication
[ https://issues.apache.org/jira/browse/LUCENE-5422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Kan updated LUCENE-5422: --- Labels: gsoc2014 (was: ) Postings lists deduplication Key: LUCENE-5422 URL: https://issues.apache.org/jira/browse/LUCENE-5422 Project: Lucene - Core Issue Type: Improvement Components: core/codecs, core/index Reporter: Dmitry Kan Labels: gsoc2014 The context: http://markmail.org/thread/tywtrjjcfdbzww6f Robert Muir and I have discussed what Robert eventually named postings lists deduplication at Berlin Buzzwords 2013 conference. The idea is to allow multiple terms to point to the same postings list to save space. This can be achieved by new index codec implementation, but this jira is open to other ideas as well. The application / impact of this is positive for synonyms, exact / inexact terms, leading wildcard support via storing reversed term etc. For example, at the moment, when supporting exact (unstemmed) and inexact (stemmed) searches, we store both unstemmed and stemmed variant of a word form and that leads to index bloating. That is why we had to remove the leading wildcard support via reversing a token on index and query time because of the same index size considerations. Comment from Mike McCandless: Neat idea! Would this idea allow a single term to point to (the union of) N other posting lists? It seems like that's necessary e.g. to handle the exact/inexact case. And then, to produce the Docs/AndPositionsEnum you'd need to do the merge sort across those N posting lists? Such a thing might also be do-able as runtime only wrapper around the postings API (FieldsProducer), if you could at runtime do the reverse expansion (e.g. stem - all of its surface forms). Comment from Robert Muir: I think the exact/inexact is trickier (detecting it would be the hard part), and you are right, another solution might work better. but for the reverse wildcard and synonyms situation, it seems we could even detect it on write if we created some hash of the previous terms postings. if the hash matches for the current term, we know it might be a duplicate and would have to actually do the costly check they are the same. maybe there are better ways to do it, but it might be a fun postingformat experiment to try. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5679) Shard splitting fails to split old clusterstate.json with router as a string
Shalin Shekhar Mangar created SOLR-5679: --- Summary: Shard splitting fails to split old clusterstate.json with router as a string Key: SOLR-5679 URL: https://issues.apache.org/jira/browse/SOLR-5679 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.6, 4.6.1 Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Fix For: 4.7 SOLR-5246 added support for splitting collections configured with a router.field but the fix was not back-compatible. After upgrading an old SolrCloud cluster to 4.6 or 4.6.1, shard splitting can fail with the following message: {quote} ERROR o.a.s.handler.admin.CoreAdminHandler - ERROR executing split: java.lang.ClassCastException: java.lang.String cannot be cast to java.util.Map at org.apache.solr.handler.admin.CoreAdminHandler.handleSplitAction(CoreAdminHandler.java:285) [solr-core-4.6.1.jar:4.6.1 1560866 - mark - 2014-01-23 20:21:50] at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:193) [solr-core-4.6.1.jar:4.6.1 1560866 - mark - 2014-01-23 20:21:50] at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) [solr-core-4.6.1.jar:4.6.1 1560866 - mark - 2014-01-23 20:21:50] {quote} This happens because the cluster state still contains the router as a string. The clusterstate.json is supposed to auto-upgrade if cluster state is upgraded but according to the user report that did not happen. In any case, we need to fix the core admin split. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5679) Shard splitting fails with ClassCastException on clusterstate.json with router as string
[ https://issues.apache.org/jira/browse/SOLR-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-5679: Summary: Shard splitting fails with ClassCastException on clusterstate.json with router as string (was: Shard splitting fails to split old clusterstate.json with router as a string) Shard splitting fails with ClassCastException on clusterstate.json with router as string Key: SOLR-5679 URL: https://issues.apache.org/jira/browse/SOLR-5679 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.6, 4.6.1 Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Fix For: 4.7 SOLR-5246 added support for splitting collections configured with a router.field but the fix was not back-compatible. After upgrading an old SolrCloud cluster to 4.6 or 4.6.1, shard splitting can fail with the following message: {quote} ERROR o.a.s.handler.admin.CoreAdminHandler - ERROR executing split: java.lang.ClassCastException: java.lang.String cannot be cast to java.util.Map at org.apache.solr.handler.admin.CoreAdminHandler.handleSplitAction(CoreAdminHandler.java:285) [solr-core-4.6.1.jar:4.6.1 1560866 - mark - 2014-01-23 20:21:50] at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:193) [solr-core-4.6.1.jar:4.6.1 1560866 - mark - 2014-01-23 20:21:50] at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) [solr-core-4.6.1.jar:4.6.1 1560866 - mark - 2014-01-23 20:21:50] {quote} This happens because the cluster state still contains the router as a string. The clusterstate.json is supposed to auto-upgrade if cluster state is upgraded but according to the user report that did not happen. In any case, we need to fix the core admin split. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5658) commitWithin does not reflect the new documents added
[ https://issues.apache.org/jira/browse/SOLR-5658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886604#comment-13886604 ] ASF subversion and git services commented on SOLR-5658: --- Commit 1562836 from sha...@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1562836 ] SOLR-5658: Removing System.out.println in JavaBinUpdatedRequestCodec added for debugging commitWithin does not reflect the new documents added - Key: SOLR-5658 URL: https://issues.apache.org/jira/browse/SOLR-5658 Project: Solr Issue Type: Bug Affects Versions: 4.6, 5.0 Reporter: Varun Thacker Assignee: Mark Miller Priority: Critical Fix For: 5.0, 4.7, 4.6.1 Attachments: SOLR-5658.patch, SOLR-5658.patch I start 4 nodes using the setup mentioned on - https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud I added a document using - curl http://localhost:8983/solr/update?commitWithin=1 -H Content-Type: text/xml --data-binary 'adddocfield name=idtestdoc/field/doc/add' In Solr 4.5.1 there is 1 soft commit with openSearcher=true and 1 hard commit with openSearcher=false In Solr 4.6.x there is there is only one commit hard commit with openSearcher=false So even after 10 seconds queries on none of the shards reflect the added document. This was also reported on the solr-user list ( http://lucene.472066.n3.nabble.com/Possible-regression-for-Solr-4-6-0-commitWithin-does-not-work-with-replicas-td4106102.html ) Here are the relevant logs Logs from Solr 4.5.1 Node 1: {code} 420021 [qtp619011445-12] INFO org.apache.solr.update.processor.LogUpdateProcessor – [collection1] webapp=/solr path=/update params={commitWithin=1} {add=[testdoc]} 0 45 {code} Node 2: {code} 119896 [qtp1608701025-10] INFO org.apache.solr.update.processor.LogUpdateProcessor – [collection1] webapp=/solr path=/update params={distrib.from=http://192.168.1.103:8983/solr/collection1/update.distrib=TOLEADERwt=javabinversion=2} {add=[testdoc (1458003295513608192)]} 0 348 129648 [commitScheduler-8-thread-1] INFO org.apache.solr.update.UpdateHandler – start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false} 129679 [commitScheduler-8-thread-1] INFO org.apache.solr.search.SolrIndexSearcher – Opening Searcher@e174f70 main 129680 [commitScheduler-8-thread-1] INFO org.apache.solr.update.UpdateHandler – end_commit_flush 129681 [searcherExecutor-5-thread-1] INFO org.apache.solr.core.SolrCore – QuerySenderListener sending requests to Searcher@e174f70 main{StandardDirectoryReader(segments_3:11:nrt _2(4.5.1):C1)} 129681 [searcherExecutor-5-thread-1] INFO org.apache.solr.core.SolrCore – QuerySenderListener done. 129681 [searcherExecutor-5-thread-1] INFO org.apache.solr.core.SolrCore – [collection1] Registered new searcher Searcher@e174f70 main{StandardDirectoryReader(segments_3:11:nrt _2(4.5.1):C1)} 134648 [commitScheduler-7-thread-1] INFO org.apache.solr.update.UpdateHandler – start commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} 134658 [commitScheduler-7-thread-1] INFO org.apache.solr.core.SolrCore – SolrDeletionPolicy.onCommit: commits: num=2 commit{dir=NRTCachingDirectory(org.apache.lucene.store.NIOFSDirectory@/Users/varun/solr-4.5.1/node2/solr/collection1/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@66a394a3; maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_3,generation=3} commit{dir=NRTCachingDirectory(org.apache.lucene.store.NIOFSDirectory@/Users/varun/solr-4.5.1/node2/solr/collection1/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@66a394a3; maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_4,generation=4} 134658 [commitScheduler-7-thread-1] INFO org.apache.solr.core.SolrCore – newest commit generation = 4 134660 [commitScheduler-7-thread-1] INFO org.apache.solr.update.UpdateHandler – end_commit_flush {code} Node 3: Node 4: {code} 374545 [qtp1608701025-16] INFO org.apache.solr.update.processor.LogUpdateProcessor – [collection1] webapp=/solr path=/update params={distrib.from=http://192.168.1.103:7574/solr/collection1/update.distrib=FROMLEADERwt=javabinversion=2} {add=[testdoc (1458002133233172480)]} 0 20 384545 [commitScheduler-8-thread-1] INFO org.apache.solr.update.UpdateHandler – start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false} 384552 [commitScheduler-8-thread-1] INFO org.apache.solr.search.SolrIndexSearcher – Opening Searcher@36137e08 main 384553
[jira] [Commented] (SOLR-5658) commitWithin does not reflect the new documents added
[ https://issues.apache.org/jira/browse/SOLR-5658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886608#comment-13886608 ] Shalin Shekhar Mangar commented on SOLR-5658: - Perhaps I should remove this println as another issue because this has already been released? commitWithin does not reflect the new documents added - Key: SOLR-5658 URL: https://issues.apache.org/jira/browse/SOLR-5658 Project: Solr Issue Type: Bug Affects Versions: 4.6, 5.0 Reporter: Varun Thacker Assignee: Mark Miller Priority: Critical Fix For: 5.0, 4.7, 4.6.1 Attachments: SOLR-5658.patch, SOLR-5658.patch I start 4 nodes using the setup mentioned on - https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud I added a document using - curl http://localhost:8983/solr/update?commitWithin=1 -H Content-Type: text/xml --data-binary 'adddocfield name=idtestdoc/field/doc/add' In Solr 4.5.1 there is 1 soft commit with openSearcher=true and 1 hard commit with openSearcher=false In Solr 4.6.x there is there is only one commit hard commit with openSearcher=false So even after 10 seconds queries on none of the shards reflect the added document. This was also reported on the solr-user list ( http://lucene.472066.n3.nabble.com/Possible-regression-for-Solr-4-6-0-commitWithin-does-not-work-with-replicas-td4106102.html ) Here are the relevant logs Logs from Solr 4.5.1 Node 1: {code} 420021 [qtp619011445-12] INFO org.apache.solr.update.processor.LogUpdateProcessor – [collection1] webapp=/solr path=/update params={commitWithin=1} {add=[testdoc]} 0 45 {code} Node 2: {code} 119896 [qtp1608701025-10] INFO org.apache.solr.update.processor.LogUpdateProcessor – [collection1] webapp=/solr path=/update params={distrib.from=http://192.168.1.103:8983/solr/collection1/update.distrib=TOLEADERwt=javabinversion=2} {add=[testdoc (1458003295513608192)]} 0 348 129648 [commitScheduler-8-thread-1] INFO org.apache.solr.update.UpdateHandler – start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false} 129679 [commitScheduler-8-thread-1] INFO org.apache.solr.search.SolrIndexSearcher – Opening Searcher@e174f70 main 129680 [commitScheduler-8-thread-1] INFO org.apache.solr.update.UpdateHandler – end_commit_flush 129681 [searcherExecutor-5-thread-1] INFO org.apache.solr.core.SolrCore – QuerySenderListener sending requests to Searcher@e174f70 main{StandardDirectoryReader(segments_3:11:nrt _2(4.5.1):C1)} 129681 [searcherExecutor-5-thread-1] INFO org.apache.solr.core.SolrCore – QuerySenderListener done. 129681 [searcherExecutor-5-thread-1] INFO org.apache.solr.core.SolrCore – [collection1] Registered new searcher Searcher@e174f70 main{StandardDirectoryReader(segments_3:11:nrt _2(4.5.1):C1)} 134648 [commitScheduler-7-thread-1] INFO org.apache.solr.update.UpdateHandler – start commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} 134658 [commitScheduler-7-thread-1] INFO org.apache.solr.core.SolrCore – SolrDeletionPolicy.onCommit: commits: num=2 commit{dir=NRTCachingDirectory(org.apache.lucene.store.NIOFSDirectory@/Users/varun/solr-4.5.1/node2/solr/collection1/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@66a394a3; maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_3,generation=3} commit{dir=NRTCachingDirectory(org.apache.lucene.store.NIOFSDirectory@/Users/varun/solr-4.5.1/node2/solr/collection1/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@66a394a3; maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_4,generation=4} 134658 [commitScheduler-7-thread-1] INFO org.apache.solr.core.SolrCore – newest commit generation = 4 134660 [commitScheduler-7-thread-1] INFO org.apache.solr.update.UpdateHandler – end_commit_flush {code} Node 3: Node 4: {code} 374545 [qtp1608701025-16] INFO org.apache.solr.update.processor.LogUpdateProcessor – [collection1] webapp=/solr path=/update params={distrib.from=http://192.168.1.103:7574/solr/collection1/update.distrib=FROMLEADERwt=javabinversion=2} {add=[testdoc (1458002133233172480)]} 0 20 384545 [commitScheduler-8-thread-1] INFO org.apache.solr.update.UpdateHandler – start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false} 384552 [commitScheduler-8-thread-1] INFO org.apache.solr.search.SolrIndexSearcher – Opening Searcher@36137e08 main 384553 [commitScheduler-8-thread-1] INFO org.apache.solr.update.UpdateHandler – end_commit_flush 384553
[JENKINS] Lucene-Solr-4.x-MacOSX (64bit/jdk1.7.0) - Build # 1248 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-MacOSX/1248/ Java: 64bit/jdk1.7.0 -XX:+UseCompressedOops -XX:+UseParallelGC All tests passed Build Log: [...truncated 10556 lines...] [junit4] JVM J0: stderr was not empty, see: /Users/jenkins/workspace/Lucene-Solr-4.x-MacOSX/solr/build/solr-core/test/temp/junit4-J0-20140130_143859_837.syserr [junit4] JVM J0: stderr (verbatim) [junit4] java(749,0x13c179000) malloc: *** error for object 0x113c1683a0: pointer being freed was not allocated [junit4] *** set a breakpoint in malloc_error_break to debug [junit4] JVM J0: EOF [...truncated 1 lines...] [junit4] ERROR: JVM J0 ended with an exception, command line: /Library/Java/JavaVirtualMachines/jdk1.7.0_51.jdk/Contents/Home/jre/bin/java -XX:+UseCompressedOops -XX:+UseParallelGC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/Users/jenkins/workspace/Lucene-Solr-4.x-MacOSX/heapdumps -Dtests.prefix=tests -Dtests.seed=FFC506EA979174D6 -Xmx512M -Dtests.iters= -Dtests.verbose=false -Dtests.infostream=false -Dtests.codec=random -Dtests.postingsformat=random -Dtests.docvaluesformat=random -Dtests.locale=random -Dtests.timezone=random -Dtests.directory=random -Dtests.linedocsfile=europarl.lines.txt.gz -Dtests.luceneMatchVersion=4.7 -Dtests.cleanthreads=perClass -Djava.util.logging.config.file=/Users/jenkins/workspace/Lucene-Solr-4.x-MacOSX/lucene/tools/junit4/logging.properties -Dtests.nightly=false -Dtests.weekly=false -Dtests.slow=true -Dtests.asserts.gracious=false -Dtests.multiplier=1 -DtempDir=. -Djava.io.tmpdir=. -Djunit4.tempDir=/Users/jenkins/workspace/Lucene-Solr-4.x-MacOSX/solr/build/solr-core/test/temp -Dclover.db.dir=/Users/jenkins/workspace/Lucene-Solr-4.x-MacOSX/lucene/build/clover/db -Djava.security.manager=org.apache.lucene.util.TestSecurityManager -Djava.security.policy=/Users/jenkins/workspace/Lucene-Solr-4.x-MacOSX/lucene/tools/junit4/tests.policy -Dlucene.version=4.7-SNAPSHOT -Djetty.testMode=1 -Djetty.insecurerandom=1 -Dsolr.directoryFactory=org.apache.solr.core.MockDirectoryFactory -Djava.awt.headless=true -Djdk.map.althashing.threshold=0 -Dtests.disableHdfs=true -Dfile.encoding=US-ASCII -classpath
[jira] [Commented] (SOLR-5658) commitWithin does not reflect the new documents added
[ https://issues.apache.org/jira/browse/SOLR-5658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886676#comment-13886676 ] ASF subversion and git services commented on SOLR-5658: --- Commit 1562860 from sha...@apache.org in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1562860 ] SOLR-5658: Removing System.out.println in JavaBinUpdatedRequestCodec added for debugging commitWithin does not reflect the new documents added - Key: SOLR-5658 URL: https://issues.apache.org/jira/browse/SOLR-5658 Project: Solr Issue Type: Bug Affects Versions: 4.6, 5.0 Reporter: Varun Thacker Assignee: Mark Miller Priority: Critical Fix For: 5.0, 4.7, 4.6.1 Attachments: SOLR-5658.patch, SOLR-5658.patch I start 4 nodes using the setup mentioned on - https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud I added a document using - curl http://localhost:8983/solr/update?commitWithin=1 -H Content-Type: text/xml --data-binary 'adddocfield name=idtestdoc/field/doc/add' In Solr 4.5.1 there is 1 soft commit with openSearcher=true and 1 hard commit with openSearcher=false In Solr 4.6.x there is there is only one commit hard commit with openSearcher=false So even after 10 seconds queries on none of the shards reflect the added document. This was also reported on the solr-user list ( http://lucene.472066.n3.nabble.com/Possible-regression-for-Solr-4-6-0-commitWithin-does-not-work-with-replicas-td4106102.html ) Here are the relevant logs Logs from Solr 4.5.1 Node 1: {code} 420021 [qtp619011445-12] INFO org.apache.solr.update.processor.LogUpdateProcessor – [collection1] webapp=/solr path=/update params={commitWithin=1} {add=[testdoc]} 0 45 {code} Node 2: {code} 119896 [qtp1608701025-10] INFO org.apache.solr.update.processor.LogUpdateProcessor – [collection1] webapp=/solr path=/update params={distrib.from=http://192.168.1.103:8983/solr/collection1/update.distrib=TOLEADERwt=javabinversion=2} {add=[testdoc (1458003295513608192)]} 0 348 129648 [commitScheduler-8-thread-1] INFO org.apache.solr.update.UpdateHandler – start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false} 129679 [commitScheduler-8-thread-1] INFO org.apache.solr.search.SolrIndexSearcher – Opening Searcher@e174f70 main 129680 [commitScheduler-8-thread-1] INFO org.apache.solr.update.UpdateHandler – end_commit_flush 129681 [searcherExecutor-5-thread-1] INFO org.apache.solr.core.SolrCore – QuerySenderListener sending requests to Searcher@e174f70 main{StandardDirectoryReader(segments_3:11:nrt _2(4.5.1):C1)} 129681 [searcherExecutor-5-thread-1] INFO org.apache.solr.core.SolrCore – QuerySenderListener done. 129681 [searcherExecutor-5-thread-1] INFO org.apache.solr.core.SolrCore – [collection1] Registered new searcher Searcher@e174f70 main{StandardDirectoryReader(segments_3:11:nrt _2(4.5.1):C1)} 134648 [commitScheduler-7-thread-1] INFO org.apache.solr.update.UpdateHandler – start commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} 134658 [commitScheduler-7-thread-1] INFO org.apache.solr.core.SolrCore – SolrDeletionPolicy.onCommit: commits: num=2 commit{dir=NRTCachingDirectory(org.apache.lucene.store.NIOFSDirectory@/Users/varun/solr-4.5.1/node2/solr/collection1/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@66a394a3; maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_3,generation=3} commit{dir=NRTCachingDirectory(org.apache.lucene.store.NIOFSDirectory@/Users/varun/solr-4.5.1/node2/solr/collection1/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@66a394a3; maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_4,generation=4} 134658 [commitScheduler-7-thread-1] INFO org.apache.solr.core.SolrCore – newest commit generation = 4 134660 [commitScheduler-7-thread-1] INFO org.apache.solr.update.UpdateHandler – end_commit_flush {code} Node 3: Node 4: {code} 374545 [qtp1608701025-16] INFO org.apache.solr.update.processor.LogUpdateProcessor – [collection1] webapp=/solr path=/update params={distrib.from=http://192.168.1.103:7574/solr/collection1/update.distrib=FROMLEADERwt=javabinversion=2} {add=[testdoc (1458002133233172480)]} 0 20 384545 [commitScheduler-8-thread-1] INFO org.apache.solr.update.UpdateHandler – start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false} 384552 [commitScheduler-8-thread-1] INFO org.apache.solr.search.SolrIndexSearcher – Opening Searcher@36137e08
[jira] [Updated] (SOLR-5673) HTTPSolrServer doesn't set own property correctly in setFollowRedirects
[ https://issues.apache.org/jira/browse/SOLR-5673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frank Wesemann updated SOLR-5673: - Attachment: SOLR-5673.diff This diff adds a test and a patch for HTTPSolrServer HTTPSolrServer doesn't set own property correctly in setFollowRedirects --- Key: SOLR-5673 URL: https://issues.apache.org/jira/browse/SOLR-5673 Project: Solr Issue Type: Bug Affects Versions: 4.6 Reporter: Frank Wesemann Priority: Minor Attachments: SOLR-5673.diff in setFollowRedirects(boolean newValue) HTTPSolr sets its internal property followRedirects always to true, regardless of the given parameter. Patch and tests will follow tomorrow. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Patches in Jira or pull requests on github?
Opening a pull request will send a notification to the mailing list, so that should get noticed the same as opening a JIRA. Excellent, thanks! Will also have a closer look at the developer tips. - Bram - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5679) Shard splitting fails with ClassCastException on clusterstate.json with router as string
[ https://issues.apache.org/jira/browse/SOLR-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886703#comment-13886703 ] ASF subversion and git services commented on SOLR-5679: --- Commit 1562872 from sha...@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1562872 ] SOLR-5679: SOLR-5679: Shard splitting fails with ClassCastException on collections upgraded from 4.5 and earlier versions Shard splitting fails with ClassCastException on clusterstate.json with router as string Key: SOLR-5679 URL: https://issues.apache.org/jira/browse/SOLR-5679 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.6, 4.6.1 Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Fix For: 4.7 SOLR-5246 added support for splitting collections configured with a router.field but the fix was not back-compatible. After upgrading an old SolrCloud cluster to 4.6 or 4.6.1, shard splitting can fail with the following message: {quote} ERROR o.a.s.handler.admin.CoreAdminHandler - ERROR executing split: java.lang.ClassCastException: java.lang.String cannot be cast to java.util.Map at org.apache.solr.handler.admin.CoreAdminHandler.handleSplitAction(CoreAdminHandler.java:285) [solr-core-4.6.1.jar:4.6.1 1560866 - mark - 2014-01-23 20:21:50] at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:193) [solr-core-4.6.1.jar:4.6.1 1560866 - mark - 2014-01-23 20:21:50] at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) [solr-core-4.6.1.jar:4.6.1 1560866 - mark - 2014-01-23 20:21:50] {quote} This happens because the cluster state still contains the router as a string. The clusterstate.json is supposed to auto-upgrade if cluster state is upgraded but according to the user report that did not happen. In any case, we need to fix the core admin split. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5679) Shard splitting fails with ClassCastException on clusterstate.json with router as string
[ https://issues.apache.org/jira/browse/SOLR-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886707#comment-13886707 ] ASF subversion and git services commented on SOLR-5679: --- Commit 1562873 from sha...@apache.org in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1562873 ] SOLR-5679: Shard splitting fails with ClassCastException on collections upgraded from 4.5 and earlier versions Shard splitting fails with ClassCastException on clusterstate.json with router as string Key: SOLR-5679 URL: https://issues.apache.org/jira/browse/SOLR-5679 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.6, 4.6.1 Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Fix For: 4.7 SOLR-5246 added support for splitting collections configured with a router.field but the fix was not back-compatible. After upgrading an old SolrCloud cluster to 4.6 or 4.6.1, shard splitting can fail with the following message: {quote} ERROR o.a.s.handler.admin.CoreAdminHandler - ERROR executing split: java.lang.ClassCastException: java.lang.String cannot be cast to java.util.Map at org.apache.solr.handler.admin.CoreAdminHandler.handleSplitAction(CoreAdminHandler.java:285) [solr-core-4.6.1.jar:4.6.1 1560866 - mark - 2014-01-23 20:21:50] at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:193) [solr-core-4.6.1.jar:4.6.1 1560866 - mark - 2014-01-23 20:21:50] at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) [solr-core-4.6.1.jar:4.6.1 1560866 - mark - 2014-01-23 20:21:50] {quote} This happens because the cluster state still contains the router as a string. The clusterstate.json is supposed to auto-upgrade if cluster state is upgraded but according to the user report that did not happen. In any case, we need to fix the core admin split. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-5679) Shard splitting fails with ClassCastException on clusterstate.json with router as string
[ https://issues.apache.org/jira/browse/SOLR-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar resolved SOLR-5679. - Resolution: Fixed This is fixed. I'll investigate the auto-upgrade of clusterstate.json separately. Shard splitting fails with ClassCastException on clusterstate.json with router as string Key: SOLR-5679 URL: https://issues.apache.org/jira/browse/SOLR-5679 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.6, 4.6.1 Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Fix For: 4.7 SOLR-5246 added support for splitting collections configured with a router.field but the fix was not back-compatible. After upgrading an old SolrCloud cluster to 4.6 or 4.6.1, shard splitting can fail with the following message: {quote} ERROR o.a.s.handler.admin.CoreAdminHandler - ERROR executing split: java.lang.ClassCastException: java.lang.String cannot be cast to java.util.Map at org.apache.solr.handler.admin.CoreAdminHandler.handleSplitAction(CoreAdminHandler.java:285) [solr-core-4.6.1.jar:4.6.1 1560866 - mark - 2014-01-23 20:21:50] at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:193) [solr-core-4.6.1.jar:4.6.1 1560866 - mark - 2014-01-23 20:21:50] at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) [solr-core-4.6.1.jar:4.6.1 1560866 - mark - 2014-01-23 20:21:50] {quote} This happens because the cluster state still contains the router as a string. The clusterstate.json is supposed to auto-upgrade if cluster state is upgraded but according to the user report that did not happen. In any case, we need to fix the core admin split. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5673) HTTPSolrServer doesn't set own property correctly in setFollowRedirects
[ https://issues.apache.org/jira/browse/SOLR-5673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frank Wesemann updated SOLR-5673: - Affects Version/s: (was: 4.6) HTTPSolrServer doesn't set own property correctly in setFollowRedirects --- Key: SOLR-5673 URL: https://issues.apache.org/jira/browse/SOLR-5673 Project: Solr Issue Type: Bug Reporter: Frank Wesemann Priority: Minor Attachments: SOLR-5673.diff in setFollowRedirects(boolean newValue) HTTPSolr sets its internal property followRedirects always to true, regardless of the given parameter. Patch and tests will follow tomorrow. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5673) HTTPSolrServer doesn't set own property correctly in setFollowRedirects
[ https://issues.apache.org/jira/browse/SOLR-5673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frank Wesemann updated SOLR-5673: - Affects Version/s: 4.7 5.0 4.6 HTTPSolrServer doesn't set own property correctly in setFollowRedirects --- Key: SOLR-5673 URL: https://issues.apache.org/jira/browse/SOLR-5673 Project: Solr Issue Type: Bug Affects Versions: 4.6, 5.0, 4.7 Reporter: Frank Wesemann Priority: Minor Attachments: SOLR-5673.diff in setFollowRedirects(boolean newValue) HTTPSolr sets its internal property followRedirects always to true, regardless of the given parameter. Patch and tests will follow tomorrow. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5418) Don't use .advance on costly (e.g. distance range facets) filters
[ https://issues.apache.org/jira/browse/LUCENE-5418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886760#comment-13886760 ] David Smiley commented on LUCENE-5418: -- How ironic that I was contemplating this very same issue yesterday (shared on IRC #lucene-dev) as I work on LUCENE-5408 and now I see you guys were just thinking about it. Rob's right; the problem isn't just advance(), it's next() too. There may be a place to share some code that Mike is committing here in his facet module with a static utility class I coded yesterday in LUCENE-5408 (not yet posted). It's a BitsDocIdSet and it's roughly similar to Mike's SlowBitsDocIdSetIterator: {code:java} /** Utility class that wraps a {@link Bits} with a {@link DocIdSet}. */ private static class BitsDocIdSet extends DocIdSet { final Bits bits;//not null public BitsDocIdSet(Bits bits) { if (bits == null) throw new NullPointerException(bits arg should be non-null); this.bits = bits; } @Override public DocIdSetIterator iterator() throws IOException { return new DocIdSetIterator() { final Bits bits = BitsDocIdSet.this.bits;//copy reference to reduce outer class access int docId = -1; @Override public int docID() { return docId; } @Override public int nextDoc() throws IOException { return advance(docId + 1); } @Override public int advance(int target) throws IOException { for (docId = target; docId bits.length(); docId++) { if (bits.get(docId)) return docId; } return NO_MORE_DOCS; } @Override public long cost() { return bits.length(); } }; } @Override public Bits bits() throws IOException { return bits;//won't be null } //we don't override isCacheable because we want the default of false }//class BitsDocIdSet {code} So Mike; you've got just the DISI portion, and you're also incorporating acceptDocs. For me I elected to have acceptDocs be pre-incorporated into the Bits I pass through. I'll post my intermediate progress on LUCENE-5408. So any way; how about we have something in the utils package to share? Don't use .advance on costly (e.g. distance range facets) filters - Key: LUCENE-5418 URL: https://issues.apache.org/jira/browse/LUCENE-5418 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.7 Attachments: LUCENE-5418.patch If you use a distance filter today (see http://blog.mikemccandless.com/2014/01/geospatial-distance-faceting-using.html ), then drill down on one of those ranges, under the hood Lucene is using .advance on the Filter, which is very costly because we end up computing distance on (possibly many) hits that don't match the query. It's better performance to find the hits matching the Query first, and then check the filter. FilteredQuery can already do this today, when you use its QUERY_FIRST_FILTER_STRATEGY. This essentially accomplishes the same thing as Solr's post filters (I think?) but with a far simpler/better/less code approach. E.g., I believe ElasticSearch uses this API when it applies costly filters. Longish term, I think Query/Filter ought to know itself that it's expensive, and cases where such a Query/Filter is MUST'd onto a BooleanQuery (e.g. ConstantScoreQuery), or the Filter is a clause in BooleanFilter, or it's passed to IndexSearcher.search, we should also be smart here and not call .advance on such clauses. But that'd be a biggish change ... so for today the workaround is the user must carefully construct the FilteredQuery themselves. In the mean time, as another workaround, I want to fix DrillSideways so that when you drill down on such filters it doesn't use .advance; this should give a good speedup for the normal path API usage with a costly filter. I'm iterating on the lucene server branch (LUCENE-5376) but once it's working I plan to merge this back to trunk / 4.7. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5408) GeometryStrategy -- match geometries in DocValues
[ https://issues.apache.org/jira/browse/LUCENE-5408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-5408: - Attachment: LUCENE-5408_GeometryStrategy.patch This is intermediate progress; it needs to be tested. And I hope to possible share a Bits based DocIdSet with [~mikemccand] in LUCENE-5418. The sentiment in that issue about how to handle super-slow Filters is a problem here too. I had an epiphany last night that the current Spatial RPT grid and algorithm doesn't need to be modified to be able to differentiate the matching docs into confirmed un-confirmed matches for common scenarios. As such, to prevent mis-use of the expensive Filter returned from this GeometryStrategy, I might force it to be paired with RecursivePrefixTreeStrategy. And then leave an expert method exposed to grab Bits or a Filter purely based on the Geometry DocValues check. ElasticSearch and Solr wouldn't use that but someone coding directly to Lucene would have the ability to wire things together in ways more flexible than are possible in ES or Solr. The most ideal way is to compute a fast pre-filter bitset separate from the slow post-filter, with user keyword queries and other filters in the middle. But the slow post-filter to operate best needs a side-artifact bitset computed when the pre-filter bitset is generated. I'll eventually be more clear in javadocs. GeometryStrategy -- match geometries in DocValues - Key: LUCENE-5408 URL: https://issues.apache.org/jira/browse/LUCENE-5408 Project: Lucene - Core Issue Type: New Feature Components: modules/spatial Reporter: David Smiley Assignee: David Smiley Fix For: 4.7 Attachments: LUCENE-5408_GeometryStrategy.patch I've started work on a new SpatialStrategy implementation I'm tentatively calling GeometryStrategy. It's similar to the [JtsGeoStrategy in Spatial-Solr-Sandbox|https://github.com/ryantxu/spatial-solr-sandbox/tree/master/LSE/src/main/java/org/apache/lucene/spatial/pending/jts] but a little different in the details -- certainly faster. Using Spatial4j 0.4's BinaryCodec, it'll serialize the shape to bytes (for polygons this in internally WKB format) and the strategy will put it in a BinaryDocValuesField. In practice the shape is likely a polygon but it needn't be. Then I'll implement a Filter that returns a DocIdSetIterator that evaluates a given document passed via advance(docid)) to see if the query shape matches a shape in DocValues. It's improper usage for it to be used in a situation where it will evaluate every document id via nextDoc(). And in practice the DocValues format chosen should be a disk resident one since each value tends to be kind of big. This spatial strategy in and of itself has no _index_; it's O(N) where N is the number of documents that get passed thru it. So it should be placed last in the query/filter tree so that the other queries limit the documents it needs to see. At a minimum, another query/filter to use in conjunction is another SpatialStrategy like RecursivePrefixTreeStrategy. Eventually once the PrefixTree grid encoding has a little bit more metadata, it will be possible to further combine the grid this strategy in such a way that many documents won't need to be checked against the serialized geometry. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5424) FilteredQuery useRandomAccess() should use cost()
David Smiley created LUCENE-5424: Summary: FilteredQuery useRandomAccess() should use cost() Key: LUCENE-5424 URL: https://issues.apache.org/jira/browse/LUCENE-5424 Project: Lucene - Core Issue Type: Improvement Components: core/query/scoring Reporter: David Smiley Now that Lucene's DISI has a cost() method, it's possible for FilteredQuery's RANDOM_ACCESS_FILTER_STRATEGY to use a smarter algorithm in its useRandomAccess() method. In particular, it might examine filterIter.cost() to see if it is greater than the cost returned by weight.scorer().cost() of the query. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-5673) HTTPSolrServer doesn't set own property correctly in setFollowRedirects
[ https://issues.apache.org/jira/browse/SOLR-5673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar reassigned SOLR-5673: --- Assignee: Shalin Shekhar Mangar HTTPSolrServer doesn't set own property correctly in setFollowRedirects --- Key: SOLR-5673 URL: https://issues.apache.org/jira/browse/SOLR-5673 Project: Solr Issue Type: Bug Affects Versions: 4.6, 5.0, 4.7 Reporter: Frank Wesemann Assignee: Shalin Shekhar Mangar Priority: Minor Attachments: SOLR-5673.diff in setFollowRedirects(boolean newValue) HTTPSolr sets its internal property followRedirects always to true, regardless of the given parameter. Patch and tests will follow tomorrow. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5673) HTTPSolrServer doesn't set own property correctly in setFollowRedirects
[ https://issues.apache.org/jira/browse/SOLR-5673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886798#comment-13886798 ] ASF subversion and git services commented on SOLR-5673: --- Commit 1562898 from sha...@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1562898 ] SOLR-5673: HttpSolrServer doesn't set own property correctly in setFollowRedirects HTTPSolrServer doesn't set own property correctly in setFollowRedirects --- Key: SOLR-5673 URL: https://issues.apache.org/jira/browse/SOLR-5673 Project: Solr Issue Type: Bug Affects Versions: 4.6, 5.0, 4.7 Reporter: Frank Wesemann Assignee: Shalin Shekhar Mangar Priority: Minor Attachments: SOLR-5673.diff in setFollowRedirects(boolean newValue) HTTPSolr sets its internal property followRedirects always to true, regardless of the given parameter. Patch and tests will follow tomorrow. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5673) HTTPSolrServer doesn't set own property correctly in setFollowRedirects
[ https://issues.apache.org/jira/browse/SOLR-5673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886800#comment-13886800 ] ASF subversion and git services commented on SOLR-5673: --- Commit 1562899 from sha...@apache.org in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1562899 ] SOLR-5673: HttpSolrServer doesn't set own property correctly in setFollowRedirects HTTPSolrServer doesn't set own property correctly in setFollowRedirects --- Key: SOLR-5673 URL: https://issues.apache.org/jira/browse/SOLR-5673 Project: Solr Issue Type: Bug Affects Versions: 4.6, 5.0, 4.7 Reporter: Frank Wesemann Assignee: Shalin Shekhar Mangar Priority: Minor Attachments: SOLR-5673.diff in setFollowRedirects(boolean newValue) HTTPSolr sets its internal property followRedirects always to true, regardless of the given parameter. Patch and tests will follow tomorrow. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5673) HTTPSolrServer doesn't set own property correctly in setFollowRedirects
[ https://issues.apache.org/jira/browse/SOLR-5673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-5673: Fix Version/s: 5.0 HTTPSolrServer doesn't set own property correctly in setFollowRedirects --- Key: SOLR-5673 URL: https://issues.apache.org/jira/browse/SOLR-5673 Project: Solr Issue Type: Bug Affects Versions: 4.6, 5.0, 4.7 Reporter: Frank Wesemann Assignee: Shalin Shekhar Mangar Priority: Minor Fix For: 5.0, 4.7 Attachments: SOLR-5673.diff in setFollowRedirects(boolean newValue) HTTPSolr sets its internal property followRedirects always to true, regardless of the given parameter. Patch and tests will follow tomorrow. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5424) FilteredQuery useRandomAccess() should use cost()
[ https://issues.apache.org/jira/browse/LUCENE-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886924#comment-13886924 ] Michael McCandless commented on LUCENE-5424: +1, there is a TODO about this in the code. But I'm not sure how to translate cost to the right filter strategy maybe we need a hasAdvance() as Rob suggested on LUCENE-5418? Also, useRandomAccess should not be used for costly filters: it should be used for cheap filters (e.g. a FixedBitSet), because this is passed down as the acceptDocs to possibly many, many postings iterators. EG, if you run a BQ with 10 terms and pass a Filter to IS when doing the search ... if useRandomAccess is true, that filter is checked in all 10 of those DocsEnums, quite possibly many times per document. FilteredQuery useRandomAccess() should use cost() - Key: LUCENE-5424 URL: https://issues.apache.org/jira/browse/LUCENE-5424 Project: Lucene - Core Issue Type: Improvement Components: core/query/scoring Reporter: David Smiley Now that Lucene's DISI has a cost() method, it's possible for FilteredQuery's RANDOM_ACCESS_FILTER_STRATEGY to use a smarter algorithm in its useRandomAccess() method. In particular, it might examine filterIter.cost() to see if it is greater than the cost returned by weight.scorer().cost() of the query. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5418) Don't use .advance on costly (e.g. distance range facets) filters
[ https://issues.apache.org/jira/browse/LUCENE-5418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886925#comment-13886925 ] Michael McCandless commented on LUCENE-5418: Actually, I've removed the SlowBitsDocIdSetIterator (need to post a patch again soon...), because it's too trappy. I think it's better if the user gets an exception here than just silently run super slowly, and at least for this issue there are always ways to run the Filter quickly (use DrillSideways or DrillDownQuery, or create FilteredQuery directly). Don't use .advance on costly (e.g. distance range facets) filters - Key: LUCENE-5418 URL: https://issues.apache.org/jira/browse/LUCENE-5418 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, 4.7 Attachments: LUCENE-5418.patch If you use a distance filter today (see http://blog.mikemccandless.com/2014/01/geospatial-distance-faceting-using.html ), then drill down on one of those ranges, under the hood Lucene is using .advance on the Filter, which is very costly because we end up computing distance on (possibly many) hits that don't match the query. It's better performance to find the hits matching the Query first, and then check the filter. FilteredQuery can already do this today, when you use its QUERY_FIRST_FILTER_STRATEGY. This essentially accomplishes the same thing as Solr's post filters (I think?) but with a far simpler/better/less code approach. E.g., I believe ElasticSearch uses this API when it applies costly filters. Longish term, I think Query/Filter ought to know itself that it's expensive, and cases where such a Query/Filter is MUST'd onto a BooleanQuery (e.g. ConstantScoreQuery), or the Filter is a clause in BooleanFilter, or it's passed to IndexSearcher.search, we should also be smart here and not call .advance on such clauses. But that'd be a biggish change ... so for today the workaround is the user must carefully construct the FilteredQuery themselves. In the mean time, as another workaround, I want to fix DrillSideways so that when you drill down on such filters it doesn't use .advance; this should give a good speedup for the normal path API usage with a costly filter. I'm iterating on the lucene server branch (LUCENE-5376) but once it's working I plan to merge this back to trunk / 4.7. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5424) FilteredQuery useRandomAccess() should use cost()
[ https://issues.apache.org/jira/browse/LUCENE-5424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886963#comment-13886963 ] David Smiley commented on LUCENE-5424: -- I know I commented on LUCENE-5418 and then immediately created this issue but these are not particularly related. I totally recognize that RANDOM_ACCESS_FILTER_STRATEGY is for the typical case of fast filters. And indeed I observed the TODO comment and thought, _hey_, DISI *does* have a {{cost()}} now -- lets do this! Now there's this JIRA issue :-) Not sure how to arrive at the right tuning ratio between the cost() of both DISI's. Maybe use the benchmark module and try various filters that match 1%, 2%, etc. up to 99% of the documents, against some simple query that always matches the same 50 % of the total docs? And then test this method given configurable threshold ratios of query_cost/filter_cost of 10%, 20%, ... etc. and see where the inflection point is. That's complicated, yeah. FilteredQuery useRandomAccess() should use cost() - Key: LUCENE-5424 URL: https://issues.apache.org/jira/browse/LUCENE-5424 Project: Lucene - Core Issue Type: Improvement Components: core/query/scoring Reporter: David Smiley Now that Lucene's DISI has a cost() method, it's possible for FilteredQuery's RANDOM_ACCESS_FILTER_STRATEGY to use a smarter algorithm in its useRandomAccess() method. In particular, it might examine filterIter.cost() to see if it is greater than the cost returned by weight.scorer().cost() of the query. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5680) ConcurrentUpdateSolrServer ignores HttpClient parameter
Edgar Espina created SOLR-5680: -- Summary: ConcurrentUpdateSolrServer ignores HttpClient parameter Key: SOLR-5680 URL: https://issues.apache.org/jira/browse/SOLR-5680 Project: Solr Issue Type: Bug Components: clients - java Affects Versions: 4.6 Reporter: Edgar Espina Priority: Minor Since 4.6.1 ConcurrentUpdateSolrServer ignores HttpClient parameter Here is the source code: public ConcurrentUpdateSolrServer(String solrServerUrl, HttpClient client, int queueSize, int threadCount) { this(solrServerUrl, null, queueSize, threadCount, Executors.newCachedThreadPool(new SolrjNamedThreadFactory(concurrentUpdateScheduler))); shutdownExecutor = true; } It calls this with null as 2nd parameter Thanks -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5681) Make the OverseerCollectionProcessor multi-threaded
[ https://issues.apache.org/jira/browse/SOLR-5681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anshum Gupta updated SOLR-5681: --- Description: Right now, the OverseerCollectionProcessor is single threaded i.e submitting anything long running would have it block processing of other mutually exclusive tasks. When OCP tasks become optionally async (SOLR-5477), it'd be good to have truly non-blocking behavior by multi-threading the OCP itself. For example, a ShardSplit call on Collection1 would block the thread and thereby, not processing a create collection task (which would stay queued in zk) though both the tasks are mutually exclusive. Here are a few of the challenges: * Mutual exclusivity: Only let mutually exclusive tasks run in parallel. An easy way to handle that is to only let 1 task per collection run at a time. * ZK Distributed Queue to feed tasks: The OCP consumes tasks from a queue. The task from the workQueue is only removed on completion so that in case of a failure, the new Overseer can re-consume the same task and retry. A queue is not the right data structure in the first place to look ahead i.e. get the 2nd task from the queue when the 1st one is in process. Also, deleting tasks which are not at the head of a queue is not really an 'intuitive' thing. Proposed solutions for task management: * Task funnel and peekAfter(): The parent thread is responsible for getting and passing the request to a new thread (or one from the pool). The parent method uses a peekAfter(last element) instead of a peek(). The peekAfter returns the task after the 'last element'. Maintain this request information and use it for deleting/cleaning up the workQueue. * Another (almost duplicate) queue: While offering tasks to workQueue, also offer them to a new queue (call it volatileWorkQueue?). The difference is, as soon as a task from this is picked up for processing by the thread, it's removed from the queue. At the end, the cleanup is done from the workQueue. was: Right now, the OverseerCollectionProcessor is single threaded i.e submitting anything long running would have it block processing of other mutually exclusive tasks. With OCP tasks becoming async, it'd be good to have truly non-blocking behavior by multi-threading the OCP itself. For example, a ShardSplit call on Collection1 would block the thread and thereby, not processing a create collection task (which would stay queued in zk) though both the tasks are mutually exclusive. Here are a few of the challenges: * Mutual exclusivity: Only let mutually exclusive tasks run in parallel. An easy way to handle that is to only let 1 task per collection run at a time. * ZK Distributed Queue to feed tasks: The OCP consumes tasks from a queue. The task from the workQueue is only removed on completion so that in case of a failure, the new Overseer can re-consume the same task and retry. A queue is not the right data structure in the first place to look ahead i.e. get the 2nd task from the queue when the 1st one is in process. Also, deleting tasks which are not at the head of a queue is not really an 'intuitive' thing. Proposed solutions for task management: * Task funnel and peekAfter(): The parent thread is responsible for getting and passing the request to a new thread (or one from the pool). The parent method uses a peekAfter(last element) instead of a peek(). The peekAfter returns the task after the 'last element'. Maintain this request information and use it for deleting/cleaning up the workQueue. * Another (almost duplicate) queue: While offering tasks to workQueue, also offer them to a new queue (call it volatileWorkQueue?). The difference is, as soon as a task from this is picked up for processing by the thread, it's removed from the queue. At the end, the cleanup is done from the workQueue. Make the OverseerCollectionProcessor multi-threaded --- Key: SOLR-5681 URL: https://issues.apache.org/jira/browse/SOLR-5681 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Anshum Gupta Assignee: Anshum Gupta Right now, the OverseerCollectionProcessor is single threaded i.e submitting anything long running would have it block processing of other mutually exclusive tasks. When OCP tasks become optionally async (SOLR-5477), it'd be good to have truly non-blocking behavior by multi-threading the OCP itself. For example, a ShardSplit call on Collection1 would block the thread and thereby, not processing a create collection task (which would stay queued in zk) though both the tasks are mutually exclusive. Here are a few of the challenges: * Mutual exclusivity: Only let mutually exclusive tasks run in parallel. An easy way to handle that is to only let 1 task per collection run at a time. *
[jira] [Created] (SOLR-5681) Make the OverseerCollectionProcessor multi-threaded
Anshum Gupta created SOLR-5681: -- Summary: Make the OverseerCollectionProcessor multi-threaded Key: SOLR-5681 URL: https://issues.apache.org/jira/browse/SOLR-5681 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Anshum Gupta Assignee: Anshum Gupta Right now, the OverseerCollectionProcessor is single threaded i.e submitting anything long running would have it block processing of other mutually exclusive tasks. With OCP tasks becoming async, it'd be good to have truly non-blocking behavior by multi-threading the OCP itself. For example, a ShardSplit call on Collection1 would block the thread and thereby, not processing a create collection task (which would stay queued in zk) though both the tasks are mutually exclusive. Here are a few of the challenges: * Mutual exclusivity: Only let mutually exclusive tasks run in parallel. An easy way to handle that is to only let 1 task per collection run at a time. * ZK Distributed Queue to feed tasks: The OCP consumes tasks from a queue. The task from the workQueue is only removed on completion so that in case of a failure, the new Overseer can re-consume the same task and retry. A queue is not the right data structure in the first place to look ahead i.e. get the 2nd task from the queue when the 1st one is in process. Also, deleting tasks which are not at the head of a queue is not really an 'intuitive' thing. Proposed solutions for task management: * Task funnel and peekAfter(): The parent thread is responsible for getting and passing the request to a new thread (or one from the pool). The parent method uses a peekAfter(last element) instead of a peek(). The peekAfter returns the task after the 'last element'. Maintain this request information and use it for deleting/cleaning up the workQueue. * Another (almost duplicate) queue: While offering tasks to workQueue, also offer them to a new queue (call it volatileWorkQueue?). The difference is, as soon as a task from this is picked up for processing by the thread, it's removed from the queue. At the end, the cleanup is done from the workQueue. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5477) Async execution of OverseerCollectionProcessor tasks
[ https://issues.apache.org/jira/browse/SOLR-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13887175#comment-13887175 ] Anshum Gupta commented on SOLR-5477: Making OverseerCollectionProcessor multi-threaded. Async execution of OverseerCollectionProcessor tasks Key: SOLR-5477 URL: https://issues.apache.org/jira/browse/SOLR-5477 Project: Solr Issue Type: Sub-task Components: SolrCloud Reporter: Noble Paul Assignee: Anshum Gupta Attachments: SOLR-5477-CoreAdminStatus.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch, SOLR-5477.patch Typical collection admin commands are long running and it is very common to have the requests get timed out. It is more of a problem if the cluster is very large.Add an option to run these commands asynchronously add an extra param async=true for all collection commands the task is written to ZK and the caller is returned a task id. as separate collection admin command will be added to poll the status of the task command=statusid=7657668909 if id is not passed all running async tasks should be listed A separate queue is created to store in-process tasks . After the tasks are completed the queue entry is removed. OverSeerColectionProcessor will perform these tasks in multiple threads -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5681) Make the OverseerCollectionProcessor multi-threaded
[ https://issues.apache.org/jira/browse/SOLR-5681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13887178#comment-13887178 ] Anshum Gupta commented on SOLR-5681: Async Collection API calls. Make the OverseerCollectionProcessor multi-threaded --- Key: SOLR-5681 URL: https://issues.apache.org/jira/browse/SOLR-5681 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Anshum Gupta Assignee: Anshum Gupta Right now, the OverseerCollectionProcessor is single threaded i.e submitting anything long running would have it block processing of other mutually exclusive tasks. When OCP tasks become optionally async (SOLR-5477), it'd be good to have truly non-blocking behavior by multi-threading the OCP itself. For example, a ShardSplit call on Collection1 would block the thread and thereby, not processing a create collection task (which would stay queued in zk) though both the tasks are mutually exclusive. Here are a few of the challenges: * Mutual exclusivity: Only let mutually exclusive tasks run in parallel. An easy way to handle that is to only let 1 task per collection run at a time. * ZK Distributed Queue to feed tasks: The OCP consumes tasks from a queue. The task from the workQueue is only removed on completion so that in case of a failure, the new Overseer can re-consume the same task and retry. A queue is not the right data structure in the first place to look ahead i.e. get the 2nd task from the queue when the 1st one is in process. Also, deleting tasks which are not at the head of a queue is not really an 'intuitive' thing. Proposed solutions for task management: * Task funnel and peekAfter(): The parent thread is responsible for getting and passing the request to a new thread (or one from the pool). The parent method uses a peekAfter(last element) instead of a peek(). The peekAfter returns the task after the 'last element'. Maintain this request information and use it for deleting/cleaning up the workQueue. * Another (almost duplicate) queue: While offering tasks to workQueue, also offer them to a new queue (call it volatileWorkQueue?). The difference is, as soon as a task from this is picked up for processing by the thread, it's removed from the queue. At the end, the cleanup is done from the workQueue. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5678) When SolrJ/SolrCloud can't talk to Zookeeper, it throws a RuntimeException
[ https://issues.apache.org/jira/browse/SOLR-5678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anshum Gupta updated SOLR-5678: --- Attachment: SOLR-5678.patch Changed the exception to a SolrException and added a test to confirm that the correct exception is thrown. When SolrJ/SolrCloud can't talk to Zookeeper, it throws a RuntimeException -- Key: SOLR-5678 URL: https://issues.apache.org/jira/browse/SOLR-5678 Project: Solr Issue Type: Bug Affects Versions: 4.6 Reporter: Karl Wright Attachments: SOLR-5678.patch This class of exception should not be used for run-of-the-mill networking kinds of issues. SolrServerException or some variety of IOException should be thrown instead. Here's the trace: {code} java.lang.RuntimeException: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper localhost:2181 within 6 ms at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:130) at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:88) at org.apache.solr.common.cloud.ZkStateReader.init(ZkStateReader.java:148) at org.apache.solr.client.solrj.impl.CloudSolrServer.connect(CloudSolrServer.java:147) at org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:173) at org.apache.manifoldcf.agents.output.solr.HttpPoster$SolrPing.process(HttpPoster.java:1315) at org.apache.manifoldcf.agents.output.solr.HttpPoster$StatusThread.run(HttpPoster.java:1208) Caused by: java.util.concurrent.TimeoutException: Could not connect to ZooKeeper localhost:2181 within 6 ms at org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:173) at org.apache.solr.common.cloud.SolrZkClient.init(SolrZkClient.java:127) ... 6 more {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5425) Make creation of FixedBitSet in FacetsCollector overridable
John Wang created LUCENE-5425: - Summary: Make creation of FixedBitSet in FacetsCollector overridable Key: LUCENE-5425 URL: https://issues.apache.org/jira/browse/LUCENE-5425 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Affects Versions: 4.6 Reporter: John Wang In FacetsCollector, creation of bits in MatchingDocs are allocated per query. For large indexes where maxDocs are large creating a bitset of maxDoc bits will be expensive and would great a lot of garbage. Attached patch is to allow for this allocation customizable while maintaining current behavior. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5425) Make creation of FixedBitSet in FacetsCollector overridable
[ https://issues.apache.org/jira/browse/LUCENE-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Wang updated LUCENE-5425: -- Attachment: facetscollector.patch Make creation of FixedBitSet in FacetsCollector overridable --- Key: LUCENE-5425 URL: https://issues.apache.org/jira/browse/LUCENE-5425 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Affects Versions: 4.6 Reporter: John Wang Attachments: facetscollector.patch In FacetsCollector, creation of bits in MatchingDocs are allocated per query. For large indexes where maxDocs are large creating a bitset of maxDoc bits will be expensive and would great a lot of garbage. Attached patch is to allow for this allocation customizable while maintaining current behavior. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5426) Make SortedSetDocValuesReaderState customizable
[ https://issues.apache.org/jira/browse/LUCENE-5426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Wang updated LUCENE-5426: -- Attachment: sortedsetreaderstate.patch Make SortedSetDocValuesReaderState customizable --- Key: LUCENE-5426 URL: https://issues.apache.org/jira/browse/LUCENE-5426 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Affects Versions: 4.6 Reporter: John Wang Attachments: sortedsetreaderstate.patch We have a reader that have a different data structure (in memory) where the cost of computing ordinals per reader open is too expensive in the realtime setting. We are maintaining in memory data structure that supports all functionality and would like to leverage SortedSetDocValuesAccumulator. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5426) Make SortedSetDocValuesReaderState customizable
John Wang created LUCENE-5426: - Summary: Make SortedSetDocValuesReaderState customizable Key: LUCENE-5426 URL: https://issues.apache.org/jira/browse/LUCENE-5426 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Affects Versions: 4.6 Reporter: John Wang Attachments: sortedsetreaderstate.patch We have a reader that have a different data structure (in memory) where the cost of computing ordinals per reader open is too expensive in the realtime setting. We are maintaining in memory data structure that supports all functionality and would like to leverage SortedSetDocValuesAccumulator. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Linux (64bit/jdk1.7.0_51) - Build # 9300 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/9300/ Java: 64bit/jdk1.7.0_51 -XX:+UseCompressedOops -XX:+UseParallelGC 1 tests failed. REGRESSION: org.apache.solr.hadoop.MapReduceIndexerToolArgumentParserTest.testArgsParserHelp Error Message: Conversion = '१' Stack Trace: java.util.UnknownFormatConversionException: Conversion = '१' at __randomizedtesting.SeedInfo.seed([7EDF355AC943FBE2:E20D9F2D8E2D97AF]:0) at java.util.Formatter.checkText(Formatter.java:2547) at java.util.Formatter.parse(Formatter.java:2523) at java.util.Formatter.format(Formatter.java:2469) at java.io.PrintWriter.format(PrintWriter.java:905) at net.sourceforge.argparse4j.helper.TextHelper.printHelp(TextHelper.java:206) at net.sourceforge.argparse4j.internal.ArgumentImpl.printHelp(ArgumentImpl.java:247) at net.sourceforge.argparse4j.internal.ArgumentParserImpl.printArgumentHelp(ArgumentParserImpl.java:253) at net.sourceforge.argparse4j.internal.ArgumentParserImpl.printHelp(ArgumentParserImpl.java:279) at org.apache.solr.hadoop.MapReduceIndexerTool$MyArgumentParser$1.run(MapReduceIndexerTool.java:187) at net.sourceforge.argparse4j.internal.ArgumentImpl.run(ArgumentImpl.java:425) at net.sourceforge.argparse4j.internal.ArgumentParserImpl.processArg(ArgumentParserImpl.java:913) at net.sourceforge.argparse4j.internal.ArgumentParserImpl.parseArgs(ArgumentParserImpl.java:810) at net.sourceforge.argparse4j.internal.ArgumentParserImpl.parseArgs(ArgumentParserImpl.java:683) at net.sourceforge.argparse4j.internal.ArgumentParserImpl.parseArgs(ArgumentParserImpl.java:580) at net.sourceforge.argparse4j.internal.ArgumentParserImpl.parseArgs(ArgumentParserImpl.java:573) at org.apache.solr.hadoop.MapReduceIndexerTool$MyArgumentParser.parseArgs(MapReduceIndexerTool.java:505) at org.apache.solr.hadoop.MapReduceIndexerToolArgumentParserTest.testArgsParserHelp(MapReduceIndexerToolArgumentParserTest.java:194) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
[jira] [Commented] (LUCENE-5426) Make SortedSetDocValuesReaderState customizable
[ https://issues.apache.org/jira/browse/LUCENE-5426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13887337#comment-13887337 ] Lei Wang commented on LUCENE-5426: -- looks like the DefaultSortedSetDocsValuesReaderState.java is missing in the patch. forgot to attach? Make SortedSetDocValuesReaderState customizable --- Key: LUCENE-5426 URL: https://issues.apache.org/jira/browse/LUCENE-5426 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Affects Versions: 4.6 Reporter: John Wang Attachments: sortedsetreaderstate.patch We have a reader that have a different data structure (in memory) where the cost of computing ordinals per reader open is too expensive in the realtime setting. We are maintaining in memory data structure that supports all functionality and would like to leverage SortedSetDocValuesAccumulator. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5425) Make creation of FixedBitSet in FacetsCollector overridable
[ https://issues.apache.org/jira/browse/LUCENE-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13887340#comment-13887340 ] Lei Wang commented on LUCENE-5425: -- Better not to depend on the bitset, it's better to depend on a more general interface. In the user's application, it they can get rid of the memset part of that data structure, like us in twitter, it can get 4X + performance improvements than a simple caching of the bitset. Make creation of FixedBitSet in FacetsCollector overridable --- Key: LUCENE-5425 URL: https://issues.apache.org/jira/browse/LUCENE-5425 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Affects Versions: 4.6 Reporter: John Wang Attachments: facetscollector.patch In FacetsCollector, creation of bits in MatchingDocs are allocated per query. For large indexes where maxDocs are large creating a bitset of maxDoc bits will be expensive and would great a lot of garbage. Attached patch is to allow for this allocation customizable while maintaining current behavior. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5682) Make the admin InfoHandler more pluggable / derivable
Gregory Chanan created SOLR-5682: Summary: Make the admin InfoHandler more pluggable / derivable Key: SOLR-5682 URL: https://issues.apache.org/jira/browse/SOLR-5682 Project: Solr Issue Type: Improvement Reporter: Gregory Chanan Priority: Minor As of SOLR-5556 a user can specify the class of the admin InfoHandler, but can't easily override the individual handlers that it provides (the PropertiesRequestHandler, LoggingHandler, ThreadDumpHandler, SystemInfoHandler). Contrast this with say, the AdminHandlers, where a user can provide his/her own implementations of the underlying request handlers easily. I've run into this limitation in the following setup: I use derived versions of the various AdminHandlers, and would like to use the same implementations for the InfoHandler. I can do this by deriving from InfoHandler, but then I'd need to duplicate the handleRequestBody dispatching code. That's doable, but not as nice as what the AdminHandlers provides. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5682) Make the admin InfoHandler more pluggable / derivable
[ https://issues.apache.org/jira/browse/SOLR-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gregory Chanan updated SOLR-5682: - Attachment: SOLR-5682.patch Here's a patch that provides this functionality along with a unit test. Make the admin InfoHandler more pluggable / derivable - Key: SOLR-5682 URL: https://issues.apache.org/jira/browse/SOLR-5682 Project: Solr Issue Type: Improvement Reporter: Gregory Chanan Priority: Minor Attachments: SOLR-5682.patch As of SOLR-5556 a user can specify the class of the admin InfoHandler, but can't easily override the individual handlers that it provides (the PropertiesRequestHandler, LoggingHandler, ThreadDumpHandler, SystemInfoHandler). Contrast this with say, the AdminHandlers, where a user can provide his/her own implementations of the underlying request handlers easily. I've run into this limitation in the following setup: I use derived versions of the various AdminHandlers, and would like to use the same implementations for the InfoHandler. I can do this by deriving from InfoHandler, but then I'd need to duplicate the handleRequestBody dispatching code. That's doable, but not as nice as what the AdminHandlers provides. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-5683) Documentation of Suggester V2
[ https://issues.apache.org/jira/browse/SOLR-5683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Areek Zillur reassigned SOLR-5683: -- Assignee: Areek Zillur Documentation of Suggester V2 - Key: SOLR-5683 URL: https://issues.apache.org/jira/browse/SOLR-5683 Project: Solr Issue Type: Task Components: SearchComponents - other Reporter: Areek Zillur Assignee: Areek Zillur Fix For: 5.0, 4.7 Place holder for documentation that will eventually end up in the Solr Ref guide. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5683) Documentation of Suggester V2
Areek Zillur created SOLR-5683: -- Summary: Documentation of Suggester V2 Key: SOLR-5683 URL: https://issues.apache.org/jira/browse/SOLR-5683 Project: Solr Issue Type: Task Components: SearchComponents - other Reporter: Areek Zillur Fix For: 5.0, 4.7 Place holder for documentation that will eventually end up in the Solr Ref guide. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5426) Make SortedSetDocValuesReaderState customizable
[ https://issues.apache.org/jira/browse/LUCENE-5426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Wang updated LUCENE-5426: -- Attachment: sortedsetreaderstate.patch Make SortedSetDocValuesReaderState customizable --- Key: LUCENE-5426 URL: https://issues.apache.org/jira/browse/LUCENE-5426 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Affects Versions: 4.6 Reporter: John Wang Attachments: sortedsetreaderstate.patch, sortedsetreaderstate.patch We have a reader that have a different data structure (in memory) where the cost of computing ordinals per reader open is too expensive in the realtime setting. We are maintaining in memory data structure that supports all functionality and would like to leverage SortedSetDocValuesAccumulator. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5426) Make SortedSetDocValuesReaderState customizable
[ https://issues.apache.org/jira/browse/LUCENE-5426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13887438#comment-13887438 ] John Wang commented on LUCENE-5426: --- You are right. Re-attached. Make SortedSetDocValuesReaderState customizable --- Key: LUCENE-5426 URL: https://issues.apache.org/jira/browse/LUCENE-5426 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Affects Versions: 4.6 Reporter: John Wang Attachments: sortedsetreaderstate.patch, sortedsetreaderstate.patch We have a reader that have a different data structure (in memory) where the cost of computing ordinals per reader open is too expensive in the realtime setting. We are maintaining in memory data structure that supports all functionality and would like to leverage SortedSetDocValuesAccumulator. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5683) Documentation of Suggester V2
[ https://issues.apache.org/jira/browse/SOLR-5683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Areek Zillur updated SOLR-5683: --- Description: Place holder for documentation that will eventually end up in the Solr Ref guide. The new Suggester Component allows Solr to fully utilize the Lucene suggesters. The main features are: - lookup pluggability (TODO: add description): -- AnalyzingInfixLookupFactory -- AnalyzingLookupFactory -- FuzzyLookupFactory -- FreeTextLookupFactory -- FSTLookupFactory -- WFSTLookupFactory -- TSTLookupFactory -- JaspellLookupFactory - Dictionary pluggability (give users the option to choose the dictionary implementation to use for their suggesters to consume) -- Input from search index --- DocumentDictionaryFactory – user can specify suggestion field along with optional weight and payload fields from their search index. --- DocumentExpressionFactory – same as DocumentDictionaryFactory but allows users to specify arbitrary expression using existing numeric fields. --- HighFrequencyDictionaryFactory – user can specify a suggestion field and specify a threshold to prune out less frequent terms. -- Input from external files --- FileDictionaryFactory – user can specify a file which contains suggest entries, along with optional weights and payloads. Config (index time) options: - name - name of suggester - sourceLocation - external file location (for file-based suggesters) - lookupImpl - type of lookup to use [default JaspellLookupFactory] - dictionaryImpl - type of dictionary to use (lookup input) [default (sourceLocation == null ? HighFrequencyDictionaryFactory : FileDictionaryFactory)] - storeDir - location to store in-memory data structure in disk - buildOnCommit - command to build suggester for every commit - buildOnOptimize - command to build suggester for every optimize Query time options: - suggest.dictionary - name of suggester to use (can occur multiple times for batching suggester requests) - suggest.count - number of suggestions to return - suggest.q - query to use for lookup - suggest.build - command to build the suggester - suggest.reload - command to reload the suggester - buildAll – command to build all suggesters in the component - reloadAll – command to reload all suggesters in the component Example query: {code} http://localhost:8983/solr/suggest?suggest.dictionary=suggester1suggest=truesuggest.build=truesuggest.q=elec {code} Distributed query: {code} http://localhost:7574/solr/suggest?suggest.dictionary=suggester2suggest=truesuggest.build=truesuggest.q=elecshards=localhost:8983/solr,localhost:7574/solrshards.qt=/suggest {code} Response Format: The response format can be either XML or JSON. The typical response structure is as follows: {code} { suggest: { suggester_name: { suggest_query: { numFound: .., suggestions: [ {term: .., weight: .., payload: ..}, .. ]} } } {code} Example Response: {code} { responseHeader: { status: 0, QTime: 3 }, suggest: { suggester1: { e: { numFound: 1, suggestions: [ { term: electronics and computer1, weight: 100, payload: } ] } }, suggester2: { e: { numFound: 1, suggestions: [ { term: electronics and computer1, weight: 10, payload: } ] } } } } {code} Example solrconfig snippet with multiple suggester configuration: {code} searchComponent name=suggest class=solr.SuggestComponent lst name=suggester str name=namesuggester1/str str name=lookupImplFuzzyLookupFactory/str str name=dictionaryImplDocumentDictionaryFactory/str str name=fieldcat/str str name=weightFieldprice/str str name=suggestAnalyzerFieldTypestring/str /lst lst name=suggester str name=namesuggester2 /str str name=dictionaryImplDocumentExpressionDictionaryFactory/str str name=lookupImplFuzzyLookupFactory/str str name=fieldproduct_name/str str name=weightExpression((price * 2) + ln(popularity))/str str name=sortFieldweight/str str name=sortFieldprice/str str name=strtoreDirsuggest_fuzzy_doc_expr_dict/str str name=suggestAnalyzerFieldTypetext/str /lst /searchComponent {code} was:Place holder for documentation that will eventually end up in the Solr Ref guide. Documentation of Suggester V2 - Key: SOLR-5683 URL:
[jira] [Commented] (SOLR-5683) Documentation of Suggester V2
[ https://issues.apache.org/jira/browse/SOLR-5683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13887442#comment-13887442 ] Areek Zillur commented on SOLR-5683: The documentation still has a lot of TODOs, but should be a good start. Documentation of Suggester V2 - Key: SOLR-5683 URL: https://issues.apache.org/jira/browse/SOLR-5683 Project: Solr Issue Type: Task Components: SearchComponents - other Reporter: Areek Zillur Assignee: Areek Zillur Fix For: 5.0, 4.7 Place holder for documentation that will eventually end up in the Solr Ref guide. The new Suggester Component allows Solr to fully utilize the Lucene suggesters. The main features are: - lookup pluggability (TODO: add description): -- AnalyzingInfixLookupFactory -- AnalyzingLookupFactory -- FuzzyLookupFactory -- FreeTextLookupFactory -- FSTLookupFactory -- WFSTLookupFactory -- TSTLookupFactory -- JaspellLookupFactory - Dictionary pluggability (give users the option to choose the dictionary implementation to use for their suggesters to consume) -- Input from search index --- DocumentDictionaryFactory – user can specify suggestion field along with optional weight and payload fields from their search index. --- DocumentExpressionFactory – same as DocumentDictionaryFactory but allows users to specify arbitrary expression using existing numeric fields. --- HighFrequencyDictionaryFactory – user can specify a suggestion field and specify a threshold to prune out less frequent terms. -- Input from external files --- FileDictionaryFactory – user can specify a file which contains suggest entries, along with optional weights and payloads. Config (index time) options: - name - name of suggester - sourceLocation - external file location (for file-based suggesters) - lookupImpl - type of lookup to use [default JaspellLookupFactory] - dictionaryImpl - type of dictionary to use (lookup input) [default (sourceLocation == null ? HighFrequencyDictionaryFactory : FileDictionaryFactory)] - storeDir - location to store in-memory data structure in disk - buildOnCommit - command to build suggester for every commit - buildOnOptimize - command to build suggester for every optimize Query time options: - suggest.dictionary - name of suggester to use (can occur multiple times for batching suggester requests) - suggest.count - number of suggestions to return - suggest.q - query to use for lookup - suggest.build - command to build the suggester - suggest.reload - command to reload the suggester - buildAll – command to build all suggesters in the component - reloadAll – command to reload all suggesters in the component Example query: {code} http://localhost:8983/solr/suggest?suggest.dictionary=suggester1suggest=truesuggest.build=truesuggest.q=elec {code} Distributed query: {code} http://localhost:7574/solr/suggest?suggest.dictionary=suggester2suggest=truesuggest.build=truesuggest.q=elecshards=localhost:8983/solr,localhost:7574/solrshards.qt=/suggest {code} Response Format: The response format can be either XML or JSON. The typical response structure is as follows: {code} { suggest: { suggester_name: { suggest_query: { numFound: .., suggestions: [ {term: .., weight: .., payload: ..}, .. ]} } } {code} Example Response: {code} { responseHeader: { status: 0, QTime: 3 }, suggest: { suggester1: { e: { numFound: 1, suggestions: [ { term: electronics and computer1, weight: 100, payload: } ] } }, suggester2: { e: { numFound: 1, suggestions: [ { term: electronics and computer1, weight: 10, payload: } ] } } } } {code} Example solrconfig snippet with multiple suggester configuration: {code} searchComponent name=suggest class=solr.SuggestComponent lst name=suggester str name=namesuggester1/str str name=lookupImplFuzzyLookupFactory/str str name=dictionaryImplDocumentDictionaryFactory/str str name=fieldcat/str str name=weightFieldprice/str str name=suggestAnalyzerFieldTypestring/str /lst lst name=suggester str name=namesuggester2 /str str name=dictionaryImplDocumentExpressionDictionaryFactory/str str
[jira] [Commented] (LUCENE-5426) Make SortedSetDocValuesReaderState customizable
[ https://issues.apache.org/jira/browse/LUCENE-5426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13887465#comment-13887465 ] Shai Erera commented on LUCENE-5426: I've got few questions: * Why is this code now in the accumulator: {code} +if (dv.getValueCount() Integer.MAX_VALUE) { + throw new IllegalArgumentException(can only handle valueCount Integer.MAX_VALUE; got + dv.getValueCount()); +} {code} I see that it's still in DefaultSSDVReaderState, i.e. you cannot construct it if DV-count is more than Integer.MAX_VALUE. It also looks odd in the accumulator - it only uses it if the given FacetArrays are {{null}}? * Can you please make sure all the added getters are not called from inside loops, such as state.getIndexReader/separatorRegex? * Perhaps you should pull getSize() up to SSDVReaderState as well and use it instead of getDV().valueCount()? Just in case you can compute the size without obtaining the DV (i.e. lazy). Currently you're forced to pull a DV from the reader. If you do that, then please fix the Accumulator to use it too. Otherwise this looks good. The gist of this patch is that you made SSDVReaderState abstract (i.e. could have been an interface) and DefaultSSDVReaderState is the current concrete implementation, right? Make SortedSetDocValuesReaderState customizable --- Key: LUCENE-5426 URL: https://issues.apache.org/jira/browse/LUCENE-5426 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Affects Versions: 4.6 Reporter: John Wang Attachments: sortedsetreaderstate.patch, sortedsetreaderstate.patch We have a reader that have a different data structure (in memory) where the cost of computing ordinals per reader open is too expensive in the realtime setting. We are maintaining in memory data structure that supports all functionality and would like to leverage SortedSetDocValuesAccumulator. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5425) Make creation of FixedBitSet in FacetsCollector overridable
[ https://issues.apache.org/jira/browse/LUCENE-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13887469#comment-13887469 ] Shai Erera commented on LUCENE-5425: The current patch's intention seems to allow the app to cache FixedBitSet, such that it can clear() it before each search? To save the long[] allocation? This looks fine to me. Can you please add javadocs to FC.createBitSet? bq. Better not to depend on the bitset, it's better to depend on a more general interface. We need to be careful here. Mike and I experimented with cutting the API over to a general DocIdSet, but it hurt the performance of faceted search, even with smarter bitsets. When we move to a DocIdSet, we add the DocIdSetIterator layer while iterating over the bits which slows down the search ... at least per luceneutil benchmarks. So if we want to do that, we should do it only after proving, by means of benchmarking, that it doesn't hurt performance severely. Would actually be good to show that more compressed bitsets maybe even improve performance, but I would go with the change if the performance-loss is marginal. And we should do it under a separate issue, to not block this one. Make creation of FixedBitSet in FacetsCollector overridable --- Key: LUCENE-5425 URL: https://issues.apache.org/jira/browse/LUCENE-5425 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Affects Versions: 4.6 Reporter: John Wang Attachments: facetscollector.patch In FacetsCollector, creation of bits in MatchingDocs are allocated per query. For large indexes where maxDocs are large creating a bitset of maxDoc bits will be expensive and would great a lot of garbage. Attached patch is to allow for this allocation customizable while maintaining current behavior. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5621) Let Solr use Lucene's SeacherManager
[ https://issues.apache.org/jira/browse/SOLR-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomás Fernández Löbbe updated SOLR-5621: Attachment: SOLR-5621.patch I'm uploading a patch with some changes. I added a test for SolrSearcherFacrtory. There are still some use cases missing: newReaderCreator: My understanding is that, during a core reload, Solr uses the reader from the old core to warm the first searcher of the new core. Right now I'm not doing that after these changes. searchersOnDeck/maxWarmingSearchers is also not implemented. Right now there can be 0 or 1 searcher warming, but no more than that. All searchers except for the first one created are warmed inside the SolrSearcherFactory, and immediately after that registered. the SearcherManager won't try to create more than one Searcher at a time. IndexReaderFactory is not implemented There are still some tests failing intermittently, I'm looking into that. I think I should be able to do a much better job managing the realtime searcher vs the regular searcher, but I didn't focus on that yet. I'm using this github repository: https://github.com/tflobbe/lucene-solr Let Solr use Lucene's SeacherManager Key: SOLR-5621 URL: https://issues.apache.org/jira/browse/SOLR-5621 Project: Solr Issue Type: Improvement Affects Versions: 5.0 Reporter: Tomás Fernández Löbbe Fix For: 5.0 Attachments: SOLR-5621.patch, SOLR-5621.patch It would be nice if Solr could take advantage of Lucene's SearcherManager and get rid of most of the logic related to managing Searchers in SolrCore. I've been taking a look at how possible it is to achieve this, and even if I haven't finish with the changes (there are some use cases that are still not working exactly the same) it looks like it is possible to do. Some things still could use a lot of improvement (like the realtime searcher management) and some other not yet implemented, like Searchers on deck or IndexReaderFactory I'm attaching an initial patch (many TODOs yet). -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Submission to ApacheCon on Tika
Hey Guys, I submitted the below talk on Apache Tika, Nutch and Solr to ApacheCon NA 2014: Real Data Science: Exploring the FBI's Vault dataset with Apache Tika, Nutch and Solr Event ApacheCon North America Submission Type Lightning Talk Category Developer Biography Chris Mattmann has a wealth of experience in software design, and in the construction of large-scale data-intensive systems. His work has infected a broad set of communities, ranging from helping NASA unlock data from its next generation of earth science system satellites, to assisting graduate students at the University of Southern California (his Alma mater) in the study of software architecture, all the way to helping industry and open source as a member of the Apache Software Foundation. When he's not busy being busy, he's spending time with his lovely wife and son braving the mean streets of Southern California. Abstract Apache Tika is a content detection and analysis toolkit allowing automated MIME type identification and rapid parsing of text and metadata from over 1200 types of files including all major file types from the Internet Assigned Number Authority's MIME database. In this talk I'll show you how to practically use Apache Tika to explore the FBI's vault of declassified PDF documents, and to use Apache Nutch to pull down the dataset, and how to use Solr to ingest, and geoclassify the documents so that can build a map of FBI PDF documents corresponding to your favorite conspiracies throughout the USA. I've taught this material in my CSCI 572 Search Engines class at USC and it's a big hit. These are normally three assignments, so I will do my best to boil down their essence into a 45min-60 min talk replete with danger and excitement. Audience Developers interested in using Tika, Nutch and Solr. Folks interested in the FBI vault dataset. GIS wonks. The like. Experience Level Intermediate Benefits to the Ecosystem The core of the talk will be Tika, but there will be some Nutch magic, and some Solr magic at very basic levels. The benefits of the ecosystem will be the real display of data science involved and on a real dataset. Technical Requirements I need an internet connection, and a projector. Status New Cheers, Chris - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5425) Make creation of FixedBitSet in FacetsCollector overridable
[ https://issues.apache.org/jira/browse/LUCENE-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13887506#comment-13887506 ] Lei Wang commented on LUCENE-5425: -- agree it can be a done in a separate issue. didn't know a wrapper will affect performance that much. it's just an additional method call to me. the default OpenBitSetIterator impl is different with nextSetBit, maybe that's the reason? anyway, start with this change won't hurt anything. and caching of the bitset should be able to get 20% down on the overhead of new a bitset each time (the other 80% is from the memset). After getting this in, i can do a separate test on the DocIdSet. See if we can get an acceptable performance for the default behavior without reusing the memory. Make creation of FixedBitSet in FacetsCollector overridable --- Key: LUCENE-5425 URL: https://issues.apache.org/jira/browse/LUCENE-5425 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Affects Versions: 4.6 Reporter: John Wang Attachments: facetscollector.patch In FacetsCollector, creation of bits in MatchingDocs are allocated per query. For large indexes where maxDocs are large creating a bitset of maxDoc bits will be expensive and would great a lot of garbage. Attached patch is to allow for this allocation customizable while maintaining current behavior. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5684) Shutdown SolrServer clients created in BasicDistributedZk2Test and BasicDistributedZkTest
[ https://issues.apache.org/jira/browse/SOLR-5684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomás Fernández Löbbe updated SOLR-5684: Attachment: SOLR-5684.patch Shutdown SolrServer clients created in BasicDistributedZk2Test and BasicDistributedZkTest - Key: SOLR-5684 URL: https://issues.apache.org/jira/browse/SOLR-5684 Project: Solr Issue Type: Bug Affects Versions: 5.0 Reporter: Tomás Fernández Löbbe Priority: Minor Fix For: 5.0 Attachments: SOLR-5684.patch I found that the tests BasicDistributedZk2Test and BasicDistributedZkTest are creating multiple HttpSolrServer objects to which they don't call the shutdown method after using them. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5684) Shutdown SolrServer clients created in BasicDistributedZk2Test and BasicDistributedZkTest
Tomás Fernández Löbbe created SOLR-5684: --- Summary: Shutdown SolrServer clients created in BasicDistributedZk2Test and BasicDistributedZkTest Key: SOLR-5684 URL: https://issues.apache.org/jira/browse/SOLR-5684 Project: Solr Issue Type: Bug Affects Versions: 5.0 Reporter: Tomás Fernández Löbbe Priority: Minor Fix For: 5.0 Attachments: SOLR-5684.patch I found that the tests BasicDistributedZk2Test and BasicDistributedZkTest are creating multiple HttpSolrServer objects to which they don't call the shutdown method after using them. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5302) Analytics Component
[ https://issues.apache.org/jira/browse/SOLR-5302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13887541#comment-13887541 ] Shawn Heisey commented on SOLR-5302: bq. This is link I meant in the pdf: https://cms.prod.bloomberg.com/team/display/fdns/Search+Analytics+Component If I had to guess, I would say that is an internal website for Bloomberg, something that only employees can get to. If they intend it for public consumption, they'll need to publish the data on a public website and fix the links in the PDF. bq. Any idea when the future 5x could be released? Quick answer: 5.0 is many months away. It's impossible to give any kind of release date prediction. Hopefully this particular feature will end up in a 4.x release, once Erick (or another committer) has the time to devote to giving the code a thorough review. Longer answer: At this time, nobody has come up with a timeframe for Solr 5.0. Once somebody decides we're going to begin the process and agrees to be the release manager, a LOT has to happen, and there's really no way to make it happen quickly. Even if we began the 5.0 release process tomorrow and everything were to be extremely smooth, I don't think you'd even see a 5.0-ALPHA release for a few months. We can't begin the release process that soon, so it's going to be even longer. One of the big items still left to do is to embed the HTTP server layer and make Solr into a standalone application. I wasn't involved with the development when 4.0 was released, so I don't know how much time passed between the beginning of the 4.0 release process and 4.0-ALPHA, but I can tell you that there were three months between 4.0-ALPHA and 4.0-FINAL. Analytics Component --- Key: SOLR-5302 URL: https://issues.apache.org/jira/browse/SOLR-5302 Project: Solr Issue Type: New Feature Reporter: Steven Bower Assignee: Erick Erickson Attachments: SOLR-5302.patch, SOLR-5302.patch, SOLR-5302.patch, SOLR-5302.patch, Search Analytics Component.pdf, Statistical Expressions.pdf, solr_analytics-2013.10.04-2.patch This ticket is to track a replacement for the StatsComponent. The AnalyticsComponent supports the following features: * All functionality of StatsComponent (SOLR-4499) * Field Faceting (SOLR-3435) ** Support for limit ** Sorting (bucket name or any stat in the bucket ** Support for offset * Range Faceting ** Supports all options of standard range faceting * Query Faceting (SOLR-2925) * Ability to use overall/field facet statistics as input to range/query faceting (ie calc min/max date and then facet over that range * Support for more complex aggregate/mapping operations (SOLR-1622) ** Aggregations: min, max, sum, sum-of-square, count, missing, stddev, mean, median, percentiles ** Operations: negation, abs, add, multiply, divide, power, log, date math, string reversal, string concat ** Easily pluggable framework to add additional operations * New / cleaner output format Outstanding Issues: * Multi-value field support for stats (supported for faceting) * Multi-shard support (may not be possible for some operations, eg median) -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5425) Make creation of FixedBitSet in FacetsCollector overridable
[ https://issues.apache.org/jira/browse/LUCENE-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13887548#comment-13887548 ] Shai Erera commented on LUCENE-5425: bq. didn't know a wrapper will affect performance that much. Me neither! We were very surprised to see some of the performance implications of saving a method call or changing the order we iterate on results + facet requests. But it's hard to argue w/ consistent numbers, and IIRC they weren't in the 3-5% range, but 10+%. Could be though that we measured several changes at once .. so I think it's worthwhile to benchmark the move to DocIdSet. And yes, it could be OpenBitSetIterator, I don't rule it out. +1 to allow caching in this issue, we can investigate generalizing the APIs in a separate issue. Make creation of FixedBitSet in FacetsCollector overridable --- Key: LUCENE-5425 URL: https://issues.apache.org/jira/browse/LUCENE-5425 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Affects Versions: 4.6 Reporter: John Wang Attachments: facetscollector.patch In FacetsCollector, creation of bits in MatchingDocs are allocated per query. For large indexes where maxDocs are large creating a bitset of maxDoc bits will be expensive and would great a lot of garbage. Attached patch is to allow for this allocation customizable while maintaining current behavior. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5685) Some tests are not really committing when they intend to
Tomás Fernández Löbbe created SOLR-5685: --- Summary: Some tests are not really committing when they intend to Key: SOLR-5685 URL: https://issues.apache.org/jira/browse/SOLR-5685 Project: Solr Issue Type: Bug Affects Versions: 5.0 Reporter: Tomás Fernández Löbbe Fix For: 5.0 There are some tests that call SolrTestCaseJ4.commit() but don't really submit the commit command (using assertU() ). This causes the commit to not really run. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5685) Some tests are not really committing when they intend to
[ https://issues.apache.org/jira/browse/SOLR-5685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomás Fernández Löbbe updated SOLR-5685: Attachment: SOLR-5685.patch Adding assertU in the cases that I found. This is causing the test org.apache.solr.analytics.facet.FieldFacetTest to fail Some tests are not really committing when they intend to Key: SOLR-5685 URL: https://issues.apache.org/jira/browse/SOLR-5685 Project: Solr Issue Type: Bug Affects Versions: 5.0 Reporter: Tomás Fernández Löbbe Fix For: 5.0 Attachments: SOLR-5685.patch There are some tests that call SolrTestCaseJ4.commit() but don't really submit the commit command (using assertU() ). This causes the commit to not really run. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org