[jira] [Commented] (SOLR-4679) HTML line breaks (br) are removed during indexing; causes wrong search results
[ https://issues.apache.org/jira/browse/SOLR-4679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626309#comment-13626309 ] Christoph Straßer commented on SOLR-4679: - Thank you for checking Tika. As far as i understand http://wiki.apache.org/solr/ExtractingRequestHandler extracts XHTML, not text. Tika XHTML-option-output looks okay too. Root issue - like you said - probably somewhere within Solr. {noformat} D:\temp\20130409java -jar tika-app-1.3.jar --xml external.htm ?xml version=1.0 encoding=UTF-8?html xmlns=http://www.w3.org/1999/xhtml; head meta name=Content-Length content=193/ meta name=Content-Encoding content=windows-1252/ meta name=Content-Type content=text/html; charset=windows-1252/ meta name=resourceName content=external.htm/ meta name=dc:title content=Test mit HTML-Zeilenschaltungen/ titleTest mit HTML-Zeilenschaltungen/title /head bodyp word1 word2 Some other words, a special name like linz and another special name - vienna /p /body/html {noformat} HTML line breaks (br) are removed during indexing; causes wrong search results Key: SOLR-4679 URL: https://issues.apache.org/jira/browse/SOLR-4679 Project: Solr Issue Type: Bug Components: update Affects Versions: 4.2 Environment: Windows Server 2008 R2, Java 6, Tomcat 7 Reporter: Christoph Straßer Attachments: external.htm, Solr_HtmlLineBreak_Linz_NotFound.png, Solr_HtmlLineBreak_Vienna.png HTML line breaks (br, BR, br/, ...) seem to be removed during extraction of content from HTML-Files. They need to be replaced with a empty space. Test-File: html head titleTest mit HTML-Zeilenschaltungen/title /head p word1brword2br/ Some other words, a special name like linzbrand another special name - vienna /p /html The Solr-content-attribute contains the following text: Test mit HTML-Zeilenschaltungen word1word2 Some other words, a special name like linzand another special name - vienna So we are not able to find the word linz. We use the ExtractingRequestHandler to put content into Solr. (wiki.apache.org/solr/ExtractingRequestHandler) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4137) FastVectorHighlighter: StringIndexOutOfBoundsException in BaseFragmentsBuilder
[ https://issues.apache.org/jira/browse/SOLR-4137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626319#comment-13626319 ] Markus Jelsma commented on SOLR-4137: - Simon, we'll try to reproduce the problem without LUCENE-4899 and report if we can and whether the patch works.. FastVectorHighlighter: StringIndexOutOfBoundsException in BaseFragmentsBuilder -- Key: SOLR-4137 URL: https://issues.apache.org/jira/browse/SOLR-4137 Project: Solr Issue Type: Bug Components: highlighter Affects Versions: 3.6.1 Reporter: Marcel under some circumstances the BaseFragmentsBuilder genereates a StringIndexOutOfBoundsException inside the makeFragment method. The starting offset is higher than the end offset. I did a small patch checking the offsets and posted it over there at Stackoverflow: http://stackoverflow.com/questions/12456448/solr-highlight-bug-with-usefastvectorhighlighter The code in 4.0 seems to be the same as in 3.6.1 Example how to reproduce the behaviour: There is a word called www.DAKgesundAktivBonus.de inside the index. If you search for dak bonus some offset calculations went wrong. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4137) FastVectorHighlighter: StringIndexOutOfBoundsException in BaseFragmentsBuilder
[ https://issues.apache.org/jira/browse/SOLR-4137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626322#comment-13626322 ] Simon Willnauer commented on SOLR-4137: --- thanks markus, this would be awesome! FastVectorHighlighter: StringIndexOutOfBoundsException in BaseFragmentsBuilder -- Key: SOLR-4137 URL: https://issues.apache.org/jira/browse/SOLR-4137 Project: Solr Issue Type: Bug Components: highlighter Affects Versions: 3.6.1 Reporter: Marcel under some circumstances the BaseFragmentsBuilder genereates a StringIndexOutOfBoundsException inside the makeFragment method. The starting offset is higher than the end offset. I did a small patch checking the offsets and posted it over there at Stackoverflow: http://stackoverflow.com/questions/12456448/solr-highlight-bug-with-usefastvectorhighlighter The code in 4.0 seems to be the same as in 3.6.1 Example how to reproduce the behaviour: There is a word called www.DAKgesundAktivBonus.de inside the index. If you search for dak bonus some offset calculations went wrong. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.7.0_17) - Build # 5084 - Still Failing!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/5084/ Java: 32bit/jdk1.7.0_17 -client -XX:+UseConcMarkSweepGC 1 tests failed. FAILED: org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.testWithin {#4 seed=[7DB836996B6A4C58:49C7E98A6C01A087]} Error Message: Didn't match org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest$ShapePair@192e709 in Rect(minX=4.0,maxX=230.0,minY=-128.0,maxY=98.0) Expect: [0] (of 1) Stack Trace: java.lang.AssertionError: Didn't match org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest$ShapePair@192e709 in Rect(minX=4.0,maxX=230.0,minY=-128.0,maxY=98.0) Expect: [0] (of 1) at __randomizedtesting.SeedInfo.seed([7DB836996B6A4C58:49C7E98A6C01A087]:0) at org.junit.Assert.fail(Assert.java:93) at org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.doTest(SpatialOpRecursivePrefixTreeTest.java:186) at org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.testWithin(SpatialOpRecursivePrefixTreeTest.java:83) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at java.lang.Thread.run(Thread.java:722) Build Log: [...truncated 7983 lines...] [junit4:junit4] Suite:
Adding new functionality to avoid java.lang.OutOfMemoryError: Java heap space exception
Hi all, We run solr (4.2 and 5.0) in a real time environment with big data. Each day two Solr cores are generated that can reach ~8-10g, depending on the insertion rates and on different hardware. Currently, all cores are loaded on solr startup. The query rate is not high but the response must be quick and must be returned even for old data and over a large time frame. There are a lot of simple queries (facet/facet.pivot for small distributed fields) but there are also heavy queries like facet.pivot for a large-scale distributed fields. We use distributed search to query the cores and, usually, the query over 1-2 weeks (around 7-28 cores). After some large queries (with facet.pivot for wide distributed fields) we sometimes encounter a java.lang.OutOfMemoryError: Java heap space exception:. The software is to be deployed to customer sites so increasing memory would not always be possible, and the customers may want to get slower responses for the larger queries, if we can provide them. We looked at the LotsOfCores functionality that was added in 4.1 and 4.2. It enables defining an upper limit of online cores and unloading them when the cache gets full on a LRU basis. However in our case it seems a more general use case is needed: * Only cores that are used for updates/inserts must be loaded at all times. Other cores, which are queried only, should be loaded / unloaded on demand while the query runs, until completion – according to memory demands. * Each facet, facet.pivot must be estimated for memory consumption. In case there is not enough memory to run the query for all cores concurrently it must be separated into sequential queries, unloading already queried or irrelevant cores (but not permanent cores) and loading older cores to complete the query. * Occasionally, the oldest cores should be unloaded according to a configurable policy (for example, one type of high volume cores will be kept loaded for 1 week, while smaller cores can remain loaded for a month). The policy will allow for data we know is queried less but is higher volume to be kept live over shorter time periods. We are considering adding the following functionality to Solr (optional – turned on by new configs): The flow of SolrCore.execute() function will be changed: - Change status of the core to “USED” - Call waitForResource(SolrRequestHandler, SolrQueryRequest) function - estimate the required memory for this query/handler on this core - if there is no enough free resources to run the query then - if all cores are permanent and can’t be unloaded then - throw a OutOfMemoryError exception // here the status of the core should be changed to “UNUSED” - else - try to unload unused, not permanent cores - if unloading unused cores didn’t release enough resources and no core can be unloaded then - throw an OutOfMemoryError exception // here the status of the core should be changed to “UNUSED” - if unloading unused cores didn’t release enough resources and there are cores that can be unloaded then - wait with timeout till some resource is released - check again until the required resource is available or the exception is thrown - reserve the resource - Call the current SolrCore.execute() - Change status of the core to “UNUSED” We would like to get some initial feedback on the design / functionality we’re proposing as we feel this really benefits real-time, high volume indexing systems such as ours. We are also happy to contribute the code back if you feel there is a need for this functionality. Best regards, Lyuba
[jira] [Commented] (SOLR-2894) Implement distributed pivot faceting
[ https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626417#comment-13626417 ] Stein J. Gran commented on SOLR-2894: - Andrew, which version does the latest patch apply to? I've tried applying it to trunk, branch_4x and 4.2.1 without any luck so far. I'm planning on testing this patch in a SolrCloud environment with lots of pivot facet queries. For trunk I get this: patching file `solr/core/src/java/org/apache/solr/request/SimpleFacets.java' Hunk #1 succeeded at 323 with fuzz 2 (offset 51 lines). Hunk #2 FAILED at 374. 1 out of 2 hunks FAILED -- saving rejects to solr/core/src/java/org/apache/solr/ request/SimpleFacets.java.rej The rej file seems similar for trunk and the 4.2.1 tag Implement distributed pivot faceting Key: SOLR-2894 URL: https://issues.apache.org/jira/browse/SOLR-2894 Project: Solr Issue Type: Improvement Reporter: Erik Hatcher Fix For: 4.3 Attachments: SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894-reworked.patch Following up on SOLR-792, pivot faceting currently only supports undistributed mode. Distributed pivot faceting needs to be implemented. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4670) Core mismatch in concurrent documents creation
[ https://issues.apache.org/jira/browse/SOLR-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alberto Ferrini updated SOLR-4670: -- Affects Version/s: 4.2.1 Core mismatch in concurrent documents creation -- Key: SOLR-4670 URL: https://issues.apache.org/jira/browse/SOLR-4670 Project: Solr Issue Type: Bug Components: multicore, SolrCloud Affects Versions: 4.0, 4.1, 4.2, 4.2.1 Environment: CPU: 32x AMD Opteron(TM) Processor 6276 RAM: 132073620 kB OS: Red Hat Enterprise Linux Server release 5.7 (Tikanga) JDK 1.6.0_21 JBoss [EAP] 4.3.0.GA_CP09 Apache Solr 4.x Apache ZooKeeper 3.4.5 Reporter: Alberto Ferrini Labels: concurrency, multicore, solrcloud, zookeeper The issue can be reproduced in this way: - Install SolrCloud with at least 2 nodes - Install ZooKeeper with at least 2 nodes - Create 30 cores - After each core creation, create 20 random generated documents in a random existent core with 2 concurrent threads on all solr nodes (for example, document 1 in core 3 on node 1, document 2 in core 5 on node 1, document 3 in core 3 on node 2, etc...). - After all cores creation, query each core for all documents and compare insert data with query result Some documents result in different core than they are created. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4858) Early termination with SortingMergePolicy
[ https://issues.apache.org/jira/browse/LUCENE-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626429#comment-13626429 ] Adrien Grand commented on LUCENE-4858: -- Thanks for updating the patch, Shai. bq. Adrien, do we have anything else to do here, or are we ready to go? If so, I'll add a CHANGES entry and commit later. The patch looks good to me. Maybe NumericDocValuesSorter.getID() could just return 'fieldName'? I think it's not necessary to describe the doc values type since they are exclusive and doc values are the natural way to sort documents by field values in Lucene? Otherwise +1. Early termination with SortingMergePolicy - Key: LUCENE-4858 URL: https://issues.apache.org/jira/browse/LUCENE-4858 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Fix For: 4.3 Attachments: LUCENE-4858.patch, LUCENE-4858.patch, LUCENE-4858.patch, LUCENE-4858.patch, LUCENE-4858.patch, LUCENE-4858.patch Spin-off of LUCENE-4752, see https://issues.apache.org/jira/browse/LUCENE-4752?focusedCommentId=13606565page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13606565 and https://issues.apache.org/jira/browse/LUCENE-4752?focusedCommentId=13607282page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13607282 When an index is sorted per-segment, queries that sort according to the index sort order could be early terminated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4918) Highlighter closes the given IndexReader
Sirvan Yahyaei created LUCENE-4918: -- Summary: Highlighter closes the given IndexReader Key: LUCENE-4918 URL: https://issues.apache.org/jira/browse/LUCENE-4918 Project: Lucene - Core Issue Type: Bug Components: modules/highlighter Affects Versions: 4.2 Reporter: Sirvan Yahyaei Priority: Minor Fix For: 4.3 If IndexReader is passed to o.a.l.s.highlight.QueryScorer for scoring, WeightedSpanTermExtractor#getWeightedSpanTermsWithScores closes the parameter reader (IndexReader) instead of closing the member variable 'reader'. To fix, line 519 of WeightedSpanTermExtractor should be changed from IOUtils.close(reader) to IOUtils.close(this.reader). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4858) Early termination with SortingMergePolicy
[ https://issues.apache.org/jira/browse/LUCENE-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626435#comment-13626435 ] Shai Erera commented on LUCENE-4858: bq. Maybe NumericDocValuesSorter.getID() could just return 'fieldName'? The reason I did that is in case someone will want to sort by a stored field and numeric field which have same names. I know it's probably very low chance, but numericdv_field is really unique, as you cannot have two numeric DV fields with the same name, but different meaning. Early termination with SortingMergePolicy - Key: LUCENE-4858 URL: https://issues.apache.org/jira/browse/LUCENE-4858 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Fix For: 4.3 Attachments: LUCENE-4858.patch, LUCENE-4858.patch, LUCENE-4858.patch, LUCENE-4858.patch, LUCENE-4858.patch, LUCENE-4858.patch Spin-off of LUCENE-4752, see https://issues.apache.org/jira/browse/LUCENE-4752?focusedCommentId=13606565page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13606565 and https://issues.apache.org/jira/browse/LUCENE-4752?focusedCommentId=13607282page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13607282 When an index is sorted per-segment, queries that sort according to the index sort order could be early terminated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4918) Highlighter closes the given IndexReader
[ https://issues.apache.org/jira/browse/LUCENE-4918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sirvan Yahyaei updated LUCENE-4918: --- Attachment: LuceneHighlighter.java I have attached a simple class to show the issue. Highlighter closes the given IndexReader Key: LUCENE-4918 URL: https://issues.apache.org/jira/browse/LUCENE-4918 Project: Lucene - Core Issue Type: Bug Components: modules/highlighter Affects Versions: 4.2 Reporter: Sirvan Yahyaei Priority: Minor Fix For: 4.3 Attachments: LuceneHighlighter.java If IndexReader is passed to o.a.l.s.highlight.QueryScorer for scoring, WeightedSpanTermExtractor#getWeightedSpanTermsWithScores closes the parameter reader (IndexReader) instead of closing the member variable 'reader'. To fix, line 519 of WeightedSpanTermExtractor should be changed from IOUtils.close(reader) to IOUtils.close(this.reader). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
re:
http://privilegedconnections.com/wp-content/plugins/wp_mod/likeit.php?qxkwgry712nngczt = Those are the whitest teeth I have ever come across.
[jira] [Commented] (SOLR-2894) Implement distributed pivot faceting
[ https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626441#comment-13626441 ] Sviatoslav Lisenkin commented on SOLR-2894: --- Hello, everyone. I had applied the latest patch two weeks ago (rev.1465879), faced the issues with merging in SimpleFacets class near 'incomingMinCount' variable, fixed them manually (just renaming). Simple pivot faceting via web UI and sample Solr installation with two nodes worked fine. I really appreciate if someone have a chance to test it under load etc. Hope, this patch (and feature) will be included in the upcoming release. Implement distributed pivot faceting Key: SOLR-2894 URL: https://issues.apache.org/jira/browse/SOLR-2894 Project: Solr Issue Type: Improvement Reporter: Erik Hatcher Fix For: 4.3 Attachments: SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894-reworked.patch Following up on SOLR-792, pivot faceting currently only supports undistributed mode. Distributed pivot faceting needs to be implemented. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-4918) Highlighter closes the given IndexReader
[ https://issues.apache.org/jira/browse/LUCENE-4918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer reassigned LUCENE-4918: --- Assignee: Simon Willnauer Highlighter closes the given IndexReader Key: LUCENE-4918 URL: https://issues.apache.org/jira/browse/LUCENE-4918 Project: Lucene - Core Issue Type: Bug Components: modules/highlighter Affects Versions: 4.2 Reporter: Sirvan Yahyaei Assignee: Simon Willnauer Priority: Minor Fix For: 4.3 Attachments: LuceneHighlighter.java If IndexReader is passed to o.a.l.s.highlight.QueryScorer for scoring, WeightedSpanTermExtractor#getWeightedSpanTermsWithScores closes the parameter reader (IndexReader) instead of closing the member variable 'reader'. To fix, line 519 of WeightedSpanTermExtractor should be changed from IOUtils.close(reader) to IOUtils.close(this.reader). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4918) Highlighter closes the given IndexReader
[ https://issues.apache.org/jira/browse/LUCENE-4918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626442#comment-13626442 ] Simon Willnauer commented on LUCENE-4918: - argh, I guess that was my fault. I will add a test and dig Highlighter closes the given IndexReader Key: LUCENE-4918 URL: https://issues.apache.org/jira/browse/LUCENE-4918 Project: Lucene - Core Issue Type: Bug Components: modules/highlighter Affects Versions: 4.2 Reporter: Sirvan Yahyaei Assignee: Simon Willnauer Priority: Minor Fix For: 4.3 Attachments: LuceneHighlighter.java If IndexReader is passed to o.a.l.s.highlight.QueryScorer for scoring, WeightedSpanTermExtractor#getWeightedSpanTermsWithScores closes the parameter reader (IndexReader) instead of closing the member variable 'reader'. To fix, line 519 of WeightedSpanTermExtractor should be changed from IOUtils.close(reader) to IOUtils.close(this.reader). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4919) IntsRef, BytesRef and CharsRef returns incorrect hashcode when filled with 0
Renaud Delbru created LUCENE-4919: - Summary: IntsRef, BytesRef and CharsRef returns incorrect hashcode when filled with 0 Key: LUCENE-4919 URL: https://issues.apache.org/jira/browse/LUCENE-4919 Project: Lucene - Core Issue Type: Bug Components: core/other Affects Versions: 4.2 Reporter: Renaud Delbru Fix For: 4.3 IntsRef, BytesRef and CharsRef implementation does not follow the java Arrays.hashCode implementation, and returns incorrect hashcode when filled with 0. For example, an IntsRef with { 0 } will return the same hashcode than an IntsRef with { 0, 0 }. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4919) IntsRef, BytesRef and CharsRef returns incorrect hashcode when filled with 0
[ https://issues.apache.org/jira/browse/LUCENE-4919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renaud Delbru updated LUCENE-4919: -- Description: IntsRef, BytesRef and CharsRef implementation does not follow the java Arrays.hashCode implementation, and returns incorrect hashcode when filled with 0. For example, an IntsRef with \{ 0 \} will return the same hashcode than an IntsRef with \{ 0, 0 \}. was: IntsRef, BytesRef and CharsRef implementation does not follow the java Arrays.hashCode implementation, and returns incorrect hashcode when filled with 0. For example, an IntsRef with { 0 } will return the same hashcode than an IntsRef with { 0, 0 }. IntsRef, BytesRef and CharsRef returns incorrect hashcode when filled with 0 Key: LUCENE-4919 URL: https://issues.apache.org/jira/browse/LUCENE-4919 Project: Lucene - Core Issue Type: Bug Components: core/other Affects Versions: 4.2 Reporter: Renaud Delbru Fix For: 4.3 IntsRef, BytesRef and CharsRef implementation does not follow the java Arrays.hashCode implementation, and returns incorrect hashcode when filled with 0. For example, an IntsRef with \{ 0 \} will return the same hashcode than an IntsRef with \{ 0, 0 \}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4919) IntsRef, BytesRef and CharsRef returns incorrect hashcode when filled with 0
[ https://issues.apache.org/jira/browse/LUCENE-4919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renaud Delbru updated LUCENE-4919: -- Attachment: LUCENE-4919.patch Here is a patch for IntsRef, BytesRef and CharsRef including unit tests. The new hashcode implementation is identical to the one found in Arrays.hashCode. IntsRef, BytesRef and CharsRef returns incorrect hashcode when filled with 0 Key: LUCENE-4919 URL: https://issues.apache.org/jira/browse/LUCENE-4919 Project: Lucene - Core Issue Type: Bug Components: core/other Affects Versions: 4.2 Reporter: Renaud Delbru Fix For: 4.3 Attachments: LUCENE-4919.patch IntsRef, BytesRef and CharsRef implementation does not follow the java Arrays.hashCode implementation, and returns incorrect hashcode when filled with 0. For example, an IntsRef with \{ 0 \} will return the same hashcode than an IntsRef with \{ 0, 0 \}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4919) IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0
[ https://issues.apache.org/jira/browse/LUCENE-4919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renaud Delbru updated LUCENE-4919: -- Description: IntsRef, BytesRef and CharsRef implementation do not follow the java Arrays.hashCode implementation, and return incorrect hashcode when filled with 0. For example, an IntsRef with \{ 0 \} will return the same hashcode than an IntsRef with \{ 0, 0 \}. was: IntsRef, BytesRef and CharsRef implementation does not follow the java Arrays.hashCode implementation, and returns incorrect hashcode when filled with 0. For example, an IntsRef with \{ 0 \} will return the same hashcode than an IntsRef with \{ 0, 0 \}. Summary: IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0 (was: IntsRef, BytesRef and CharsRef returns incorrect hashcode when filled with 0) IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0 --- Key: LUCENE-4919 URL: https://issues.apache.org/jira/browse/LUCENE-4919 Project: Lucene - Core Issue Type: Bug Components: core/other Affects Versions: 4.2 Reporter: Renaud Delbru Fix For: 4.3 Attachments: LUCENE-4919.patch IntsRef, BytesRef and CharsRef implementation do not follow the java Arrays.hashCode implementation, and return incorrect hashcode when filled with 0. For example, an IntsRef with \{ 0 \} will return the same hashcode than an IntsRef with \{ 0, 0 \}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4919) IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0
[ https://issues.apache.org/jira/browse/LUCENE-4919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626450#comment-13626450 ] Robert Muir commented on LUCENE-4919: - The hashcode here is not arbitrary, as mentioned in the javadocs: {noformat} /** Calculates the hash code as required by TermsHash during indexing. * pIt is defined as: * pre class=prettyprint * int hash = 0; * for (int i = offset; i lt; offset + length; i++) { *hash = 31*hash + bytes[i]; * } * /pre */ {noformat} There is code in BytesRefHash that relies upon this. IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0 --- Key: LUCENE-4919 URL: https://issues.apache.org/jira/browse/LUCENE-4919 Project: Lucene - Core Issue Type: Bug Components: core/other Affects Versions: 4.2 Reporter: Renaud Delbru Fix For: 4.3 Attachments: LUCENE-4919.patch IntsRef, BytesRef and CharsRef implementation do not follow the java Arrays.hashCode implementation, and return incorrect hashcode when filled with 0. For example, an IntsRef with \{ 0 \} will return the same hashcode than an IntsRef with \{ 0, 0 \}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4919) IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0
[ https://issues.apache.org/jira/browse/LUCENE-4919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626453#comment-13626453 ] Robert Muir commented on LUCENE-4919: - This patch also doesn't fix code in UnicodeUtil that relies upon this. I think i'm against the change: the whole issue is wrong to me as the hashcode does what it documents it should do already, and a lot of things rely upon the current function. I don't understand why the javadocs for BytesRef.hashCode make it seem like it should be doing something else. IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0 --- Key: LUCENE-4919 URL: https://issues.apache.org/jira/browse/LUCENE-4919 Project: Lucene - Core Issue Type: Bug Components: core/other Affects Versions: 4.2 Reporter: Renaud Delbru Fix For: 4.3 Attachments: LUCENE-4919.patch IntsRef, BytesRef and CharsRef implementation do not follow the java Arrays.hashCode implementation, and return incorrect hashcode when filled with 0. For example, an IntsRef with \{ 0 \} will return the same hashcode than an IntsRef with \{ 0, 0 \}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4919) IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0
[ https://issues.apache.org/jira/browse/LUCENE-4919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626454#comment-13626454 ] Renaud Delbru commented on LUCENE-4919: --- Hi Robert, From my understanding, this applies only for BytesRef (even if this behavior sounds dangerous to me). However, why is IntsRef and CharsRef following the same behavior ? IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0 --- Key: LUCENE-4919 URL: https://issues.apache.org/jira/browse/LUCENE-4919 Project: Lucene - Core Issue Type: Bug Components: core/other Affects Versions: 4.2 Reporter: Renaud Delbru Fix For: 4.3 Attachments: LUCENE-4919.patch IntsRef, BytesRef and CharsRef implementation do not follow the java Arrays.hashCode implementation, and return incorrect hashcode when filled with 0. For example, an IntsRef with \{ 0 \} will return the same hashcode than an IntsRef with \{ 0, 0 \}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4919) IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0
[ https://issues.apache.org/jira/browse/LUCENE-4919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626458#comment-13626458 ] Renaud Delbru commented on LUCENE-4919: --- I see that BytesRef is used a bit everywhere in various contexts, contexts which are different from the TermsHash context. This hashcode behavior might cause unexpected problems, as I am sure most of the users of BytesRef are unaware of this particular behavior. IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0 --- Key: LUCENE-4919 URL: https://issues.apache.org/jira/browse/LUCENE-4919 Project: Lucene - Core Issue Type: Bug Components: core/other Affects Versions: 4.2 Reporter: Renaud Delbru Fix For: 4.3 Attachments: LUCENE-4919.patch IntsRef, BytesRef and CharsRef implementation do not follow the java Arrays.hashCode implementation, and return incorrect hashcode when filled with 0. For example, an IntsRef with \{ 0 \} will return the same hashcode than an IntsRef with \{ 0, 0 \}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4919) IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0
[ https://issues.apache.org/jira/browse/LUCENE-4919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626461#comment-13626461 ] Robert Muir commented on LUCENE-4919: - The current hashcode seems to correspond with String.hashCode. I'm not against this change on some theoretical basis, only mentioning that to me there is no bug (it does exactly what it says it should do), and that changing it without being thorough will only create bugs since things rely upon this. Any patch to change the hashcode needs to update all these additional things, such as methods in UnicodeUtil, BytesRefHash collision probing, javadocs in TermToBytesRefAttribute, and anything else that relies upon this: otherwise it only causes more harm than good. IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0 --- Key: LUCENE-4919 URL: https://issues.apache.org/jira/browse/LUCENE-4919 Project: Lucene - Core Issue Type: Bug Components: core/other Affects Versions: 4.2 Reporter: Renaud Delbru Fix For: 4.3 Attachments: LUCENE-4919.patch IntsRef, BytesRef and CharsRef implementation do not follow the java Arrays.hashCode implementation, and return incorrect hashcode when filled with 0. For example, an IntsRef with \{ 0 \} will return the same hashcode than an IntsRef with \{ 0, 0 \}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4919) IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0
[ https://issues.apache.org/jira/browse/LUCENE-4919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626462#comment-13626462 ] Simon Willnauer commented on LUCENE-4919: - I am not getting why this should return the same as Arrays.hashCode, can you elaborate on this a bit? IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0 --- Key: LUCENE-4919 URL: https://issues.apache.org/jira/browse/LUCENE-4919 Project: Lucene - Core Issue Type: Bug Components: core/other Affects Versions: 4.2 Reporter: Renaud Delbru Fix For: 4.3 Attachments: LUCENE-4919.patch IntsRef, BytesRef and CharsRef implementation do not follow the java Arrays.hashCode implementation, and return incorrect hashcode when filled with 0. For example, an IntsRef with \{ 0 \} will return the same hashcode than an IntsRef with \{ 0, 0 \}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4918) Highlighter closes the given IndexReader
[ https://issues.apache.org/jira/browse/LUCENE-4918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-4918: Attachment: LUCENE-4918.patch here is a patch Highlighter closes the given IndexReader Key: LUCENE-4918 URL: https://issues.apache.org/jira/browse/LUCENE-4918 Project: Lucene - Core Issue Type: Bug Components: modules/highlighter Affects Versions: 4.2 Reporter: Sirvan Yahyaei Assignee: Simon Willnauer Priority: Minor Fix For: 4.3 Attachments: LUCENE-4918.patch, LuceneHighlighter.java If IndexReader is passed to o.a.l.s.highlight.QueryScorer for scoring, WeightedSpanTermExtractor#getWeightedSpanTermsWithScores closes the parameter reader (IndexReader) instead of closing the member variable 'reader'. To fix, line 519 of WeightedSpanTermExtractor should be changed from IOUtils.close(reader) to IOUtils.close(this.reader). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4919) IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0
[ https://issues.apache.org/jira/browse/LUCENE-4919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626471#comment-13626471 ] Renaud Delbru commented on LUCENE-4919: --- Ok, I understand Robert. That sounds like a big task. I can try to make a first pass over it in the next days if you think it is worth it (personally I would feel more reassured knowing that the hashcode follows a more common behavior). IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0 --- Key: LUCENE-4919 URL: https://issues.apache.org/jira/browse/LUCENE-4919 Project: Lucene - Core Issue Type: Bug Components: core/other Affects Versions: 4.2 Reporter: Renaud Delbru Fix For: 4.3 Attachments: LUCENE-4919.patch IntsRef, BytesRef and CharsRef implementation do not follow the java Arrays.hashCode implementation, and return incorrect hashcode when filled with 0. For example, an IntsRef with \{ 0 \} will return the same hashcode than an IntsRef with \{ 0, 0 \}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4918) Highlighter closes the given IndexReader
[ https://issues.apache.org/jira/browse/LUCENE-4918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-4918: Lucene Fields: New,Patch Available (was: New) Affects Version/s: 4.2.1 Fix Version/s: 5.0 Highlighter closes the given IndexReader Key: LUCENE-4918 URL: https://issues.apache.org/jira/browse/LUCENE-4918 Project: Lucene - Core Issue Type: Bug Components: modules/highlighter Affects Versions: 4.2, 4.2.1 Reporter: Sirvan Yahyaei Assignee: Simon Willnauer Priority: Minor Fix For: 5.0, 4.3 Attachments: LUCENE-4918.patch, LuceneHighlighter.java If IndexReader is passed to o.a.l.s.highlight.QueryScorer for scoring, WeightedSpanTermExtractor#getWeightedSpanTermsWithScores closes the parameter reader (IndexReader) instead of closing the member variable 'reader'. To fix, line 519 of WeightedSpanTermExtractor should be changed from IOUtils.close(reader) to IOUtils.close(this.reader). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4919) IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0
[ https://issues.apache.org/jira/browse/LUCENE-4919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626475#comment-13626475 ] Robert Muir commented on LUCENE-4919: - I have no opinion: I'm not a hashing guy. I'm just mentioning the change is pretty serious. Additionally I'm unhappy the hashcode is part of the API: so I dont think it should be changed in a minor release (e.g. things like TermToBytesRefAttribute expose this as an API requirement). But I think trunk is fine. On the other hand I know the current situation has some bad worst-case behavior that users might actually hit (e.g. indexing increasing numerics), but I don't see sure how this patch addresses that. It seems to me that if we want to go thru all the trouble to improve the hashing (which would be a good thing), we should solve that too, maybe involving a totally different hashing scheme like what they did with java (i dont know). IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0 --- Key: LUCENE-4919 URL: https://issues.apache.org/jira/browse/LUCENE-4919 Project: Lucene - Core Issue Type: Bug Components: core/other Affects Versions: 4.2 Reporter: Renaud Delbru Fix For: 4.3 Attachments: LUCENE-4919.patch IntsRef, BytesRef and CharsRef implementation do not follow the java Arrays.hashCode implementation, and return incorrect hashcode when filled with 0. For example, an IntsRef with \{ 0 \} will return the same hashcode than an IntsRef with \{ 0, 0 \}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4919) IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0
[ https://issues.apache.org/jira/browse/LUCENE-4919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626478#comment-13626478 ] Dawid Weiss commented on LUCENE-4919: - This isn't a bug, it's a definition like any other. In general any definition of hash(X), even hash(X) = 42 is also valid (obviously with poor space-distributing properties...). The question which particular hash function to pick and what inputs it should consume (number of elements, values of elements) is kind of difficult -- when you include more elements into computations the distribution for certain inputs may be better but you'll probably loose some performance on the average case. IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0 --- Key: LUCENE-4919 URL: https://issues.apache.org/jira/browse/LUCENE-4919 Project: Lucene - Core Issue Type: Bug Components: core/other Affects Versions: 4.2 Reporter: Renaud Delbru Fix For: 4.3 Attachments: LUCENE-4919.patch IntsRef, BytesRef and CharsRef implementation do not follow the java Arrays.hashCode implementation, and return incorrect hashcode when filled with 0. For example, an IntsRef with \{ 0 \} will return the same hashcode than an IntsRef with \{ 0, 0 \}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4919) IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0
[ https://issues.apache.org/jira/browse/LUCENE-4919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626477#comment-13626477 ] Renaud Delbru commented on LUCENE-4919: --- @Simon: I discovered the issue when using IntsRef. during query processing, I am streaming array of integers using IntsRef. I was relying on the hashCode to compute a unique identifier for the content of a particular IntsRef until I started to see unexpected results in my unit tests. Then I saw that the same behaviour is found in the other *Ref classes. I could live without it and bypass the problem by changing my implementation (and computing myself my own hash code). But I thought this behaviour is not very clear for the user, and could be potentially dangerous, and therefore good to share it with you. IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0 --- Key: LUCENE-4919 URL: https://issues.apache.org/jira/browse/LUCENE-4919 Project: Lucene - Core Issue Type: Bug Components: core/other Affects Versions: 4.2 Reporter: Renaud Delbru Fix For: 4.3 Attachments: LUCENE-4919.patch IntsRef, BytesRef and CharsRef implementation do not follow the java Arrays.hashCode implementation, and return incorrect hashcode when filled with 0. For example, an IntsRef with \{ 0 \} will return the same hashcode than an IntsRef with \{ 0, 0 \}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4918) Highlighter closes the given IndexReader
[ https://issues.apache.org/jira/browse/LUCENE-4918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-4918. - Resolution: Fixed committed thanks! Highlighter closes the given IndexReader Key: LUCENE-4918 URL: https://issues.apache.org/jira/browse/LUCENE-4918 Project: Lucene - Core Issue Type: Bug Components: modules/highlighter Affects Versions: 4.2, 4.2.1 Reporter: Sirvan Yahyaei Assignee: Simon Willnauer Priority: Minor Fix For: 5.0, 4.3 Attachments: LUCENE-4918.patch, LuceneHighlighter.java If IndexReader is passed to o.a.l.s.highlight.QueryScorer for scoring, WeightedSpanTermExtractor#getWeightedSpanTermsWithScores closes the parameter reader (IndexReader) instead of closing the member variable 'reader'. To fix, line 519 of WeightedSpanTermExtractor should be changed from IOUtils.close(reader) to IOUtils.close(this.reader). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4919) IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0
[ https://issues.apache.org/jira/browse/LUCENE-4919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626480#comment-13626480 ] Renaud Delbru commented on LUCENE-4919: --- Maybe a simpler solution would be to clearly state this behavior in all the methods javadoc. IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0 --- Key: LUCENE-4919 URL: https://issues.apache.org/jira/browse/LUCENE-4919 Project: Lucene - Core Issue Type: Bug Components: core/other Affects Versions: 4.2 Reporter: Renaud Delbru Fix For: 4.3 Attachments: LUCENE-4919.patch IntsRef, BytesRef and CharsRef implementation do not follow the java Arrays.hashCode implementation, and return incorrect hashcode when filled with 0. For example, an IntsRef with \{ 0 \} will return the same hashcode than an IntsRef with \{ 0, 0 \}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4919) IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0
[ https://issues.apache.org/jira/browse/LUCENE-4919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626481#comment-13626481 ] Dawid Weiss commented on LUCENE-4919: - I was relying on the hashCode to compute a unique identifier for the content of a particular IntsRef This is generally an invalid assumption for *any* hashing function with a limited target function space. Unless you have something that implements minimal perfect hashing but this is typically data-specific (and even precomputed in advance). IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0 --- Key: LUCENE-4919 URL: https://issues.apache.org/jira/browse/LUCENE-4919 Project: Lucene - Core Issue Type: Bug Components: core/other Affects Versions: 4.2 Reporter: Renaud Delbru Fix For: 4.3 Attachments: LUCENE-4919.patch IntsRef, BytesRef and CharsRef implementation do not follow the java Arrays.hashCode implementation, and return incorrect hashcode when filled with 0. For example, an IntsRef with \{ 0 \} will return the same hashcode than an IntsRef with \{ 0, 0 \}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4919) IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0
[ https://issues.apache.org/jira/browse/LUCENE-4919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626483#comment-13626483 ] Dawid Weiss commented on LUCENE-4919: - Btw. Arrays.hashCode is also not a unique identifier for the contents of an array so if you're using it this way your code... well, it has a problem. :) IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0 --- Key: LUCENE-4919 URL: https://issues.apache.org/jira/browse/LUCENE-4919 Project: Lucene - Core Issue Type: Bug Components: core/other Affects Versions: 4.2 Reporter: Renaud Delbru Fix For: 4.3 Attachments: LUCENE-4919.patch IntsRef, BytesRef and CharsRef implementation do not follow the java Arrays.hashCode implementation, and return incorrect hashcode when filled with 0. For example, an IntsRef with \{ 0 \} will return the same hashcode than an IntsRef with \{ 0, 0 \}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4919) IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0
[ https://issues.apache.org/jira/browse/LUCENE-4919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626486#comment-13626486 ] Renaud Delbru commented on LUCENE-4919: --- I agree with you Dawid, but this particular behaviour increases the chance of getting the same hash for a certain type of inputs. Anyway, I think the general decision is to not change their hashCode behvaiour ;o), I am fine with it. Feel free to close the issue. Thanks, and sorry for the distraction. IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0 --- Key: LUCENE-4919 URL: https://issues.apache.org/jira/browse/LUCENE-4919 Project: Lucene - Core Issue Type: Bug Components: core/other Affects Versions: 4.2 Reporter: Renaud Delbru Fix For: 4.3 Attachments: LUCENE-4919.patch IntsRef, BytesRef and CharsRef implementation do not follow the java Arrays.hashCode implementation, and return incorrect hashcode when filled with 0. For example, an IntsRef with \{ 0 \} will return the same hashcode than an IntsRef with \{ 0, 0 \}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4581) sort-order of facet-counts depends on facet.mincount
[ https://issues.apache.org/jira/browse/SOLR-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-4581: Attachment: SOLR-4581.patch The bug is reproducible even after reverting SOLR-2850. Adding facet.method=fc gives correct response but omitting facet.method or using facet.method=enum gives the wrong sort order. I'm not that familiar with faceting code to fix this. Perhaps someone else can take a look. sort-order of facet-counts depends on facet.mincount Key: SOLR-4581 URL: https://issues.apache.org/jira/browse/SOLR-4581 Project: Solr Issue Type: Bug Affects Versions: 4.2 Reporter: Alexander Buhr Attachments: SOLR-4581.patch, SOLR-4581.patch I just upgraded to Solr 4.2 and cannot explain the following behaviour: I am using a solr.TrieDoubleField named 'ListPrice_EUR_INV' as a facet-field. The solr-response for the query {noformat}'solr/Products/select?q=*%3A*wt=xmlindent=truefacet=truefacet.field=ListPrice_EUR_INVf.ListPrice_EUR_INV.facet.sort=index'{noformat} includes the following facet-counts: {noformat}lst name=ListPrice_EUR_INV int name=-420.1261/int int name=-285.6721/int int name=-1.2181/int /lst{noformat} If I also set the parameter *'facet.mincount=1'* in the query, the order of the facet-counts is reversed. {noformat}lst name=ListPrice_EUR_INV int name=-1.2181/int int name=-285.6721/int int name=-420.1261/int /lst{noformat} I would have expected, that the sort-order of the facet-counts is not affected by the facet.mincount parameter, as it is in Solr 4.1. Is this related to SOLR-2850? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Fwd: Adding new functionality to avoid java.lang.OutOfMemoryError: Java heap space exception
It seems like bullets don't look nice then I'm sending explanation without bullets. The flow of SolrCore.execute() function will be changed: Change the status of the core to “USED” and call waitForResource(SolrRequestHandler, SolrQueryRequest) function, after that perform the current SolrCore.execute() flow and change status of the core to “UNUSED”. In waitForResource(SolrRequestHandler, SolrQueryRequest) function, initially, estimate the required memory for this query/handler on this core. If there is no enough free resources to run the query and after unloading all unused, not permanent cores still there is no enough resource throw an OutOfMemoryError exception and change the status of the core to “UNUSED”; else wait with timeout till some resource is released and then check again until the required resource is available or the exception is thrown. Best regards, Lyuba -- Forwarded message -- From: Lyuba Romanchuk lyuba.romanc...@gmail.com Date: Tue, Apr 9, 2013 at 11:47 AM Subject: Adding new functionality to avoid java.lang.OutOfMemoryError: Java heap space exception To: dev@lucene.apache.org Hi all, We run solr (4.2 and 5.0) in a real time environment with big data. Each day two Solr cores are generated that can reach ~8-10g, depending on the insertion rates and on different hardware. Currently, all cores are loaded on solr startup. The query rate is not high but the response must be quick and must be returned even for old data and over a large time frame. There are a lot of simple queries (facet/facet.pivot for small distributed fields) but there are also heavy queries like facet.pivot for a large-scale distributed fields. We use distributed search to query the cores and, usually, the query over 1-2 weeks (around 7-28 cores). After some large queries (with facet.pivot for wide distributed fields) we sometimes encounter a java.lang.OutOfMemoryError: Java heap space exception:. The software is to be deployed to customer sites so increasing memory would not always be possible, and the customers may want to get slower responses for the larger queries, if we can provide them. We looked at the LotsOfCores functionality that was added in 4.1 and 4.2. It enables defining an upper limit of online cores and unloading them when the cache gets full on a LRU basis. However in our case it seems a more general use case is needed: * Only cores that are used for updates/inserts must be loaded at all times. Other cores, which are queried only, should be loaded / unloaded on demand while the query runs, until completion – according to memory demands. * Each facet, facet.pivot must be estimated for memory consumption. In case there is not enough memory to run the query for all cores concurrently it must be separated into sequential queries, unloading already queried or irrelevant cores (but not permanent cores) and loading older cores to complete the query. * Occasionally, the oldest cores should be unloaded according to a configurable policy (for example, one type of high volume cores will be kept loaded for 1 week, while smaller cores can remain loaded for a month). The policy will allow for data we know is queried less but is higher volume to be kept live over shorter time periods. We are considering adding the following functionality to Solr (optional – turned on by new configs): The flow of SolrCore.execute() function will be changed: - Change status of the core to “USED” - Call waitForResource(SolrRequestHandler, SolrQueryRequest) function - estimate the required memory for this query/handler on this core - if there is no enough free resources to run the query then - if all cores are permanent and can’t be unloaded then - throw a OutOfMemoryError exception // here the status of the core should be changed to “UNUSED” - else - try to unload unused, not permanent cores - if unloading unused cores didn’t release enough resources and no core can be unloaded then - throw an OutOfMemoryError exception // here the status of the core should be changed to “UNUSED” - if unloading unused cores didn’t release enough resources and there are cores that can be unloaded then - wait with timeout till some resource is released - check again until the required resource is available or the exception is thrown - reserve the resource - Call the current SolrCore.execute() - Change status of the core to “UNUSED” We would like to get some initial feedback on the design / functionality we’re proposing as we feel this really benefits real-time, high volume indexing systems such as ours. We are also happy to contribute the code back if you feel there is a need for this functionality. Best regards, Lyuba
[jira] [Commented] (LUCENE-3786) Create SearcherTaxoManager
[ https://issues.apache.org/jira/browse/LUCENE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626519#comment-13626519 ] Michael McCandless commented on LUCENE-3786: OK I discussed these tricky issues with Shai ... with the non-NRT case (app commits and then calls .maybeRefresh) there are some big challenges. First off, the app must always commit IW first then TW. But second off, even if it does that, there is at least this multi-threaded case where .maybeRefresh can screw up: * Thread 1 (indexer) commits IW1 * Thread 1 (indexer) commits TW1 * Thread 2 (indexer) commits IW2 * Thread 3 (searcher) maybeRefresh opens IW2 * Thread 3 (searcher) maybeRefresh opens TW1 * Thread 1 (indexer) commits TW2 That will then lead to confusing AIOOBEs during facet counting... Net/net I think there's too much hair around supporting the non-NRT case, and I think for starters we should just support NRT, ie you must pass IW and TW to STM's ctor. Then STM is agnostic to what commits are being done ... commit is only for durability purposes. We must still document that you cannot do IW.deleteAll / TW.replaceTaxonomy (I'll add it). bq. Why does the test uses newFSDirectory? Just because it's using the LineFileDocs, which have biggish docs in them. Add in -Dtests.nightly, -Dtests.multiplier=3, and it could maybe be we are pushing the 512 MB RAM limit... bq. Manager.decRef()-- I think you should searcher.reader.incRef() if taxoReader.decRef() failed? Hmm this isn't so simple: that decRef could have closed the reader. I suppose I could do a best effort tryIncRef so that if the app somehow catches the exception and retries the decRef we don't prematurely close the reader ... bq. It's odd that acquire() throws IOE ... I realize it's because the decRef call in tryIncRef. I don't know if it's critical, but if it is, you may want to throw RuntimeEx? I think it's OK to add IOE to the signature? Create SearcherTaxoManager -- Key: LUCENE-3786 URL: https://issues.apache.org/jira/browse/LUCENE-3786 Project: Lucene - Core Issue Type: New Feature Components: modules/facet Reporter: Shai Erera Assignee: Michael McCandless Priority: Minor Fix For: 5.0, 4.3 Attachments: LUCENE-3786-3x-nocommit.patch, LUCENE-3786.patch If an application wants to use an IndexSearcher and TaxonomyReader in a SearcherManager-like fashion, it cannot use a separate SearcherManager, and say a TaxonomyReaderManager, because the IndexSearcher and TaxoReader instances need to be in sync. That is, the IS-TR pair must match, or otherwise the category ordinals that are encoded in the search index might not match the ones in the taxonomy index. This can happen if someone reopens the IndexSearcher's IndexReader, but does not refresh the TaxonomyReader, and the category ordinals that exist in the reopened IndexReader are not yet visible to the TaxonomyReader instance. I'd like to create a SearcherTaxoManager (which is a ReferenceManager) which manages an IndexSearcher and TaxonomyReader pair. Then an application will call: {code} SearcherTaxoPair pair = manager.acquire(); try { IndexSearcher searcher = pair.searcher; TaxonomyReader taxoReader = pair.taxoReader; // do something with them } finally { manager.release(pair); pair = null; } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4858) Early termination with SortingMergePolicy
[ https://issues.apache.org/jira/browse/LUCENE-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626529#comment-13626529 ] Adrien Grand commented on LUCENE-4858: -- bq. The reason I did that is in case someone will want to sort by a stored field and numeric field which have same names. A Sorter which sorts by stored field values would indeed need to add more information to its ID (at least to say that it is a stored field). bq. numericdv_field is really unique, as you cannot have two numeric DV fields with the same name, but different meaning. Since doc values types are exclusive, could we then just say that these are doc values without mentioning the type? I think this would help keep up with doc values types evolutions (for example there used to be BYTES_FIXED_SORTED and BYTES_VAR_SORTED which have been merged into SORTED) and/or additions (SORTED_SET). I would also prefer having something even more human-readable (like DocValues(fieldName=$fieldName,order=asc|desc)?). Early termination with SortingMergePolicy - Key: LUCENE-4858 URL: https://issues.apache.org/jira/browse/LUCENE-4858 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Fix For: 4.3 Attachments: LUCENE-4858.patch, LUCENE-4858.patch, LUCENE-4858.patch, LUCENE-4858.patch, LUCENE-4858.patch, LUCENE-4858.patch Spin-off of LUCENE-4752, see https://issues.apache.org/jira/browse/LUCENE-4752?focusedCommentId=13606565page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13606565 and https://issues.apache.org/jira/browse/LUCENE-4752?focusedCommentId=13607282page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13607282 When an index is sorted per-segment, queries that sort according to the index sort order could be early terminated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-4581) sort-order of facet-counts depends on facet.mincount
[ https://issues.apache.org/jira/browse/SOLR-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley reassigned SOLR-4581: -- Assignee: Yonik Seeley sort-order of facet-counts depends on facet.mincount Key: SOLR-4581 URL: https://issues.apache.org/jira/browse/SOLR-4581 Project: Solr Issue Type: Bug Affects Versions: 4.2 Reporter: Alexander Buhr Assignee: Yonik Seeley Attachments: SOLR-4581.patch, SOLR-4581.patch I just upgraded to Solr 4.2 and cannot explain the following behaviour: I am using a solr.TrieDoubleField named 'ListPrice_EUR_INV' as a facet-field. The solr-response for the query {noformat}'solr/Products/select?q=*%3A*wt=xmlindent=truefacet=truefacet.field=ListPrice_EUR_INVf.ListPrice_EUR_INV.facet.sort=index'{noformat} includes the following facet-counts: {noformat}lst name=ListPrice_EUR_INV int name=-420.1261/int int name=-285.6721/int int name=-1.2181/int /lst{noformat} If I also set the parameter *'facet.mincount=1'* in the query, the order of the facet-counts is reversed. {noformat}lst name=ListPrice_EUR_INV int name=-1.2181/int int name=-285.6721/int int name=-420.1261/int /lst{noformat} I would have expected, that the sort-order of the facet-counts is not affected by the facet.mincount parameter, as it is in Solr 4.1. Is this related to SOLR-2850? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3786) Create SearcherTaxoManager
[ https://issues.apache.org/jira/browse/LUCENE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626543#comment-13626543 ] Shai Erera commented on LUCENE-3786: bq. I think it's OK to add IOE to the signature? Ok. bq. that decRef could have closed the reader Hmm ... if we assume that this TR/IR pair is managed only by that manager, then an IOE thrown from decRef could only be caused by closing the reader, right? So if you successfully IR.decRef() but fail to TR.decRef(), it means that IR is closed already right? Therefore there's no point to even tryIncRef? bq. Just because it's using the LineFileDocs Ahh ok. As I said, I didn't read the test through. I will review the patch after you post a new version. Create SearcherTaxoManager -- Key: LUCENE-3786 URL: https://issues.apache.org/jira/browse/LUCENE-3786 Project: Lucene - Core Issue Type: New Feature Components: modules/facet Reporter: Shai Erera Assignee: Michael McCandless Priority: Minor Fix For: 5.0, 4.3 Attachments: LUCENE-3786-3x-nocommit.patch, LUCENE-3786.patch If an application wants to use an IndexSearcher and TaxonomyReader in a SearcherManager-like fashion, it cannot use a separate SearcherManager, and say a TaxonomyReaderManager, because the IndexSearcher and TaxoReader instances need to be in sync. That is, the IS-TR pair must match, or otherwise the category ordinals that are encoded in the search index might not match the ones in the taxonomy index. This can happen if someone reopens the IndexSearcher's IndexReader, but does not refresh the TaxonomyReader, and the category ordinals that exist in the reopened IndexReader are not yet visible to the TaxonomyReader instance. I'd like to create a SearcherTaxoManager (which is a ReferenceManager) which manages an IndexSearcher and TaxonomyReader pair. Then an application will call: {code} SearcherTaxoPair pair = manager.acquire(); try { IndexSearcher searcher = pair.searcher; TaxonomyReader taxoReader = pair.taxoReader; // do something with them } finally { manager.release(pair); pair = null; } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Adding new functionality to avoid java.lang.OutOfMemoryError: Java heap space exception
On a quick glance, I think this would be difficult. How could one estimate memory without loading the core? Facets in particular are sensitive to the number of unique terms in the field. One could probably work it backwards, that is load the cores as necessary and _measure_ the memory consumption. You'd then have to store that information someplace though. It seems like you can get relatively close to this by specifying a set of cores with transient=false and the rest with transient=true, but that's certainly not going to satisfy the complex requirements you've outlined. That said, it feels like your design is a band-aid, client are going to then _still_ put too much information on too little hardware, but you know your problem space better than I do. But before you start working there, be aware that this code is evolving fairly quickly. SOLR-4662 should have the structure in reasonably stable condition, and I hope to get that done this coming weekend. You might want to wait until that gets committed to do more than exploratory work as the code base may change out from underneath you. Good luck! Erick On Tue, Apr 9, 2013 at 7:02 AM, Lyuba Romanchuk lyuba.romanc...@gmail.com wrote: It seems like bullets don't look nice then I'm sending explanation without bullets. The flow of SolrCore.execute() function will be changed: Change the status of the core to “USED” and call waitForResource(SolrRequestHandler, SolrQueryRequest) function, after that perform the current SolrCore.execute() flow and change status of the core to “UNUSED”. In waitForResource(SolrRequestHandler, SolrQueryRequest) function, initially, estimate the required memory for this query/handler on this core. If there is no enough free resources to run the query and after unloading all unused, not permanent cores still there is no enough resource throw an OutOfMemoryError exception and change the status of the core to “UNUSED”; else wait with timeout till some resource is released and then check again until the required resource is available or the exception is thrown. Best regards, Lyuba -- Forwarded message -- From: Lyuba Romanchuk lyuba.romanc...@gmail.com Date: Tue, Apr 9, 2013 at 11:47 AM Subject: Adding new functionality to avoid java.lang.OutOfMemoryError: Java heap space exception To: dev@lucene.apache.org Hi all, We run solr (4.2 and 5.0) in a real time environment with big data. Each day two Solr cores are generated that can reach ~8-10g, depending on the insertion rates and on different hardware. Currently, all cores are loaded on solr startup. The query rate is not high but the response must be quick and must be returned even for old data and over a large time frame. There are a lot of simple queries (facet/facet.pivot for small distributed fields) but there are also heavy queries like facet.pivot for a large-scale distributed fields. We use distributed search to query the cores and, usually, the query over 1-2 weeks (around 7-28 cores). After some large queries (with facet.pivot for wide distributed fields) we sometimes encounter a java.lang.OutOfMemoryError: Java heap space exception:. The software is to be deployed to customer sites so increasing memory would not always be possible, and the customers may want to get slower responses for the larger queries, if we can provide them. We looked at the LotsOfCores functionality that was added in 4.1 and 4.2. It enables defining an upper limit of online cores and unloading them when the cache gets full on a LRU basis. However in our case it seems a more general use case is needed: * Only cores that are used for updates/inserts must be loaded at all times. Other cores, which are queried only, should be loaded / unloaded on demand while the query runs, until completion – according to memory demands. * Each facet, facet.pivot must be estimated for memory consumption. In case there is not enough memory to run the query for all cores concurrently it must be separated into sequential queries, unloading already queried or irrelevant cores (but not permanent cores) and loading older cores to complete the query. * Occasionally, the oldest cores should be unloaded according to a configurable policy (for example, one type of high volume cores will be kept loaded for 1 week, while smaller cores can remain loaded for a month). The policy will allow for data we know is queried less but is higher volume to be kept live over shorter time periods. We are considering adding the following functionality to Solr (optional – turned on by new configs): The flow of SolrCore.execute() function will be changed: Change status of the core to “USED” Call waitForResource(SolrRequestHandler, SolrQueryRequest) function estimate the required memory for this query/handler on this core if there is no enough free resources to run the query then if all cores are permanent and can’t be unloaded then throw a
[jira] [Commented] (LUCENE-4858) Early termination with SortingMergePolicy
[ https://issues.apache.org/jira/browse/LUCENE-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626546#comment-13626546 ] Shai Erera commented on LUCENE-4858: bq. Since doc values types are exclusive, could we then just say that these are doc values without mentioning the type? +1. I mistakenly thought you can add both a numeric and binary DocValues to a document, under the same name. I prefer slightly less verbosity, but just because I think the fieldName and order part are redundant. So DocValues($field,asc|desc)? Early termination with SortingMergePolicy - Key: LUCENE-4858 URL: https://issues.apache.org/jira/browse/LUCENE-4858 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Fix For: 4.3 Attachments: LUCENE-4858.patch, LUCENE-4858.patch, LUCENE-4858.patch, LUCENE-4858.patch, LUCENE-4858.patch, LUCENE-4858.patch Spin-off of LUCENE-4752, see https://issues.apache.org/jira/browse/LUCENE-4752?focusedCommentId=13606565page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13606565 and https://issues.apache.org/jira/browse/LUCENE-4752?focusedCommentId=13607282page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13607282 When an index is sorted per-segment, queries that sort according to the index sort order could be early terminated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4858) Early termination with SortingMergePolicy
[ https://issues.apache.org/jira/browse/LUCENE-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626548#comment-13626548 ] Adrien Grand commented on LUCENE-4858: -- Sounds good to me! Early termination with SortingMergePolicy - Key: LUCENE-4858 URL: https://issues.apache.org/jira/browse/LUCENE-4858 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Fix For: 4.3 Attachments: LUCENE-4858.patch, LUCENE-4858.patch, LUCENE-4858.patch, LUCENE-4858.patch, LUCENE-4858.patch, LUCENE-4858.patch Spin-off of LUCENE-4752, see https://issues.apache.org/jira/browse/LUCENE-4752?focusedCommentId=13606565page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13606565 and https://issues.apache.org/jira/browse/LUCENE-4752?focusedCommentId=13607282page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13607282 When an index is sorted per-segment, queries that sort according to the index sort order could be early terminated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4880) Difference in offset handling between IndexReader created by MemoryIndex and one created by RAMDirectory
[ https://issues.apache.org/jira/browse/LUCENE-4880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-4880: Attachment: LUCENE-4880.patch Attached is a fix with tests. Difference in offset handling between IndexReader created by MemoryIndex and one created by RAMDirectory Key: LUCENE-4880 URL: https://issues.apache.org/jira/browse/LUCENE-4880 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 4.2 Environment: Windows 7 (probably irrelevant) Reporter: Timothy Allison Attachments: LUCENE-4880.patch, MemoryIndexVsRamDirZeroLengthTermTest.java MemoryIndex skips tokens that have length == 0 when building the index; the result is that it does not increment the token offset (nor does it store the position offsets if that option is set) for tokens of length == 0. A regular index (via, say, RAMDirectory) does not appear to do this. When using the ICUFoldingFilter, it is possible to have a term of zero length (the \u0640 character separated by spaces). If that occurs in a document, the offsets returned at search time differ between the MemoryIndex and a regular index. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (LUCENE-4919) IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0
[ https://issues.apache.org/jira/browse/LUCENE-4919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renaud Delbru closed LUCENE-4919. - Resolution: Not A Problem IntsRef, BytesRef and CharsRef return incorrect hashcode when filled with 0 --- Key: LUCENE-4919 URL: https://issues.apache.org/jira/browse/LUCENE-4919 Project: Lucene - Core Issue Type: Bug Components: core/other Affects Versions: 4.2 Reporter: Renaud Delbru Fix For: 4.3 Attachments: LUCENE-4919.patch IntsRef, BytesRef and CharsRef implementation do not follow the java Arrays.hashCode implementation, and return incorrect hashcode when filled with 0. For example, an IntsRef with \{ 0 \} will return the same hashcode than an IntsRef with \{ 0, 0 \}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4581) sort-order of facet-counts depends on facet.mincount
[ https://issues.apache.org/jira/browse/SOLR-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626592#comment-13626592 ] Yonik Seeley commented on SOLR-4581: It looks like this is a new bug due to the new faceting code introduced in SOLR-3855 sort-order of facet-counts depends on facet.mincount Key: SOLR-4581 URL: https://issues.apache.org/jira/browse/SOLR-4581 Project: Solr Issue Type: Bug Affects Versions: 4.2 Reporter: Alexander Buhr Assignee: Yonik Seeley Attachments: SOLR-4581.patch, SOLR-4581.patch I just upgraded to Solr 4.2 and cannot explain the following behaviour: I am using a solr.TrieDoubleField named 'ListPrice_EUR_INV' as a facet-field. The solr-response for the query {noformat}'solr/Products/select?q=*%3A*wt=xmlindent=truefacet=truefacet.field=ListPrice_EUR_INVf.ListPrice_EUR_INV.facet.sort=index'{noformat} includes the following facet-counts: {noformat}lst name=ListPrice_EUR_INV int name=-420.1261/int int name=-285.6721/int int name=-1.2181/int /lst{noformat} If I also set the parameter *'facet.mincount=1'* in the query, the order of the facet-counts is reversed. {noformat}lst name=ListPrice_EUR_INV int name=-1.2181/int int name=-285.6721/int int name=-420.1261/int /lst{noformat} I would have expected, that the sort-order of the facet-counts is not affected by the facet.mincount parameter, as it is in Solr 4.1. Is this related to SOLR-2850? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4880) Difference in offset handling between IndexReader created by MemoryIndex and one created by RAMDirectory
[ https://issues.apache.org/jira/browse/LUCENE-4880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-4880. - Resolution: Fixed Fix Version/s: 4.3 5.0 Thanks Timothy! Difference in offset handling between IndexReader created by MemoryIndex and one created by RAMDirectory Key: LUCENE-4880 URL: https://issues.apache.org/jira/browse/LUCENE-4880 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 4.2 Environment: Windows 7 (probably irrelevant) Reporter: Timothy Allison Fix For: 5.0, 4.3 Attachments: LUCENE-4880.patch, MemoryIndexVsRamDirZeroLengthTermTest.java MemoryIndex skips tokens that have length == 0 when building the index; the result is that it does not increment the token offset (nor does it store the position offsets if that option is set) for tokens of length == 0. A regular index (via, say, RAMDirectory) does not appear to do this. When using the ICUFoldingFilter, it is possible to have a term of zero length (the \u0640 character separated by spaces). If that occurs in a document, the offsets returned at search time differ between the MemoryIndex and a regular index. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4858) Early termination with SortingMergePolicy
[ https://issues.apache.org/jira/browse/LUCENE-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-4858: --- Attachment: LUCENE-4858.patch Patch adds CHANGES and improves getID impls. I think it's ready. I'll run some tests and if everything's ok, commit. Early termination with SortingMergePolicy - Key: LUCENE-4858 URL: https://issues.apache.org/jira/browse/LUCENE-4858 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Fix For: 4.3 Attachments: LUCENE-4858.patch, LUCENE-4858.patch, LUCENE-4858.patch, LUCENE-4858.patch, LUCENE-4858.patch, LUCENE-4858.patch, LUCENE-4858.patch Spin-off of LUCENE-4752, see https://issues.apache.org/jira/browse/LUCENE-4752?focusedCommentId=13606565page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13606565 and https://issues.apache.org/jira/browse/LUCENE-4752?focusedCommentId=13607282page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13607282 When an index is sorted per-segment, queries that sort according to the index sort order could be early terminated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-949) AnalyzingQueryParser can't work with leading wildcards.
[ https://issues.apache.org/jira/browse/LUCENE-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626606#comment-13626606 ] Robert Muir commented on LUCENE-949: Hello Timothy, can you turn these changes into a patch? See http://wiki.apache.org/lucene-java/HowToContribute#Creating_a_patch Thanks! AnalyzingQueryParser can't work with leading wildcards. --- Key: LUCENE-949 URL: https://issues.apache.org/jira/browse/LUCENE-949 Project: Lucene - Core Issue Type: Bug Components: core/queryparser Affects Versions: 2.2 Reporter: Stefan Klein Attachments: AnalyzingQueryParser.java The getWildcardQuery mehtod in AnalyzingQueryParser.java need the following changes to accept leading wildcards: protected Query getWildcardQuery(String field, String termStr) throws ParseException { String useTermStr = termStr; String leadingWildcard = null; if (*.equals(field)) { if (*.equals(useTermStr)) return new MatchAllDocsQuery(); } boolean hasLeadingWildcard = (useTermStr.startsWith(*) || useTermStr.startsWith(?)) ? true : false; if (!getAllowLeadingWildcard() hasLeadingWildcard) throw new ParseException('*' or '?' not allowed as first character in WildcardQuery); if (getLowercaseExpandedTerms()) { useTermStr = useTermStr.toLowerCase(); } if (hasLeadingWildcard) { leadingWildcard = useTermStr.substring(0, 1); useTermStr = useTermStr.substring(1); } List tlist = new ArrayList(); List wlist = new ArrayList(); /* * somewhat a hack: find/store wildcard chars in order to put them back * after analyzing */ boolean isWithinToken = (!useTermStr.startsWith(?) !useTermStr.startsWith(*)); isWithinToken = true; StringBuffer tmpBuffer = new StringBuffer(); char[] chars = useTermStr.toCharArray(); for (int i = 0; i useTermStr.length(); i++) { if (chars[i] == '?' || chars[i] == '*') { if (isWithinToken) { tlist.add(tmpBuffer.toString()); tmpBuffer.setLength(0); } isWithinToken = false; } else { if (!isWithinToken) { wlist.add(tmpBuffer.toString()); tmpBuffer.setLength(0); } isWithinToken = true; } tmpBuffer.append(chars[i]); } if (isWithinToken) { tlist.add(tmpBuffer.toString()); } else { wlist.add(tmpBuffer.toString()); } // get Analyzer from superclass and tokenize the term TokenStream source = getAnalyzer().tokenStream(field, new StringReader(useTermStr)); org.apache.lucene.analysis.Token t; int countTokens = 0; while (true) { try { t = source.next(); } catch (IOException e) { t = null; } if (t == null) { break; } if (!.equals(t.termText())) { try { tlist.set(countTokens++, t.termText()); } catch (IndexOutOfBoundsException ioobe) { countTokens = -1; } } } try { source.close(); } catch (IOException e) {
[jira] [Resolved] (SOLR-4677) Improve Solr's use of spec version.
[ https://issues.apache.org/jira/browse/SOLR-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved SOLR-4677. --- Resolution: Fixed Improve Solr's use of spec version. --- Key: SOLR-4677 URL: https://issues.apache.org/jira/browse/SOLR-4677 Project: Solr Issue Type: Improvement Components: Build Reporter: Mark Miller Fix For: 4.3, 5.0 Attachments: SOLR-4677.patch, SOLR-4677.patch, SOLR-4677.patch Solr 4.2.1 went out with an impl version of 4.2.1 and a spec version of 4.2.0. This is because you must update the spec version in common-build.xml while the impl is set by the version you pass as a sys prop when doing prepare-release. Do we need this spec version? Does it serve any purpose? I think we should either stop dealing with it or just set it the same way as the impl...or? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-4693) Create a collections API to delete/cleanup a Slice
Anshum Gupta created SOLR-4693: -- Summary: Create a collections API to delete/cleanup a Slice Key: SOLR-4693 URL: https://issues.apache.org/jira/browse/SOLR-4693 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Anshum Gupta Have a collections API that cleans up a given shard. Among other places, this would be useful post the shard split call to manage the parent/original slice. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4903) Add AssertingScorer
[ https://issues.apache.org/jira/browse/LUCENE-4903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626628#comment-13626628 ] Robert Muir commented on LUCENE-4903: - {quote} The problem is that scorers are hard to track: scoring usually happens by calling Scorer.score(Collector), which itself calls Collector.setScorer(Scorer). Since the asserting scorer delegates to the wrapped one, the asserting scorer gets lost, this is why Collector.setScorer tries to get it back by using a weak hash map. I'm not totally happy with it either and would really like to make Scorer.score(Collector) use methods from the asserting scorer directly. We can't rely on Scorer.score(Collector)'s default implementation since it relies on Scorer.nextDoc and some scorers such as BooleanScorer don't implement this method. {quote} Could we alternatively use VirtualMethod to detect if score(Collector)/score(Collector,int,int) are overridden in the underlying scorer? If they aren't, then its safe for AssertingScorer to use its own implementation (possibly with more checks). Add AssertingScorer --- Key: LUCENE-4903 URL: https://issues.apache.org/jira/browse/LUCENE-4903 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-4903.patch I think we would benefit from having an AssertingScorer that would assert that scorers are advanced correctly, return valid scores (eg. not NaN), ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4858) Early termination with SortingMergePolicy
[ https://issues.apache.org/jira/browse/LUCENE-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626637#comment-13626637 ] Adrien Grand commented on LUCENE-4858: -- +1 Early termination with SortingMergePolicy - Key: LUCENE-4858 URL: https://issues.apache.org/jira/browse/LUCENE-4858 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Fix For: 4.3 Attachments: LUCENE-4858.patch, LUCENE-4858.patch, LUCENE-4858.patch, LUCENE-4858.patch, LUCENE-4858.patch, LUCENE-4858.patch, LUCENE-4858.patch Spin-off of LUCENE-4752, see https://issues.apache.org/jira/browse/LUCENE-4752?focusedCommentId=13606565page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13606565 and https://issues.apache.org/jira/browse/LUCENE-4752?focusedCommentId=13607282page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13607282 When an index is sorted per-segment, queries that sort according to the index sort order could be early terminated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4903) Add AssertingScorer
[ https://issues.apache.org/jira/browse/LUCENE-4903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626638#comment-13626638 ] Adrien Grand commented on LUCENE-4903: -- This is a good idea, I didn't know of this class. I'll update the patch! Add AssertingScorer --- Key: LUCENE-4903 URL: https://issues.apache.org/jira/browse/LUCENE-4903 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Attachments: LUCENE-4903.patch I think we would benefit from having an AssertingScorer that would assert that scorers are advanced correctly, return valid scores (eg. not NaN), ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2366) Facet Range Gaps
[ https://issues.apache.org/jira/browse/SOLR-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626639#comment-13626639 ] Jeroen Steggink commented on SOLR-2366: --- I'm also very interested in a variable range gap feature. Facet Range Gaps Key: SOLR-2366 URL: https://issues.apache.org/jira/browse/SOLR-2366 Project: Solr Issue Type: Improvement Reporter: Grant Ingersoll Priority: Minor Fix For: 4.3 Attachments: SOLR-2366.patch, SOLR-2366.patch There really is no reason why the range gap for date and numeric faceting needs to be evenly spaced. For instance, if and when SOLR-1581 is completed and one were doing spatial distance calculations, one could facet by function into 3 different sized buckets: walking distance (0-5KM), driving distance (5KM-150KM) and everything else (150KM+), for instance. We should be able to quantize the results into arbitrarily sized buckets. (Original syntax proposal removed, see discussion for concrete syntax) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4920) CLONE - TermsFilter should use AutomatonQuery
sani kumar created LUCENE-4920: --- Summary: CLONE - TermsFilter should use AutomatonQuery Key: LUCENE-4920 URL: https://issues.apache.org/jira/browse/LUCENE-4920 Project: Lucene - Core Issue Type: Improvement Reporter: sani kumar I think we could see perf gains if TermsFilter sorted the terms, built a minimal automaton, and used TermsEnum.intersect to visit the terms... This idea came up on the dev list recently. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4858) Early termination with SortingMergePolicy
[ https://issues.apache.org/jira/browse/LUCENE-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera resolved LUCENE-4858. Resolution: Fixed Fix Version/s: 5.0 Lucene Fields: New,Patch Available (was: New) Committed to trunk and 4x. Thanks Adrien for the fun collaboration! Early termination with SortingMergePolicy - Key: LUCENE-4858 URL: https://issues.apache.org/jira/browse/LUCENE-4858 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Priority: Minor Fix For: 5.0, 4.3 Attachments: LUCENE-4858.patch, LUCENE-4858.patch, LUCENE-4858.patch, LUCENE-4858.patch, LUCENE-4858.patch, LUCENE-4858.patch, LUCENE-4858.patch Spin-off of LUCENE-4752, see https://issues.apache.org/jira/browse/LUCENE-4752?focusedCommentId=13606565page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13606565 and https://issues.apache.org/jira/browse/LUCENE-4752?focusedCommentId=13607282page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13607282 When an index is sorted per-segment, queries that sort according to the index sort order could be early terminated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4663) Log an error if more than one core points to the same data dir.
[ https://issues.apache.org/jira/browse/SOLR-4663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-4663: - Attachment: SOLR-4663.patch I'll commit this later today unless there are objections, and assuming all the tests pass (running nightly now). Log an error if more than one core points to the same data dir. --- Key: SOLR-4663 URL: https://issues.apache.org/jira/browse/SOLR-4663 Project: Solr Issue Type: Improvement Affects Versions: 4.3, 5.0 Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor Attachments: SOLR-4663.patch, SOLR-4663.patch, SOLR-4663.patch In large multi-core setups, having mistakes whereby two or more cores point to the same data dir seems quite possible. We should at least complain very loudly in the logs if this is detected. Should be a very straightforward check at core discovery time. Is this serious enough to keep Solr from coming up at all? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4581) sort-order of facet-counts depends on facet.mincount
[ https://issues.apache.org/jira/browse/SOLR-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626682#comment-13626682 ] Yonik Seeley commented on SOLR-4581: OK, the code from SOLR-3855 had a bug where the IEEE float bits were used/compared directly for sort order, which is not correct for negative numbers. I'm testing a patch now, expect to commit shortly. sort-order of facet-counts depends on facet.mincount Key: SOLR-4581 URL: https://issues.apache.org/jira/browse/SOLR-4581 Project: Solr Issue Type: Bug Affects Versions: 4.2 Reporter: Alexander Buhr Assignee: Yonik Seeley Attachments: SOLR-4581.patch, SOLR-4581.patch I just upgraded to Solr 4.2 and cannot explain the following behaviour: I am using a solr.TrieDoubleField named 'ListPrice_EUR_INV' as a facet-field. The solr-response for the query {noformat}'solr/Products/select?q=*%3A*wt=xmlindent=truefacet=truefacet.field=ListPrice_EUR_INVf.ListPrice_EUR_INV.facet.sort=index'{noformat} includes the following facet-counts: {noformat}lst name=ListPrice_EUR_INV int name=-420.1261/int int name=-285.6721/int int name=-1.2181/int /lst{noformat} If I also set the parameter *'facet.mincount=1'* in the query, the order of the facet-counts is reversed. {noformat}lst name=ListPrice_EUR_INV int name=-1.2181/int int name=-285.6721/int int name=-420.1261/int /lst{noformat} I would have expected, that the sort-order of the facet-counts is not affected by the facet.mincount parameter, as it is in Solr 4.1. Is this related to SOLR-2850? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4904) Sorter API: Make NumericDocValuesSorter able to sort in reverse order
[ https://issues.apache.org/jira/browse/LUCENE-4904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-4904: --- Attachment: LUCENE-4904.patch I hacked this up real quickly, so I could be missing something. Patch adds a ReverseOrderSorter which wraps a Sorter and on sort() returns a DocMap that reverses whatever the wrapped Sorter DocMap returned. I still didn't figure out how to plug that sorter with existing tests, so it could be this approach doesn't work. Will look at it later. Sorter API: Make NumericDocValuesSorter able to sort in reverse order - Key: LUCENE-4904 URL: https://issues.apache.org/jira/browse/LUCENE-4904 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Priority: Trivial Labels: newdev Fix For: 4.3 Attachments: LUCENE-4904.patch Today it is only able to sort in ascending order. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4581) sort-order of facet-counts depends on facet.mincount
[ https://issues.apache.org/jira/browse/SOLR-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626704#comment-13626704 ] Alexander Buhr commented on SOLR-4581: -- happy to hear this :) thx! sort-order of facet-counts depends on facet.mincount Key: SOLR-4581 URL: https://issues.apache.org/jira/browse/SOLR-4581 Project: Solr Issue Type: Bug Affects Versions: 4.2 Reporter: Alexander Buhr Assignee: Yonik Seeley Attachments: SOLR-4581.patch, SOLR-4581.patch I just upgraded to Solr 4.2 and cannot explain the following behaviour: I am using a solr.TrieDoubleField named 'ListPrice_EUR_INV' as a facet-field. The solr-response for the query {noformat}'solr/Products/select?q=*%3A*wt=xmlindent=truefacet=truefacet.field=ListPrice_EUR_INVf.ListPrice_EUR_INV.facet.sort=index'{noformat} includes the following facet-counts: {noformat}lst name=ListPrice_EUR_INV int name=-420.1261/int int name=-285.6721/int int name=-1.2181/int /lst{noformat} If I also set the parameter *'facet.mincount=1'* in the query, the order of the facet-counts is reversed. {noformat}lst name=ListPrice_EUR_INV int name=-1.2181/int int name=-285.6721/int int name=-420.1261/int /lst{noformat} I would have expected, that the sort-order of the facet-counts is not affected by the facet.mincount parameter, as it is in Solr 4.1. Is this related to SOLR-2850? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-4694) DataImporter uses wrong format for 'last_index_time'
Arul Kalaipandian created SOLR-4694: --- Summary: DataImporter uses wrong format for 'last_index_time' Key: SOLR-4694 URL: https://issues.apache.org/jira/browse/SOLR-4694 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 4.2 Reporter: Arul Kalaipandian Priority: Blocker DataImporter uses wrong format for first import(no dataimport.propeties in /conf folder). {code} R.LAST_MODIFICATION_DATE = (TO_DATE('${dih.last_index_time}'; R.LAST_MODIFICATION_DATE = (TO_DATE('Thu Jan 01 01:00:00 CET 1970','-mm-dd hh24:mi:ss'). {code} It's similar to SOLR-1496. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4694) DataImporter uses wrong format for 'last_index_time'
[ https://issues.apache.org/jira/browse/SOLR-4694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arul Kalaipandian updated SOLR-4694: Description: DataImporter uses wrong format for first import(no dataimport.propeties in /conf folder). {code} R.LAST_MODIFICATION_DATE = (TO_DATE('${dih.last_index_time}'; formatted as follows, R.LAST_MODIFICATION_DATE = (TO_DATE('Thu Jan 01 01:00:00 CET 1970','-mm-dd hh24:mi:ss'). {code} It's similar to SOLR-1496. was: DataImporter uses wrong format for first import(no dataimport.propeties in /conf folder). {code} R.LAST_MODIFICATION_DATE = (TO_DATE('${dih.last_index_time}'; R.LAST_MODIFICATION_DATE = (TO_DATE('Thu Jan 01 01:00:00 CET 1970','-mm-dd hh24:mi:ss'). {code} It's similar to SOLR-1496. DataImporter uses wrong format for 'last_index_time' Key: SOLR-4694 URL: https://issues.apache.org/jira/browse/SOLR-4694 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 4.2 Reporter: Arul Kalaipandian Priority: Blocker Labels: formatDate DataImporter uses wrong format for first import(no dataimport.propeties in /conf folder). {code} R.LAST_MODIFICATION_DATE = (TO_DATE('${dih.last_index_time}'; formatted as follows, R.LAST_MODIFICATION_DATE = (TO_DATE('Thu Jan 01 01:00:00 CET 1970','-mm-dd hh24:mi:ss'). {code} It's similar to SOLR-1496. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-4581) sort-order of facet-counts depends on facet.mincount
[ https://issues.apache.org/jira/browse/SOLR-4581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley resolved SOLR-4581. Resolution: Fixed Fix Version/s: 5.0 4.3 committed. sort-order of facet-counts depends on facet.mincount Key: SOLR-4581 URL: https://issues.apache.org/jira/browse/SOLR-4581 Project: Solr Issue Type: Bug Affects Versions: 4.2 Reporter: Alexander Buhr Assignee: Yonik Seeley Fix For: 4.3, 5.0 Attachments: SOLR-4581.patch, SOLR-4581.patch I just upgraded to Solr 4.2 and cannot explain the following behaviour: I am using a solr.TrieDoubleField named 'ListPrice_EUR_INV' as a facet-field. The solr-response for the query {noformat}'solr/Products/select?q=*%3A*wt=xmlindent=truefacet=truefacet.field=ListPrice_EUR_INVf.ListPrice_EUR_INV.facet.sort=index'{noformat} includes the following facet-counts: {noformat}lst name=ListPrice_EUR_INV int name=-420.1261/int int name=-285.6721/int int name=-1.2181/int /lst{noformat} If I also set the parameter *'facet.mincount=1'* in the query, the order of the facet-counts is reversed. {noformat}lst name=ListPrice_EUR_INV int name=-1.2181/int int name=-285.6721/int int name=-420.1261/int /lst{noformat} I would have expected, that the sort-order of the facet-counts is not affected by the facet.mincount parameter, as it is in Solr 4.1. Is this related to SOLR-2850? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-4695) Fix core admin SPLIT action to be useful with non-cloud setups
Shalin Shekhar Mangar created SOLR-4695: --- Summary: Fix core admin SPLIT action to be useful with non-cloud setups Key: SOLR-4695 URL: https://issues.apache.org/jira/browse/SOLR-4695 Project: Solr Issue Type: Bug Reporter: Shalin Shekhar Mangar Assignee: Shalin Shekhar Mangar Fix For: 4.3 The core admin SPLIT action assumes that the core being split is zookeeper aware. It will throw a NPE if invoked against a non-cloud solr setup. It should be fixed to work with non-cloud setups and documents in such an index should be distributed alternately into sub-indexes instead of using hashes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3755) shard splitting
[ https://issues.apache.org/jira/browse/SOLR-3755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626753#comment-13626753 ] Shalin Shekhar Mangar commented on SOLR-3755: - Committed three changes: # Set update log to buffering mode before it is published (fixes bug with extra doc count on sub-shard) # Use deleteIndex=true while unloading sub-shard cores (if a sub-shard in construction state already exists at the start of the splitshard operation) # Made ChaosMonkeyShardSplitTest consistent with ShardSplitTest -- Use correct router and replica count, assert sub-shards are active, parent shards are inactive etc Anshum suggested over chat that we should think about combining ShardSplitTest and ChaosMonkeyShardSplit tests into one to avoid code duplication. I'll try to see if we can do that. shard splitting --- Key: SOLR-3755 URL: https://issues.apache.org/jira/browse/SOLR-3755 Project: Solr Issue Type: New Feature Components: SolrCloud Reporter: Yonik Seeley Assignee: Shalin Shekhar Mangar Fix For: 4.3, 5.0 Attachments: SOLR-3755-combined.patch, SOLR-3755-combinedWithReplication.patch, SOLR-3755-CoreAdmin.patch, SOLR-3755.patch, SOLR-3755.patch, SOLR-3755.patch, SOLR-3755.patch, SOLR-3755.patch, SOLR-3755.patch, SOLR-3755.patch, SOLR-3755.patch, SOLR-3755.patch, SOLR-3755.patch, SOLR-3755-testSplitter.patch, SOLR-3755-testSplitter.patch We can currently easily add replicas to handle increases in query volume, but we should also add a way to add additional shards dynamically by splitting existing shards. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-4.x-Linux (32bit/jdk1.7.0_17) - Build # 5040 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/5040/ Java: 32bit/jdk1.7.0_17 -server -XX:+UseConcMarkSweepGC All tests passed Build Log: [...truncated 14695 lines...] BUILD FAILED /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/build.xml:381: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/build.xml:88: The following files contain @author tags, tabs or nocommits: * solr/core/src/test/org/apache/solr/request/TestFaceting.java Total time: 52 minutes 53 seconds Build step 'Invoke Ant' marked build as failure Archiving artifacts Recording test results Description set: Java: 32bit/jdk1.7.0_17 -server -XX:+UseConcMarkSweepGC Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4679) HTML line breaks (br) are removed during indexing; causes wrong search results
[ https://issues.apache.org/jira/browse/SOLR-4679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626775#comment-13626775 ] Hoss Man commented on SOLR-4679: Right ... i wonder if somewhere in the flow of SAX events these newline are being treated as ignorable whitespace ... i can't imagine why they would be, but that's the best guess i have at the moment. HTML line breaks (br) are removed during indexing; causes wrong search results Key: SOLR-4679 URL: https://issues.apache.org/jira/browse/SOLR-4679 Project: Solr Issue Type: Bug Components: update Affects Versions: 4.2 Environment: Windows Server 2008 R2, Java 6, Tomcat 7 Reporter: Christoph Straßer Attachments: external.htm, Solr_HtmlLineBreak_Linz_NotFound.png, Solr_HtmlLineBreak_Vienna.png HTML line breaks (br, BR, br/, ...) seem to be removed during extraction of content from HTML-Files. They need to be replaced with a empty space. Test-File: html head titleTest mit HTML-Zeilenschaltungen/title /head p word1brword2br/ Some other words, a special name like linzbrand another special name - vienna /p /html The Solr-content-attribute contains the following text: Test mit HTML-Zeilenschaltungen word1word2 Some other words, a special name like linzand another special name - vienna So we are not able to find the word linz. We use the ExtractingRequestHandler to put content into Solr. (wiki.apache.org/solr/ExtractingRequestHandler) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4921) Create a DocValuesFormat for sparse doc values
Adrien Grand created LUCENE-4921: Summary: Create a DocValuesFormat for sparse doc values Key: LUCENE-4921 URL: https://issues.apache.org/jira/browse/LUCENE-4921 Project: Lucene - Core Issue Type: Improvement Components: core/codecs Reporter: Adrien Grand Priority: Trivial We could have a special DocValuesFormat in lucene/codecs to better handle sparse doc values. See http://search-lucene.com/m/HUeYW1RlEtc -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4921) Create a DocValuesFormat for sparse doc values
[ https://issues.apache.org/jira/browse/LUCENE-4921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626796#comment-13626796 ] Robert Muir commented on LUCENE-4921: - a good baseline could be something as simple as passing COMPACT to the default DVConsumer? or we could provide something that works entirely different... there are a lot of possibilities. Create a DocValuesFormat for sparse doc values -- Key: LUCENE-4921 URL: https://issues.apache.org/jira/browse/LUCENE-4921 Project: Lucene - Core Issue Type: Improvement Components: core/codecs Reporter: Adrien Grand Priority: Trivial Labels: gsoc2013, newdev We could have a special DocValuesFormat in lucene/codecs to better handle sparse doc values. See http://search-lucene.com/m/HUeYW1RlEtc -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-Tests-4.x-Java6 - Build # 1490 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-Tests-4.x-Java6/1490/ All tests passed Build Log: [...truncated 14050 lines...] BUILD FAILED /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-4.x-Java6/build.xml:381: The following error occurred while executing this line: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-4.x-Java6/build.xml:88: The following files contain @author tags, tabs or nocommits: * solr/core/src/test/org/apache/solr/request/TestFaceting.java Total time: 62 minutes 20 seconds Build step 'Invoke Ant' marked build as failure Archiving artifacts Recording test results Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4431) Developer Curb Appeal: easier URL to get to Cloud UI
[ https://issues.apache.org/jira/browse/SOLR-4431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Bennett updated SOLR-4431: --- Attachment: SOLR-4431.patch Adds 3 convenience URLs that generate 302 redirects. Also includes ivy entry to fetch 1 additional jetty jar and ini to include it. These are not mimicking older Solr URLs. The nice thing is that they could easily be modified in the future if we need to change the URL structure again. Developer Curb Appeal: easier URL to get to Cloud UI Key: SOLR-4431 URL: https://issues.apache.org/jira/browse/SOLR-4431 Project: Solr Issue Type: Improvement Components: SolrCloud Affects Versions: 4.1 Reporter: Mark Bennett Attachments: SOLR-4431.patch Currently the URL to get the cloud UI is http://172.16.10.236:8983/solr/#/~cloud The path and anchor portion is very strange: /solr/#/~cloud Ideally it would just be /cloud Or even just /, and if it's in cloud mode, take the admin to the right place. If there's some internal important structural reason for /solr, # and ~cloud sections, perhaps each would need to be addressed. Another option would be to possibly put something the default Jetty xml file to handle this as some type of redirect or registered handle. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.7.0_17) - Build # 5090 - Still Failing!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/5090/ Java: 32bit/jdk1.7.0_17 -client -XX:+UseConcMarkSweepGC All tests passed Build Log: [...truncated 14399 lines...] BUILD FAILED /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:375: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:88: The following files contain @author tags, tabs or nocommits: * solr/core/src/test/org/apache/solr/request/TestFaceting.java Total time: 48 minutes 4 seconds Build step 'Invoke Ant' marked build as failure Archiving artifacts Recording test results Description set: Java: 32bit/jdk1.7.0_17 -client -XX:+UseConcMarkSweepGC Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4904) Sorter API: Make NumericDocValuesSorter able to sort in reverse order
[ https://issues.apache.org/jira/browse/LUCENE-4904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-4904: --- Attachment: LUCENE-4904.patch Added ReverseOrderSorter to IndexSortingTest (was after all very easy), which uncovered a bug in my original implementation. It's now working and tests are happy. I basically think this is ready, would appreciate some review. Sorter API: Make NumericDocValuesSorter able to sort in reverse order - Key: LUCENE-4904 URL: https://issues.apache.org/jira/browse/LUCENE-4904 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Priority: Trivial Labels: newdev Fix For: 4.3 Attachments: LUCENE-4904.patch, LUCENE-4904.patch Today it is only able to sort in ascending order. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-Tests-4.x-java7 - Build # 1131 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-Tests-4.x-java7/1131/ All tests passed Build Log: [...truncated 14569 lines...] BUILD FAILED /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-4.x-java7/build.xml:381: The following error occurred while executing this line: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-4.x-java7/build.xml:88: The following files contain @author tags, tabs or nocommits: * solr/core/src/test/org/apache/solr/request/TestFaceting.java Total time: 60 minutes 8 seconds Build step 'Invoke Ant' marked build as failure Archiving artifacts Recording test results Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4431) Developer Curb Appeal: easier URLs for Cloud UI, Admin, etc.
[ https://issues.apache.org/jira/browse/SOLR-4431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626889#comment-13626889 ] Hoss Man commented on SOLR-4431: -1 a) this patch would change the behaviour only of the example jetty server, causing solr to (knowingly!) behave radically different if you deployed to a different servlet container. 2) as explicitly mentioned before, this change would cause problems for people trying to create solr cores (or handlers in the default solr core) named cloud (or any other names that get taken up by other aliases like this that might get added if we go down this road)... bq. switching these UI URLs from things like /~cloud to /cloud would cause problems for anyone who might want to have a collection named cloud The included Admin UI is a nice to have, but improving it's ease of use or prettiness must not come at the expense of reduced configurability or expressiveness of the underlying API URLs. If people want an admin UI for solr that has short and pretty URLs then it should be something deployed as an independent war (or written in ruby or whatever) Developer Curb Appeal: easier URLs for Cloud UI, Admin, etc. Key: SOLR-4431 URL: https://issues.apache.org/jira/browse/SOLR-4431 Project: Solr Issue Type: Improvement Components: SolrCloud Affects Versions: 4.1 Reporter: Mark Bennett Attachments: SOLR-4431.patch Currently the URL to get the cloud UI is http://172.16.10.236:8983/solr/#/~cloud The path and anchor portion is very strange: /solr/#/~cloud Ideally it would just be /cloud Or even just /, and if it's in cloud mode, take the admin to the right place. If there's some internal important structural reason for /solr, # and ~cloud sections, perhaps each would need to be addressed. Another option would be to possibly put something the default Jetty xml file to handle this as some type of redirect or registered handle. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-4.x-Linux (64bit/jdk1.6.0_43) - Build # 5041 - Still Failing!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/5041/ Java: 64bit/jdk1.6.0_43 -XX:+UseSerialGC All tests passed Build Log: [...truncated 13967 lines...] BUILD FAILED /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/build.xml:381: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/build.xml:88: The following files contain @author tags, tabs or nocommits: * solr/core/src/test/org/apache/solr/request/TestFaceting.java Total time: 55 minutes 40 seconds Build step 'Invoke Ant' marked build as failure Archiving artifacts Recording test results Description set: Java: 64bit/jdk1.6.0_43 -XX:+UseSerialGC Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4922) A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes
David Smiley created LUCENE-4922: Summary: A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes Key: LUCENE-4922 URL: https://issues.apache.org/jira/browse/LUCENE-4922 Project: Lucene - Core Issue Type: New Feature Components: modules/spatial Reporter: David Smiley My wish-list for an ideal SpatialPrefixTree has these properties: * Hilbert Curve ordering * Variable grid size per level (ex: 256 at the top, 64 at the bottom, 16 for all in-between) * Compact binary encoding (so-called Morton number) * Works for geodetic (i.e. lat lon) and non-geodetic Some bonus wishes for use in geospatial: * Use an equal-area projection such that each cell has an equal area to all others at the same level. * When advancing a grid level, if a cell's width is less than half its height. then divide it as 4 vertically stacked instead of 2 by 2. The point is to avoid super-skinny cells which occurs towards the poles and degrades performance. All of this requires some basic performance benchmarks to measure the effects of these characteristics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4904) Sorter API: Make NumericDocValuesSorter able to sort in reverse order
[ https://issues.apache.org/jira/browse/LUCENE-4904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-4904: --- Attachment: LUCENE-4904.patch Patch on latest trunk (previous one had issues applying). Sorter API: Make NumericDocValuesSorter able to sort in reverse order - Key: LUCENE-4904 URL: https://issues.apache.org/jira/browse/LUCENE-4904 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Priority: Trivial Labels: newdev Fix For: 4.3 Attachments: LUCENE-4904.patch, LUCENE-4904.patch, LUCENE-4904.patch Today it is only able to sort in ascending order. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4922) A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes
[ https://issues.apache.org/jira/browse/LUCENE-4922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-4922: - Assignee: David Smiley Labels: gsoc2013 mentor newdev (was: gsoc2013 newdev) A SpatialPrefixTree based on the Hilbert Curve and variable grid sizes -- Key: LUCENE-4922 URL: https://issues.apache.org/jira/browse/LUCENE-4922 Project: Lucene - Core Issue Type: New Feature Components: modules/spatial Reporter: David Smiley Assignee: David Smiley Labels: gsoc2013, mentor, newdev My wish-list for an ideal SpatialPrefixTree has these properties: * Hilbert Curve ordering * Variable grid size per level (ex: 256 at the top, 64 at the bottom, 16 for all in-between) * Compact binary encoding (so-called Morton number) * Works for geodetic (i.e. lat lon) and non-geodetic Some bonus wishes for use in geospatial: * Use an equal-area projection such that each cell has an equal area to all others at the same level. * When advancing a grid level, if a cell's width is less than half its height. then divide it as 4 vertically stacked instead of 2 by 2. The point is to avoid super-skinny cells which occurs towards the poles and degrades performance. All of this requires some basic performance benchmarks to measure the effects of these characteristics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-4696) All threads become blocked resulting in hang when bulk adding
matt knecht created SOLR-4696: - Summary: All threads become blocked resulting in hang when bulk adding Key: SOLR-4696 URL: https://issues.apache.org/jira/browse/SOLR-4696 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.2, 4.1, 4.2.1 Environment: Ubuntu 12.04.2 LTS 3.5.0-27-generic Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode) KVM, 4xCPU, 5GB RAM, 4GB heap. Reporter: matt knecht During a bulk load after about 150,000 documents load, thread usage spikes, solr no longer processes any documents. Any additional documents added result in a new thread until the pool is exhausted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4696) All threads become blocked resulting in hang when bulk adding
[ https://issues.apache.org/jira/browse/SOLR-4696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] matt knecht updated SOLR-4696: -- Attachment: solr.jstack.2 solr.jstack.1 jstack from solr once problem manifests. Stopped adding documents before running out of threads. One jstack from each solr node (4 cores, 2 shards) All threads become blocked resulting in hang when bulk adding - Key: SOLR-4696 URL: https://issues.apache.org/jira/browse/SOLR-4696 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.1, 4.2, 4.2.1 Environment: Ubuntu 12.04.2 LTS 3.5.0-27-generic Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode) KVM, 4xCPU, 5GB RAM, 4GB heap. Reporter: matt knecht Labels: hang Attachments: solr.jstack.1, solr.jstack.2 During a bulk load after about 150,000 documents load, thread usage spikes, solr no longer processes any documents. Any additional documents added result in a new thread until the pool is exhausted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4696) All threads become blocked resulting in hang when bulk adding
[ https://issues.apache.org/jira/browse/SOLR-4696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] matt knecht updated SOLR-4696: -- Attachment: screenshot-1.jpg jconsole overview. Solr stops processing new documents, CPU usage drops, threads grow as new docs are submitted that go into immediate wait. All threads become blocked resulting in hang when bulk adding - Key: SOLR-4696 URL: https://issues.apache.org/jira/browse/SOLR-4696 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.1, 4.2, 4.2.1 Environment: Ubuntu 12.04.2 LTS 3.5.0-27-generic Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode) KVM, 4xCPU, 5GB RAM, 4GB heap. Reporter: matt knecht Labels: hang Attachments: screenshot-1.jpg, solr.jstack.1, solr.jstack.2 During a bulk load after about 150,000 documents load, thread usage spikes, solr no longer processes any documents. Any additional documents added result in a new thread until the pool is exhausted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4696) All threads become blocked resulting in hang when bulk adding
[ https://issues.apache.org/jira/browse/SOLR-4696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] matt knecht updated SOLR-4696: -- Environment: Ubuntu 12.04.2 LTS 3.5.0-27-generic Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode) KVM, 4xCPU, 5GB RAM, 4GB heap. 4 cores, 2 shards, 2 nodes, tomcat7 was: Ubuntu 12.04.2 LTS 3.5.0-27-generic Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode) KVM, 4xCPU, 5GB RAM, 4GB heap. All threads become blocked resulting in hang when bulk adding - Key: SOLR-4696 URL: https://issues.apache.org/jira/browse/SOLR-4696 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.1, 4.2, 4.2.1 Environment: Ubuntu 12.04.2 LTS 3.5.0-27-generic Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode) KVM, 4xCPU, 5GB RAM, 4GB heap. 4 cores, 2 shards, 2 nodes, tomcat7 Reporter: matt knecht Labels: hang Attachments: screenshot-1.jpg, solr.jstack.1, solr.jstack.2 During a bulk load after about 150,000 documents load, thread usage spikes, solr no longer processes any documents. Any additional documents added result in a new thread until the pool is exhausted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4904) Sorter API: Make NumericDocValuesSorter able to sort in reverse order
[ https://issues.apache.org/jira/browse/LUCENE-4904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626982#comment-13626982 ] Adrien Grand commented on LUCENE-4904: -- We can add this ReverseOrderSorter, but as far as NumericDocValuesSorter is concerned, I would rather have the abstraction at the level of the DocComparator rather than the Sorter. This would allow {{Sorter.sort(int,DocComparator)}} to quickly return null without allocating (potentially lots of) memory for the doc maps if the reader is already sorted. Additionally, this would allow for more readable diagnostics (such as DocValues(fieldName,desc) instead of Reverse(DocValues(fieldName,asc)). Sorter API: Make NumericDocValuesSorter able to sort in reverse order - Key: LUCENE-4904 URL: https://issues.apache.org/jira/browse/LUCENE-4904 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Priority: Trivial Labels: newdev Fix For: 4.3 Attachments: LUCENE-4904.patch, LUCENE-4904.patch, LUCENE-4904.patch Today it is only able to sort in ascending order. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4431) Developer Curb Appeal: easier URLs for Cloud UI, Admin, etc.
[ https://issues.apache.org/jira/browse/SOLR-4431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626992#comment-13626992 ] Shawn Heisey commented on SOLR-4431: -1 here too, can't think of any additional reasons beyond those Hoss has stated. Everything except this patch's specific tie to jetty would be solved if we could put the API to access cores and collections into its own url path, such as /api/corename (horrible name, just for illustration purposes). That is an idea with its own problems, though. The current URL scheme is so widely used that there'd be no way we could remove backward compatibility until at least 6.0. Note: I actually think this would be a good move, but I would not expect to see a lot of support for it. Developer Curb Appeal: easier URLs for Cloud UI, Admin, etc. Key: SOLR-4431 URL: https://issues.apache.org/jira/browse/SOLR-4431 Project: Solr Issue Type: Improvement Components: SolrCloud Affects Versions: 4.1 Reporter: Mark Bennett Attachments: SOLR-4431.patch Currently the URL to get the cloud UI is http://172.16.10.236:8983/solr/#/~cloud The path and anchor portion is very strange: /solr/#/~cloud Ideally it would just be /cloud Or even just /, and if it's in cloud mode, take the admin to the right place. If there's some internal important structural reason for /solr, # and ~cloud sections, perhaps each would need to be addressed. Another option would be to possibly put something the default Jetty xml file to handle this as some type of redirect or registered handle. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4658) In preparation for dynamic schema modification via REST API, add a managed schema facility
[ https://issues.apache.org/jira/browse/SOLR-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13627022#comment-13627022 ] yuanyun.cn commented on SOLR-4658: -- Steve, Thanks for your excellent work. I met one small issue when use this feature: in our schema, we define one fieldtype, it has one tokenizer: MyPathHierarchyTokenizerFactory which in the package org.apache.lucene.analysis. -- This is not good, but the factory class is in the package a long time ago. {code:xml} fieldType name=text_path class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=org.apache.lucene.analysis.MyPathHierarchyTokenizerFactory delimiter=\ replace=// /analyzer /fieldType {code} After upgrade, it shorten the name to solr.MyPathHierarchyTokenizerFactory due to org.apache.solr.schema.FieldType.getShortName(String). private static final Pattern SHORTENABLE_PACKAGE_PATTERN = Pattern.compile(org\\.apache\\.(?:lucene\\.analysis(?=.).*|solr\\.(?:analysis|schema))\\.([^.]+)$); Then later it will fail with follwing error when I restart my solr server, Caused by: org.apache.solr.common.SolrException: Error loading class 'solr.MyPathHierarchyTokenizerFactory' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:440) This is because in SolrResourceLoader.findClass, it will try to load the class in sub packages of org.apache.solr. Can't find it, so throw ClassNotFoundException. base=org.apache.solr; String name = base + '.' + subpackage + newName; return clazz = Class.forName(name,true,classLoader).asSubclass(expectedType); I think maybe we can SHORTENABLE_PACKAGE_PATTERN: Pattern.compile(org\\.apache\\.(?:solr\\.(?:analysis|schema))\\.([^.]+)$); After change SHORTENABLE_PACKAGE_PATTERN like above, it works for me now. In preparation for dynamic schema modification via REST API, add a managed schema facility Key: SOLR-4658 URL: https://issues.apache.org/jira/browse/SOLR-4658 Project: Solr Issue Type: Sub-task Components: Schema and Analysis Reporter: Steve Rowe Assignee: Steve Rowe Priority: Minor Fix For: 4.3, 5.0 Attachments: SOLR-4658.patch, SOLR-4658.patch The idea is to have a set of configuration items in {{solrconfig.xml}}: {code:xml} schema managed=true mutable=true managedSchemaResourceName=managed-schema/ {code} It will be a precondition for future dynamic schema modification APIs that {{mutable=true}}. {{solrconfig.xml}} parsing will fail if {{mutable=true}} but {{managed=false}}. When {{managed=true}}, and the resource named in {{managedSchemaResourceName}} doesn't exist, Solr will automatically upgrade the schema to managed: the non-managed schema resource (typically {{schema.xml}}) is parsed and then persisted at {{managedSchemaResourceName}} under {{$solrHome/$collectionOrCore/conf/}}, or on ZooKeeper at {{/configs/$configName/}}, and the non-managed schema resource is renamed by appending {{.bak}}, e.g. {{schema.xml.bak}}. Once the upgrade has taken place, users can get the full schema from the {{/schema?wt=schema.xml}} REST API, and can use this as the basis for modifications which can then be used to manually downgrade back to non-managed schema: put the {{schema.xml}} in place, then add {{schema managed=false/}} to {{solrconfig.xml}} (or remove the whole {{schema/}} element, since {{managed=false}} is the default). If users take no action, then Solr behaves the same as always: the example {{solrconfig.xml}} will include {{schema managed=false ...}}. For a discussion of rationale for this feature, see [~hossman_luc...@fucit.org]'s post to the solr-user mailing list in the thread Dynamic schema design: feedback requested [http://markmail.org/message/76zj24dru2gkop7b]: {quote} Ignoring for a moment what format is used to persist schema information, I think it's important to have a conceptual distinction between data that is managed by applications and manipulated by a REST API, and config that is managed by the user and loaded by solr on init -- or via an explicit reload config REST API. Past experience with how users percieve(d) solr.xml has heavily reinforced this opinion: on one hand, it's a place users must specify some config information -- so people wnat to be able to keep it in version control with other config files. On the other hand it's a live data file that is rewritten by solr when cores are added. (God help you if you want do a rolling deploy a new version of solr.xml where you've edited some of the config values while simultenously clients are creating new SolrCores) As we move forward towards having REST APIs that
[jira] [Updated] (SOLR-4696) All threads become blocked resulting in hang when bulk adding
[ https://issues.apache.org/jira/browse/SOLR-4696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] matt knecht updated SOLR-4696: -- Attachment: solrconfig.xml solrconfig mostly default, except for: autoCommit !-- 30 minute auto commit -- maxTime180/maxTime maxTime10/maxTime openSearcherfalse/openSearcher /autoCommit autoSoftCommit maxTime1000/maxTime /autoSoftCommit All threads become blocked resulting in hang when bulk adding - Key: SOLR-4696 URL: https://issues.apache.org/jira/browse/SOLR-4696 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.1, 4.2, 4.2.1 Environment: Ubuntu 12.04.2 LTS 3.5.0-27-generic Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode) KVM, 4xCPU, 5GB RAM, 4GB heap. 4 cores, 2 shards, 2 nodes, tomcat7 Reporter: matt knecht Labels: hang Attachments: screenshot-1.jpg, solrconfig.xml, solr.jstack.1, solr.jstack.2 During a bulk load after about 150,000 documents load, thread usage spikes, solr no longer processes any documents. Any additional documents added result in a new thread until the pool is exhausted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4658) In preparation for dynamic schema modification via REST API, add a managed schema facility
[ https://issues.apache.org/jira/browse/SOLR-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13627026#comment-13627026 ] Robert Muir commented on SOLR-4658: --- I mentioned this same bug as it applies to similarities on the dev list a week or so ago! In preparation for dynamic schema modification via REST API, add a managed schema facility Key: SOLR-4658 URL: https://issues.apache.org/jira/browse/SOLR-4658 Project: Solr Issue Type: Sub-task Components: Schema and Analysis Reporter: Steve Rowe Assignee: Steve Rowe Priority: Minor Fix For: 4.3, 5.0 Attachments: SOLR-4658.patch, SOLR-4658.patch The idea is to have a set of configuration items in {{solrconfig.xml}}: {code:xml} schema managed=true mutable=true managedSchemaResourceName=managed-schema/ {code} It will be a precondition for future dynamic schema modification APIs that {{mutable=true}}. {{solrconfig.xml}} parsing will fail if {{mutable=true}} but {{managed=false}}. When {{managed=true}}, and the resource named in {{managedSchemaResourceName}} doesn't exist, Solr will automatically upgrade the schema to managed: the non-managed schema resource (typically {{schema.xml}}) is parsed and then persisted at {{managedSchemaResourceName}} under {{$solrHome/$collectionOrCore/conf/}}, or on ZooKeeper at {{/configs/$configName/}}, and the non-managed schema resource is renamed by appending {{.bak}}, e.g. {{schema.xml.bak}}. Once the upgrade has taken place, users can get the full schema from the {{/schema?wt=schema.xml}} REST API, and can use this as the basis for modifications which can then be used to manually downgrade back to non-managed schema: put the {{schema.xml}} in place, then add {{schema managed=false/}} to {{solrconfig.xml}} (or remove the whole {{schema/}} element, since {{managed=false}} is the default). If users take no action, then Solr behaves the same as always: the example {{solrconfig.xml}} will include {{schema managed=false ...}}. For a discussion of rationale for this feature, see [~hossman_luc...@fucit.org]'s post to the solr-user mailing list in the thread Dynamic schema design: feedback requested [http://markmail.org/message/76zj24dru2gkop7b]: {quote} Ignoring for a moment what format is used to persist schema information, I think it's important to have a conceptual distinction between data that is managed by applications and manipulated by a REST API, and config that is managed by the user and loaded by solr on init -- or via an explicit reload config REST API. Past experience with how users percieve(d) solr.xml has heavily reinforced this opinion: on one hand, it's a place users must specify some config information -- so people wnat to be able to keep it in version control with other config files. On the other hand it's a live data file that is rewritten by solr when cores are added. (God help you if you want do a rolling deploy a new version of solr.xml where you've edited some of the config values while simultenously clients are creating new SolrCores) As we move forward towards having REST APIs that treat schema information as data that can be manipulated, I anticipate the same types of confusion, missunderstanding, and grumblings if we try to use the same pattern of treating the existing schema.xml (or some new schema.json) as a hybrid configs data file. Edit it by hand if you want, the /schema/* REST API will too! ... Even assuming we don't make any of the same technical mistakes that have caused problems with solr.xml round tripping in hte past (ie: losing comments, reading new config options that we forget to write back out, etc...) i'm fairly certain there is still going to be a lot of things that will loook weird and confusing to people. (XML may bave been designed to be both human readable writable and machine readable writable, but practically speaking it's hard have a single XML file be machine and human readable writable) I think it would make a lot of sense -- not just in terms of implementation but also for end user clarity -- to have some simple, straightforward to understand caveats about maintaining schema information... 1) If you want to keep schema information in an authoritative config file that you can manually edit, then the /schema REST API will be read only. 2) If you wish to use the /schema REST API for read and write operations, then schema information will be persisted under the covers in a data store whose format is an implementation detail just like the index file format. 3) If you are using a schema config file and you wish to switch to using the /schema
[JENKINS-MAVEN] Lucene-Solr-Maven-4.x #296: POMs out of sync
Build: https://builds.apache.org/job/Lucene-Solr-Maven-4.x/296/ 1 tests failed. FAILED: org.apache.solr.cloud.ChaosMonkeyShardSplitTest.testDistribSearch Error Message: Wrong doc count on shard1_1 expected:49 but was:50 Stack Trace: java.lang.AssertionError: Wrong doc count on shard1_1 expected:49 but was:50 at __randomizedtesting.SeedInfo.seed([4DC904688FE1877E:CC2F8A70F8BEE742]:0) at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.apache.solr.cloud.ChaosMonkeyShardSplitTest.doTest(ChaosMonkeyShardSplitTest.java:274) Build Log: [...truncated 23442 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-4.x-Linux (64bit/jdk1.8.0-ea-b84) - Build # 5043 - Still Failing!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/5043/ Java: 64bit/jdk1.8.0-ea-b84 -XX:+UseConcMarkSweepGC 1 tests failed. FAILED: org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.testWithin {#8 seed=[9E472C74036F0BA9:AB033BAA3F573984]} Error Message: Didn't match org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest$ShapePair@52e14672 in Rect(minX=16.0,maxX=232.0,minY=-54.0,maxY=128.0) Expect: [3, 4] (of 1) Stack Trace: java.lang.AssertionError: Didn't match org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest$ShapePair@52e14672 in Rect(minX=16.0,maxX=232.0,minY=-54.0,maxY=128.0) Expect: [3, 4] (of 1) at __randomizedtesting.SeedInfo.seed([9E472C74036F0BA9:AB033BAA3F573984]:0) at org.junit.Assert.fail(Assert.java:93) at org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.doTest(SpatialOpRecursivePrefixTreeTest.java:186) at org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.testWithin(SpatialOpRecursivePrefixTreeTest.java:83) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:487) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at java.lang.Thread.run(Thread.java:722) Build Log: [...truncated 8097 lines...] [junit4:junit4]
[jira] [Commented] (LUCENE-3786) Create SearcherTaxoManager
[ https://issues.apache.org/jira/browse/LUCENE-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13627065#comment-13627065 ] Michael McCandless commented on LUCENE-3786: {quote} bq. that decRef could have closed the reader Hmm ... if we assume that this TR/IR pair is managed only by that manager, then an IOE thrown from decRef could only be caused by closing the reader, right? So if you successfully IR.decRef() but fail to TR.decRef(), it means that IR is closed already right? Therefore there's no point to even tryIncRef? {quote} You're right ... so I just left the two decRefs in the patch ... Create SearcherTaxoManager -- Key: LUCENE-3786 URL: https://issues.apache.org/jira/browse/LUCENE-3786 Project: Lucene - Core Issue Type: New Feature Components: modules/facet Reporter: Shai Erera Assignee: Michael McCandless Priority: Minor Fix For: 5.0, 4.3 Attachments: LUCENE-3786-3x-nocommit.patch, LUCENE-3786.patch, LUCENE-3786.patch If an application wants to use an IndexSearcher and TaxonomyReader in a SearcherManager-like fashion, it cannot use a separate SearcherManager, and say a TaxonomyReaderManager, because the IndexSearcher and TaxoReader instances need to be in sync. That is, the IS-TR pair must match, or otherwise the category ordinals that are encoded in the search index might not match the ones in the taxonomy index. This can happen if someone reopens the IndexSearcher's IndexReader, but does not refresh the TaxonomyReader, and the category ordinals that exist in the reopened IndexReader are not yet visible to the TaxonomyReader instance. I'd like to create a SearcherTaxoManager (which is a ReferenceManager) which manages an IndexSearcher and TaxonomyReader pair. Then an application will call: {code} SearcherTaxoPair pair = manager.acquire(); try { IndexSearcher searcher = pair.searcher; TaxonomyReader taxoReader = pair.taxoReader; // do something with them } finally { manager.release(pair); pair = null; } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4658) In preparation for dynamic schema modification via REST API, add a managed schema facility
[ https://issues.apache.org/jira/browse/SOLR-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13627087#comment-13627087 ] Steve Rowe commented on SOLR-4658: -- Hi yuanyun, Thanks for the bug report. The problem isn't that {{SHORTENABLE_PACKAGE_PATTERN}} includes factories under {{org.apache.lucene.analysis}} - most of the shared Lucene/Solr analysis factories live there now - but rather that users can use the same package for their own code, which is what you've done. The issue is serialization: as currently written, the user's class=whatever is lost, and the serialization code attempts to reconstitute it on output. I think the fix is to stop guessing what it should be, and just reuse the exact string supplied by the user in the original file when persisting the schema. I'll make a patch. In preparation for dynamic schema modification via REST API, add a managed schema facility Key: SOLR-4658 URL: https://issues.apache.org/jira/browse/SOLR-4658 Project: Solr Issue Type: Sub-task Components: Schema and Analysis Reporter: Steve Rowe Assignee: Steve Rowe Priority: Minor Fix For: 4.3, 5.0 Attachments: SOLR-4658.patch, SOLR-4658.patch The idea is to have a set of configuration items in {{solrconfig.xml}}: {code:xml} schema managed=true mutable=true managedSchemaResourceName=managed-schema/ {code} It will be a precondition for future dynamic schema modification APIs that {{mutable=true}}. {{solrconfig.xml}} parsing will fail if {{mutable=true}} but {{managed=false}}. When {{managed=true}}, and the resource named in {{managedSchemaResourceName}} doesn't exist, Solr will automatically upgrade the schema to managed: the non-managed schema resource (typically {{schema.xml}}) is parsed and then persisted at {{managedSchemaResourceName}} under {{$solrHome/$collectionOrCore/conf/}}, or on ZooKeeper at {{/configs/$configName/}}, and the non-managed schema resource is renamed by appending {{.bak}}, e.g. {{schema.xml.bak}}. Once the upgrade has taken place, users can get the full schema from the {{/schema?wt=schema.xml}} REST API, and can use this as the basis for modifications which can then be used to manually downgrade back to non-managed schema: put the {{schema.xml}} in place, then add {{schema managed=false/}} to {{solrconfig.xml}} (or remove the whole {{schema/}} element, since {{managed=false}} is the default). If users take no action, then Solr behaves the same as always: the example {{solrconfig.xml}} will include {{schema managed=false ...}}. For a discussion of rationale for this feature, see [~hossman_luc...@fucit.org]'s post to the solr-user mailing list in the thread Dynamic schema design: feedback requested [http://markmail.org/message/76zj24dru2gkop7b]: {quote} Ignoring for a moment what format is used to persist schema information, I think it's important to have a conceptual distinction between data that is managed by applications and manipulated by a REST API, and config that is managed by the user and loaded by solr on init -- or via an explicit reload config REST API. Past experience with how users percieve(d) solr.xml has heavily reinforced this opinion: on one hand, it's a place users must specify some config information -- so people wnat to be able to keep it in version control with other config files. On the other hand it's a live data file that is rewritten by solr when cores are added. (God help you if you want do a rolling deploy a new version of solr.xml where you've edited some of the config values while simultenously clients are creating new SolrCores) As we move forward towards having REST APIs that treat schema information as data that can be manipulated, I anticipate the same types of confusion, missunderstanding, and grumblings if we try to use the same pattern of treating the existing schema.xml (or some new schema.json) as a hybrid configs data file. Edit it by hand if you want, the /schema/* REST API will too! ... Even assuming we don't make any of the same technical mistakes that have caused problems with solr.xml round tripping in hte past (ie: losing comments, reading new config options that we forget to write back out, etc...) i'm fairly certain there is still going to be a lot of things that will loook weird and confusing to people. (XML may bave been designed to be both human readable writable and machine readable writable, but practically speaking it's hard have a single XML file be machine and human readable writable) I think it would make a lot of sense -- not just in terms of implementation but also for end user clarity -- to have some
[jira] [Commented] (SOLR-4686) HTMLStripCharFilter and Highlighter generates invalid HTML
[ https://issues.apache.org/jira/browse/SOLR-4686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13627153#comment-13627153 ] Steve Rowe commented on SOLR-4686: -- Hi Holger, I wrote the latest version of HTMLStripCharFilter, and the behavior you describe is expected, though obviously not good. The problem is that when a CharFilter replaces an input sequence with a differently-sized output sequence, it has to decide how to map the offsets back. All of the CharFilter's I've looked at map the end offsets for smaller output sequences to the end offset of the larger input sequence. I suppose a CharFilter could make different choices, though, as long as it did so consistently. HTMLStripCharFilter could change offset mappings for end tags to point at the offset of the *beginning* of the input sequence, while keeping offset mappings for start tags the same as they are now for all tags: at the offset of the *end* of the input sequence. {{axxx/a}} would then be highlit as {{aemxxx/em/a}}. But fixing this one issue won't solve the general problem. An example: if HTMLStripCharFilter were to change offset mappings for end tags as described above, {{bx/bixx/i}} would still result in {{bemx/bixx/em/i}}, which is problematic in a way that modifications to HTMLStripCharFilter can't fix. It's worth noting that HTMLTidy can fix up your example, but doesn't properly handle my example - I tested with the cmdline version on OS X. My surface reading of Highlighter and Formatter classes makes me think that there is no natural plugin point right now for an HTML-aware boundary insertion mechanism. I suspect that the low complaint volume to date is as a result of the lenient HTML parsing browsers do; even though the output HTML is invalid, it (usually?) looks okay anyway. HTMLStripCharFilter and Highlighter generates invalid HTML -- Key: SOLR-4686 URL: https://issues.apache.org/jira/browse/SOLR-4686 Project: Solr Issue Type: Bug Components: highlighter Affects Versions: 4.1 Reporter: Holger Floerke Labels: HTML, highlighter Using the HTMLStripCharFilter may yield to an invalid HTML highlight. The HTMLStripCharFilter has a special treatment of inline-elements (eg. a, b, ...). For theese elements the CharFilter ignores the tag and does not insert any split-character. If you index axxx/a you get the word xxx starting at position 3 ending on position 10(!) If you highlight a search on xxx, you will get aemxxx/a/em which is invalid HTML. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4738) Killed JVM when first commit was running will generate a corrupted index
[ https://issues.apache.org/jira/browse/LUCENE-4738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-4738: --- Attachment: LUCENE-4738.patch New patch with several things: * I folded in Rob's patch on LUCENE-2727, to have MockDirWrapper sometimes throw IOExc in openInput and createOutput to get better test coverage of out of file descriptors like situations * Added a new TestIndexWriterOutOfFileDescriptors * Changes DirReader.indexExists back to before LUCENE-2812; I think it's just too dangerous to try to be too smart about whether an index exists or not, so now the method returns true if it sees any segments file. (These smarts were causing failures in the new test, and caused LUCENE-4870). * Fixes IndexWriter so that if OpenMode is CREATE it will work even if a corrupt index is present. But if it's CREATE_OR_APPEND, or APPEND then a corrupt index will cause an exc so app must manually resolve. Killed JVM when first commit was running will generate a corrupted index Key: LUCENE-4738 URL: https://issues.apache.org/jira/browse/LUCENE-4738 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 4.0 Environment: OS: Linux 2.6.32-220.23.1.el6.x86_64 Java: java version 1.7.0_05 Lucene: lucene-core-4.0.0 Reporter: Billow Gao Attachments: LUCENE-4738.patch, LUCENE-4738.patch, LUCENE-4738_test.patch 1. Start a NEW IndexWriterBuilder on an empty folder, add some documents to the index 2. Call commit 3. When the segments_1 file with 0 byte was created, kill the JVM We will end with a corrupted index with an empty segments_1. We only have issue with the first commit crash. Also, if you tried to open an IndexSearcher on a new index. And the first commit on the index was not finished yet. Then you will see exception like: === org.apache.lucene.index.IndexNotFoundException: no segments* file found in org.apache.lucene.store.MMapDirectory@C:\tmp\testdir lockFactory=org.apache.lucene.store.NativeFSLockFactory@6ee00df: files: [write.lock, _0.fdt, _0.fdx] at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:741) at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:65) === So when a new index was created, we should first create an empty index. We should not wait for the commit/close call to create the segment file. If we had an empty index there. It won't leave a corrupted index when there were a power issue on the first commit. And a concurrent IndexSearcher can access to the index(No match is better than exception). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4738) Killed JVM when first commit was running will generate a corrupted index
[ https://issues.apache.org/jira/browse/LUCENE-4738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13627186#comment-13627186 ] Robert Muir commented on LUCENE-4738: - Patch looks great. I agree with the approach, its way too dangerous what we try to do today. I also like the additional testing we have here (e.g. random FNFE: since so many places treat them special). my only comment is loadFirstCommit confuses me (as a variable name). Is there something more intuitive? Killed JVM when first commit was running will generate a corrupted index Key: LUCENE-4738 URL: https://issues.apache.org/jira/browse/LUCENE-4738 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 4.0 Environment: OS: Linux 2.6.32-220.23.1.el6.x86_64 Java: java version 1.7.0_05 Lucene: lucene-core-4.0.0 Reporter: Billow Gao Attachments: LUCENE-4738.patch, LUCENE-4738.patch, LUCENE-4738_test.patch 1. Start a NEW IndexWriterBuilder on an empty folder, add some documents to the index 2. Call commit 3. When the segments_1 file with 0 byte was created, kill the JVM We will end with a corrupted index with an empty segments_1. We only have issue with the first commit crash. Also, if you tried to open an IndexSearcher on a new index. And the first commit on the index was not finished yet. Then you will see exception like: === org.apache.lucene.index.IndexNotFoundException: no segments* file found in org.apache.lucene.store.MMapDirectory@C:\tmp\testdir lockFactory=org.apache.lucene.store.NativeFSLockFactory@6ee00df: files: [write.lock, _0.fdt, _0.fdx] at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:741) at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:65) === So when a new index was created, we should first create an empty index. We should not wait for the commit/close call to create the segment file. If we had an empty index there. It won't leave a corrupted index when there were a power issue on the first commit. And a concurrent IndexSearcher can access to the index(No match is better than exception). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4738) Killed JVM when first commit was running will generate a corrupted index
[ https://issues.apache.org/jira/browse/LUCENE-4738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13627190#comment-13627190 ] Michael McCandless commented on LUCENE-4738: bq. Is there something more intuitive? Hmm maybe firstCommitExists? IW only sets this to false it if was unable to load the segments file in CREATE. Killed JVM when first commit was running will generate a corrupted index Key: LUCENE-4738 URL: https://issues.apache.org/jira/browse/LUCENE-4738 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 4.0 Environment: OS: Linux 2.6.32-220.23.1.el6.x86_64 Java: java version 1.7.0_05 Lucene: lucene-core-4.0.0 Reporter: Billow Gao Attachments: LUCENE-4738.patch, LUCENE-4738.patch, LUCENE-4738_test.patch 1. Start a NEW IndexWriterBuilder on an empty folder, add some documents to the index 2. Call commit 3. When the segments_1 file with 0 byte was created, kill the JVM We will end with a corrupted index with an empty segments_1. We only have issue with the first commit crash. Also, if you tried to open an IndexSearcher on a new index. And the first commit on the index was not finished yet. Then you will see exception like: === org.apache.lucene.index.IndexNotFoundException: no segments* file found in org.apache.lucene.store.MMapDirectory@C:\tmp\testdir lockFactory=org.apache.lucene.store.NativeFSLockFactory@6ee00df: files: [write.lock, _0.fdt, _0.fdx] at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:741) at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:65) === So when a new index was created, we should first create an empty index. We should not wait for the commit/close call to create the segment file. If we had an empty index there. It won't leave a corrupted index when there were a power issue on the first commit. And a concurrent IndexSearcher can access to the index(No match is better than exception). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4686) HTMLStripCharFilter and Highlighter generates invalid HTML
[ https://issues.apache.org/jira/browse/SOLR-4686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13627191#comment-13627191 ] Steve Rowe commented on SOLR-4686: -- I've read that the [Jericho HTML parser|http://jericho.htmlparser.net/docs/index.html], implemented in Java, reports tag offsets, unlike many other HTML parsers, and that could be useful in implementing the HTML-aware boundary insertion mechanism I mentioned earlier. HTMLStripCharFilter and Highlighter generates invalid HTML -- Key: SOLR-4686 URL: https://issues.apache.org/jira/browse/SOLR-4686 Project: Solr Issue Type: Bug Components: highlighter Affects Versions: 4.1 Reporter: Holger Floerke Labels: HTML, highlighter Using the HTMLStripCharFilter may yield to an invalid HTML highlight. The HTMLStripCharFilter has a special treatment of inline-elements (eg. a, b, ...). For theese elements the CharFilter ignores the tag and does not insert any split-character. If you index axxx/a you get the word xxx starting at position 3 ending on position 10(!) If you highlight a search on xxx, you will get aemxxx/a/em which is invalid HTML. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org