[jira] [Resolved] (SOLR-5017) Allow sharding based on the value of a field
[ https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul resolved SOLR-5017. -- Resolution: Fixed A parameter called 'routeField' is supported in both routers. If routeField is 'x' all documents inserted must have a value for the field 'x' . The semantics of querying will remain same \_route_ param can be used to limit down the search to a given shard (s) Allow sharding based on the value of a field Key: SOLR-5017 URL: https://issues.apache.org/jira/browse/SOLR-5017 Project: Solr Issue Type: Sub-task Reporter: Noble Paul Assignee: Noble Paul Fix For: 4.5, 5.0 Attachments: SOLR-5017.patch We should be able to create a collection where sharding is done based on the value of a given field collections can be created with shardField=fieldName, which will be persisted in DocCollection in ZK implicit DocRouter would look at this field instead of _shard_ field CompositeIdDocRouter can also use this field instead of looking at the id field. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5168) ByteSliceReader assert trips with 32-bit oracle 1.7.0_25 + G1GC
[ https://issues.apache.org/jira/browse/LUCENE-5168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737898#comment-13737898 ] Dawid Weiss commented on LUCENE-5168: - Very likely a compiler bug. It'd be best to run with tests.jvms=1, pass -XX:+PrintCompilation -XX:+PrintAssembly (requires hsdis) and capture two logs -- one for a failing run and one for a passing run. Then it's all about inspecting the assembly output via diff -- this would narrow down the scope of looking for the faulty jit optimization. Can't do it today but if anybody beats me to it I'm interested in what you can find out! :) ByteSliceReader assert trips with 32-bit oracle 1.7.0_25 + G1GC --- Key: LUCENE-5168 URL: https://issues.apache.org/jira/browse/LUCENE-5168 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir This assertion trips (sometimes from different tests), if you run the highlighting tests on branch_4x with r1512807. It reproduces about half the time, always only with 32bit + G1GC (other combinations do not seem to trip it, i didnt try looping or anything really though). {noformat} rmuir@beast:~/workspace/branch_4x$ svn up -r 1512807 rmuir@beast:~/workspace/branch_4x$ ant clean rmuir@beast:~/workspace/branch_4x$ rm -rf .caches #this is important, otherwise master seed does not work! rmuir@beast:~/workspace/branch_4x/lucene/highlighter$ ant test -Dtests.jvms=2 -Dtests.seed=EBBFA6F4E80A7365 -Dargs=-server -XX:+UseG1GC {noformat} Originally showed up like this: {noformat} Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/6874/ Java: 32bit/jdk1.7.0_25 -server -XX:+UseG1GC 1 tests failed. REGRESSION: org.apache.lucene.search.postingshighlight.TestPostingsHighlighter.testUserFailedToIndexOffsets Error Message: Stack Trace: java.lang.AssertionError at __randomizedtesting.SeedInfo.seed([EBBFA6F4E80A7365:1FBF811885F2D611]:0) at org.apache.lucene.index.ByteSliceReader.readByte(ByteSliceReader.java:73) at org.apache.lucene.store.DataInput.readVInt(DataInput.java:108) at org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:453) at org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85) at org.apache.lucene.index.TermsHash.flush(TermsHash.java:116) at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53) at org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:81) at org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:501) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3076) Solr(Cloud) should support block joins
[ https://issues.apache.org/jira/browse/SOLR-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737904#comment-13737904 ] Vadim Kirilchuk commented on SOLR-3076: --- Thank you [~yo...@apache.org]! We all waited this for a long time! Solr(Cloud) should support block joins -- Key: SOLR-3076 URL: https://issues.apache.org/jira/browse/SOLR-3076 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Fix For: 4.5, 5.0 Attachments: 27M-singlesegment-histogram.png, 27M-singlesegment.png, bjq-vs-filters-backward-disi.patch, bjq-vs-filters-illegal-state.patch, child-bjqparser.patch, dih-3076.patch, dih-config.xml, parent-bjq-qparser.patch, parent-bjq-qparser.patch, Screen Shot 2012-07-17 at 1.12.11 AM.png, SOLR-3076-childDocs.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-7036-childDocs-solr-fork-trunk-patched, solrconf-bjq-erschema-snippet.xml, solrconfig.xml.patch, tochild-bjq-filtered-search-fix.patch Lucene has the ability to do block joins, we should add it to Solr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-3076) Solr(Cloud) should support block joins
[ https://issues.apache.org/jira/browse/SOLR-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737904#comment-13737904 ] Vadim Kirilchuk edited comment on SOLR-3076 at 8/13/13 6:44 AM: Thank you [~yo...@apache.org]! We all waited this for a long time! Btw, as there are still many things we need to address (for example dih support): should we create subtasks for this jira or create another jira like Improving block joins support with new subtasks? wdyt? was (Author: vkirilchuk): Thank you [~yo...@apache.org]! We all waited this for a long time! Solr(Cloud) should support block joins -- Key: SOLR-3076 URL: https://issues.apache.org/jira/browse/SOLR-3076 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Fix For: 4.5, 5.0 Attachments: 27M-singlesegment-histogram.png, 27M-singlesegment.png, bjq-vs-filters-backward-disi.patch, bjq-vs-filters-illegal-state.patch, child-bjqparser.patch, dih-3076.patch, dih-config.xml, parent-bjq-qparser.patch, parent-bjq-qparser.patch, Screen Shot 2012-07-17 at 1.12.11 AM.png, SOLR-3076-childDocs.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-7036-childDocs-solr-fork-trunk-patched, solrconf-bjq-erschema-snippet.xml, solrconfig.xml.patch, tochild-bjq-filtered-search-fix.patch Lucene has the ability to do block joins, we should add it to Solr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-4.x-Linux (32bit/jdk1.7.0_25) - Build # 6874 - Failure!
I have seen some issues like that with a 3rd party collection library lately going away as users move away from G1GC. So I think if that is the case it might be G1GC that is the problem here! thanks for tracking it down to a reproducible state! simon On Mon, Aug 12, 2013 at 10:04 PM, Robert Muir rcm...@gmail.com wrote: I can now reproduce this, but in a crazy way. (I reproduced it twice, and first time it failed in FastVectorHighlighter, second time in PostingsHighlighter!) So i think this really looks like a JVM bug (i have not tried other possibilties or combinations, i will open an issue). REPRO #1: rmuir@beast:~/workspace/branch_4x$ svn up -r 1512807 rmuir@beast:~/workspace/branch_4x$ ant clean rmuir@beast:~/workspace/branch_4x$ rm -rf .caches #this is important, otherwise master seed does not work! rmuir@beast:~/workspace/branch_4x/lucene/highlighter$ ant test -Dtests.jvms=2 -Dtests.seed=EBBFA6F4E80A7365 -Dargs=-server -XX:+UseG1GC [junit4] Suite: org.apache.lucene.search.vectorhighlight.FastVectorHighlighterTest [junit4] 2 NOTE: reproduce with: ant test -Dtestcase=FastVectorHighlighterTest -Dtests.method=testCommonTermsQueryHighlightTest -Dtests.seed=EBBFA6F4E80A7365 -Dtests.slow=true -Dtests.locale=es_PA -Dtests.timezone=America/Indiana/Vincennes -Dtests.file.encoding=UTF-8 [junit4] FAILURE 0.02s J1 | FastVectorHighlighterTest.testCommonTermsQueryHighlightTest [junit4] Throwable #1: java.lang.AssertionError [junit4]at __randomizedtesting.SeedInfo.seed([EBBFA6F4E80A7365:D307BBE9A713DA33]:0) [junit4]at org.apache.lucene.index.ByteSliceReader.readByte(ByteSliceReader.java:73) [junit4]at org.apache.lucene.store.DataInput.readVInt(DataInput.java:108) [junit4]at org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:453) [junit4]at org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85) [junit4]at org.apache.lucene.index.TermsHash.flush(TermsHash.java:116) [junit4]at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53) [junit4]at org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:81) [junit4]at org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:501) [junit4]at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:478) [junit4]at org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:615) [junit4]at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:365) [junit4]at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:111) [junit4]at org.apache.lucene.search.vectorhighlight.FastVectorHighlighterTest.testCommonTermsQueryHighlightTest(FastVectorHighlighterTest.java:285) [junit4]at java.lang.Thread.run(Thread.java:724) [junit4] 2 NOTE: test params are: codec=Lucene3x, sim=DefaultSimilarity, locale=es_PA, timezone=America/Indiana/Vincennes [junit4] 2 NOTE: Linux 3.5.0-27-generic i386/Oracle Corporation 1.7.0_25 (32-bit)/cpus=8,threads=1,free=48330984,total=67108864 [junit4] 2 NOTE: All tests run in this JVM: [WeightedFragListBuilderTest, IndexTimeSynonymTest, ScoreOrderFragmentsBuilderTest, HighlighterTest, FieldPhraseListTest, HighlighterPhraseTest, SimpleFragListBuilderTest, FieldTermStackTest, SimpleFragmentsBuilderTest, FieldQueryTest, TestPostingsHighlighter, FastVectorHighlighterTest] [junit4] Completed on J1 in 0.16s, 6 tests, 1 failure FAILURES! REPRO #2: rmuir@beast:~/workspace/branch_4x$ rm -rf .caches #this is important, otherwise master seed does not work! rmuir@beast:~/workspace/branch_4x/lucene/highlighter$ ant test -Dtests.jvms=2 -Dtests.seed=EBBFA6F4E80A7365 -Dargs=-server -XX:+UseG1GC [junit4] Suite: org.apache.lucene.search.postingshighlight.TestPostingsHighlighter [junit4] 2 NOTE: reproduce with: ant test -Dtestcase=TestPostingsHighlighter -Dtests.method=testUserFailedToIndexOffsets -Dtests.seed=EBBFA6F4E80A7365 -Dtests.slow=true -Dtests.locale=lt_LT -Dtests.timezone=Europe/Isle_of_Man -Dtests.file.encoding=UTF-8 [junit4] FAILURE 0.02s J1 | TestPostingsHighlighter.testUserFailedToIndexOffsets [junit4] Throwable #1: java.lang.AssertionError [junit4]at __randomizedtesting.SeedInfo.seed([EBBFA6F4E80A7365:1FBF811885F2D611]:0) [junit4]at org.apache.lucene.index.ByteSliceReader.readByte(ByteSliceReader.java:73) [junit4]at org.apache.lucene.store.DataInput.readVInt(DataInput.java:108) [junit4]at org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:453) [junit4]at
[jira] [Created] (LUCENE-5169) UniDic 2.1.2 support
mygithubit created LUCENE-5169: -- Summary: UniDic 2.1.2 support Key: LUCENE-5169 URL: https://issues.apache.org/jira/browse/LUCENE-5169 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Reporter: mygithubit Priority: Minor I made some amendments to support UniDic 2.1.2 into kuromoji. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5169) UniDic 2.1.2 support
[ https://issues.apache.org/jira/browse/LUCENE-5169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mygithubit updated LUCENE-5169: --- Attachment: unidic.patch UniDic 2.1.2 support Key: LUCENE-5169 URL: https://issues.apache.org/jira/browse/LUCENE-5169 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Reporter: mygithubit Priority: Minor Attachments: unidic.patch I made some amendments to support UniDic 2.1.2 into kuromoji. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5169) UniDic 2.1.2 support for Japanese Tokenizer (Kuromoji)
[ https://issues.apache.org/jira/browse/LUCENE-5169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mygithubit updated LUCENE-5169: --- Summary: UniDic 2.1.2 support for Japanese Tokenizer (Kuromoji) (was: UniDic 2.1.2 support) UniDic 2.1.2 support for Japanese Tokenizer (Kuromoji) -- Key: LUCENE-5169 URL: https://issues.apache.org/jira/browse/LUCENE-5169 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Reporter: mygithubit Priority: Minor Attachments: unidic.patch I made some amendments to support UniDic 2.1.2 into kuromoji. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5169) UniDic 2.1.2 support for Japanese Tokenizer (Kuromoji)
[ https://issues.apache.org/jira/browse/LUCENE-5169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mygithubit updated LUCENE-5169: --- Description: I made some amendments to support UniDic 2.1.2 into kuromoji. The attached patch is against lucene_solr_4_4 branch. was:I made some amendments to support UniDic 2.1.2 into kuromoji. UniDic 2.1.2 support for Japanese Tokenizer (Kuromoji) -- Key: LUCENE-5169 URL: https://issues.apache.org/jira/browse/LUCENE-5169 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Reporter: mygithubit Priority: Minor Attachments: unidic.patch I made some amendments to support UniDic 2.1.2 into kuromoji. The attached patch is against lucene_solr_4_4 branch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5166) PostingsHighlighter fails with IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/LUCENE-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13737983#comment-13737983 ] Manuel Amoabeng commented on LUCENE-5166: - Thank you for the quick help! PostingsHighlighter fails with IndexOutOfBoundsException Key: LUCENE-5166 URL: https://issues.apache.org/jira/browse/LUCENE-5166 Project: Lucene - Core Issue Type: Bug Components: modules/highlighter Affects Versions: 4.4 Reporter: Manuel Amoabeng Fix For: 5.0, 4.5 Attachments: LUCENE-5166-2.patch, LUCENE-5166.patch, LUCENE-5166.patch, LUCENE-5166.patch, LUCENE-5166.patch, LUCENE-5166.patch, LUCENE-5166.patch Given a document with a match at a startIndex PostingsHighlighter.maxlength and an endIndex PostingsHighlighter.maxLength, DefaultPassageFormatter will throw an IndexOutOfBoundsException when DefaultPassageFormatter.append() is invoked. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5084) new field type - EnumField
[ https://issues.apache.org/jira/browse/SOLR-5084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738031#comment-13738031 ] Elran Dvir commented on SOLR-5084: -- The patch is finally attached. I'll attach a patch with unit tests ASAP. new field type - EnumField -- Key: SOLR-5084 URL: https://issues.apache.org/jira/browse/SOLR-5084 Project: Solr Issue Type: New Feature Reporter: Elran Dvir Attachments: enumsConfig.xml, schema_example.xml, Solr-5084.patch, Solr-5084.patch, Solr-5084.patch We have encountered a use case in our system where we have a few fields (Severity. Risk etc) with a closed set of values, where the sort order for these values is pre-determined but not lexicographic (Critical is higher than High). Generically this is very close to how enums work. To implement, I have prototyped a new type of field: EnumField where the inputs are a closed predefined set of strings in a special configuration file (similar to currency.xml). The code is based on 4.2.1. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5084) new field type - EnumField
[ https://issues.apache.org/jira/browse/SOLR-5084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elran Dvir updated SOLR-5084: - Attachment: Solr-5084.patch new field type - EnumField -- Key: SOLR-5084 URL: https://issues.apache.org/jira/browse/SOLR-5084 Project: Solr Issue Type: New Feature Reporter: Elran Dvir Attachments: enumsConfig.xml, schema_example.xml, Solr-5084.patch, Solr-5084.patch, Solr-5084.patch We have encountered a use case in our system where we have a few fields (Severity. Risk etc) with a closed set of values, where the sort order for these values is pre-determined but not lexicographic (Critical is higher than High). Generically this is very close to how enums work. To implement, I have prototyped a new type of field: EnumField where the inputs are a closed predefined set of strings in a special configuration file (similar to currency.xml). The code is based on 4.2.1. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5170) Add getter for reuse strategy to Analyzer
[ https://issues.apache.org/jira/browse/LUCENE-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-5170: -- Fix Version/s: 4.5 5.0 Assignee: Uwe Schindler Add getter for reuse strategy to Analyzer - Key: LUCENE-5170 URL: https://issues.apache.org/jira/browse/LUCENE-5170 Project: Lucene - Core Issue Type: Bug Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 5.0, 4.5 If you write an Analyzer that wraps another one (but without using AnalyzerWrapper) you may need use the same reuse strategy in your wrapper. This is not possible as there is no way to get the reuse startegy (private field and no getter). An example is ES's NamedAnalyzer, see my comment: [https://github.com/elasticsearch/elasticsearch/commit/b9a2fbd8741aa1b9beffb7d2922fc9b4525397e4#src/main/java/org/elasticsearch/index/analysis/NamedAnalyzer.java] This would add a getter, just a 3-liner. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5170) Add getter for reuse strategy to Analyzer
Uwe Schindler created LUCENE-5170: - Summary: Add getter for reuse strategy to Analyzer Key: LUCENE-5170 URL: https://issues.apache.org/jira/browse/LUCENE-5170 Project: Lucene - Core Issue Type: Bug Reporter: Uwe Schindler If you write an Analyzer that wraps another one (but without using AnalyzerWrapper) you may need use the same reuse strategy in your wrapper. This is not possible as there is no way to get the reuse startegy (private field and no getter). An example is ES's NamedAnalyzer, see my comment: [https://github.com/elasticsearch/elasticsearch/commit/b9a2fbd8741aa1b9beffb7d2922fc9b4525397e4#src/main/java/org/elasticsearch/index/analysis/NamedAnalyzer.java] This would add a getter, just a 3-liner. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5170) Add getter for reuse strategy to Analyzer
[ https://issues.apache.org/jira/browse/LUCENE-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-5170: -- Attachment: LUCENE-5170.patch Patch. Maybe we should rethink AnalyzerWrapper, too. So it would use the strategy of the wrapped Analyzer, too, unless you have something field-specific? In that case your would pass explicit reuse strategy in the ctor, but the default is the one of the inner analyzer. Add getter for reuse strategy to Analyzer - Key: LUCENE-5170 URL: https://issues.apache.org/jira/browse/LUCENE-5170 Project: Lucene - Core Issue Type: Bug Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 5.0, 4.5 Attachments: LUCENE-5170.patch If you write an Analyzer that wraps another one (but without using AnalyzerWrapper) you may need use the same reuse strategy in your wrapper. This is not possible as there is no way to get the reuse startegy (private field and no getter). An example is ES's NamedAnalyzer, see my comment: [https://github.com/elasticsearch/elasticsearch/commit/b9a2fbd8741aa1b9beffb7d2922fc9b4525397e4#src/main/java/org/elasticsearch/index/analysis/NamedAnalyzer.java] This would add a getter, just a 3-liner. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5170) Add getter for reuse strategy to Analyzer
[ https://issues.apache.org/jira/browse/LUCENE-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738077#comment-13738077 ] Simon Willnauer commented on LUCENE-5170: - +1 Add getter for reuse strategy to Analyzer - Key: LUCENE-5170 URL: https://issues.apache.org/jira/browse/LUCENE-5170 Project: Lucene - Core Issue Type: Bug Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 5.0, 4.5 Attachments: LUCENE-5170.patch If you write an Analyzer that wraps another one (but without using AnalyzerWrapper) you may need use the same reuse strategy in your wrapper. This is not possible as there is no way to get the reuse startegy (private field and no getter). An example is ES's NamedAnalyzer, see my comment: [https://github.com/elasticsearch/elasticsearch/commit/b9a2fbd8741aa1b9beffb7d2922fc9b4525397e4#src/main/java/org/elasticsearch/index/analysis/NamedAnalyzer.java] This would add a getter, just a 3-liner. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5057) queryResultCache should not related with the order of fq's list
[ https://issues.apache.org/jira/browse/SOLR-5057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738081#comment-13738081 ] Erick Erickson commented on SOLR-5057: -- I'll give this another go-over in the next day or two. queryResultCache should not related with the order of fq's list --- Key: SOLR-5057 URL: https://issues.apache.org/jira/browse/SOLR-5057 Project: Solr Issue Type: Improvement Components: search Affects Versions: 4.0, 4.1, 4.2, 4.3 Reporter: Feihong Huang Assignee: Erick Erickson Priority: Minor Attachments: SOLR-5057.patch, SOLR-5057.patch, SOLR-5057.patch Original Estimate: 48h Remaining Estimate: 48h There are two case query with the same meaning below. But the case2 can't use the queryResultCache when case1 is executed. case1: q=*:*fq=field1:value1fq=field2:value2 case2: q=*:*fq=field2:value2fq=field1:value1 I think queryResultCache should not be related with the order of fq's list. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5170) Add getter for reuse strategy to Analyzer
[ https://issues.apache.org/jira/browse/LUCENE-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738089#comment-13738089 ] Robert Muir commented on LUCENE-5170: - +1. I agree we should rethink AnalyzerWrapper too. my preference: just make it a mandatory arg to the protected ctor of this class. Add getter for reuse strategy to Analyzer - Key: LUCENE-5170 URL: https://issues.apache.org/jira/browse/LUCENE-5170 Project: Lucene - Core Issue Type: Bug Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 5.0, 4.5 Attachments: LUCENE-5170.patch If you write an Analyzer that wraps another one (but without using AnalyzerWrapper) you may need use the same reuse strategy in your wrapper. This is not possible as there is no way to get the reuse startegy (private field and no getter). An example is ES's NamedAnalyzer, see my comment: [https://github.com/elasticsearch/elasticsearch/commit/b9a2fbd8741aa1b9beffb7d2922fc9b4525397e4#src/main/java/org/elasticsearch/index/analysis/NamedAnalyzer.java] This would add a getter, just a 3-liner. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5084) new field type - EnumField
[ https://issues.apache.org/jira/browse/SOLR-5084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738092#comment-13738092 ] Erick Erickson commented on SOLR-5084: -- @Elran bq: Why do you say the assumption is the type is restricted to single value?... Parts of the discussion mentioned sorting, which is undefined on multivalued fields. If sorting is _required_ for an enum-type field then it shouldn't be mutliValued. There's no reason it _needs_ to be restricted to single values, it's fine for the enum type to be just like any other field; it's up to the user to only put one value in the field if it's to be used to sorting. Mostly getting it straight in my head what the characteristics are, not saying it _should_ be single-valued-only... Erick new field type - EnumField -- Key: SOLR-5084 URL: https://issues.apache.org/jira/browse/SOLR-5084 Project: Solr Issue Type: New Feature Reporter: Elran Dvir Attachments: enumsConfig.xml, schema_example.xml, Solr-5084.patch, Solr-5084.patch, Solr-5084.patch We have encountered a use case in our system where we have a few fields (Severity. Risk etc) with a closed set of values, where the sort order for these values is pre-determined but not lexicographic (Critical is higher than High). Generically this is very close to how enums work. To implement, I have prototyped a new type of field: EnumField where the inputs are a closed predefined set of strings in a special configuration file (similar to currency.xml). The code is based on 4.2.1. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5084) new field type - EnumField
[ https://issues.apache.org/jira/browse/SOLR-5084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738096#comment-13738096 ] Robert Muir commented on SOLR-5084: --- Wait: i said sort order (not sorting). So to me the multivalued case of an enum field makes total sense (it is kinda like java's EnumSet). And the sort order defines what is used in faceting, range queries, and so on. new field type - EnumField -- Key: SOLR-5084 URL: https://issues.apache.org/jira/browse/SOLR-5084 Project: Solr Issue Type: New Feature Reporter: Elran Dvir Attachments: enumsConfig.xml, schema_example.xml, Solr-5084.patch, Solr-5084.patch, Solr-5084.patch We have encountered a use case in our system where we have a few fields (Severity. Risk etc) with a closed set of values, where the sort order for these values is pre-determined but not lexicographic (Critical is higher than High). Generically this is very close to how enums work. To implement, I have prototyped a new type of field: EnumField where the inputs are a closed predefined set of strings in a special configuration file (similar to currency.xml). The code is based on 4.2.1. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4906) PostingsHighlighter's PassageFormatter should allow for rendering to arbitrary objects
[ https://issues.apache.org/jira/browse/LUCENE-4906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738097#comment-13738097 ] Michael McCandless commented on LUCENE-4906: I think the challenge here is that these are not just advanced uses; they are super expert uses, and I don't feel like that justifies the added cost of generics for normal users. There are definitely times when generics make sense but I don't think this case applies ... I agree the Object approach is rather old fashioned ... but it should still work for these super expert cases. So, it's not ideal, but it's a step forward at least (progress not perfection) ... I'd like to commit the Object approach so we move forward. If future use cases emerge that makes the generics use-case more common we can always re-visit this (this API is experimental; we are free to change it), so none of this is set in stone ... PostingsHighlighter's PassageFormatter should allow for rendering to arbitrary objects -- Key: LUCENE-4906 URL: https://issues.apache.org/jira/browse/LUCENE-4906 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Attachments: LUCENE-4906.patch, LUCENE-4906.patch For example, in a server, I may want to render the highlight result to JsonObject to send back to the front-end. Today since we render to string, I have to render to JSON string and then re-parse to JsonObject, which is inefficient... Or, if (Rob's idea:) we make a query that's like MoreLikeThis but it pulls terms from snippets instead, so you get proximity-influenced salient/expanded terms, then perhaps that renders to just an array of tokens or fragments or something from each snippet. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5084) new field type - EnumField
[ https://issues.apache.org/jira/browse/SOLR-5084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738098#comment-13738098 ] Erick Erickson commented on SOLR-5084: -- Ahhh, OK. Then Hoss says sorting, so no wonder I'm confused! There's no reason one couldn't sort by a field of this type, right? Frankly though, it seems kind of low-utility since there are probably only going to be a few values in the common use-case, but I'd guess it's still a possibility... new field type - EnumField -- Key: SOLR-5084 URL: https://issues.apache.org/jira/browse/SOLR-5084 Project: Solr Issue Type: New Feature Reporter: Elran Dvir Attachments: enumsConfig.xml, schema_example.xml, Solr-5084.patch, Solr-5084.patch, Solr-5084.patch We have encountered a use case in our system where we have a few fields (Severity. Risk etc) with a closed set of values, where the sort order for these values is pre-determined but not lexicographic (Critical is higher than High). Generically this is very close to how enums work. To implement, I have prototyped a new type of field: EnumField where the inputs are a closed predefined set of strings in a special configuration file (similar to currency.xml). The code is based on 4.2.1. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4583) StraightBytesDocValuesField fails if bytes 32k
[ https://issues.apache.org/jira/browse/LUCENE-4583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-4583: --- Attachment: LUCENE-4583.patch bq. I'm confused by the following comment: I fixed the comment; it's because those DVFormats use PagedBytes.fillSlice, which cannot handle more than 2 pages. New patch w/ that fix ... StraightBytesDocValuesField fails if bytes 32k Key: LUCENE-4583 URL: https://issues.apache.org/jira/browse/LUCENE-4583 Project: Lucene - Core Issue Type: Bug Components: core/index Affects Versions: 4.0, 4.1, 5.0 Reporter: David Smiley Assignee: Michael McCandless Priority: Critical Fix For: 5.0, 4.5 Attachments: LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch, LUCENE-4583.patch I didn't observe any limitations on the size of a bytes based DocValues field value in the docs. It appears that the limit is 32k, although I didn't get any friendly error telling me that was the limit. 32k is kind of small IMO; I suspect this limit is unintended and as such is a bug.The following test fails: {code:java} public void testBigDocValue() throws IOException { Directory dir = newDirectory(); IndexWriter writer = new IndexWriter(dir, writerConfig(false)); Document doc = new Document(); BytesRef bytes = new BytesRef((4+4)*4097);//4096 works bytes.length = bytes.bytes.length;//byte data doesn't matter doc.add(new StraightBytesDocValuesField(dvField, bytes)); writer.addDocument(doc); writer.commit(); writer.close(); DirectoryReader reader = DirectoryReader.open(dir); DocValues docValues = MultiDocValues.getDocValues(reader, dvField); //FAILS IF BYTES IS BIG! docValues.getSource().getBytes(0, bytes); reader.close(); dir.close(); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary
[ https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738105#comment-13738105 ] Han Jiang commented on LUCENE-3069: --- Hi, currently, we have problem when migrating the codes to trunk: The API refactoring on PostingsReader/WriterBase now splits term metadata into two parts: monotonic long[] and generical byte[], the former is known by term dictionary for better d-gap encoding. So we need a 'longsSize' in field summary, to tell reader the fixed length of this monotonic long[]. However, this API change actually breaks backward compability: the old 4.x indices didn't support this, and for some codec like Lucene40, since their writer part are already deprecated, their tests won't pass. It seems like we can put all the metadata in generic byte[] and let PBF do its own buffering (like we do in old API: nextTerm() ), however we'll have to add logics for this, in every PBF then. So... can we solve this problem more elegantly? Lucene should have an entirely memory resident term dictionary -- Key: LUCENE-3069 URL: https://issues.apache.org/jira/browse/LUCENE-3069 Project: Lucene - Core Issue Type: Improvement Components: core/index, core/search Affects Versions: 4.0-ALPHA Reporter: Simon Willnauer Assignee: Han Jiang Labels: gsoc2013 Fix For: 5.0, 4.5 Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch FST based TermDictionary has been a great improvement yet it still uses a delta codec file for scanning to terms. Some environments have enough memory available to keep the entire FST based term dict in memory. We should add a TermDictionary implementation that encodes all needed information for each term into the FST (custom fst.Output) and builds a FST from the entire term not just the delta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5084) new field type - EnumField
[ https://issues.apache.org/jira/browse/SOLR-5084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738106#comment-13738106 ] Robert Muir commented on SOLR-5084: --- I think sorting is a major use case. With some of these previous examples like risk or issue tracker status, you want to sort by the field and for 'high' risk to sort after 'low', maybe 'closed' after 'created' and so on. new field type - EnumField -- Key: SOLR-5084 URL: https://issues.apache.org/jira/browse/SOLR-5084 Project: Solr Issue Type: New Feature Reporter: Elran Dvir Attachments: enumsConfig.xml, schema_example.xml, Solr-5084.patch, Solr-5084.patch, Solr-5084.patch We have encountered a use case in our system where we have a few fields (Severity. Risk etc) with a closed set of values, where the sort order for these values is pre-determined but not lexicographic (Critical is higher than High). Generically this is very close to how enums work. To implement, I have prototyped a new type of field: EnumField where the inputs are a closed predefined set of strings in a special configuration file (similar to currency.xml). The code is based on 4.2.1. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5170) Add getter for reuse strategy to Analyzer
[ https://issues.apache.org/jira/browse/LUCENE-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738113#comment-13738113 ] Uwe Schindler commented on LUCENE-5170: --- Robert: After reviewing the code: The fixed-nonchangeable default in AnalyzerWrapper is PerField, which is a large overhead and should only be used in stuff like PerFieldAnalyzerWrapper (this class should call super(PerField) in its own ctor). But for other use cases of AnalyzerWrapper I have to use global strategy or the one of a wrapped analyzer). It looks like the current impl in AnalyzerWrapper is somehow assuming you want to wrap per field. I would suggest to make it mandatory in Lucene trunk, and add the missing ctor in Lucene 4.x, too. The default one should be deprecated with a hint that it might be a bad idea to use this default. My use case is: I have lots of predefined Analyzers for several languages or functionality in my search application. I have some additional AnalyzerWrappers around that simply turn any other analyzer into a phonetic one or ASCIIFolding one (so I can use that with another field). So, my wrapper just takes one of these per-language Analyzers and wraps with another additional TokenFilter. As the underlying Analyzer is global reuse, I need to make the wrapper global, too - currently impossible. Per field is a waste of resources in this case. So I would suggest to make the base class AnalyzerWrapper copy the ctor of the superclass Analyzer and deprecate the default ctor in 4.x. For my above example (to wrap another analyzer), I still need the resuse strategy of the inner analyzer, so I need set getter on Analyzer.java, too (see current patch). Add getter for reuse strategy to Analyzer - Key: LUCENE-5170 URL: https://issues.apache.org/jira/browse/LUCENE-5170 Project: Lucene - Core Issue Type: Bug Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 5.0, 4.5 Attachments: LUCENE-5170.patch If you write an Analyzer that wraps another one (but without using AnalyzerWrapper) you may need use the same reuse strategy in your wrapper. This is not possible as there is no way to get the reuse startegy (private field and no getter). An example is ES's NamedAnalyzer, see my comment: [https://github.com/elasticsearch/elasticsearch/commit/b9a2fbd8741aa1b9beffb7d2922fc9b4525397e4#src/main/java/org/elasticsearch/index/analysis/NamedAnalyzer.java] This would add a getter, just a 3-liner. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5170) Add getter for reuse strategy to Analyzer, make AnalyzerWrapper's reuse strategy configureable
[ https://issues.apache.org/jira/browse/LUCENE-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-5170: -- Component/s: modules/analysis core/other Affects Version/s: 4.4 Add getter for reuse strategy to Analyzer, make AnalyzerWrapper's reuse strategy configureable -- Key: LUCENE-5170 URL: https://issues.apache.org/jira/browse/LUCENE-5170 Project: Lucene - Core Issue Type: Bug Components: core/other, modules/analysis Affects Versions: 4.4 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 5.0, 4.5 Attachments: LUCENE-5170.patch If you write an Analyzer that wraps another one (but without using AnalyzerWrapper) you may need use the same reuse strategy in your wrapper. This is not possible as there is no way to get the reuse startegy (private field and no getter). An example is ES's NamedAnalyzer, see my comment: [https://github.com/elasticsearch/elasticsearch/commit/b9a2fbd8741aa1b9beffb7d2922fc9b4525397e4#src/main/java/org/elasticsearch/index/analysis/NamedAnalyzer.java] This would add a getter, just a 3-liner. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5170) Add getter for reuse strategy to Analyzer, make AnalyzerWrapper's reuse strategy configureable
[ https://issues.apache.org/jira/browse/LUCENE-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-5170: -- Summary: Add getter for reuse strategy to Analyzer, make AnalyzerWrapper's reuse strategy configureable (was: Add getter for reuse strategy to Analyzer) Add getter for reuse strategy to Analyzer, make AnalyzerWrapper's reuse strategy configureable -- Key: LUCENE-5170 URL: https://issues.apache.org/jira/browse/LUCENE-5170 Project: Lucene - Core Issue Type: Bug Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 5.0, 4.5 Attachments: LUCENE-5170.patch If you write an Analyzer that wraps another one (but without using AnalyzerWrapper) you may need use the same reuse strategy in your wrapper. This is not possible as there is no way to get the reuse startegy (private field and no getter). An example is ES's NamedAnalyzer, see my comment: [https://github.com/elasticsearch/elasticsearch/commit/b9a2fbd8741aa1b9beffb7d2922fc9b4525397e4#src/main/java/org/elasticsearch/index/analysis/NamedAnalyzer.java] This would add a getter, just a 3-liner. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-5170) Add getter for reuse strategy to Analyzer
[ https://issues.apache.org/jira/browse/LUCENE-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738113#comment-13738113 ] Uwe Schindler edited comment on LUCENE-5170 at 8/13/13 11:58 AM: - Robert: After reviewing the code: The fixed-nonchangeable default in AnalyzerWrapper is PerField, which is a large overhead and should only be used in stuff like PerFieldAnalyzerWrapper (this class should call super(PerField) in its own ctor). But for other use cases of AnalyzerWrapper I have to use global strategy or the one of a wrapped analyzer). It looks like the current impl in AnalyzerWrapper is somehow assuming you want to wrap per field. I would suggest to make it mandatory in Lucene trunk, and add the missing ctor in Lucene 4.x, too. The default one should be deprecated with a hint that it might be a bad idea to use this default. My use case is: I have lots of predefined Analyzers for several languages or functionality in my search application. I have some additional AnalyzerWrappers around that simply turn any other analyzer into a phonetic one or ASCIIFolding one (so I can use that with another field). So, my wrapper just takes one of these per-language Analyzers and wraps with another additional TokenFilter. As the underlying Analyzer is global reuse, I need to make the wrapper global, too - currently impossible. Per field is a waste of resources in this case. Only PerFieldAnalyzerWrapper should use PerField strategy hardcoded (as it is per field), the base class not! So I would suggest to make the base class AnalyzerWrapper copy the ctor of the superclass Analyzer and deprecate the default ctor in 4.x. For my above example (to wrap another analyzer), I still need the resuse strategy of the inner analyzer, so I need set getter on Analyzer.java, too (see current patch). was (Author: thetaphi): Robert: After reviewing the code: The fixed-nonchangeable default in AnalyzerWrapper is PerField, which is a large overhead and should only be used in stuff like PerFieldAnalyzerWrapper (this class should call super(PerField) in its own ctor). But for other use cases of AnalyzerWrapper I have to use global strategy or the one of a wrapped analyzer). It looks like the current impl in AnalyzerWrapper is somehow assuming you want to wrap per field. I would suggest to make it mandatory in Lucene trunk, and add the missing ctor in Lucene 4.x, too. The default one should be deprecated with a hint that it might be a bad idea to use this default. My use case is: I have lots of predefined Analyzers for several languages or functionality in my search application. I have some additional AnalyzerWrappers around that simply turn any other analyzer into a phonetic one or ASCIIFolding one (so I can use that with another field). So, my wrapper just takes one of these per-language Analyzers and wraps with another additional TokenFilter. As the underlying Analyzer is global reuse, I need to make the wrapper global, too - currently impossible. Per field is a waste of resources in this case. So I would suggest to make the base class AnalyzerWrapper copy the ctor of the superclass Analyzer and deprecate the default ctor in 4.x. For my above example (to wrap another analyzer), I still need the resuse strategy of the inner analyzer, so I need set getter on Analyzer.java, too (see current patch). Add getter for reuse strategy to Analyzer - Key: LUCENE-5170 URL: https://issues.apache.org/jira/browse/LUCENE-5170 Project: Lucene - Core Issue Type: Bug Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 5.0, 4.5 Attachments: LUCENE-5170.patch If you write an Analyzer that wraps another one (but without using AnalyzerWrapper) you may need use the same reuse strategy in your wrapper. This is not possible as there is no way to get the reuse startegy (private field and no getter). An example is ES's NamedAnalyzer, see my comment: [https://github.com/elasticsearch/elasticsearch/commit/b9a2fbd8741aa1b9beffb7d2922fc9b4525397e4#src/main/java/org/elasticsearch/index/analysis/NamedAnalyzer.java] This would add a getter, just a 3-liner. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5170) Add getter for reuse strategy to Analyzer, make AnalyzerWrapper's reuse strategy configureable
[ https://issues.apache.org/jira/browse/LUCENE-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738120#comment-13738120 ] Robert Muir commented on LUCENE-5170: - {quote} I would suggest to make it mandatory in Lucene trunk, and add the missing ctor in Lucene 4.x, too. The default one should be deprecated with a hint that it might be a bad idea to use this default. {quote} Yes, this is exactly what i think we should do. i really should be a mandatory parameter today (but cannot really work without also having the getter available!) Add getter for reuse strategy to Analyzer, make AnalyzerWrapper's reuse strategy configureable -- Key: LUCENE-5170 URL: https://issues.apache.org/jira/browse/LUCENE-5170 Project: Lucene - Core Issue Type: Bug Components: core/other, modules/analysis Affects Versions: 4.4 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 5.0, 4.5 Attachments: LUCENE-5170.patch If you write an Analyzer that wraps another one (but without using AnalyzerWrapper) you may need use the same reuse strategy in your wrapper. This is not possible as there is no way to get the reuse startegy (private field and no getter). An example is ES's NamedAnalyzer, see my comment: [https://github.com/elasticsearch/elasticsearch/commit/b9a2fbd8741aa1b9beffb7d2922fc9b4525397e4#src/main/java/org/elasticsearch/index/analysis/NamedAnalyzer.java] This would add a getter, just a 3-liner. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5170) Add getter for reuse strategy to Analyzer, make AnalyzerWrapper's reuse strategy configureable
[ https://issues.apache.org/jira/browse/LUCENE-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738132#comment-13738132 ] Uwe Schindler commented on LUCENE-5170: --- There is a major problem: *Strategy is no strategy at all, it holds state!* So my idea to make the getter available is wrong, because it would make the private state of the analyzer public to the outside! So this is a misdesign in the API. The correct way to do this would be: Make the strategy a ENUM like class (no state). The ThreadLocal should not be sitting on the strategy, the strategy should only implement the strategy, not also take care of storing the data in the ThreadLocal. I have no idea how to fix this - it looks like we need to backwards break to fix this! Add getter for reuse strategy to Analyzer, make AnalyzerWrapper's reuse strategy configureable -- Key: LUCENE-5170 URL: https://issues.apache.org/jira/browse/LUCENE-5170 Project: Lucene - Core Issue Type: Bug Components: core/other, modules/analysis Affects Versions: 4.4 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 5.0, 4.5 Attachments: LUCENE-5170.patch If you write an Analyzer that wraps another one (but without using AnalyzerWrapper) you may need use the same reuse strategy in your wrapper. This is not possible as there is no way to get the reuse startegy (private field and no getter). An example is ES's NamedAnalyzer, see my comment: [https://github.com/elasticsearch/elasticsearch/commit/b9a2fbd8741aa1b9beffb7d2922fc9b4525397e4#src/main/java/org/elasticsearch/index/analysis/NamedAnalyzer.java] This would add a getter, just a 3-liner. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5170) Add getter for reuse strategy to Analyzer, make AnalyzerWrapper's reuse strategy configureable
[ https://issues.apache.org/jira/browse/LUCENE-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738143#comment-13738143 ] Uwe Schindler commented on LUCENE-5170: --- The strategy pattern is defined like this, no state involved: http://en.wikipedia.org/wiki/Strategy_pattern Add getter for reuse strategy to Analyzer, make AnalyzerWrapper's reuse strategy configureable -- Key: LUCENE-5170 URL: https://issues.apache.org/jira/browse/LUCENE-5170 Project: Lucene - Core Issue Type: Bug Components: core/other, modules/analysis Affects Versions: 4.4 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 5.0, 4.5 Attachments: LUCENE-5170.patch If you write an Analyzer that wraps another one (but without using AnalyzerWrapper) you may need use the same reuse strategy in your wrapper. This is not possible as there is no way to get the reuse startegy (private field and no getter). An example is ES's NamedAnalyzer, see my comment: [https://github.com/elasticsearch/elasticsearch/commit/b9a2fbd8741aa1b9beffb7d2922fc9b4525397e4#src/main/java/org/elasticsearch/index/analysis/NamedAnalyzer.java] This would add a getter, just a 3-liner. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-5170) Add getter for reuse strategy to Analyzer, make AnalyzerWrapper's reuse strategy configureable
[ https://issues.apache.org/jira/browse/LUCENE-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738143#comment-13738143 ] Uwe Schindler edited comment on LUCENE-5170 at 8/13/13 12:39 PM: - The definition of the strategy pattern can be found here, no state involved: http://en.wikipedia.org/wiki/Strategy_pattern was (Author: thetaphi): The strategy pattern is defined like this, no state involved: http://en.wikipedia.org/wiki/Strategy_pattern Add getter for reuse strategy to Analyzer, make AnalyzerWrapper's reuse strategy configureable -- Key: LUCENE-5170 URL: https://issues.apache.org/jira/browse/LUCENE-5170 Project: Lucene - Core Issue Type: Bug Components: core/other, modules/analysis Affects Versions: 4.4 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 5.0, 4.5 Attachments: LUCENE-5170.patch If you write an Analyzer that wraps another one (but without using AnalyzerWrapper) you may need use the same reuse strategy in your wrapper. This is not possible as there is no way to get the reuse startegy (private field and no getter). An example is ES's NamedAnalyzer, see my comment: [https://github.com/elasticsearch/elasticsearch/commit/b9a2fbd8741aa1b9beffb7d2922fc9b4525397e4#src/main/java/org/elasticsearch/index/analysis/NamedAnalyzer.java] This would add a getter, just a 3-liner. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5171) AnalyzingSuggester and FuzzySuggester should be able to share same FST
Anna Björk Nikulásdóttir created LUCENE-5171: Summary: AnalyzingSuggester and FuzzySuggester should be able to share same FST Key: LUCENE-5171 URL: https://issues.apache.org/jira/browse/LUCENE-5171 Project: Lucene - Core Issue Type: Improvement Components: modules/other Affects Versions: 4.3.1, 4.4 Reporter: Anna Björk Nikulásdóttir In my code I use both suggesters for the same FST. I use AnalyzerSuggester#store() to create the FST and later on AnalyzingSuggester#load() and FuzzySuggester#load() to use it. This approach works very well but it unnecessarily creates 2 fst instances resulting in 2x memory consumption. It seems that for the time being both suggesters use the same FST format. The following trivial method in AnalyzingSuggester provides the possibility to share the same FST among different instances of AnalyzingSuggester. It has been tested in the above scenario: public boolean shareFstFrom(AnalyzingSuggester instance) { if (instance.fst == null) { return false; } this.fst = instance.fst; this.maxAnalyzedPathsForOneInput = instance.maxAnalyzedPathsForOneInput; this.hasPayloads = instance.hasPayloads; return true; } One could use it like this: analyzingSugg = new AnalyzingSuggester(...); fuzzySugg = new FuzzySuggester(...); analyzingSugg.load(someInputStream); fuzzySugg = analyzingSugg.shareFstFrom(analyzingSugg); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-Tests-trunk-Java7 - Build # 4231 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-Tests-trunk-Java7/4231/ All tests passed Build Log: [...truncated 34759 lines...] -documentation-lint: [jtidy] Checking for broken html (such as invalid tags)... [delete] Deleting directory /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-Java7/lucene/build/jtidy_tmp [echo] Checking for broken links... [exec] [exec] Crawl/parse... [exec] [exec] Verify... [echo] Checking for malformed docs... [exec] [exec] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-Java7/solr/build/docs/solr-core/overview-summary.html [exec] missing: org.apache.solr.search.join [exec] [exec] Missing javadocs were found! BUILD FAILED /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-Java7/build.xml:389: The following error occurred while executing this line: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-Java7/build.xml:60: The following error occurred while executing this line: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-Java7/solr/build.xml:563: The following error occurred while executing this line: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-Java7/solr/build.xml:579: The following error occurred while executing this line: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-Java7/lucene/common-build.xml:2149: exec returned: 1 Total time: 80 minutes 13 seconds Build step 'Invoke Ant' marked build as failure Archiving artifacts Recording test results Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5171) AnalyzingSuggester and FuzzySuggester should be able to share same FST
[ https://issues.apache.org/jira/browse/LUCENE-5171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anna Björk Nikulásdóttir updated LUCENE-5171: - Priority: Minor (was: Major) AnalyzingSuggester and FuzzySuggester should be able to share same FST -- Key: LUCENE-5171 URL: https://issues.apache.org/jira/browse/LUCENE-5171 Project: Lucene - Core Issue Type: Improvement Components: modules/other Affects Versions: 4.4, 4.3.1 Reporter: Anna Björk Nikulásdóttir Priority: Minor In my code I use both suggesters for the same FST. I use AnalyzerSuggester#store() to create the FST and later on AnalyzingSuggester#load() and FuzzySuggester#load() to use it. This approach works very well but it unnecessarily creates 2 fst instances resulting in 2x memory consumption. It seems that for the time being both suggesters use the same FST format. The following trivial method in AnalyzingSuggester provides the possibility to share the same FST among different instances of AnalyzingSuggester. It has been tested in the above scenario: public boolean shareFstFrom(AnalyzingSuggester instance) { if (instance.fst == null) { return false; } this.fst = instance.fst; this.maxAnalyzedPathsForOneInput = instance.maxAnalyzedPathsForOneInput; this.hasPayloads = instance.hasPayloads; return true; } One could use it like this: analyzingSugg = new AnalyzingSuggester(...); fuzzySugg = new FuzzySuggester(...); analyzingSugg.load(someInputStream); fuzzySugg = analyzingSugg.shareFstFrom(analyzingSugg); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.7.0) - Build # 727 - Still Failing!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/727/ Java: 64bit/jdk1.7.0 -XX:+UseCompressedOops -XX:+UseSerialGC All tests passed Build Log: [...truncated 34761 lines...] -documentation-lint: [jtidy] Checking for broken html (such as invalid tags)... [delete] Deleting directory /Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/jtidy_tmp [echo] Checking for broken links... [exec] [exec] Crawl/parse... [exec] [exec] Verify... [echo] Checking for malformed docs... [exec] [exec] /Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/solr/build/docs/solr-core/overview-summary.html [exec] missing: org.apache.solr.search.join [exec] [exec] Missing javadocs were found! BUILD FAILED /Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/build.xml:389: The following error occurred while executing this line: /Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/build.xml:60: The following error occurred while executing this line: /Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/solr/build.xml:563: The following error occurred while executing this line: /Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/solr/build.xml:579: The following error occurred while executing this line: /Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/lucene/common-build.xml:2149: exec returned: 1 Total time: 164 minutes 25 seconds Build step 'Invoke Ant' marked build as failure Description set: Java: 64bit/jdk1.7.0 -XX:+UseCompressedOops -XX:+UseSerialGC Archiving artifacts Recording test results Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5172) FuzzySuggester should boost terms with minimal Levenshtein Distance
Anna Björk Nikulásdóttir created LUCENE-5172: Summary: FuzzySuggester should boost terms with minimal Levenshtein Distance Key: LUCENE-5172 URL: https://issues.apache.org/jira/browse/LUCENE-5172 Project: Lucene - Core Issue Type: Improvement Components: modules/other Affects Versions: 4.3.1, 4.4 Reporter: Anna Björk Nikulásdóttir For my use case I need both suggesters: AnalyzingSuggester and FuzzySuggester because FuzzySuggester does not boost terms with minimal Levenshtein distance. Post processing of FuzzySuggester results is somewhat heavy if only one wants to find direct prefix suggestions. So I first use AnalyzingSuggester to find prefix suggestions and optionally FuzzySuggester afterwards if AnaylzingSuggester did not yield appropriate results. It would be really useful if FuzzySuggester could boost/sort suggestion results in order of Levenshtein distances. Then I only would need FuzzySuggester. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5173) Add checkindex piece of LUCENE-5116
Robert Muir created LUCENE-5173: --- Summary: Add checkindex piece of LUCENE-5116 Key: LUCENE-5173 URL: https://issues.apache.org/jira/browse/LUCENE-5173 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir LUCENE-5116 fixes addIndexes(Reader) to never write a 0-document segment (in the case you merge in empty or all-deleted stuff). I considered it just an inconsistency, but it could cause confusing exceptions to real users too if there was a regression here. (see solr users list:Split Shard Error - maxValue must be non-negative). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5174) On disk FST objects
Anna Björk Nikulásdóttir created LUCENE-5174: Summary: On disk FST objects Key: LUCENE-5174 URL: https://issues.apache.org/jira/browse/LUCENE-5174 Project: Lucene - Core Issue Type: Improvement Components: core/FSTs Affects Versions: 4.3.1, 4.4 Reporter: Anna Björk Nikulásdóttir If one wants to support multiple language suggestions at the same time via AnalyzingSuggester/FuzzySuggester on Android, it's almost not possible for the time being, because all suggesters use in memory resident FST's. And of course each language needs its own FST. On Android there are VM memory restrictions of 32MB for older devices like the Nexus S. Making the math: a good language FST is roughly 11-15MB in size. Supporting even 2 languages at the same time is therefore difficult taking into account that FST's are not the only part of a common Android app. A possible approach to a solution via memory mapping and DirectByteBuffer has been proposed by Mike Mc Candless on Lucene ML: [http://mail-archives.apache.org/mod_mbox/lucene-java-user/201308.mbox/%3CCAL8PwkbHdeEvk+e47H6v6_=Ln36yhE2RY=m7rqbfp+h50u5...@mail.gmail.com%3E] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5116) IW.addIndexes doesn't prune all deleted segments
[ https://issues.apache.org/jira/browse/LUCENE-5116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738221#comment-13738221 ] ASF subversion and git services commented on LUCENE-5116: - Commit 1513487 from [~thetaphi] in branch 'dev/trunk' [ https://svn.apache.org/r1513487 ] LUCENE-5116: Simplify test to use MatchNoBits instead own impl IW.addIndexes doesn't prune all deleted segments Key: LUCENE-5116 URL: https://issues.apache.org/jira/browse/LUCENE-5116 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Assignee: Shai Erera Fix For: 5.0, 4.5 Attachments: LUCENE-5116.patch, LUCENE-5116_test.patch at the least, this can easily create segments with maxDoc == 0. It seems buggy: elsewhere we prune these segments out, so its expected to have a commit point with no segments rather than a segment with 0 documents... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5173) Add checkindex piece of LUCENE-5116
[ https://issues.apache.org/jira/browse/LUCENE-5173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-5173: Attachment: LUCENE-5116.patch Simple patch: also adds an assert to SegmentMerger. we can only check this if the index is 4.5+, because thats when LUCENE-5116 was fixed. Add checkindex piece of LUCENE-5116 --- Key: LUCENE-5173 URL: https://issues.apache.org/jira/browse/LUCENE-5173 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5116.patch LUCENE-5116 fixes addIndexes(Reader) to never write a 0-document segment (in the case you merge in empty or all-deleted stuff). I considered it just an inconsistency, but it could cause confusing exceptions to real users too if there was a regression here. (see solr users list:Split Shard Error - maxValue must be non-negative). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5116) IW.addIndexes doesn't prune all deleted segments
[ https://issues.apache.org/jira/browse/LUCENE-5116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738225#comment-13738225 ] ASF subversion and git services commented on LUCENE-5116: - Commit 1513488 from [~thetaphi] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1513488 ] Merged revision(s) 1513487 from lucene/dev/trunk: LUCENE-5116: Simplify test to use MatchNoBits instead own impl IW.addIndexes doesn't prune all deleted segments Key: LUCENE-5116 URL: https://issues.apache.org/jira/browse/LUCENE-5116 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Assignee: Shai Erera Fix For: 5.0, 4.5 Attachments: LUCENE-5116.patch, LUCENE-5116_test.patch at the least, this can easily create segments with maxDoc == 0. It seems buggy: elsewhere we prune these segments out, so its expected to have a commit point with no segments rather than a segment with 0 documents... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5173) Add checkindex piece of LUCENE-5116
[ https://issues.apache.org/jira/browse/LUCENE-5173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738244#comment-13738244 ] Robert Muir commented on LUCENE-5173: - deleted the wrongly-named patch, sorry :) Add checkindex piece of LUCENE-5116 --- Key: LUCENE-5173 URL: https://issues.apache.org/jira/browse/LUCENE-5173 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5173.patch LUCENE-5116 fixes addIndexes(Reader) to never write a 0-document segment (in the case you merge in empty or all-deleted stuff). I considered it just an inconsistency, but it could cause confusing exceptions to real users too if there was a regression here. (see solr users list:Split Shard Error - maxValue must be non-negative). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5173) Add checkindex piece of LUCENE-5116
[ https://issues.apache.org/jira/browse/LUCENE-5173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-5173: Attachment: LUCENE-5173.patch just a slight simplification of the logic Add checkindex piece of LUCENE-5116 --- Key: LUCENE-5173 URL: https://issues.apache.org/jira/browse/LUCENE-5173 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5173.patch LUCENE-5116 fixes addIndexes(Reader) to never write a 0-document segment (in the case you merge in empty or all-deleted stuff). I considered it just an inconsistency, but it could cause confusing exceptions to real users too if there was a regression here. (see solr users list:Split Shard Error - maxValue must be non-negative). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5173) Add checkindex piece of LUCENE-5116
[ https://issues.apache.org/jira/browse/LUCENE-5173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-5173: Attachment: (was: LUCENE-5116.patch) Add checkindex piece of LUCENE-5116 --- Key: LUCENE-5173 URL: https://issues.apache.org/jira/browse/LUCENE-5173 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5173.patch LUCENE-5116 fixes addIndexes(Reader) to never write a 0-document segment (in the case you merge in empty or all-deleted stuff). I considered it just an inconsistency, but it could cause confusing exceptions to real users too if there was a regression here. (see solr users list:Split Shard Error - maxValue must be non-negative). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5170) Add getter for reuse strategy to Analyzer, make AnalyzerWrapper's reuse strategy configureable
[ https://issues.apache.org/jira/browse/LUCENE-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738258#comment-13738258 ] Robert Muir commented on LUCENE-5170: - {quote} Make the strategy a ENUM like class (no state). The ThreadLocal should not be sitting on the strategy, the strategy should only implement the strategy, not also take care of storing the data in the ThreadLocal. {quote} I like this idea, i think it could simplify the thing a lot. {quote} I have no idea how to fix this - it looks like we need to backwards break to fix this! {quote} Personally i support that in this case: because i think we can minimize the breaks at the end of the day. For example if we switch to enums, in 4.x, we could still allow 'instantiation' but its just useless (since the object is stateless) and deprecated. and the 'constants' would be declared like MultiTermQuery rewrite? Add getter for reuse strategy to Analyzer, make AnalyzerWrapper's reuse strategy configureable -- Key: LUCENE-5170 URL: https://issues.apache.org/jira/browse/LUCENE-5170 Project: Lucene - Core Issue Type: Bug Components: core/other, modules/analysis Affects Versions: 4.4 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 5.0, 4.5 Attachments: LUCENE-5170.patch If you write an Analyzer that wraps another one (but without using AnalyzerWrapper) you may need use the same reuse strategy in your wrapper. This is not possible as there is no way to get the reuse startegy (private field and no getter). An example is ES's NamedAnalyzer, see my comment: [https://github.com/elasticsearch/elasticsearch/commit/b9a2fbd8741aa1b9beffb7d2922fc9b4525397e4#src/main/java/org/elasticsearch/index/analysis/NamedAnalyzer.java] This would add a getter, just a 3-liner. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5173) Add checkindex piece of LUCENE-5116
[ https://issues.apache.org/jira/browse/LUCENE-5173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738263#comment-13738263 ] Uwe Schindler commented on LUCENE-5173: --- Patch is fine. I like that the checkindex allows older segments with empty size, but once a segment was merged it can no longer be empty. Maybe the assert in SegmentMerge should be a hard check, unless SegmentMerger always strictly throws away empty segments (not that somebody can somehow with a crazy alcoholic mergepolicy create those segments again). Add checkindex piece of LUCENE-5116 --- Key: LUCENE-5173 URL: https://issues.apache.org/jira/browse/LUCENE-5173 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5173.patch LUCENE-5116 fixes addIndexes(Reader) to never write a 0-document segment (in the case you merge in empty or all-deleted stuff). I considered it just an inconsistency, but it could cause confusing exceptions to real users too if there was a regression here. (see solr users list:Split Shard Error - maxValue must be non-negative). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary
[ https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Han Jiang updated LUCENE-3069: -- Attachment: LUCENE-3069.patch Patch with backward compability fix on Lucene41PBF (TempPostingsReader is actually a fork of Lucene41PostingsReader). Lucene should have an entirely memory resident term dictionary -- Key: LUCENE-3069 URL: https://issues.apache.org/jira/browse/LUCENE-3069 Project: Lucene - Core Issue Type: Improvement Components: core/index, core/search Affects Versions: 4.0-ALPHA Reporter: Simon Willnauer Assignee: Han Jiang Labels: gsoc2013 Fix For: 5.0, 4.5 Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch FST based TermDictionary has been a great improvement yet it still uses a delta codec file for scanning to terms. Some environments have enough memory available to keep the entire FST based term dict in memory. We should add a TermDictionary implementation that encodes all needed information for each term into the FST (custom fst.Output) and builds a FST from the entire term not just the delta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5173) Add checkindex piece of LUCENE-5116
[ https://issues.apache.org/jira/browse/LUCENE-5173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738299#comment-13738299 ] Robert Muir commented on LUCENE-5173: - {quote} Maybe the assert in SegmentMerge should be a hard check, unless SegmentMerger always strictly throws away empty segments (not that somebody can somehow with a crazy alcoholic mergepolicy create those segments again). {quote} Or, maybe mergeState.segmentInfo.setDocCount(setDocMaps()) should happen in the ctor of SegmentMerger instead of line 1 of merge()? And it could a simple boolean method like shouldMerge(): returns docCount 0, called by addIndexes and mergeMiddle? this way the logic added to addIndexes in LUCENE-5116 wouldnt even need to be there, and we'd feel better that we arent writing such 0 document segments (which codecs are not prepared to handle today). Add checkindex piece of LUCENE-5116 --- Key: LUCENE-5173 URL: https://issues.apache.org/jira/browse/LUCENE-5173 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5173.patch LUCENE-5116 fixes addIndexes(Reader) to never write a 0-document segment (in the case you merge in empty or all-deleted stuff). I considered it just an inconsistency, but it could cause confusing exceptions to real users too if there was a regression here. (see solr users list:Split Shard Error - maxValue must be non-negative). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5139) Make Core Admin more user friendly when in SolrCloud mode.
[ https://issues.apache.org/jira/browse/SOLR-5139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Matheis (steffkes) updated SOLR-5139: Component/s: web gui Make Core Admin more user friendly when in SolrCloud mode. -- Key: SOLR-5139 URL: https://issues.apache.org/jira/browse/SOLR-5139 Project: Solr Issue Type: Improvement Components: SolrCloud, web gui Reporter: Mark Miller Fix For: 4.5, 5.0 The CoreAdmin in the UI can easily get users into trouble - especially since we don't yet have a collection management API. The info displayed is useful though, and sometimes it makes sense to have access to the commands on a per core level as well. We should improve the situation though. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5101) Invalid UTF-8 character 0xfffe during shard update
[ https://issues.apache.org/jira/browse/SOLR-5101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738302#comment-13738302 ] Federico Chiacchiaretta commented on SOLR-5101: --- Hi, should this issue be reopened or filed elsewhere? I'd like to track changes to Solr that may affect this issue (i.e. switch to javabin for updates). Thanks, Federico Chiacchiaretta Invalid UTF-8 character 0xfffe during shard update -- Key: SOLR-5101 URL: https://issues.apache.org/jira/browse/SOLR-5101 Project: Solr Issue Type: Bug Components: clients - java Affects Versions: 4.3 Environment: Ubuntu 12.04.2 java version 1.6.0_27 OpenJDK Runtime Environment (IcedTea6 1.12.5) (6b27-1.12.5-0ubuntu0.12.04.1) OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode) Reporter: Federico Chiacchiaretta On data import from a PostgreSQL db, I get the following error in solr.log: ERROR - 2013-08-01 09:51:00.217; org.apache.solr.common.SolrException; shard update error RetryNode: http://172.16.201.173:8983/solr/archive/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Invalid UTF-8 character 0xfffe at char #416, byte #127) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:402) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180) at org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:332) at org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:306) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:679) This prevents the document from being successfully added to the index, and a few documents targeting the same shard are also missing. This happens silently, because data import completes successfully, and the whole number of documents reported as Added includes those who failed (and are actually lost). Is there a known workaround for this issue? Regards, Federico Chiacchiaretta -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5122) spellcheck.collateMaxCollectDocs estimates seem to be meaninless -- can lead to ArithmeticException: / by zero
[ https://issues.apache.org/jira/browse/SOLR-5122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738311#comment-13738311 ] James Dyer commented on SOLR-5122: -- Hoss, I appreciate your reporting this taking care of this as much as possible. Do you know offhand a failing seed for this test? (I've been away for awhile and might not have the jenkins log easily available.) I will look at this. Likely, I need to require docs to be collected in order and mistakenly thought this was unnecessary. spellcheck.collateMaxCollectDocs estimates seem to be meaninless -- can lead to ArithmeticException: / by zero Key: SOLR-5122 URL: https://issues.apache.org/jira/browse/SOLR-5122 Project: Solr Issue Type: Bug Affects Versions: 4.4 Reporter: Hoss Man Attachments: SOLR-5122.patch As part of SOLR-4952 SpellCheckCollatorTest started using RandomMergePolicy, and this (aparently) led to a failure in testEstimatedHitCounts. As far as i can tell: the test assumes that specific values would be returned as the _estimated_ hits for a colleation, and it appears that the change in MergePolicy however resulted in different segments with different term stats, causing the estimation code to produce different values then what is expected. I made a quick attempt to improve the test to: * expect explicit exact values only when spellcheck.collateMaxCollectDocs is set such that the estimate' should actually be exact (ie: collateMaxCollectDocs == 0 or collateMaxCollectDocs greater then the num docs in the index * randomize the values used for collateMaxCollectDocs and confirm that the estimates are never more then the num docs in the index This lead to an odd ArithmeticException: / by zero error in the test, which seems to suggest that there is a genuine bug in the code for estimating the hits that only gets tickled in certain mergepolicy/segment/collateMaxCollectDocs combinations. *Update:* This appears to be a general problem with collecting docs out of order and the estimation of hits -- i believe even if there is no divide by zero error, the estimates are largely meaningless since the docs are collected out of order. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-5122) spellcheck.collateMaxCollectDocs estimates seem to be meaninless -- can lead to ArithmeticException: / by zero
[ https://issues.apache.org/jira/browse/SOLR-5122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Dyer reassigned SOLR-5122: Assignee: James Dyer spellcheck.collateMaxCollectDocs estimates seem to be meaninless -- can lead to ArithmeticException: / by zero Key: SOLR-5122 URL: https://issues.apache.org/jira/browse/SOLR-5122 Project: Solr Issue Type: Bug Affects Versions: 4.4 Reporter: Hoss Man Assignee: James Dyer Attachments: SOLR-5122.patch As part of SOLR-4952 SpellCheckCollatorTest started using RandomMergePolicy, and this (aparently) led to a failure in testEstimatedHitCounts. As far as i can tell: the test assumes that specific values would be returned as the _estimated_ hits for a colleation, and it appears that the change in MergePolicy however resulted in different segments with different term stats, causing the estimation code to produce different values then what is expected. I made a quick attempt to improve the test to: * expect explicit exact values only when spellcheck.collateMaxCollectDocs is set such that the estimate' should actually be exact (ie: collateMaxCollectDocs == 0 or collateMaxCollectDocs greater then the num docs in the index * randomize the values used for collateMaxCollectDocs and confirm that the estimates are never more then the num docs in the index This lead to an odd ArithmeticException: / by zero error in the test, which seems to suggest that there is a genuine bug in the code for estimating the hits that only gets tickled in certain mergepolicy/segment/collateMaxCollectDocs combinations. *Update:* This appears to be a general problem with collecting docs out of order and the estimation of hits -- i believe even if there is no divide by zero error, the estimates are largely meaningless since the docs are collected out of order. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field
[ https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738321#comment-13738321 ] Jack Krupansky commented on SOLR-5017: -- Is this feature intended for both traditional Solr sharding as well as SolrCloud? If it is intended for SolrCloud as well, how does delete-by-id work, in the sense that the delete command does not include the field needed to determine routing? Allow sharding based on the value of a field Key: SOLR-5017 URL: https://issues.apache.org/jira/browse/SOLR-5017 Project: Solr Issue Type: Sub-task Reporter: Noble Paul Assignee: Noble Paul Fix For: 4.5, 5.0 Attachments: SOLR-5017.patch We should be able to create a collection where sharding is done based on the value of a given field collections can be created with shardField=fieldName, which will be persisted in DocCollection in ZK implicit DocRouter would look at this field instead of _shard_ field CompositeIdDocRouter can also use this field instead of looking at the id field. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5173) Add checkindex piece of LUCENE-5116
[ https://issues.apache.org/jira/browse/LUCENE-5173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738344#comment-13738344 ] Michael McCandless commented on LUCENE-5173: +1, I like consolidating the logic into a single shouldMerge(). And I don't think codecs should be required to handle the 0 doc segment case: we should never send such a segment to them. Add checkindex piece of LUCENE-5116 --- Key: LUCENE-5173 URL: https://issues.apache.org/jira/browse/LUCENE-5173 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5173.patch LUCENE-5116 fixes addIndexes(Reader) to never write a 0-document segment (in the case you merge in empty or all-deleted stuff). I considered it just an inconsistency, but it could cause confusing exceptions to real users too if there was a regression here. (see solr users list:Split Shard Error - maxValue must be non-negative). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field
[ https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738375#comment-13738375 ] Noble Paul commented on SOLR-5017: -- This is only for SolrCloud deleteById/getById would expect the param \_route_ or shard.keys (deprecated) without which it will have to fan out a distributed request. it works without complaining but will be inefficient Allow sharding based on the value of a field Key: SOLR-5017 URL: https://issues.apache.org/jira/browse/SOLR-5017 Project: Solr Issue Type: Sub-task Reporter: Noble Paul Assignee: Noble Paul Fix For: 4.5, 5.0 Attachments: SOLR-5017.patch We should be able to create a collection where sharding is done based on the value of a given field collections can be created with shardField=fieldName, which will be persisted in DocCollection in ZK implicit DocRouter would look at this field instead of _shard_ field CompositeIdDocRouter can also use this field instead of looking at the id field. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5173) Add checkindex piece of LUCENE-5116
[ https://issues.apache.org/jira/browse/LUCENE-5173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738388#comment-13738388 ] Uwe Schindler commented on LUCENE-5173: --- I agree with both. My complaint was the following: The assert was not correct, as asserts should only be used for real assertions withing the same class. For this special check, there is something outside of SegmentMerger that could maybe insert empty readers into the merge queue, so those should be thrown away while merging or when sergmentmerger initializes (so move to ctor is a good idea). I am thinking about crazy stuff like a merge policy that wraps with a FilterAtomicReader to filter while merging (like IndexSorter) - which is possible with the current API. So the segments should be removed on creating the SegmentMerger when all readers to merge are already in the ListAtomicReader. In the IndexWriter#addIndexes we may then just need the top-level check to not even start a merge. Add checkindex piece of LUCENE-5116 --- Key: LUCENE-5173 URL: https://issues.apache.org/jira/browse/LUCENE-5173 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5173.patch LUCENE-5116 fixes addIndexes(Reader) to never write a 0-document segment (in the case you merge in empty or all-deleted stuff). I considered it just an inconsistency, but it could cause confusing exceptions to real users too if there was a regression here. (see solr users list:Split Shard Error - maxValue must be non-negative). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5122) spellcheck.collateMaxCollectDocs estimates seem to be meaninless -- can lead to ArithmeticException: / by zero
[ https://issues.apache.org/jira/browse/SOLR-5122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738395#comment-13738395 ] Hoss Man commented on SOLR-5122: The initial jenkins failure i saw was At revision 1511278... https://builds.apache.org/job/Lucene-Solr-NightlyTests-trunk/343/ https://mail-archives.apache.org/mod_mbox/lucene-dev/201308.mbox/%3Calpine.DEB.2.02.1308070919170.13959@frisbee%3E {quote} I can reproduce this -- it's probably related to the MP randomization i put in ... looks like it's doing exact numeric comparisons based on term stats. I'll take a look later today... ant test -Dtestcase=SpellCheckCollatorTest -Dtests.method=testEstimatedHitCounts -Dtests.seed=16B4D8F74E59EE10 -Dtests.multiplier=2 -Dtests.nightly=true -Dtests.slow=true -Dtests.locale=nl -Dtests.timezone=America/Dawson -Dtests.file.encoding=US-ASCII {quote} ...regardless of he initial failure though, if you try out the patch i attached to try and improve the test coverage, then the reproduce line from the failure i posted along iwth that patch still reproduces on trunk (but you do have to manually uncomment the {{@Ignore}}... {code} ant test -Dtestcase=SpellCheckCollatorTest -Dtests.method=testEstimatedHitCounts -Dtests.seed=16B4D8F74E59EE10 -Dtests.multiplier=2 -Dtests.nightly=true -Dtests.slow=true -Dtests.locale=nl -Dtests.timezone=America/Dawson -Dtests.file.encoding=US-ASCII {code} spellcheck.collateMaxCollectDocs estimates seem to be meaninless -- can lead to ArithmeticException: / by zero Key: SOLR-5122 URL: https://issues.apache.org/jira/browse/SOLR-5122 Project: Solr Issue Type: Bug Affects Versions: 4.4 Reporter: Hoss Man Assignee: James Dyer Attachments: SOLR-5122.patch As part of SOLR-4952 SpellCheckCollatorTest started using RandomMergePolicy, and this (aparently) led to a failure in testEstimatedHitCounts. As far as i can tell: the test assumes that specific values would be returned as the _estimated_ hits for a colleation, and it appears that the change in MergePolicy however resulted in different segments with different term stats, causing the estimation code to produce different values then what is expected. I made a quick attempt to improve the test to: * expect explicit exact values only when spellcheck.collateMaxCollectDocs is set such that the estimate' should actually be exact (ie: collateMaxCollectDocs == 0 or collateMaxCollectDocs greater then the num docs in the index * randomize the values used for collateMaxCollectDocs and confirm that the estimates are never more then the num docs in the index This lead to an odd ArithmeticException: / by zero error in the test, which seems to suggest that there is a genuine bug in the code for estimating the hits that only gets tickled in certain mergepolicy/segment/collateMaxCollectDocs combinations. *Update:* This appears to be a general problem with collecting docs out of order and the estimation of hits -- i believe even if there is no divide by zero error, the estimates are largely meaningless since the docs are collected out of order. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5122) spellcheck.collateMaxCollectDocs estimates seem to be meaninless -- can lead to ArithmeticException: / by zero
[ https://issues.apache.org/jira/browse/SOLR-5122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-5122: --- Attachment: SOLR-5122.patch updated patch to trunk and included the commenting out of the {{@Ignore}} so all ou need to do is apply this patch to reproduce with the previously mentioned seed. spellcheck.collateMaxCollectDocs estimates seem to be meaninless -- can lead to ArithmeticException: / by zero Key: SOLR-5122 URL: https://issues.apache.org/jira/browse/SOLR-5122 Project: Solr Issue Type: Bug Affects Versions: 4.4 Reporter: Hoss Man Assignee: James Dyer Attachments: SOLR-5122.patch, SOLR-5122.patch As part of SOLR-4952 SpellCheckCollatorTest started using RandomMergePolicy, and this (aparently) led to a failure in testEstimatedHitCounts. As far as i can tell: the test assumes that specific values would be returned as the _estimated_ hits for a colleation, and it appears that the change in MergePolicy however resulted in different segments with different term stats, causing the estimation code to produce different values then what is expected. I made a quick attempt to improve the test to: * expect explicit exact values only when spellcheck.collateMaxCollectDocs is set such that the estimate' should actually be exact (ie: collateMaxCollectDocs == 0 or collateMaxCollectDocs greater then the num docs in the index * randomize the values used for collateMaxCollectDocs and confirm that the estimates are never more then the num docs in the index This lead to an odd ArithmeticException: / by zero error in the test, which seems to suggest that there is a genuine bug in the code for estimating the hits that only gets tickled in certain mergepolicy/segment/collateMaxCollectDocs combinations. *Update:* This appears to be a general problem with collecting docs out of order and the estimation of hits -- i believe even if there is no divide by zero error, the estimates are largely meaningless since the docs are collected out of order. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5175) Add parameter to lower-bound TF normalization for BM25 (for long documents)
Tom Burton-West created LUCENE-5175: --- Summary: Add parameter to lower-bound TF normalization for BM25 (for long documents) Key: LUCENE-5175 URL: https://issues.apache.org/jira/browse/LUCENE-5175 Project: Lucene - Core Issue Type: Improvement Components: core/search Reporter: Tom Burton-West Priority: Minor In the article When Documents Are Very Long, BM25 Fails! a fix for the problem is documented. There was a TODO note in BM25Similarity to add this fix. I will attach a patch that implements the fix shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5175) Add parameter to lower-bound TF normalization for BM25 (for long documents)
[ https://issues.apache.org/jira/browse/LUCENE-5175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom Burton-West updated LUCENE-5175: Attachment: LUCENE-5175.patch Patch adds optional parameter delta to lower-bound tf normalization. Attached also are unit tests. Still need to add tests of the explanation/scoring for cases 1) no norms, and 2) no delta If no delta parameter is supplied, the math works out to the equivalent of the regular BM25 formula as far as the score, but I think there is an extra step or two to get there. I'll see if I can get some benchmarks running to see if there is any significant performance issue. Add parameter to lower-bound TF normalization for BM25 (for long documents) --- Key: LUCENE-5175 URL: https://issues.apache.org/jira/browse/LUCENE-5175 Project: Lucene - Core Issue Type: Improvement Components: core/search Reporter: Tom Burton-West Priority: Minor Attachments: LUCENE-5175.patch In the article When Documents Are Very Long, BM25 Fails! a fix for the problem is documented. There was a TODO note in BM25Similarity to add this fix. I will attach a patch that implements the fix shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3076) Solr(Cloud) should support block joins
[ https://issues.apache.org/jira/browse/SOLR-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738415#comment-13738415 ] ASF subversion and git services commented on SOLR-3076: --- Commit 1513577 from [~yo...@apache.org] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1513577 ] SOLR-3076: block join parent and child queries Solr(Cloud) should support block joins -- Key: SOLR-3076 URL: https://issues.apache.org/jira/browse/SOLR-3076 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Fix For: 4.5, 5.0 Attachments: 27M-singlesegment-histogram.png, 27M-singlesegment.png, bjq-vs-filters-backward-disi.patch, bjq-vs-filters-illegal-state.patch, child-bjqparser.patch, dih-3076.patch, dih-config.xml, parent-bjq-qparser.patch, parent-bjq-qparser.patch, Screen Shot 2012-07-17 at 1.12.11 AM.png, SOLR-3076-childDocs.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-7036-childDocs-solr-fork-trunk-patched, solrconf-bjq-erschema-snippet.xml, solrconfig.xml.patch, tochild-bjq-filtered-search-fix.patch Lucene has the ability to do block joins, we should add it to Solr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field
[ https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738422#comment-13738422 ] Shalin Shekhar Mangar commented on SOLR-5017: - Shard splitting doesn't support collections configured with a hash router and routeField. I'll put up a test and fix. Allow sharding based on the value of a field Key: SOLR-5017 URL: https://issues.apache.org/jira/browse/SOLR-5017 Project: Solr Issue Type: Sub-task Reporter: Noble Paul Assignee: Noble Paul Fix For: 4.5, 5.0 Attachments: SOLR-5017.patch We should be able to create a collection where sharding is done based on the value of a given field collections can be created with shardField=fieldName, which will be persisted in DocCollection in ZK implicit DocRouter would look at this field instead of _shard_ field CompositeIdDocRouter can also use this field instead of looking at the id field. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4799) SQLEntityProcessor for zipper join
[ https://issues.apache.org/jira/browse/SOLR-4799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738424#comment-13738424 ] James Dyer commented on SOLR-4799: -- Mikhail, This seems like a great feature, but I haven't looked at it. As I said, I do not feel it wise to add features that won't neatly plug-in the current DIH infrastructure until we improve the code. Really, I would love to chop out features (Debug mode, delta updates, streaming from a POST request, etc), and make it work independently from Solr before we build more into it. But I've been busy with other things and haven't had much time. By the way, have you any experience with Apache Flume? In your opinion, could it become DIH's successor? A Solr Sink was added earlier in the year that will index disparate data. I haven't looked much at it, but my first impression is that it is a big, complicated tool whereas DIH is smaller and simpler and a the 2 would have different use-cases. Also, not so sure it has any support yet for RDBMS. SQLEntityProcessor for zipper join -- Key: SOLR-4799 URL: https://issues.apache.org/jira/browse/SOLR-4799 Project: Solr Issue Type: New Feature Components: contrib - DataImportHandler Reporter: Mikhail Khludnev Priority: Minor Labels: dih Attachments: SOLR-4799.patch DIH is mostly considered as a playground tool, and real usages end up with SolrJ. I want to contribute few improvements target DIH performance. This one provides performant approach for joining SQL Entities with miserable memory at contrast to http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor The idea is: * parent table is explicitly ordered by it’s PK in SQL * children table is explicitly ordered by parent_id FK in SQL * children entity processor joins ordered resultsets by ‘zipper’ algorithm. Do you think it’s worth to contribute it into DIH? cc: [~goksron] [~jdyer] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4718) Allow solr.xml to be stored in zookeeper
[ https://issues.apache.org/jira/browse/SOLR-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738434#comment-13738434 ] Alan Woodward commented on SOLR-4718: - Rather than adding Zookeeper stuff to ConfigSolr.fromSolrHome(), can we create a new static method ConfigSolr.fromZookeeper()? And then push the system property checks back out into SolrDispatchFilter or wherever fromSolrHome is being called. Keeps each fromXXX method just doing one thing. I wonder if it's worth refactoring the ByteArrayInputStream re-reading dance into fromInputStream as well. It's a bit of a hack anyway, and I don't like having it in more than once place. Allow solr.xml to be stored in zookeeper Key: SOLR-4718 URL: https://issues.apache.org/jira/browse/SOLR-4718 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 4.3, 5.0 Reporter: Erick Erickson Assignee: Erick Erickson Attachments: SOLR-4718.patch, SOLR-4718.patch So the near-final piece of this puzzle is to make solr.xml be storable in Zookeeper. Code-wise in terms of Solr, this doesn't look very difficult, I'm working on it now. More interesting is how to get the configuration into ZK in the first place, enhancements to ZkCli? Or boostrap-conf? Other? I'm punting on that for this patch. Second level is how to tell Solr to get the file from ZK. Some possibilities: 1 A system prop, -DzkSolrXmlPath=blah where blah is the path _on zk_ where the file is. Would require -DzkHost or -DzkRun as well. pros - simple, I can wrap my head around it. - easy to script cons - can't run multiple JVMs pointing to different files. Is this really a problem? 2 New solr.xml element. Something like: solr solrcloud str name=zkHostzkurl/str str name=zkSolrXmlPathwhatever/str /solrcloud solr Really, this form would hinge on the presence or absence of zkSolrXmlPath. If present, go up and look for the indicated solr.xml file on ZK. Any properties in the ZK version would overwrite anything in the local copy. NOTE: I'm really not very interested in supporting this as an option for old-style solr.xml unless it's _really_ easy. For instance, what if the local solr.xml is new-style and the one in ZK is old-style? Or vice-versa? Since old-style is going away, this doesn't seem like it's worth the effort. pros - No new mechanisms cons - once again requires that there be a solr.xml file on each client. Admittedly for installations that didn't care much about multiple JVMs, it could be a stock file that didn't change... For now, I'm going to just manually push solr.xml to ZK, then read it based on a sysprop. That'll get the structure in place while we debate. Not going to check this in until there's some consensus though. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4952) audit test configs to use solrconfig.snippet.randomindexconfig.xml in more tests
[ https://issues.apache.org/jira/browse/SOLR-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738454#comment-13738454 ] ASF subversion and git services commented on SOLR-4952: --- Commit 1513586 from hoss...@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1513586 ] SOLR-4952: use solrconfig.snippet.randomindexconfig.xml in the QueryElevation tests audit test configs to use solrconfig.snippet.randomindexconfig.xml in more tests Key: SOLR-4952 URL: https://issues.apache.org/jira/browse/SOLR-4952 Project: Solr Issue Type: Sub-task Reporter: Hoss Man Assignee: Hoss Man in SOLR-4942 i updated every solrconfig.xml to either... * include solrconfig.snippet.randomindexconfig.xml where it was easy to do so * use the useCompoundFile sys prop if it already had an {{indexConfig}} section, or if including the snippet wasn't going to be easy (ie: contrib tests) As an improvment on this: * audit all core configs not already using solrconfig.snippet.randomindexconfig.xml and either: ** make them use it, ignoring any previously unimportant explicit incdexConfig settings ** make them use it, using explicit sys props to overwrite random values in cases were explicit indexConfig values are important for test ** add a comment why it's not using the include snippet in cases where the explicit parsing is part of hte test * try figure out a way for contrib tests to easily include the same file and/or apply the same rules as above -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4952) audit test configs to use solrconfig.snippet.randomindexconfig.xml in more tests
[ https://issues.apache.org/jira/browse/SOLR-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738459#comment-13738459 ] ASF subversion and git services commented on SOLR-4952: --- Commit 1513587 from hoss...@apache.org in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1513587 ] SOLR-4952: use solrconfig.snippet.randomindexconfig.xml in the QueryElevation tests (merge r1513586) audit test configs to use solrconfig.snippet.randomindexconfig.xml in more tests Key: SOLR-4952 URL: https://issues.apache.org/jira/browse/SOLR-4952 Project: Solr Issue Type: Sub-task Reporter: Hoss Man Assignee: Hoss Man in SOLR-4942 i updated every solrconfig.xml to either... * include solrconfig.snippet.randomindexconfig.xml where it was easy to do so * use the useCompoundFile sys prop if it already had an {{indexConfig}} section, or if including the snippet wasn't going to be easy (ie: contrib tests) As an improvment on this: * audit all core configs not already using solrconfig.snippet.randomindexconfig.xml and either: ** make them use it, ignoring any previously unimportant explicit incdexConfig settings ** make them use it, using explicit sys props to overwrite random values in cases were explicit indexConfig values are important for test ** add a comment why it's not using the include snippet in cases where the explicit parsing is part of hte test * try figure out a way for contrib tests to easily include the same file and/or apply the same rules as above -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4718) Allow solr.xml to be stored in zookeeper
[ https://issues.apache.org/jira/browse/SOLR-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738465#comment-13738465 ] Mark Miller commented on SOLR-4718: --- +1 Allow solr.xml to be stored in zookeeper Key: SOLR-4718 URL: https://issues.apache.org/jira/browse/SOLR-4718 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 4.3, 5.0 Reporter: Erick Erickson Assignee: Erick Erickson Attachments: SOLR-4718.patch, SOLR-4718.patch So the near-final piece of this puzzle is to make solr.xml be storable in Zookeeper. Code-wise in terms of Solr, this doesn't look very difficult, I'm working on it now. More interesting is how to get the configuration into ZK in the first place, enhancements to ZkCli? Or boostrap-conf? Other? I'm punting on that for this patch. Second level is how to tell Solr to get the file from ZK. Some possibilities: 1 A system prop, -DzkSolrXmlPath=blah where blah is the path _on zk_ where the file is. Would require -DzkHost or -DzkRun as well. pros - simple, I can wrap my head around it. - easy to script cons - can't run multiple JVMs pointing to different files. Is this really a problem? 2 New solr.xml element. Something like: solr solrcloud str name=zkHostzkurl/str str name=zkSolrXmlPathwhatever/str /solrcloud solr Really, this form would hinge on the presence or absence of zkSolrXmlPath. If present, go up and look for the indicated solr.xml file on ZK. Any properties in the ZK version would overwrite anything in the local copy. NOTE: I'm really not very interested in supporting this as an option for old-style solr.xml unless it's _really_ easy. For instance, what if the local solr.xml is new-style and the one in ZK is old-style? Or vice-versa? Since old-style is going away, this doesn't seem like it's worth the effort. pros - No new mechanisms cons - once again requires that there be a solr.xml file on each client. Admittedly for installations that didn't care much about multiple JVMs, it could be a stock file that didn't change... For now, I'm going to just manually push solr.xml to ZK, then read it based on a sysprop. That'll get the structure in place while we debate. Not going to check this in until there's some consensus though. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4856) ant eclipse is not generating .project file correctly on mac
[ https://issues.apache.org/jira/browse/SOLR-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738471#comment-13738471 ] Steve Rowe commented on SOLR-4856: -- I don't use Eclipse, so it may be that there is something else wrong that isn't apparent on casual inspection, but I can't reproduce the problem you're reporting here. On my Macbook Pro with OS X 10.8.4, when I run {{ant eclipse}} from a Bash cmdline on {{branch_4x}} (using ant v1.8.2 and Oracle Java 1.7.0_25), the generated {{.project}} file contents start with: {code:xml} ?xml version=1.0 encoding=UTF-8? projectDescription namebranch_4x/name comment/comment projects /projects buildSpec buildCommand nameorg.eclipse.jdt.core.javabuilder/name arguments /arguments /buildCommand /buildSpec natures natureorg.eclipse.jdt.core.javanature/nature /natures filteredResources ... {code} ant eclipse is not generating .project file correctly on mac Key: SOLR-4856 URL: https://issues.apache.org/jira/browse/SOLR-4856 Project: Solr Issue Type: Bug Components: Build Affects Versions: 4.4 Environment: Mac OS X 10.8.2 Eclipse Version: Juno Service Release 2 Build id: 20130225-0426 Reporter: Kranti Parisa Priority: Minor STEPS: - Checkout from the branch_4x (Using Subclipse inside Eclipse Juno) - On the Terminal (command line) ran ant eclipse - Generated the eclipse .project, .classpath, .settings files - Refresh the project in Eclipse (I can see the files in the Navigator View) along with the actual source code checked out from SVN - Open .project file and there is no buildSpec, natures elements in there - Hence not able to build it properly and use ctrl+clicks for the references I manually edited the .project file to have the following buildSpec buildCommand nameorg.eclipse.jdt.core.javabuilder/name arguments /arguments /buildCommand /buildSpec natures natureorg.eclipse.jdt.core.javanature/nature /natures Shouldn't it be automatically added to the .project file at the first place when we run ant eclipse ? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4718) Allow solr.xml to be stored in zookeeper
[ https://issues.apache.org/jira/browse/SOLR-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738470#comment-13738470 ] Mark Miller commented on SOLR-4718: --- We should also probably be strict about the property values for the setting - eg zookeeper works, solrhome works, null works (as solrhome), anything else fails with an error. Allow solr.xml to be stored in zookeeper Key: SOLR-4718 URL: https://issues.apache.org/jira/browse/SOLR-4718 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 4.3, 5.0 Reporter: Erick Erickson Assignee: Erick Erickson Attachments: SOLR-4718.patch, SOLR-4718.patch So the near-final piece of this puzzle is to make solr.xml be storable in Zookeeper. Code-wise in terms of Solr, this doesn't look very difficult, I'm working on it now. More interesting is how to get the configuration into ZK in the first place, enhancements to ZkCli? Or boostrap-conf? Other? I'm punting on that for this patch. Second level is how to tell Solr to get the file from ZK. Some possibilities: 1 A system prop, -DzkSolrXmlPath=blah where blah is the path _on zk_ where the file is. Would require -DzkHost or -DzkRun as well. pros - simple, I can wrap my head around it. - easy to script cons - can't run multiple JVMs pointing to different files. Is this really a problem? 2 New solr.xml element. Something like: solr solrcloud str name=zkHostzkurl/str str name=zkSolrXmlPathwhatever/str /solrcloud solr Really, this form would hinge on the presence or absence of zkSolrXmlPath. If present, go up and look for the indicated solr.xml file on ZK. Any properties in the ZK version would overwrite anything in the local copy. NOTE: I'm really not very interested in supporting this as an option for old-style solr.xml unless it's _really_ easy. For instance, what if the local solr.xml is new-style and the one in ZK is old-style? Or vice-versa? Since old-style is going away, this doesn't seem like it's worth the effort. pros - No new mechanisms cons - once again requires that there be a solr.xml file on each client. Admittedly for installations that didn't care much about multiple JVMs, it could be a stock file that didn't change... For now, I'm going to just manually push solr.xml to ZK, then read it based on a sysprop. That'll get the structure in place while we debate. Not going to check this in until there's some consensus though. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5175) Add parameter to lower-bound TF normalization for BM25 (for long documents)
[ https://issues.apache.org/jira/browse/LUCENE-5175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738474#comment-13738474 ] Robert Muir commented on LUCENE-5175: - I can benchmark your patch with luceneutil Tom. I know this thing is sensitive for some reason. Really if there is a performance issue, worst case we can just call it BM25L or something? Thanks for doing this! Add parameter to lower-bound TF normalization for BM25 (for long documents) --- Key: LUCENE-5175 URL: https://issues.apache.org/jira/browse/LUCENE-5175 Project: Lucene - Core Issue Type: Improvement Components: core/search Reporter: Tom Burton-West Priority: Minor Attachments: LUCENE-5175.patch In the article When Documents Are Very Long, BM25 Fails! a fix for the problem is documented. There was a TODO note in BM25Similarity to add this fix. I will attach a patch that implements the fix shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5173) Add checkindex piece of LUCENE-5116
[ https://issues.apache.org/jira/browse/LUCENE-5173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-5173: Attachment: LUCENE-5173_ugly.patch here is a ugly patch, there must be a better way... sorry :) I wonder if its too paranoid: however playing with the old patch I think i hit my own assert with testThreadInterruptDeadLock... I will investigate that more, to see under what conditions we are doing these 0 doc merges today. Add checkindex piece of LUCENE-5116 --- Key: LUCENE-5173 URL: https://issues.apache.org/jira/browse/LUCENE-5173 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5173.patch, LUCENE-5173_ugly.patch LUCENE-5116 fixes addIndexes(Reader) to never write a 0-document segment (in the case you merge in empty or all-deleted stuff). I considered it just an inconsistency, but it could cause confusing exceptions to real users too if there was a regression here. (see solr users list:Split Shard Error - maxValue must be non-negative). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5175) Add parameter to lower-bound TF normalization for BM25 (for long documents)
[ https://issues.apache.org/jira/browse/LUCENE-5175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738627#comment-13738627 ] Tom Burton-West commented on LUCENE-5175: - Thanks Robert, In the article, they claim that the change doesn't have a performance impact. On the other hand, I'm not familiar enough with Java performance to be able to eyeball it, and it looks to me like we added one or more floating point operations, so it would be good to benchmark, especially since the scoring alg gets run against every hit, and we might have millions of hits for a poorly chosen query. (And if we switch to page-level indexing we could have hundreds of millions of hits). I was actually considering making it a subclass instead of just modifying BM25Similarity, so that it would be easy to benchmark, and if it turns out that there is a significant perf difference, that users could choose which implementation to use. I saw that computeWeight in BM25Similarity was final and decided I didn't know enough about why this is final to either refactor to create a base class, or change the method in order to subclass. Is luceneutil the same as lucene benchmark? I've been wanting to learn how to use lucene benchmark for some time. Tom Add parameter to lower-bound TF normalization for BM25 (for long documents) --- Key: LUCENE-5175 URL: https://issues.apache.org/jira/browse/LUCENE-5175 Project: Lucene - Core Issue Type: Improvement Components: core/search Reporter: Tom Burton-West Priority: Minor Attachments: LUCENE-5175.patch In the article When Documents Are Very Long, BM25 Fails! a fix for the problem is documented. There was a TODO note in BM25Similarity to add this fix. I will attach a patch that implements the fix shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4952) audit test configs to use solrconfig.snippet.randomindexconfig.xml in more tests
[ https://issues.apache.org/jira/browse/SOLR-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738640#comment-13738640 ] ASF subversion and git services commented on SOLR-4952: --- Commit 1513611 from hoss...@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1513611 ] SOLR-4952: get all manged schema tests using solrconfig.snippet.randomindexconfig.xml - mainly by removing several solrconfig-*-managed-schema.xml files and using sys props in solrconfig-managed-schema.xml audit test configs to use solrconfig.snippet.randomindexconfig.xml in more tests Key: SOLR-4952 URL: https://issues.apache.org/jira/browse/SOLR-4952 Project: Solr Issue Type: Sub-task Reporter: Hoss Man Assignee: Hoss Man in SOLR-4942 i updated every solrconfig.xml to either... * include solrconfig.snippet.randomindexconfig.xml where it was easy to do so * use the useCompoundFile sys prop if it already had an {{indexConfig}} section, or if including the snippet wasn't going to be easy (ie: contrib tests) As an improvment on this: * audit all core configs not already using solrconfig.snippet.randomindexconfig.xml and either: ** make them use it, ignoring any previously unimportant explicit incdexConfig settings ** make them use it, using explicit sys props to overwrite random values in cases were explicit indexConfig values are important for test ** add a comment why it's not using the include snippet in cases where the explicit parsing is part of hte test * try figure out a way for contrib tests to easily include the same file and/or apply the same rules as above -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5175) Add parameter to lower-bound TF normalization for BM25 (for long documents)
[ https://issues.apache.org/jira/browse/LUCENE-5175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738664#comment-13738664 ] Robert Muir commented on LUCENE-5175: - Hi Tom: I know for a fact i tried to remove the crazy cache (I created the monster) that this thing creates, and it always hurts performance for example. But I don't think we need to worry too much because: # We should benchmark it the way you have it first and just see what we are dealing with # IF there is a problem, we could try to open it up to subclassing better, maybe it even improves the API # There is also the option of just having specialized SimScorers for the delta=0 case. So I am confident we will find a good solution. As far as luceneutil we tried creating a README (http://code.google.com/a/apache-extras.org/p/luceneutil/source/browse/README.txt) to get started. The basic idea is you pull down 2 different checkouts of lucene-trunk and setup a competition between the two. There are two options important here: one is to set the similarity for each competitor, the other can disable score comparisons (I havent yet examined the patch to tell if they might differ slightly, e.g. order of floating point ops and stuff). But thats typically how i benchmark two Sim impls against each other. Add parameter to lower-bound TF normalization for BM25 (for long documents) --- Key: LUCENE-5175 URL: https://issues.apache.org/jira/browse/LUCENE-5175 Project: Lucene - Core Issue Type: Improvement Components: core/search Reporter: Tom Burton-West Priority: Minor Attachments: LUCENE-5175.patch In the article When Documents Are Very Long, BM25 Fails! a fix for the problem is documented. There was a TODO note in BM25Similarity to add this fix. I will attach a patch that implements the fix shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4856) ant eclipse is not generating .project file correctly on mac
[ https://issues.apache.org/jira/browse/SOLR-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738672#comment-13738672 ] Uwe Schindler commented on SOLR-4856: - Hi, could or be that the problem is because you check out from inside Eclipse using subclipse? Ant does not overwrite an already existing project file to no loose custom project settings. Make sure that after checkout all eclipse files are deleted, you can use ant clean-eclipse to do this. ant eclipse is not generating .project file correctly on mac Key: SOLR-4856 URL: https://issues.apache.org/jira/browse/SOLR-4856 Project: Solr Issue Type: Bug Components: Build Affects Versions: 4.4 Environment: Mac OS X 10.8.2 Eclipse Version: Juno Service Release 2 Build id: 20130225-0426 Reporter: Kranti Parisa Priority: Minor STEPS: - Checkout from the branch_4x (Using Subclipse inside Eclipse Juno) - On the Terminal (command line) ran ant eclipse - Generated the eclipse .project, .classpath, .settings files - Refresh the project in Eclipse (I can see the files in the Navigator View) along with the actual source code checked out from SVN - Open .project file and there is no buildSpec, natures elements in there - Hence not able to build it properly and use ctrl+clicks for the references I manually edited the .project file to have the following buildSpec buildCommand nameorg.eclipse.jdt.core.javabuilder/name arguments /arguments /buildCommand /buildSpec natures natureorg.eclipse.jdt.core.javanature/nature /natures Shouldn't it be automatically added to the .project file at the first place when we run ant eclipse ? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4856) ant eclipse is not generating .project file correctly on mac
[ https://issues.apache.org/jira/browse/SOLR-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738682#comment-13738682 ] Kranti Parisa commented on SOLR-4856: - yes, it's fine now. thanks. ant eclipse is not generating .project file correctly on mac Key: SOLR-4856 URL: https://issues.apache.org/jira/browse/SOLR-4856 Project: Solr Issue Type: Bug Components: Build Affects Versions: 4.4 Environment: Mac OS X 10.8.2 Eclipse Version: Juno Service Release 2 Build id: 20130225-0426 Reporter: Kranti Parisa Priority: Minor STEPS: - Checkout from the branch_4x (Using Subclipse inside Eclipse Juno) - On the Terminal (command line) ran ant eclipse - Generated the eclipse .project, .classpath, .settings files - Refresh the project in Eclipse (I can see the files in the Navigator View) along with the actual source code checked out from SVN - Open .project file and there is no buildSpec, natures elements in there - Hence not able to build it properly and use ctrl+clicks for the references I manually edited the .project file to have the following buildSpec buildCommand nameorg.eclipse.jdt.core.javabuilder/name arguments /arguments /buildCommand /buildSpec natures natureorg.eclipse.jdt.core.javanature/nature /natures Shouldn't it be automatically added to the .project file at the first place when we run ant eclipse ? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (SOLR-4856) ant eclipse is not generating .project file correctly on mac
[ https://issues.apache.org/jira/browse/SOLR-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kranti Parisa closed SOLR-4856. --- Resolution: Not A Problem it was an environmental issue ant eclipse is not generating .project file correctly on mac Key: SOLR-4856 URL: https://issues.apache.org/jira/browse/SOLR-4856 Project: Solr Issue Type: Bug Components: Build Affects Versions: 4.4 Environment: Mac OS X 10.8.2 Eclipse Version: Juno Service Release 2 Build id: 20130225-0426 Reporter: Kranti Parisa Priority: Minor STEPS: - Checkout from the branch_4x (Using Subclipse inside Eclipse Juno) - On the Terminal (command line) ran ant eclipse - Generated the eclipse .project, .classpath, .settings files - Refresh the project in Eclipse (I can see the files in the Navigator View) along with the actual source code checked out from SVN - Open .project file and there is no buildSpec, natures elements in there - Hence not able to build it properly and use ctrl+clicks for the references I manually edited the .project file to have the following buildSpec buildCommand nameorg.eclipse.jdt.core.javabuilder/name arguments /arguments /buildCommand /buildSpec natures natureorg.eclipse.jdt.core.javanature/nature /natures Shouldn't it be automatically added to the .project file at the first place when we run ant eclipse ? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4952) audit test configs to use solrconfig.snippet.randomindexconfig.xml in more tests
[ https://issues.apache.org/jira/browse/SOLR-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738687#comment-13738687 ] ASF subversion and git services commented on SOLR-4952: --- Commit 1513616 from hoss...@apache.org in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1513616 ] SOLR-4952: get all manged schema tests using solrconfig.snippet.randomindexconfig.xml - mainly by removing several solrconfig-*-managed-schema.xml files and using sys props in solrconfig-managed-schema.xml (merge r1513611) audit test configs to use solrconfig.snippet.randomindexconfig.xml in more tests Key: SOLR-4952 URL: https://issues.apache.org/jira/browse/SOLR-4952 Project: Solr Issue Type: Sub-task Reporter: Hoss Man Assignee: Hoss Man in SOLR-4942 i updated every solrconfig.xml to either... * include solrconfig.snippet.randomindexconfig.xml where it was easy to do so * use the useCompoundFile sys prop if it already had an {{indexConfig}} section, or if including the snippet wasn't going to be easy (ie: contrib tests) As an improvment on this: * audit all core configs not already using solrconfig.snippet.randomindexconfig.xml and either: ** make them use it, ignoring any previously unimportant explicit incdexConfig settings ** make them use it, using explicit sys props to overwrite random values in cases were explicit indexConfig values are important for test ** add a comment why it's not using the include snippet in cases where the explicit parsing is part of hte test * try figure out a way for contrib tests to easily include the same file and/or apply the same rules as above -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5122) spellcheck.collateMaxCollectDocs estimates seem to be meaninless -- can lead to ArithmeticException: / by zero
[ https://issues.apache.org/jira/browse/SOLR-5122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738708#comment-13738708 ] James Dyer commented on SOLR-5122: -- The scenarios tested in testEstimatedHitCounts() seem to always pick a collector that does not accept docs out-of-order (TopFieldCollector$OneComparatorNonScoringCollector). The problem looks like when a new segment/scorer is set, we get a new set of doc id's. So prior to random merges, the test naively assummed everything was on 1 segment. Now with multiple, all bets are off and I don't think we can be estimating hits. I think the best fix is to dial back the functionality here and not offer hit estimates at all. The functionality still would be beneficial in cases the user did not require hit-counts to be returned at all (for instance, ~rmuir mentioned using this feature with suggesters). Another option is to add together the doc ids for the various scorers that are looked at and pretend this is your max doc id. I'm torn here because I'd hate to remove functionality that has been released but on the other hand if it is always going to give lousy estimates then why fool people? Thoughts? spellcheck.collateMaxCollectDocs estimates seem to be meaninless -- can lead to ArithmeticException: / by zero Key: SOLR-5122 URL: https://issues.apache.org/jira/browse/SOLR-5122 Project: Solr Issue Type: Bug Affects Versions: 4.4 Reporter: Hoss Man Assignee: James Dyer Attachments: SOLR-5122.patch, SOLR-5122.patch As part of SOLR-4952 SpellCheckCollatorTest started using RandomMergePolicy, and this (aparently) led to a failure in testEstimatedHitCounts. As far as i can tell: the test assumes that specific values would be returned as the _estimated_ hits for a colleation, and it appears that the change in MergePolicy however resulted in different segments with different term stats, causing the estimation code to produce different values then what is expected. I made a quick attempt to improve the test to: * expect explicit exact values only when spellcheck.collateMaxCollectDocs is set such that the estimate' should actually be exact (ie: collateMaxCollectDocs == 0 or collateMaxCollectDocs greater then the num docs in the index * randomize the values used for collateMaxCollectDocs and confirm that the estimates are never more then the num docs in the index This lead to an odd ArithmeticException: / by zero error in the test, which seems to suggest that there is a genuine bug in the code for estimating the hits that only gets tickled in certain mergepolicy/segment/collateMaxCollectDocs combinations. *Update:* This appears to be a general problem with collecting docs out of order and the estimation of hits -- i believe even if there is no divide by zero error, the estimates are largely meaningless since the docs are collected out of order. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4856) ant eclipse is not generating .project file correctly on mac
[ https://issues.apache.org/jira/browse/SOLR-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738720#comment-13738720 ] Steve Rowe commented on SOLR-4856: -- Kranti, can you describe the environmental issue? Someone else encountering the problem might benefit. ant eclipse is not generating .project file correctly on mac Key: SOLR-4856 URL: https://issues.apache.org/jira/browse/SOLR-4856 Project: Solr Issue Type: Bug Components: Build Affects Versions: 4.4 Environment: Mac OS X 10.8.2 Eclipse Version: Juno Service Release 2 Build id: 20130225-0426 Reporter: Kranti Parisa Priority: Minor STEPS: - Checkout from the branch_4x (Using Subclipse inside Eclipse Juno) - On the Terminal (command line) ran ant eclipse - Generated the eclipse .project, .classpath, .settings files - Refresh the project in Eclipse (I can see the files in the Navigator View) along with the actual source code checked out from SVN - Open .project file and there is no buildSpec, natures elements in there - Hence not able to build it properly and use ctrl+clicks for the references I manually edited the .project file to have the following buildSpec buildCommand nameorg.eclipse.jdt.core.javabuilder/name arguments /arguments /buildCommand /buildSpec natures natureorg.eclipse.jdt.core.javanature/nature /natures Shouldn't it be automatically added to the .project file at the first place when we run ant eclipse ? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3076) Solr(Cloud) should support block joins
[ https://issues.apache.org/jira/browse/SOLR-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738729#comment-13738729 ] Uwe Schindler commented on SOLR-3076: - Hi Yonik, hi Mikhail, the committed version seems to be much better than the top-level cache one! Many, many thanks for committing that one! This was my only problem with it. But as you say, we should really work on getting Solr to no longer use top-level caches for filters and facets. Also Filters should not use OpenBitSet anymore, instead FixedBitSet or one of the new compressed bitsets (maybe off-heap). FixedBitSet is also better supported by internal APIs, as some algorithms can directly use it (e.g. in BooleanFilter), not sure if this is relevant for Solr. Solr(Cloud) should support block joins -- Key: SOLR-3076 URL: https://issues.apache.org/jira/browse/SOLR-3076 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Fix For: 4.5, 5.0 Attachments: 27M-singlesegment-histogram.png, 27M-singlesegment.png, bjq-vs-filters-backward-disi.patch, bjq-vs-filters-illegal-state.patch, child-bjqparser.patch, dih-3076.patch, dih-config.xml, parent-bjq-qparser.patch, parent-bjq-qparser.patch, Screen Shot 2012-07-17 at 1.12.11 AM.png, SOLR-3076-childDocs.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-7036-childDocs-solr-fork-trunk-patched, solrconf-bjq-erschema-snippet.xml, solrconfig.xml.patch, tochild-bjq-filtered-search-fix.patch Lucene has the ability to do block joins, we should add it to Solr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5141) the VelocityResponseWriter can't find lucene.IOUtils from within IntelliJ
Erick Erickson created SOLR-5141: Summary: the VelocityResponseWriter can't find lucene.IOUtils from within IntelliJ Key: SOLR-5141 URL: https://issues.apache.org/jira/browse/SOLR-5141 Project: Solr Issue Type: Improvement Reporter: Erick Erickson Assignee: Erick Erickson Fix For: 4.5, 5.0 IRC chat with Steve Rowe pointed me at how to fix this, will check in momentarily. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5141) the VelocityResponseWriter can't find lucene.IOUtils from within IntelliJ
[ https://issues.apache.org/jira/browse/SOLR-5141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738740#comment-13738740 ] ASF subversion and git services commented on SOLR-5141: --- Commit 1513628 from [~erickoerickson] in branch 'dev/trunk' [ https://svn.apache.org/r1513628 ] SOLR-5141. lucene.IOUtils needs to be available for VelocityResopnseWriter in IntelliJ the VelocityResponseWriter can't find lucene.IOUtils from within IntelliJ - Key: SOLR-5141 URL: https://issues.apache.org/jira/browse/SOLR-5141 Project: Solr Issue Type: Improvement Reporter: Erick Erickson Assignee: Erick Erickson Fix For: 4.5, 5.0 IRC chat with Steve Rowe pointed me at how to fix this, will check in momentarily. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4856) ant eclipse is not generating .project file correctly on mac
[ https://issues.apache.org/jira/browse/SOLR-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738752#comment-13738752 ] Kranti Parisa commented on SOLR-4856: - Earlier, I did checkout using Subclipse. But now, I tried on command line svn checkout and then did ant eclipse and the .project file does have the javabuilder commands. Not sure what was wrong with my eclipse environment for the Subclipse based checkout. Anyways, I think command line checkout is more safer and cleaner. ant eclipse is not generating .project file correctly on mac Key: SOLR-4856 URL: https://issues.apache.org/jira/browse/SOLR-4856 Project: Solr Issue Type: Bug Components: Build Affects Versions: 4.4 Environment: Mac OS X 10.8.2 Eclipse Version: Juno Service Release 2 Build id: 20130225-0426 Reporter: Kranti Parisa Priority: Minor STEPS: - Checkout from the branch_4x (Using Subclipse inside Eclipse Juno) - On the Terminal (command line) ran ant eclipse - Generated the eclipse .project, .classpath, .settings files - Refresh the project in Eclipse (I can see the files in the Navigator View) along with the actual source code checked out from SVN - Open .project file and there is no buildSpec, natures elements in there - Hence not able to build it properly and use ctrl+clicks for the references I manually edited the .project file to have the following buildSpec buildCommand nameorg.eclipse.jdt.core.javabuilder/name arguments /arguments /buildCommand /buildSpec natures natureorg.eclipse.jdt.core.javanature/nature /natures Shouldn't it be automatically added to the .project file at the first place when we run ant eclipse ? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4856) ant eclipse is not generating .project file correctly on mac
[ https://issues.apache.org/jira/browse/SOLR-4856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738758#comment-13738758 ] Uwe Schindler commented on SOLR-4856: - As I said before. Ant eclipse does not overwrite an already existing project file. When you check out from inside Eclipse one is generated by Eclipse itself. Use Ant clean-eclipse to remove all eclipse specific files from checkout before regenerating. ant eclipse is not generating .project file correctly on mac Key: SOLR-4856 URL: https://issues.apache.org/jira/browse/SOLR-4856 Project: Solr Issue Type: Bug Components: Build Affects Versions: 4.4 Environment: Mac OS X 10.8.2 Eclipse Version: Juno Service Release 2 Build id: 20130225-0426 Reporter: Kranti Parisa Priority: Minor STEPS: - Checkout from the branch_4x (Using Subclipse inside Eclipse Juno) - On the Terminal (command line) ran ant eclipse - Generated the eclipse .project, .classpath, .settings files - Refresh the project in Eclipse (I can see the files in the Navigator View) along with the actual source code checked out from SVN - Open .project file and there is no buildSpec, natures elements in there - Hence not able to build it properly and use ctrl+clicks for the references I manually edited the .project file to have the following buildSpec buildCommand nameorg.eclipse.jdt.core.javabuilder/name arguments /arguments /buildCommand /buildSpec natures natureorg.eclipse.jdt.core.javanature/nature /natures Shouldn't it be automatically added to the .project file at the first place when we run ant eclipse ? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
lazily-loaded cores and SolrCloud
There was a question on the user's list today about making lazily-loaded (aka transient) cores work with SolrCloud where I basically punted and said not designed with that in mind. I've kind of avoided thinking about this as the use-case; the transient code wasn't written with SolrCloud in mind. But what is the general reaction to that pairing? Mostly I'm looking for feedback at the level of no way that could work without invasive changes to SolrCloud, don't even go there or sure, just allow ZK to get a list of all cores and it'll be fine, the user is responsible for the quirks though. Some questions that come to my mind: Is a core that's not loaded be considered live by ZK? Would simply returning a list of all cores (both loaded and not loaded) be sufficient for ZK? (this list is already available so the admin UI can list all cores). Does SolrCloud distributed update processing go through (or could be made to go through) the path that autoloads a core? Ditto for querying. I suspect the answer to both is that it'll just happen. Would the idea of waiting for all the cores to load on all the nodes for an update be totally unacceptable? We already have the distributed deadlock potential, this seems to make that more likely by lengthening out the time the semaphore in question is held. Would re-synching/leader election be an absolute nightmare? I can imagine that if all the cores for a particular shard weren't loaded at startup, there'd be a terrible time waiting for leader election for instance. Stuff I haven't thought of Mostly I'm trying to get a sense of the community here about whether supporting transient cores in SolrCloud mode would be something that would be easy/do-able/really_hard/totally_unacceptable. Thanks, Erick
[jira] [Created] (SOLR-5142) Block Indexing / Join Improvements
Yonik Seeley created SOLR-5142: -- Summary: Block Indexing / Join Improvements Key: SOLR-5142 URL: https://issues.apache.org/jira/browse/SOLR-5142 Project: Solr Issue Type: Improvement Reporter: Yonik Seeley Fix For: 4.5, 5.0 Follow-on main issue for general block indexing / join improvements -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-3076) Solr(Cloud) should support block joins
[ https://issues.apache.org/jira/browse/SOLR-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley resolved SOLR-3076. Resolution: Fixed Closing... I opened SOLR-5142 for additional work. Solr(Cloud) should support block joins -- Key: SOLR-3076 URL: https://issues.apache.org/jira/browse/SOLR-3076 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Fix For: 4.5, 5.0 Attachments: 27M-singlesegment-histogram.png, 27M-singlesegment.png, bjq-vs-filters-backward-disi.patch, bjq-vs-filters-illegal-state.patch, child-bjqparser.patch, dih-3076.patch, dih-config.xml, parent-bjq-qparser.patch, parent-bjq-qparser.patch, Screen Shot 2012-07-17 at 1.12.11 AM.png, SOLR-3076-childDocs.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-7036-childDocs-solr-fork-trunk-patched, solrconf-bjq-erschema-snippet.xml, solrconfig.xml.patch, tochild-bjq-filtered-search-fix.patch Lucene has the ability to do block joins, we should add it to Solr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5141) the VelocityResponseWriter can't find lucene.IOUtils from within IntelliJ
[ https://issues.apache.org/jira/browse/SOLR-5141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738789#comment-13738789 ] ASF subversion and git services commented on SOLR-5141: --- Commit 1513640 from [~erickoerickson] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1513640 ] =SOLR-5141 the VelocityResponseWriter can't find lucene.IOUtils from within IntelliJ the VelocityResponseWriter can't find lucene.IOUtils from within IntelliJ - Key: SOLR-5141 URL: https://issues.apache.org/jira/browse/SOLR-5141 Project: Solr Issue Type: Improvement Reporter: Erick Erickson Assignee: Erick Erickson Fix For: 4.5, 5.0 IRC chat with Steve Rowe pointed me at how to fix this, will check in momentarily. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-5141) the VelocityResponseWriter can't find lucene.IOUtils from within IntelliJ
[ https://issues.apache.org/jira/browse/SOLR-5141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson resolved SOLR-5141. -- Resolution: Fixed Thanks for the pointers Steve! the VelocityResponseWriter can't find lucene.IOUtils from within IntelliJ - Key: SOLR-5141 URL: https://issues.apache.org/jira/browse/SOLR-5141 Project: Solr Issue Type: Improvement Reporter: Erick Erickson Assignee: Erick Erickson Fix For: 4.5, 5.0 IRC chat with Steve Rowe pointed me at how to fix this, will check in momentarily. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3076) Solr(Cloud) should support block joins
[ https://issues.apache.org/jira/browse/SOLR-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738801#comment-13738801 ] Mikhail Khludnev commented on SOLR-3076: Yonik, Thanks and Congratulations! Solr(Cloud) should support block joins -- Key: SOLR-3076 URL: https://issues.apache.org/jira/browse/SOLR-3076 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Fix For: 4.5, 5.0 Attachments: 27M-singlesegment-histogram.png, 27M-singlesegment.png, bjq-vs-filters-backward-disi.patch, bjq-vs-filters-illegal-state.patch, child-bjqparser.patch, dih-3076.patch, dih-config.xml, parent-bjq-qparser.patch, parent-bjq-qparser.patch, Screen Shot 2012-07-17 at 1.12.11 AM.png, SOLR-3076-childDocs.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-7036-childDocs-solr-fork-trunk-patched, solrconf-bjq-erschema-snippet.xml, solrconfig.xml.patch, tochild-bjq-filtered-search-fix.patch Lucene has the ability to do block joins, we should add it to Solr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (SOLR-4836) overwrite=true support for block updates
[ https://issues.apache.org/jira/browse/SOLR-4836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Khludnev closed SOLR-4836. -- Resolution: Won't Fix as far as I understand it's already covered by SOLR-3076 overwrite=true support for block updates Key: SOLR-4836 URL: https://issues.apache.org/jira/browse/SOLR-4836 Project: Solr Issue Type: Sub-task Components: update Reporter: Mikhail Khludnev Fix For: 4.5, 5.0 functional extension for SOLR-3076. I just want to propose an approach for the subj. We can treat uniqueKey as a key for whole block not for single document, sadly it's not really backward compatible. Otherwise we can introduce uniqueBlockKey tag in schema.xml. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5173) Add checkindex piece of LUCENE-5116
[ https://issues.apache.org/jira/browse/LUCENE-5173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-5173: Attachment: LUCENE-5173.patch Here's a cleaned up version... maybe its OK. As far as the stuff i saw with the first patch on this issue, maybe it was due to running tests from eclipse (I beasted TestIndexWriter with it from curiousity, but nothing came out)... its old news anyway I guess. Add checkindex piece of LUCENE-5116 --- Key: LUCENE-5173 URL: https://issues.apache.org/jira/browse/LUCENE-5173 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Attachments: LUCENE-5173.patch, LUCENE-5173.patch, LUCENE-5173_ugly.patch LUCENE-5116 fixes addIndexes(Reader) to never write a 0-document segment (in the case you merge in empty or all-deleted stuff). I considered it just an inconsistency, but it could cause confusing exceptions to real users too if there was a regression here. (see solr users list:Split Shard Error - maxValue must be non-negative). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: lazily-loaded cores and SolrCloud
At a high level, I think the idea is fine (and I've seen a number of people that wanted it). The question is more around one of implementation... would it make a mess of things or not. The answer to that I think is probably mostly related to issues around how zookeeper is currently handled. I don't see any issues with other things like spinning up a core when a request comes in for it. -Yonik http://lucidworks.com On Tue, Aug 13, 2013 at 4:26 PM, Erick Erickson erickerick...@gmail.com wrote: There was a question on the user's list today about making lazily-loaded (aka transient) cores work with SolrCloud where I basically punted and said not designed with that in mind. I've kind of avoided thinking about this as the use-case; the transient code wasn't written with SolrCloud in mind. But what is the general reaction to that pairing? Mostly I'm looking for feedback at the level of no way that could work without invasive changes to SolrCloud, don't even go there or sure, just allow ZK to get a list of all cores and it'll be fine, the user is responsible for the quirks though. Some questions that come to my mind: Is a core that's not loaded be considered live by ZK? Would simply returning a list of all cores (both loaded and not loaded) be sufficient for ZK? (this list is already available so the admin UI can list all cores). Does SolrCloud distributed update processing go through (or could be made to go through) the path that autoloads a core? Ditto for querying. I suspect the answer to both is that it'll just happen. Would the idea of waiting for all the cores to load on all the nodes for an update be totally unacceptable? We already have the distributed deadlock potential, this seems to make that more likely by lengthening out the time the semaphore in question is held. Would re-synching/leader election be an absolute nightmare? I can imagine that if all the cores for a particular shard weren't loaded at startup, there'd be a terrible time waiting for leader election for instance. Stuff I haven't thought of Mostly I'm trying to get a sense of the community here about whether supporting transient cores in SolrCloud mode would be something that would be easy/do-able/really_hard/totally_unacceptable. Thanks, Erick - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3076) Solr(Cloud) should support block joins
[ https://issues.apache.org/jira/browse/SOLR-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738797#comment-13738797 ] Yonik Seeley commented on SOLR-3076: bq. Filters should not use OpenBitSet anymore, instead FixedBitSet Hey, it isn't my fault that Lucene chose to fork OpenBitSet ;-) Solr(Cloud) should support block joins -- Key: SOLR-3076 URL: https://issues.apache.org/jira/browse/SOLR-3076 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Fix For: 4.5, 5.0 Attachments: 27M-singlesegment-histogram.png, 27M-singlesegment.png, bjq-vs-filters-backward-disi.patch, bjq-vs-filters-illegal-state.patch, child-bjqparser.patch, dih-3076.patch, dih-config.xml, parent-bjq-qparser.patch, parent-bjq-qparser.patch, Screen Shot 2012-07-17 at 1.12.11 AM.png, SOLR-3076-childDocs.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-7036-childDocs-solr-fork-trunk-patched, solrconf-bjq-erschema-snippet.xml, solrconfig.xml.patch, tochild-bjq-filtered-search-fix.patch Lucene has the ability to do block joins, we should add it to Solr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5175) Add parameter to lower-bound TF normalization for BM25 (for long documents)
[ https://issues.apache.org/jira/browse/LUCENE-5175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13738818#comment-13738818 ] Tom Burton-West commented on LUCENE-5175: - I wondered about that crazy cache, in that it makes the implementation dependent on the norms implementation. BTW: It looks to me with Lucene's default norms that there are only about 130 or so document lengths. If there is no boosting going on the byte value has to get to 124 for a doclenth = 1, so there are only 255-124 =131 possible different lengths. i=124 norm=1.0,doclen=1.0 Add parameter to lower-bound TF normalization for BM25 (for long documents) --- Key: LUCENE-5175 URL: https://issues.apache.org/jira/browse/LUCENE-5175 Project: Lucene - Core Issue Type: Improvement Components: core/search Reporter: Tom Burton-West Priority: Minor Attachments: LUCENE-5175.patch In the article When Documents Are Very Long, BM25 Fails! a fix for the problem is documented. There was a TODO note in BM25Similarity to add this fix. I will attach a patch that implements the fix shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4718) Allow solr.xml to be stored in zookeeper
[ https://issues.apache.org/jira/browse/SOLR-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-4718: - Attachment: SOLR-4718.patch I have to run right now, but is this what you two had in mind? Not all of the new tests run, but I have to leave for the evening and wanted to see if this is down the right path. Haven't dealt with the bytestream yet. Allow solr.xml to be stored in zookeeper Key: SOLR-4718 URL: https://issues.apache.org/jira/browse/SOLR-4718 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 4.3, 5.0 Reporter: Erick Erickson Assignee: Erick Erickson Attachments: SOLR-4718.patch, SOLR-4718.patch, SOLR-4718.patch So the near-final piece of this puzzle is to make solr.xml be storable in Zookeeper. Code-wise in terms of Solr, this doesn't look very difficult, I'm working on it now. More interesting is how to get the configuration into ZK in the first place, enhancements to ZkCli? Or boostrap-conf? Other? I'm punting on that for this patch. Second level is how to tell Solr to get the file from ZK. Some possibilities: 1 A system prop, -DzkSolrXmlPath=blah where blah is the path _on zk_ where the file is. Would require -DzkHost or -DzkRun as well. pros - simple, I can wrap my head around it. - easy to script cons - can't run multiple JVMs pointing to different files. Is this really a problem? 2 New solr.xml element. Something like: solr solrcloud str name=zkHostzkurl/str str name=zkSolrXmlPathwhatever/str /solrcloud solr Really, this form would hinge on the presence or absence of zkSolrXmlPath. If present, go up and look for the indicated solr.xml file on ZK. Any properties in the ZK version would overwrite anything in the local copy. NOTE: I'm really not very interested in supporting this as an option for old-style solr.xml unless it's _really_ easy. For instance, what if the local solr.xml is new-style and the one in ZK is old-style? Or vice-versa? Since old-style is going away, this doesn't seem like it's worth the effort. pros - No new mechanisms cons - once again requires that there be a solr.xml file on each client. Admittedly for installations that didn't care much about multiple JVMs, it could be a stock file that didn't change... For now, I'm going to just manually push solr.xml to ZK, then read it based on a sysprop. That'll get the structure in place while we debate. Not going to check this in until there's some consensus though. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4718) Allow solr.xml to be stored in zookeeper
[ https://issues.apache.org/jira/browse/SOLR-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-4718: - Attachment: SOLR-4718.patch Another version with a quick hack (rushing out the door, may be totally wrong!) for the bytestream stuff [~Alan Woodward] do you have a moment to check the refactoring of the bytestream? I haven't even run any tests on it, all I know is it compiles. Allow solr.xml to be stored in zookeeper Key: SOLR-4718 URL: https://issues.apache.org/jira/browse/SOLR-4718 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 4.3, 5.0 Reporter: Erick Erickson Assignee: Erick Erickson Attachments: SOLR-4718.patch, SOLR-4718.patch, SOLR-4718.patch, SOLR-4718.patch So the near-final piece of this puzzle is to make solr.xml be storable in Zookeeper. Code-wise in terms of Solr, this doesn't look very difficult, I'm working on it now. More interesting is how to get the configuration into ZK in the first place, enhancements to ZkCli? Or boostrap-conf? Other? I'm punting on that for this patch. Second level is how to tell Solr to get the file from ZK. Some possibilities: 1 A system prop, -DzkSolrXmlPath=blah where blah is the path _on zk_ where the file is. Would require -DzkHost or -DzkRun as well. pros - simple, I can wrap my head around it. - easy to script cons - can't run multiple JVMs pointing to different files. Is this really a problem? 2 New solr.xml element. Something like: solr solrcloud str name=zkHostzkurl/str str name=zkSolrXmlPathwhatever/str /solrcloud solr Really, this form would hinge on the presence or absence of zkSolrXmlPath. If present, go up and look for the indicated solr.xml file on ZK. Any properties in the ZK version would overwrite anything in the local copy. NOTE: I'm really not very interested in supporting this as an option for old-style solr.xml unless it's _really_ easy. For instance, what if the local solr.xml is new-style and the one in ZK is old-style? Or vice-versa? Since old-style is going away, this doesn't seem like it's worth the effort. pros - No new mechanisms cons - once again requires that there be a solr.xml file on each client. Admittedly for installations that didn't care much about multiple JVMs, it could be a stock file that didn't change... For now, I'm going to just manually push solr.xml to ZK, then read it based on a sysprop. That'll get the structure in place while we debate. Not going to check this in until there's some consensus though. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org