[jira] [Created] (LUCENE-5677) Simplify position handling in DefaultIndexingChain
Robert Muir created LUCENE-5677: --- Summary: Simplify position handling in DefaultIndexingChain Key: LUCENE-5677 URL: https://issues.apache.org/jira/browse/LUCENE-5677 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-5677.patch There are currently a ton of conditionals checking for various problems, as well as a horribly confusing unbalanced decrement + increment, and in general the code is a nightmare to follow. To make it worse, besides being confusing it doesnt handle all cases: e.g. a negative position increment gap from the analyzer will just result in total chaos (corruption etc). I think an easier way to implement this is to init fieldinvertstate.position to -1, and for the logic to be: {code} position += posincr; check that position = 0 position = lastPosition lastPosition = position; {code} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5677) Simplify position handling in DefaultIndexingChain
[ https://issues.apache.org/jira/browse/LUCENE-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-5677: Attachment: LUCENE-5677.patch heres a quick prototype. tests seem happy. Simplify position handling in DefaultIndexingChain -- Key: LUCENE-5677 URL: https://issues.apache.org/jira/browse/LUCENE-5677 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-5677.patch There are currently a ton of conditionals checking for various problems, as well as a horribly confusing unbalanced decrement + increment, and in general the code is a nightmare to follow. To make it worse, besides being confusing it doesnt handle all cases: e.g. a negative position increment gap from the analyzer will just result in total chaos (corruption etc). I think an easier way to implement this is to init fieldinvertstate.position to -1, and for the logic to be: {code} position += posincr; check that position = 0 position = lastPosition lastPosition = position; {code} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: BaseTokenStreamTestCase
Got it. Will do so, and amend my JIRA ticket to include this as well as tests. Thanks! On Sat, May 17, 2014 at 2:21 AM, Uwe Schindler u...@thetaphi.de wrote: Hi, you have to capture state on the first token before inserting new ones. When inserting a new token, **solely** call restoreState(); clearAttributes() is not needed before restoreState(). If you don’t do this, your filter will work incorrect if other filters come **after** it. The assertion in BaseTokenStreamTestCase is therefore correct and really mandatory. There are many filters that show how to do this token inserting correctly. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de *From:* Nitzan Shaked [mailto:nitzan.sha...@gmail.com] *Sent:* Friday, May 16, 2014 6:28 AM *To:* dev@lucene.apache.org *Subject:* BaseTokenStreamTestCase Hi all While writing the unit tests for a new token filter I came across an issue(?) with BaseTokenStreamTestCase.assertTokenStreamContents(): it goes to some length to assure that clearAttributes() was called for every token produced by the filter under test. I suppose this helps most of the time, but my filter produces sometimes more than 1 output token for a given input token. I don't want to care about what attributes the input token carries, and so don't clear attributes between producing the output tokens from a given input token: I only change the attributes I care about (in my case this is charTerm right now, and nothing else, not even positionIncrement). This makes my unit tests unable to use BaseTokenStreamTestCase.assertTokenStreamContents(). I certainly do not want to add a captureState() and clearAttributes() ; restoreState() calls just so I can pass the unit tests. I would rather change assertTokenStreamContents to support my use case, by adding a boolean and making the required changes everywhere else. Thoughts? Nitzan
[jira] [Commented] (LUCENE-5663) Fix FSDirectory.open API
[ https://issues.apache.org/jira/browse/LUCENE-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000695#comment-14000695 ] Uwe Schindler commented on LUCENE-5663: --- I think the main problem here is just the name open(). The problem here is that NIOFSDir.open() reads like open a NIOFSDir. I agree with Hoss, that this is the standard,well known factory pattern and this problem with it applies to other cases, too (you can also call {{Lucene43Codec.forName(Lucene3x)}} which is also bullshit. But here it is obvious from the method name that forName relates to a factory. So people should really listen to their Eclipse warning (better would be to have it as error and Java should not allow access to static methods on subclasses). The better fix is in my opinion to just rename the method to a better name: {{FSDirectory.newPlatformDefault(...);}} Then no need to shadow them and its more obvious, that this is a factory method. In 4.x we can still provide a deprecated open(), which is shadowed in the subclasses and throws UOE in subclasses. Fix FSDirectory.open API Key: LUCENE-5663 URL: https://issues.apache.org/jira/browse/LUCENE-5663 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Spinoff from LUCENE-5658 {quote} This does not use NIOFSDir! open() is a static factory method on FSDirectory, just inherited to NIOFSDirectory. {quote} I think its confusing we have this method on FSDirectory, so its visible in subclasses. We should at least consider doing this in another way so its not confusing. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4371) consider refactoring slicer to indexinput.slice
[ https://issues.apache.org/jira/browse/LUCENE-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000706#comment-14000706 ] Uwe Schindler commented on LUCENE-4371: --- Looks cool. I was a bit confused about ByteBufferIndexInput, because this one already has {{slice(...)}}. We should add {{@Override}} here, because it now implements abstract method. I still have to think if close works as expected, but this did not change as before. Maybe this is my misunderstanding, but it is really confusing: Slices are always closed by consumer code (not like clones) or not? If yes, all looks fine, but we should document this: clones do not need to be closed, but what about slices? I think we use the same FileDescriptor, so we also don't need to close the slices? consider refactoring slicer to indexinput.slice --- Key: LUCENE-4371 URL: https://issues.apache.org/jira/browse/LUCENE-4371 Project: Lucene - Core Issue Type: Task Reporter: Robert Muir Attachments: LUCENE-4371.patch, LUCENE-4371.patch, LUCENE-4371.patch, LUCENE-4371.patch From LUCENE-4364: {quote} In my opinion, we should maybe check, if we can remove the whole Slicer in all Indexinputs? Just make the slice(...) method return the current BufferedIndexInput-based one. This could be another issue, once this is in. {quote} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: svn commit: r1595425 - in /lucene/dev/branches/branch_4x: ./ lucene/ lucene/core/ lucene/core/src/test/org/apache/lucene/index/TestIndexWriterExceptions.java
Grrr, thanks Rob. Mike McCandless http://blog.mikemccandless.com On Sat, May 17, 2014 at 1:22 AM, rm...@apache.org wrote: Author: rmuir Date: Sat May 17 05:22:33 2014 New Revision: 1595425 URL: http://svn.apache.org/r1595425 Log: improve test Modified: lucene/dev/branches/branch_4x/ (props changed) lucene/dev/branches/branch_4x/lucene/ (props changed) lucene/dev/branches/branch_4x/lucene/core/ (props changed) lucene/dev/branches/branch_4x/lucene/core/src/test/org/apache/lucene/index/TestIndexWriterExceptions.java Modified: lucene/dev/branches/branch_4x/lucene/core/src/test/org/apache/lucene/index/TestIndexWriterExceptions.java URL: http://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/lucene/core/src/test/org/apache/lucene/index/TestIndexWriterExceptions.java?rev=1595425r1=1595424r2=1595425view=diff == --- lucene/dev/branches/branch_4x/lucene/core/src/test/org/apache/lucene/index/TestIndexWriterExceptions.java (original) +++ lucene/dev/branches/branch_4x/lucene/core/src/test/org/apache/lucene/index/TestIndexWriterExceptions.java Sat May 17 05:22:33 2014 @@ -36,6 +36,7 @@ import org.apache.lucene.analysis.MockTo import org.apache.lucene.analysis.Token; import org.apache.lucene.analysis.TokenFilter; import org.apache.lucene.analysis.TokenStream; +import org.apache.lucene.analysis.Analyzer.TokenStreamComponents; import org.apache.lucene.document.BinaryDocValuesField; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; @@ -1509,6 +1510,7 @@ public class TestIndexWriterExceptions e String value = null; doc.add(new StoredField(foo, value)); iw.addDocument(doc); + fail(didn't get expected exception); } catch (IllegalArgumentException expected) {} iw.close(); // make sure we see our good doc @@ -1532,6 +1534,7 @@ public class TestIndexWriterExceptions e // set to null value theField.setStringValue(null); iw.addDocument(doc); + fail(didn't get expected exception); } catch (IllegalArgumentException expected) {} iw.close(); // make sure we see our good doc @@ -1556,6 +1559,7 @@ public class TestIndexWriterExceptions e Field theField = new StoredField(foo, v); doc.add(theField); iw.addDocument(doc); + fail(didn't get expected exception); } catch (NullPointerException expected) {} iw.close(); // make sure we see our good doc @@ -1580,6 +1584,7 @@ public class TestIndexWriterExceptions e byte v[] = null; theField.setBytesValue(v); iw.addDocument(doc); + fail(didn't get expected exception); } catch (NullPointerException expected) {} iw.close(); // make sure we see our good doc @@ -1604,6 +1609,7 @@ public class TestIndexWriterExceptions e Field theField = new StoredField(foo, v); doc.add(theField); iw.addDocument(doc); + fail(didn't get expected exception); } catch (IllegalArgumentException expected) {} iw.close(); // make sure we see our good doc @@ -1628,6 +1634,7 @@ public class TestIndexWriterExceptions e BytesRef v = null; theField.setBytesValue(v); iw.addDocument(doc); + fail(didn't get expected exception); } catch (IllegalArgumentException expected) {} iw.close(); // make sure we see our good doc - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4371) consider refactoring slicer to indexinput.slice
[ https://issues.apache.org/jira/browse/LUCENE-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000715#comment-14000715 ] Uwe Schindler commented on LUCENE-4371: --- Btw, thanks for hiding and making the concrete FSDirIndexInputs hidden and especially final! Great step. The protected annoyed me for long time, but for backwards compatibility I never removed them (although I am sure nobody was ever able to subclass them correctly!). In ByteBufferIndexInput.slice() the return value is a package-protected class, so we should change this to the general IndexInput like in the abstract base class, otherwise the Javadocs will be look broken. This applies to the other classes and their clone(), too. The caller only needs the abstract IndexInput (especially if the impl class is invisible). consider refactoring slicer to indexinput.slice --- Key: LUCENE-4371 URL: https://issues.apache.org/jira/browse/LUCENE-4371 Project: Lucene - Core Issue Type: Task Reporter: Robert Muir Attachments: LUCENE-4371.patch, LUCENE-4371.patch, LUCENE-4371.patch, LUCENE-4371.patch From LUCENE-4364: {quote} In my opinion, we should maybe check, if we can remove the whole Slicer in all Indexinputs? Just make the slice(...) method return the current BufferedIndexInput-based one. This could be another issue, once this is in. {quote} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5677) Simplify position handling in DefaultIndexingChain
[ https://issues.apache.org/jira/browse/LUCENE-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000713#comment-14000713 ] Michael McCandless commented on LUCENE-5677: +1, much better! Simplify position handling in DefaultIndexingChain -- Key: LUCENE-5677 URL: https://issues.apache.org/jira/browse/LUCENE-5677 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-5677.patch There are currently a ton of conditionals checking for various problems, as well as a horribly confusing unbalanced decrement + increment, and in general the code is a nightmare to follow. To make it worse, besides being confusing it doesnt handle all cases: e.g. a negative position increment gap from the analyzer will just result in total chaos (corruption etc). I think an easier way to implement this is to init fieldinvertstate.position to -1, and for the logic to be: {code} position += posincr; check that position = 0 position = lastPosition lastPosition = position; {code} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Consolidate IndexWriter.deleteDocuments()
+1 Mike McCandless http://blog.mikemccandless.com On Fri, May 16, 2014 at 7:03 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : I was looking at IW.deleteDocs() API, and was wondering why do we have both : deleteDocuments(Term) and deleteDocuments(Term...). Why can't we have just : the vararg one? Same applies to deleteDocuments(Query). +1 I think those method signatures just haven't been cleaned up since the introduction of varags ? (ie: Lucene 2.9 was Java1.4 compatible and had Array versions of both of those methods instead of the more general vararg versions we have now) -Hoss http://www.lucidworks.com/ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4371) consider refactoring slicer to indexinput.slice
[ https://issues.apache.org/jira/browse/LUCENE-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000722#comment-14000722 ] Michael McCandless commented on LUCENE-4371: +1, this is an awesome simplification! consider refactoring slicer to indexinput.slice --- Key: LUCENE-4371 URL: https://issues.apache.org/jira/browse/LUCENE-4371 Project: Lucene - Core Issue Type: Task Reporter: Robert Muir Attachments: LUCENE-4371.patch, LUCENE-4371.patch, LUCENE-4371.patch, LUCENE-4371.patch From LUCENE-4364: {quote} In my opinion, we should maybe check, if we can remove the whole Slicer in all Indexinputs? Just make the slice(...) method return the current BufferedIndexInput-based one. This could be another issue, once this is in. {quote} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4371) consider refactoring slicer to indexinput.slice
[ https://issues.apache.org/jira/browse/LUCENE-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000737#comment-14000737 ] Robert Muir commented on LUCENE-4371: - {quote} We should add @Override here, because it now implements abstract method. {quote} Oh, thanks, I forgot this. {quote} I think we use the same FileDescriptor, so we also don't need to close the slices? {quote} Slices are just like clones. So for example CFSDirectory holds an input over the entire .cfs file, and when you ask to open a file within the cfs it returns a slice (clone) of it. when you close the cfs it closes the real one. {quote} In ByteBufferIndexInput.slice() the return value is a package-protected class, so we should change this to the general IndexInput like in the abstract base class, otherwise the Javadocs will be look broken. {quote} What javadocs? This is not a public class :) consider refactoring slicer to indexinput.slice --- Key: LUCENE-4371 URL: https://issues.apache.org/jira/browse/LUCENE-4371 Project: Lucene - Core Issue Type: Task Reporter: Robert Muir Attachments: LUCENE-4371.patch, LUCENE-4371.patch, LUCENE-4371.patch, LUCENE-4371.patch From LUCENE-4364: {quote} In my opinion, we should maybe check, if we can remove the whole Slicer in all Indexinputs? Just make the slice(...) method return the current BufferedIndexInput-based one. This could be another issue, once this is in. {quote} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: svn commit: r1595425 - in /lucene/dev/branches/branch_4x: ./ lucene/ lucene/core/ lucene/core/src/test/org/apache/lucene/index/TestIndexWriterExceptions.java
it was my bug, i recently added these tests On Sat, May 17, 2014 at 4:32 AM, Michael McCandless luc...@mikemccandless.com wrote: Grrr, thanks Rob. Mike McCandless http://blog.mikemccandless.com On Sat, May 17, 2014 at 1:22 AM, rm...@apache.org wrote: Author: rmuir Date: Sat May 17 05:22:33 2014 New Revision: 1595425 URL: http://svn.apache.org/r1595425 Log: improve test Modified: lucene/dev/branches/branch_4x/ (props changed) lucene/dev/branches/branch_4x/lucene/ (props changed) lucene/dev/branches/branch_4x/lucene/core/ (props changed) lucene/dev/branches/branch_4x/lucene/core/src/test/org/apache/lucene/index/TestIndexWriterExceptions.java Modified: lucene/dev/branches/branch_4x/lucene/core/src/test/org/apache/lucene/index/TestIndexWriterExceptions.java URL: http://svn.apache.org/viewvc/lucene/dev/branches/branch_4x/lucene/core/src/test/org/apache/lucene/index/TestIndexWriterExceptions.java?rev=1595425r1=1595424r2=1595425view=diff == --- lucene/dev/branches/branch_4x/lucene/core/src/test/org/apache/lucene/index/TestIndexWriterExceptions.java (original) +++ lucene/dev/branches/branch_4x/lucene/core/src/test/org/apache/lucene/index/TestIndexWriterExceptions.java Sat May 17 05:22:33 2014 @@ -36,6 +36,7 @@ import org.apache.lucene.analysis.MockTo import org.apache.lucene.analysis.Token; import org.apache.lucene.analysis.TokenFilter; import org.apache.lucene.analysis.TokenStream; +import org.apache.lucene.analysis.Analyzer.TokenStreamComponents; import org.apache.lucene.document.BinaryDocValuesField; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; @@ -1509,6 +1510,7 @@ public class TestIndexWriterExceptions e String value = null; doc.add(new StoredField(foo, value)); iw.addDocument(doc); + fail(didn't get expected exception); } catch (IllegalArgumentException expected) {} iw.close(); // make sure we see our good doc @@ -1532,6 +1534,7 @@ public class TestIndexWriterExceptions e // set to null value theField.setStringValue(null); iw.addDocument(doc); + fail(didn't get expected exception); } catch (IllegalArgumentException expected) {} iw.close(); // make sure we see our good doc @@ -1556,6 +1559,7 @@ public class TestIndexWriterExceptions e Field theField = new StoredField(foo, v); doc.add(theField); iw.addDocument(doc); + fail(didn't get expected exception); } catch (NullPointerException expected) {} iw.close(); // make sure we see our good doc @@ -1580,6 +1584,7 @@ public class TestIndexWriterExceptions e byte v[] = null; theField.setBytesValue(v); iw.addDocument(doc); + fail(didn't get expected exception); } catch (NullPointerException expected) {} iw.close(); // make sure we see our good doc @@ -1604,6 +1609,7 @@ public class TestIndexWriterExceptions e Field theField = new StoredField(foo, v); doc.add(theField); iw.addDocument(doc); + fail(didn't get expected exception); } catch (IllegalArgumentException expected) {} iw.close(); // make sure we see our good doc @@ -1628,6 +1634,7 @@ public class TestIndexWriterExceptions e BytesRef v = null; theField.setBytesValue(v); iw.addDocument(doc); + fail(didn't get expected exception); } catch (IllegalArgumentException expected) {} iw.close(); // make sure we see our good doc - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.7.0_55) - Build # 10320 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/10320/ Java: 32bit/jdk1.7.0_55 -client -XX:+UseG1GC 1 tests failed. REGRESSION: org.apache.lucene.uninverting.TestFieldCacheVsDocValues.testHugeBinaryValues Error Message: Stack Trace: java.lang.AssertionError at __randomizedtesting.SeedInfo.seed([B2F2E797848F720B:51C934959CBDA284]:0) at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertFalse(Assert.java:68) at org.junit.Assert.assertFalse(Assert.java:79) at org.apache.lucene.uninverting.TestFieldCacheVsDocValues.testHugeBinaryValues(TestFieldCacheVsDocValues.java:188) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:360) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:793) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:453) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:360) at java.lang.Thread.run(Thread.java:745) Build Log: [...truncated 8634 lines...] [junit4] Suite: org.apache.lucene.uninverting.TestFieldCacheVsDocValues [junit4] 2 NOTE: reproduce with: ant test -Dtestcase=TestFieldCacheVsDocValues -Dtests.method=testHugeBinaryValues -Dtests.seed=B2F2E797848F720B -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=sr_RS -Dtests.timezone=America/Cambridge_Bay -Dtests.file.encoding=UTF-8 [junit4] FAILURE
[jira] [Commented] (LUCENE-4371) consider refactoring slicer to indexinput.slice
[ https://issues.apache.org/jira/browse/LUCENE-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000748#comment-14000748 ] Uwe Schindler commented on LUCENE-4371: --- bq. What javadocs? This is not a public class You are right, because MMapIndexInput is private, too! consider refactoring slicer to indexinput.slice --- Key: LUCENE-4371 URL: https://issues.apache.org/jira/browse/LUCENE-4371 Project: Lucene - Core Issue Type: Task Reporter: Robert Muir Attachments: LUCENE-4371.patch, LUCENE-4371.patch, LUCENE-4371.patch, LUCENE-4371.patch From LUCENE-4364: {quote} In my opinion, we should maybe check, if we can remove the whole Slicer in all Indexinputs? Just make the slice(...) method return the current BufferedIndexInput-based one. This could be another issue, once this is in. {quote} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4371) consider refactoring slicer to indexinput.slice
[ https://issues.apache.org/jira/browse/LUCENE-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-4371: Attachment: LUCENE-4371.patch Added missing \@Override By the way, i noticed something when refactoring the code: slicing/cloning currently has no safety (except for MMAP). we should think about this for NIO/Simple too: simple range checks that the slice is in bounds and maybe that the channel.isOpen. CFSDir could check some of this too, because its handle is now an ordinary input. But i didn't want to stir up controversy in this refactor (it is unrelated to this patch). I think there is no performance impact of adding such checks to NIO/Simple because they already must suffer a buffer refill here anyway. So maybe we can just open a followup. consider refactoring slicer to indexinput.slice --- Key: LUCENE-4371 URL: https://issues.apache.org/jira/browse/LUCENE-4371 Project: Lucene - Core Issue Type: Task Reporter: Robert Muir Attachments: LUCENE-4371.patch, LUCENE-4371.patch, LUCENE-4371.patch, LUCENE-4371.patch, LUCENE-4371.patch From LUCENE-4364: {quote} In my opinion, we should maybe check, if we can remove the whole Slicer in all Indexinputs? Just make the slice(...) method return the current BufferedIndexInput-based one. This could be another issue, once this is in. {quote} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re:
For Sure. Lucene's explain is really expensive and is not purposed for production use, but only for rare troubleshooting. As a mitigation measure you can scroll result set by small portions more efficient like Hoss recently explained at SearchHub. In such kind of problems, usually it's possible to create sort of specialized custom collectors doing something particular. Have a god day! On Sat, May 17, 2014 at 3:01 AM, Tom Burton-West tburt...@umich.edu wrote: Hello all, I'm trying to get relevance scoring information for each of 1,000 docs returned for each of 250 queries.If I run the query (appended below) without debugQuery=on, I have no problem with getting all the results with under 4GB of memory use. If I add the parameter debugQuery=on, memory use goes up continuously and after about 20 queries (with 1,000 results each), memory use reaches about 29.1 GB and the garbage collector gives up: org.apache.solr.common.SolrException; null:java.lang.RuntimeException: java.lang.OutOfMemoryError: GC overhead limit exceeded I've attached a jmap -histo, exgerpt below. Is this a known issue with debugQuery? Tom query: q=Abraham+Lincolnfl=id,scoreindent=onwt=jsonstart=0rows=1000version=2.2 debugQuery=on without debugQuery=on: q=Abraham+Lincolnfl=id,scoreindent=onwt=jsonstart=0rows=1000version=2.2 num #instances#bytes Class description -- 1: 585,559 10,292,067,456 byte[] 2: 743,639 18,874,349,592 char[] 3: 53,821 91,936,328 long[] 4: 70,430 69,234,400 int[] 5: 51,348 27,111,744 org.apache.lucene.util.fst.FST$Arc[] 6: 286,357 20,617,704 org.apache.lucene.util.fst.FST$Arc 7: 715,364 17,168,736 java.lang.String 8: 79,561 12,547,792 * ConstMethodKlass 9: 18,909 11,404,696 short[] 10: 345,854 11,067,328 java.util.HashMap$Entry 11: 8,823 10,351,024 * ConstantPoolKlass 12: 79,561 10,193,328 * MethodKlass 13: 228,587 9,143,480 org.apache.lucene.document.FieldType 14: 228,584 9,143,360 org.apache.lucene.document.Field 15: 368,423 8,842,152 org.apache.lucene.util.BytesRef 16: 210,342 8,413,680 java.util.TreeMap$Entry 17: 81,576 8,204,648 java.util.HashMap$Entry[] 18: 107,921 7,770,312 org.apache.lucene.util.fst.FST$Arc 19: 13,020 6,874,560 org.apache.lucene.util.fst.FST$Arc[] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
[jira] [Updated] (LUCENE-5677) Simplify position handling in DefaultIndexingChain
[ https://issues.apache.org/jira/browse/LUCENE-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-5677: Attachment: LUCENE-5677.patch Slightly tweaked patch: just handles the offsets with the same logic for consistency, and adds a test for crazyOffsetGap This removes another conditional and just makes it simpler. I also pulled out the 'boost omitNorms check' into the caller, because its unrelated to inverting the tokenstream. We should try to keep invert() simple. Simplify position handling in DefaultIndexingChain -- Key: LUCENE-5677 URL: https://issues.apache.org/jira/browse/LUCENE-5677 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-5677.patch, LUCENE-5677.patch There are currently a ton of conditionals checking for various problems, as well as a horribly confusing unbalanced decrement + increment, and in general the code is a nightmare to follow. To make it worse, besides being confusing it doesnt handle all cases: e.g. a negative position increment gap from the analyzer will just result in total chaos (corruption etc). I think an easier way to implement this is to init fieldinvertstate.position to -1, and for the logic to be: {code} position += posincr; check that position = 0 position = lastPosition lastPosition = position; {code} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.7.0_55) - Build # 10320 - Failure!
Test bug: i committed a fix. On Sat, May 17, 2014 at 8:12 AM, Policeman Jenkins Server jenk...@thetaphi.de wrote: Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/10320/ Java: 32bit/jdk1.7.0_55 -client -XX:+UseG1GC 1 tests failed. REGRESSION: org.apache.lucene.uninverting.TestFieldCacheVsDocValues.testHugeBinaryValues Error Message: Stack Trace: java.lang.AssertionError at __randomizedtesting.SeedInfo.seed([B2F2E797848F720B:51C934959CBDA284]:0) at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertFalse(Assert.java:68) at org.junit.Assert.assertFalse(Assert.java:79) at org.apache.lucene.uninverting.TestFieldCacheVsDocValues.testHugeBinaryValues(TestFieldCacheVsDocValues.java:188) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:360) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:793) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:453) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:360) at java.lang.Thread.run(Thread.java:745) Build Log: [...truncated 8634 lines...] [junit4] Suite: org.apache.lucene.uninverting.TestFieldCacheVsDocValues [junit4] 2 NOTE: reproduce with: ant test -Dtestcase=TestFieldCacheVsDocValues
[jira] [Commented] (LUCENE-5677) Simplify position handling in DefaultIndexingChain
[ https://issues.apache.org/jira/browse/LUCENE-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000783#comment-14000783 ] ASF subversion and git services commented on LUCENE-5677: - Commit 1595469 from [~rcmuir] in branch 'dev/trunk' [ https://svn.apache.org/r1595469 ] LUCENE-5677: simplify position handling in DefaultIndexingChain Simplify position handling in DefaultIndexingChain -- Key: LUCENE-5677 URL: https://issues.apache.org/jira/browse/LUCENE-5677 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-5677.patch, LUCENE-5677.patch There are currently a ton of conditionals checking for various problems, as well as a horribly confusing unbalanced decrement + increment, and in general the code is a nightmare to follow. To make it worse, besides being confusing it doesnt handle all cases: e.g. a negative position increment gap from the analyzer will just result in total chaos (corruption etc). I think an easier way to implement this is to init fieldinvertstate.position to -1, and for the logic to be: {code} position += posincr; check that position = 0 position = lastPosition lastPosition = position; {code} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5666) Add UninvertingReader
[ https://issues.apache.org/jira/browse/LUCENE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000782#comment-14000782 ] David Smiley commented on LUCENE-5666: -- Oh, right. I'll repost it here for everyone's benefit: {noformat} * LUCENE-5666: Change uninverted access (sorting, faceting, grouping, etc) to use the DocValues API instead of FieldCache. For FieldCache functionality, use UninvertingReader in lucene/misc (or implement your own FilterReader). UninvertingReader is more efficient: supports multi-valued numeric fields, detects when a multi-valued field is single-valued, reuses caches of compatible types (e.g. SORTED also supports BINARY and SORTED_SET access without insanity). Insanity is no longer possible unless you explicitly want it. Rename FieldCache* and DocTermOrds* classes in the search package to DocValues*. Move SortedSetSortField to core and add SortedSetFieldSource to queries/, which takes the same selectors. Add helper methods to DocValues.java that are better suited for search code (never return null, etc). (Mike McCandless, Robert Muir) {noformat} I looked up DocValues which is new to me but the commit message references LUCENE-5573 which seems mis-attributed. I'm kinda surprised FieldCache isn't deprecated. It could be marked \@lucene.internal. At least... it's name doesn't seem appropriate anymore. Maybe UninvertedCache. But perhaps a rename like that would introduce too much change for now, even though it's trunk. It could use some javadocs stating that DocValues.java should generally be used instead. Add UninvertingReader - Key: LUCENE-5666 URL: https://issues.apache.org/jira/browse/LUCENE-5666 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Fix For: 5.0 Attachments: LUCENE-5666.patch Currently the fieldcache is not pluggable at all. It would be better if everything used the docvalues apis. This would allow people to customize the implementation, extend the classes with custom subclasses with additional stuff, etc etc. FieldCache can be accessed via the docvalues apis, using the FilterReader api. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5677) Simplify position handling in DefaultIndexingChain
[ https://issues.apache.org/jira/browse/LUCENE-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000786#comment-14000786 ] ASF subversion and git services commented on LUCENE-5677: - Commit 1595475 from [~rcmuir] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1595475 ] LUCENE-5677: simplify position handling in DefaultIndexingChain Simplify position handling in DefaultIndexingChain -- Key: LUCENE-5677 URL: https://issues.apache.org/jira/browse/LUCENE-5677 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Fix For: 4.9, 5.0 Attachments: LUCENE-5677.patch, LUCENE-5677.patch There are currently a ton of conditionals checking for various problems, as well as a horribly confusing unbalanced decrement + increment, and in general the code is a nightmare to follow. To make it worse, besides being confusing it doesnt handle all cases: e.g. a negative position increment gap from the analyzer will just result in total chaos (corruption etc). I think an easier way to implement this is to init fieldinvertstate.position to -1, and for the logic to be: {code} position += posincr; check that position = 0 position = lastPosition lastPosition = position; {code} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5677) Simplify position handling in DefaultIndexingChain
[ https://issues.apache.org/jira/browse/LUCENE-5677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-5677. - Resolution: Fixed Fix Version/s: 5.0 4.9 Simplify position handling in DefaultIndexingChain -- Key: LUCENE-5677 URL: https://issues.apache.org/jira/browse/LUCENE-5677 Project: Lucene - Core Issue Type: Bug Reporter: Robert Muir Fix For: 4.9, 5.0 Attachments: LUCENE-5677.patch, LUCENE-5677.patch There are currently a ton of conditionals checking for various problems, as well as a horribly confusing unbalanced decrement + increment, and in general the code is a nightmare to follow. To make it worse, besides being confusing it doesnt handle all cases: e.g. a negative position increment gap from the analyzer will just result in total chaos (corruption etc). I think an easier way to implement this is to init fieldinvertstate.position to -1, and for the logic to be: {code} position += posincr; check that position = 0 position = lastPosition lastPosition = position; {code} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5666) Add UninvertingReader
[ https://issues.apache.org/jira/browse/LUCENE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000784#comment-14000784 ] Robert Muir commented on LUCENE-5666: - I think you missed the point. it does not have any javadocs: its package private. Add UninvertingReader - Key: LUCENE-5666 URL: https://issues.apache.org/jira/browse/LUCENE-5666 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Fix For: 5.0 Attachments: LUCENE-5666.patch Currently the fieldcache is not pluggable at all. It would be better if everything used the docvalues apis. This would allow people to customize the implementation, extend the classes with custom subclasses with additional stuff, etc etc. FieldCache can be accessed via the docvalues apis, using the FilterReader api. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4371) consider refactoring slicer to indexinput.slice
[ https://issues.apache.org/jira/browse/LUCENE-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-4371. - Resolution: Fixed Fix Version/s: 5.0 consider refactoring slicer to indexinput.slice --- Key: LUCENE-4371 URL: https://issues.apache.org/jira/browse/LUCENE-4371 Project: Lucene - Core Issue Type: Task Reporter: Robert Muir Fix For: 5.0 Attachments: LUCENE-4371.patch, LUCENE-4371.patch, LUCENE-4371.patch, LUCENE-4371.patch, LUCENE-4371.patch From LUCENE-4364: {quote} In my opinion, we should maybe check, if we can remove the whole Slicer in all Indexinputs? Just make the slice(...) method return the current BufferedIndexInput-based one. This could be another issue, once this is in. {quote} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4371) consider refactoring slicer to indexinput.slice
[ https://issues.apache.org/jira/browse/LUCENE-4371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000794#comment-14000794 ] ASF subversion and git services commented on LUCENE-4371: - Commit 1595480 from [~rcmuir] in branch 'dev/trunk' [ https://svn.apache.org/r1595480 ] LUCENE-4371: Replace IndexInputSlicer with IndexInput.slice consider refactoring slicer to indexinput.slice --- Key: LUCENE-4371 URL: https://issues.apache.org/jira/browse/LUCENE-4371 Project: Lucene - Core Issue Type: Task Reporter: Robert Muir Fix For: 5.0 Attachments: LUCENE-4371.patch, LUCENE-4371.patch, LUCENE-4371.patch, LUCENE-4371.patch, LUCENE-4371.patch From LUCENE-4364: {quote} In my opinion, we should maybe check, if we can remove the whole Slicer in all Indexinputs? Just make the slice(...) method return the current BufferedIndexInput-based one. This could be another issue, once this is in. {quote} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5678) Investigate to use FileoutputStream instead of RandomAccessFile in FSIndexOutput
Uwe Schindler created LUCENE-5678: - Summary: Investigate to use FileoutputStream instead of RandomAccessFile in FSIndexOutput Key: LUCENE-5678 URL: https://issues.apache.org/jira/browse/LUCENE-5678 Project: Lucene - Core Issue Type: Bug Components: core/store Reporter: Uwe Schindler Assignee: Uwe Schindler We no longer allow seeking in IndexOutput, so there is no need to use RandomAccessFile. We can change this with a 1 KiB patch. Further improvements would be to merge this with OutputStreamIndexOutput, so we get many simplifications. There is also no reason anymore to separate DataOutput from IndexOutput. The only additional thing is IndexOutput#getFilePointer(), which is handled by an internal counter (does not use getFilePointer of the underlying RAF). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5678) Investigate to use FileoutputStream instead of RandomAccessFile in FSIndexOutput
[ https://issues.apache.org/jira/browse/LUCENE-5678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-5678: -- Attachment: LUCENE-5678.patch Very simple patch. [~mikemccand]: It would be good to compare performance as a first review. We can then merge this with OutputStreamDataOutput. An alternative would be to nuke BufferedIndexOutput completely and use BufferedOutputStream! Investigate to use FileoutputStream instead of RandomAccessFile in FSIndexOutput Key: LUCENE-5678 URL: https://issues.apache.org/jira/browse/LUCENE-5678 Project: Lucene - Core Issue Type: Bug Components: core/store Reporter: Uwe Schindler Assignee: Uwe Schindler Attachments: LUCENE-5678.patch We no longer allow seeking in IndexOutput, so there is no need to use RandomAccessFile. We can change this with a 1 KiB patch. Further improvements would be to merge this with OutputStreamIndexOutput, so we get many simplifications. There is also no reason anymore to separate DataOutput from IndexOutput. The only additional thing is IndexOutput#getFilePointer(), which is handled by an internal counter (does not use getFilePointer of the underlying RAF). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2809) searcher leases
[ https://issues.apache.org/jira/browse/SOLR-2809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated SOLR-2809: --- Attachment: SOLR-2809.patch OK, here's a proof-of-concept lease manager that implements the core functionality. It's nice and small since it just uses the existing searcher management code. The remaining work would be integration HTTP-API: - SolrCore would have a LeaseManager - if a lease key is passed in, look up the searcher in the lease manager rather than getting the most recently registered searcher - at the end of a request, do the lease if requested, and return the lease key to the client searcher leases --- Key: SOLR-2809 URL: https://issues.apache.org/jira/browse/SOLR-2809 Project: Solr Issue Type: New Feature Reporter: Yonik Seeley Attachments: SOLR-2809.patch Leases/reservations on searcher instances would give us the ability to use the same searcher across phases of a distributed search, or for clients to send multiple requests and have them hit a consistent/unchanging view of the index. The latter requires something extra to ensure that the load balancer contacts the same replicas as before. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5970) Create collection API always has status 0
[ https://issues.apache.org/jira/browse/SOLR-5970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000814#comment-14000814 ] Mark Miller commented on SOLR-5970: --- Collections API responses really need an overhaul I think. One of those things that has gotten no real attention. Very hard to process the response currently I think. I do think we need fine grained results available of some kind, unless we change how things work - for instance, you can create a collection and it fails to create on 4 nodes and succeeds on 3 - that collection will exist regardless the way things currently work - it just won't be what you wanted. That's a lot more effort to improve I think, but an all or nothing system would be nicer at some point IMO. Create collection API always has status 0 - Key: SOLR-5970 URL: https://issues.apache.org/jira/browse/SOLR-5970 Project: Solr Issue Type: Bug Reporter: Abraham Elmahrek The responses below are from a successful create collection API (https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-CreateormodifyanAliasforaCollection) call and an unsuccessful create collection API call. It seems the 'status' is always 0. Success: {u'responseHeader': {u'status': 0, u'QTime': 4421}, u'success': {u'': {u'core': u'test1_shard1_replica1', u'responseHeader': {u'status': 0, u'QTime': 3449 Failure: {u'failure': {u'': uorg.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error CREATEing SolrCore 'test43_shard1_replica1': Unable to create core: test43_shard1_replica1 Caused by: Could not find configName for collection test43 found:[test1]}, u'responseHeader': {u'status': 0, u'QTime': 17149} } It seems like the status should be 400 or something similar for an unsuccessful attempt? -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re:
Thanks Mikhail, I understand its expensive, but it appears that it is not freeing up memory after each debugQuery is run. That seems like it should be avoidable (I say that without having looked at the code). Should I open a JIRA about a possible memory leak? Tom On Sat, May 17, 2014 at 8:20 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: For Sure. Lucene's explain is really expensive and is not purposed for production use, but only for rare troubleshooting. As a mitigation measure you can scroll result set by small portions more efficient like Hoss recently explained at SearchHub. In such kind of problems, usually it's possible to create sort of specialized custom collectors doing something particular. Have a god day! On Sat, May 17, 2014 at 3:01 AM, Tom Burton-West tburt...@umich.eduwrote: Hello all, I'm trying to get relevance scoring information for each of 1,000 docs returned for each of 250 queries.If I run the query (appended below) without debugQuery=on, I have no problem with getting all the results with under 4GB of memory use. If I add the parameter debugQuery=on, memory use goes up continuously and after about 20 queries (with 1,000 results each), memory use reaches about 29.1 GB and the garbage collector gives up: org.apache.solr.common.SolrException; null:java.lang.RuntimeException: java.lang.OutOfMemoryError: GC overhead limit exceeded I've attached a jmap -histo, exgerpt below. Is this a known issue with debugQuery? Tom query: q=Abraham+Lincolnfl=id,scoreindent=onwt=jsonstart=0rows=1000version=2.2 debugQuery=on without debugQuery=on: q=Abraham+Lincolnfl=id,scoreindent=onwt=jsonstart=0rows=1000version=2.2 num #instances#bytes Class description -- 1: 585,559 10,292,067,456 byte[] 2: 743,639 18,874,349,592 char[] 3: 53,821 91,936,328 long[] 4: 70,430 69,234,400 int[] 5: 51,348 27,111,744 org.apache.lucene.util.fst.FST$Arc[] 6: 286,357 20,617,704 org.apache.lucene.util.fst.FST$Arc 7: 715,364 17,168,736 java.lang.String 8: 79,561 12,547,792 * ConstMethodKlass 9: 18,909 11,404,696 short[] 10: 345,854 11,067,328 java.util.HashMap$Entry 11: 8,823 10,351,024 * ConstantPoolKlass 12: 79,561 10,193,328 * MethodKlass 13: 228,587 9,143,480 org.apache.lucene.document.FieldType 14: 228,584 9,143,360 org.apache.lucene.document.Field 15: 368,423 8,842,152 org.apache.lucene.util.BytesRef 16: 210,342 8,413,680 java.util.TreeMap$Entry 17: 81,576 8,204,648 java.util.HashMap$Entry[] 18: 107,921 7,770,312 org.apache.lucene.util.fst.FST$Arc 19: 13,020 6,874,560 org.apache.lucene.util.fst.FST$Arc[] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re:
On Sat, May 17, 2014 at 12:11 PM, Tom Burton-West tburt...@umich.edu wrote: I understand its expensive, but it appears that it is not freeing up memory after each debugQuery is run. That seems like it should be avoidable (I say that without having looked at the code). Should I open a JIRA about a possible memory leak? Yes, please do! -Yonik http://heliosearch.org - facet functions, subfacets, off-heap filtersfieldcache - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5678) Investigate to use FileoutputStream instead of RandomAccessFile in FSIndexOutput
[ https://issues.apache.org/jira/browse/LUCENE-5678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-5678: -- Description: We no longer allow seeking in IndexOutput, so there is no need to use RandomAccessFile. We can change this with a 1 KiB patch. Further improvements would be to merge this with OutputStreamIndexOutput, so we get many simplifications. There is also no reason anymore to separate DataOutput from IndexOutput. The only additional thing is IndexOutput#getFilePointer(), which is handled by an internal counter (does not use getFilePointer of the underlying RAF) and checksums. was: We no longer allow seeking in IndexOutput, so there is no need to use RandomAccessFile. We can change this with a 1 KiB patch. Further improvements would be to merge this with OutputStreamIndexOutput, so we get many simplifications. There is also no reason anymore to separate DataOutput from IndexOutput. The only additional thing is IndexOutput#getFilePointer(), which is handled by an internal counter (does not use getFilePointer of the underlying RAF). Investigate to use FileoutputStream instead of RandomAccessFile in FSIndexOutput Key: LUCENE-5678 URL: https://issues.apache.org/jira/browse/LUCENE-5678 Project: Lucene - Core Issue Type: Bug Components: core/store Reporter: Uwe Schindler Assignee: Uwe Schindler Attachments: LUCENE-5678.patch We no longer allow seeking in IndexOutput, so there is no need to use RandomAccessFile. We can change this with a 1 KiB patch. Further improvements would be to merge this with OutputStreamIndexOutput, so we get many simplifications. There is also no reason anymore to separate DataOutput from IndexOutput. The only additional thing is IndexOutput#getFilePointer(), which is handled by an internal counter (does not use getFilePointer of the underlying RAF) and checksums. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-5678) Investigate to use FileoutputStream instead of RandomAccessFile in FSIndexOutput
[ https://issues.apache.org/jira/browse/LUCENE-5678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000801#comment-14000801 ] Uwe Schindler edited comment on LUCENE-5678 at 5/17/14 5:56 PM: Very simple patch. [~mikemccand]: It would be good to compare performance as a first review. We can then merge this with OutputStreamDataOutput. An alternative would be to nuke BufferedIndexOutput completely and use BufferedOutputStream in combinations with java.util.zip.CheckedOutputStream (for the checksum)! was (Author: thetaphi): Very simple patch. [~mikemccand]: It would be good to compare performance as a first review. We can then merge this with OutputStreamDataOutput. An alternative would be to nuke BufferedIndexOutput completely and use BufferedOutputStream! Investigate to use FileoutputStream instead of RandomAccessFile in FSIndexOutput Key: LUCENE-5678 URL: https://issues.apache.org/jira/browse/LUCENE-5678 Project: Lucene - Core Issue Type: Bug Components: core/store Reporter: Uwe Schindler Assignee: Uwe Schindler Attachments: LUCENE-5678.patch We no longer allow seeking in IndexOutput, so there is no need to use RandomAccessFile. We can change this with a 1 KiB patch. Further improvements would be to merge this with OutputStreamIndexOutput, so we get many simplifications. There is also no reason anymore to separate DataOutput from IndexOutput. The only additional thing is IndexOutput#getFilePointer(), which is handled by an internal counter (does not use getFilePointer of the underlying RAF) and checksums. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5675) ID postings format
[ https://issues.apache.org/jira/browse/LUCENE-5675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000850#comment-14000850 ] ASF subversion and git services commented on LUCENE-5675: - Commit 1595530 from [~mikemccand] in branch 'dev/branches/lucene5675' [ https://svn.apache.org/r1595530 ] LUCENE-5675: checkpoint current dirty state ID postings format Key: LUCENE-5675 URL: https://issues.apache.org/jira/browse/LUCENE-5675 Project: Lucene - Core Issue Type: New Feature Reporter: Robert Muir Today the primary key lookup in lucene is not that great for systems like solr and elasticsearch that have versioning in front of IndexWriter. To some extend BlockTree can sometimes help avoid seeks by telling you the term does not exist for a segment. But this technique (based on FST prefix) is fragile. The only other choice today is bloom filters, which use up huge amounts of memory. I don't think we are using everything we know: particularly the version semantics. Instead, if the FST for the terms index used an algebra that represents the max version for any subtree, we might be able to answer that there is no term T with version V in that segment very efficiently. Also ID fields dont need postings lists, they dont need stats like docfreq/totaltermfreq, etc this stuff is all implicit. As far as API, i think for users to provide IDs with versions to such a PF, a start would to set a payload or whatever on the term field to get it thru indexwriter to the codec. And a consumer of the codec can just cast the Terms to a subclass that exposes the FST to do this version check efficiently. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5678) Investigate to use FileoutputStream instead of RandomAccessFile in FSIndexOutput
[ https://issues.apache.org/jira/browse/LUCENE-5678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-5678: -- Attachment: (was: LUCENE-5678.patch) Investigate to use FileoutputStream instead of RandomAccessFile in FSIndexOutput Key: LUCENE-5678 URL: https://issues.apache.org/jira/browse/LUCENE-5678 Project: Lucene - Core Issue Type: Bug Components: core/store Reporter: Uwe Schindler Assignee: Uwe Schindler Attachments: LUCENE-5678.patch, LUCENE-5678.patch We no longer allow seeking in IndexOutput, so there is no need to use RandomAccessFile. We can change this with a 1 KiB patch. Further improvements would be to merge this with OutputStreamIndexOutput, so we get many simplifications. There is also no reason anymore to separate DataOutput from IndexOutput. The only additional thing is IndexOutput#getFilePointer(), which is handled by an internal counter (does not use getFilePointer of the underlying RAF) and checksums. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5678) Investigate to use FileoutputStream instead of RandomAccessFile in FSIndexOutput
[ https://issues.apache.org/jira/browse/LUCENE-5678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000867#comment-14000867 ] Michael McCandless commented on LUCENE-5678: I tested index time for full Wikipedia; it's output intensive, and it looks like no perf change w/ the patch, though the numbers are a little noisy from run to run ... Investigate to use FileoutputStream instead of RandomAccessFile in FSIndexOutput Key: LUCENE-5678 URL: https://issues.apache.org/jira/browse/LUCENE-5678 Project: Lucene - Core Issue Type: Bug Components: core/store Reporter: Uwe Schindler Assignee: Uwe Schindler Attachments: LUCENE-5678.patch We no longer allow seeking in IndexOutput, so there is no need to use RandomAccessFile. We can change this with a 1 KiB patch. Further improvements would be to merge this with OutputStreamIndexOutput, so we get many simplifications. There is also no reason anymore to separate DataOutput from IndexOutput. The only additional thing is IndexOutput#getFilePointer(), which is handled by an internal counter (does not use getFilePointer of the underlying RAF) and checksums. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5678) Investigate to use FileoutputStream instead of RandomAccessFile in FSIndexOutput
[ https://issues.apache.org/jira/browse/LUCENE-5678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000885#comment-14000885 ] Michael McCandless commented on LUCENE-5678: Indexing perf of new patch looks fine too! Investigate to use FileoutputStream instead of RandomAccessFile in FSIndexOutput Key: LUCENE-5678 URL: https://issues.apache.org/jira/browse/LUCENE-5678 Project: Lucene - Core Issue Type: Bug Components: core/store Reporter: Uwe Schindler Assignee: Uwe Schindler Attachments: LUCENE-5678.patch, LUCENE-5678.patch We no longer allow seeking in IndexOutput, so there is no need to use RandomAccessFile. We can change this with a 1 KiB patch. Further improvements would be to merge this with OutputStreamIndexOutput, so we get many simplifications. There is also no reason anymore to separate DataOutput from IndexOutput. The only additional thing is IndexOutput#getFilePointer(), which is handled by an internal counter (does not use getFilePointer of the underlying RAF) and checksums. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5678) Investigate to use FileoutputStream instead of RandomAccessFile in FSIndexOutput
[ https://issues.apache.org/jira/browse/LUCENE-5678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-5678: -- Attachment: LUCENE-5678.patch Investigate to use FileoutputStream instead of RandomAccessFile in FSIndexOutput Key: LUCENE-5678 URL: https://issues.apache.org/jira/browse/LUCENE-5678 Project: Lucene - Core Issue Type: Bug Components: core/store Reporter: Uwe Schindler Assignee: Uwe Schindler Attachments: LUCENE-5678.patch, LUCENE-5678.patch We no longer allow seeking in IndexOutput, so there is no need to use RandomAccessFile. We can change this with a 1 KiB patch. Further improvements would be to merge this with OutputStreamIndexOutput, so we get many simplifications. There is also no reason anymore to separate DataOutput from IndexOutput. The only additional thing is IndexOutput#getFilePointer(), which is handled by an internal counter (does not use getFilePointer of the underlying RAF) and checksums. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5678) Investigate to use FileoutputStream instead of RandomAccessFile in FSIndexOutput
[ https://issues.apache.org/jira/browse/LUCENE-5678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-5678: -- Attachment: LUCENE-5678.patch Hi, I cleaned up most of the code. This now makes BufferedIndexOutput obsolete (once I fixed RateLimiter, which buffers a second time!!! horrible!!!). But before I do this, we should check the perf, because this is now completely different code. I also fixed HdfsDirectory to use this new class, too. The only remaining use of BufferedIndexOutput is in RateLimiter. I think we should plug the rate limiter deeper on the OutputStream level in future (subclass BufferedOutputStream to limit rate) and allow to plug that into the FSDir impl. Investigate to use FileoutputStream instead of RandomAccessFile in FSIndexOutput Key: LUCENE-5678 URL: https://issues.apache.org/jira/browse/LUCENE-5678 Project: Lucene - Core Issue Type: Bug Components: core/store Reporter: Uwe Schindler Assignee: Uwe Schindler Attachments: LUCENE-5678.patch, LUCENE-5678.patch We no longer allow seeking in IndexOutput, so there is no need to use RandomAccessFile. We can change this with a 1 KiB patch. Further improvements would be to merge this with OutputStreamIndexOutput, so we get many simplifications. There is also no reason anymore to separate DataOutput from IndexOutput. The only additional thing is IndexOutput#getFilePointer(), which is handled by an internal counter (does not use getFilePointer of the underlying RAF) and checksums. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5650) When mixing adds and deletes, it appears there is a corner case where peersync can bring back a deleted update.
[ https://issues.apache.org/jira/browse/SOLR-5650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000891#comment-14000891 ] ASF subversion and git services commented on SOLR-5650: --- Commit 1595547 from [~rjernst] in branch 'dev/branches/lucene5650' [ https://svn.apache.org/r1595547 ] SOLR-5650: add changes entries When mixing adds and deletes, it appears there is a corner case where peersync can bring back a deleted update. --- Key: SOLR-5650 URL: https://issues.apache.org/jira/browse/SOLR-5650 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Mark Miller Assignee: Mark Miller Fix For: 4.7, 5.0 Attachments: SOLR-5650.patch, SOLR-5650.patch, solr.log.tar.gz -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5650) createTempDir and associated functions no longer create java.io.tmpdir
[ https://issues.apache.org/jira/browse/LUCENE-5650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000889#comment-14000889 ] Ryan Ernst commented on LUCENE-5650: Sorry about that. The nocommit was left by mistake. The failure was a goof on my part. I've put a fix for it in the branch. createTempDir and associated functions no longer create java.io.tmpdir -- Key: LUCENE-5650 URL: https://issues.apache.org/jira/browse/LUCENE-5650 Project: Lucene - Core Issue Type: Improvement Components: general/test Reporter: Ryan Ernst Assignee: Dawid Weiss Priority: Minor Fix For: 4.9, 5.0 Attachments: LUCENE-5650.patch, LUCENE-5650.patch, LUCENE-5650.patch, LUCENE-5650.patch The recent refactoring to all the create temp file/dir functions (which is great!) has a minor regression from what existed before. With the old {{LuceneTestCase.TEMP_DIR}}, the directory was created if it did not exist. So, if you set {{java.io.tmpdir}} to {{./temp}}, then it would create that dir within the per jvm working dir. However, {{getBaseTempDirForClass()}} now does asserts that check the dir exists, is a dir, and is writeable. Lucene uses {{.}} as {{java.io.tmpdir}}. Then in the test security manager, the per jvm cwd has read/write/execute permissions. However, this allows tests to write to their cwd, which I'm trying to protect against (by setting cwd to read/execute in my test security manager). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5650) createTempDir and associated functions no longer create java.io.tmpdir
[ https://issues.apache.org/jira/browse/LUCENE-5650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000888#comment-14000888 ] ASF subversion and git services commented on LUCENE-5650: - Commit 1595546 from [~rjernst] in branch 'dev/branches/lucene5650' [ https://svn.apache.org/r1595546 ] LUCENE-5650: fix some solr tests createTempDir and associated functions no longer create java.io.tmpdir -- Key: LUCENE-5650 URL: https://issues.apache.org/jira/browse/LUCENE-5650 Project: Lucene - Core Issue Type: Improvement Components: general/test Reporter: Ryan Ernst Assignee: Dawid Weiss Priority: Minor Fix For: 4.9, 5.0 Attachments: LUCENE-5650.patch, LUCENE-5650.patch, LUCENE-5650.patch, LUCENE-5650.patch The recent refactoring to all the create temp file/dir functions (which is great!) has a minor regression from what existed before. With the old {{LuceneTestCase.TEMP_DIR}}, the directory was created if it did not exist. So, if you set {{java.io.tmpdir}} to {{./temp}}, then it would create that dir within the per jvm working dir. However, {{getBaseTempDirForClass()}} now does asserts that check the dir exists, is a dir, and is writeable. Lucene uses {{.}} as {{java.io.tmpdir}}. Then in the test security manager, the per jvm cwd has read/write/execute permissions. However, this allows tests to write to their cwd, which I'm trying to protect against (by setting cwd to read/execute in my test security manager). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5678) Investigate to use FileoutputStream instead of RandomAccessFile in FSIndexOutput
[ https://issues.apache.org/jira/browse/LUCENE-5678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-5678: -- Attachment: LUCENE-5678.patch New patch to make sure BufferedOutputStream is flushed on close(), not ignoring Exceptions. Investigate to use FileoutputStream instead of RandomAccessFile in FSIndexOutput Key: LUCENE-5678 URL: https://issues.apache.org/jira/browse/LUCENE-5678 Project: Lucene - Core Issue Type: Bug Components: core/store Reporter: Uwe Schindler Assignee: Uwe Schindler Attachments: LUCENE-5678.patch, LUCENE-5678.patch, LUCENE-5678.patch We no longer allow seeking in IndexOutput, so there is no need to use RandomAccessFile. We can change this with a 1 KiB patch. Further improvements would be to merge this with OutputStreamIndexOutput, so we get many simplifications. There is also no reason anymore to separate DataOutput from IndexOutput. The only additional thing is IndexOutput#getFilePointer(), which is handled by an internal counter (does not use getFilePointer of the underlying RAF) and checksums. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5675) ID postings format
[ https://issues.apache.org/jira/browse/LUCENE-5675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000896#comment-14000896 ] ASF subversion and git services commented on LUCENE-5675: - Commit 1595548 from [~mikemccand] in branch 'dev/branches/lucene5675' [ https://svn.apache.org/r1595548 ] LUCENE-5675: testRandom seems to be passing ID postings format Key: LUCENE-5675 URL: https://issues.apache.org/jira/browse/LUCENE-5675 Project: Lucene - Core Issue Type: New Feature Reporter: Robert Muir Today the primary key lookup in lucene is not that great for systems like solr and elasticsearch that have versioning in front of IndexWriter. To some extend BlockTree can sometimes help avoid seeks by telling you the term does not exist for a segment. But this technique (based on FST prefix) is fragile. The only other choice today is bloom filters, which use up huge amounts of memory. I don't think we are using everything we know: particularly the version semantics. Instead, if the FST for the terms index used an algebra that represents the max version for any subtree, we might be able to answer that there is no term T with version V in that segment very efficiently. Also ID fields dont need postings lists, they dont need stats like docfreq/totaltermfreq, etc this stuff is all implicit. As far as API, i think for users to provide IDs with versions to such a PF, a start would to set a payload or whatever on the term field to get it thru indexwriter to the codec. And a consumer of the codec can just cast the Terms to a subclass that exposes the FST to do this version check efficiently. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5650) createTempDir and associated functions no longer create java.io.tmpdir
[ https://issues.apache.org/jira/browse/LUCENE-5650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000902#comment-14000902 ] ASF subversion and git services commented on LUCENE-5650: - Commit 1595551 from [~rjernst] in branch 'dev/branches/lucene5650' [ https://svn.apache.org/r1595551 ] LUCENE-5650: fix one more solr test createTempDir and associated functions no longer create java.io.tmpdir -- Key: LUCENE-5650 URL: https://issues.apache.org/jira/browse/LUCENE-5650 Project: Lucene - Core Issue Type: Improvement Components: general/test Reporter: Ryan Ernst Assignee: Dawid Weiss Priority: Minor Fix For: 4.9, 5.0 Attachments: LUCENE-5650.patch, LUCENE-5650.patch, LUCENE-5650.patch, LUCENE-5650.patch The recent refactoring to all the create temp file/dir functions (which is great!) has a minor regression from what existed before. With the old {{LuceneTestCase.TEMP_DIR}}, the directory was created if it did not exist. So, if you set {{java.io.tmpdir}} to {{./temp}}, then it would create that dir within the per jvm working dir. However, {{getBaseTempDirForClass()}} now does asserts that check the dir exists, is a dir, and is writeable. Lucene uses {{.}} as {{java.io.tmpdir}}. Then in the test security manager, the per jvm cwd has read/write/execute permissions. However, this allows tests to write to their cwd, which I'm trying to protect against (by setting cwd to read/execute in my test security manager). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5678) Investigate to use FileoutputStream instead of RandomAccessFile in FSIndexOutput
[ https://issues.apache.org/jira/browse/LUCENE-5678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-5678: -- Attachment: LUCENE-5678.patch New patch, now fixes RateLimiter and nukes BufferedIndexOutput. The Ratelimiter was quite easy to fix. I only changed the single-byte write to not always call the volatile read on the getMinPauseCheckBytes() volatile. By this small change we no longer need to double buffer using BufferedIndexOutput. I think this should be fine now. Investigate to use FileoutputStream instead of RandomAccessFile in FSIndexOutput Key: LUCENE-5678 URL: https://issues.apache.org/jira/browse/LUCENE-5678 Project: Lucene - Core Issue Type: Bug Components: core/store Reporter: Uwe Schindler Assignee: Uwe Schindler Attachments: LUCENE-5678.patch, LUCENE-5678.patch, LUCENE-5678.patch, LUCENE-5678.patch We no longer allow seeking in IndexOutput, so there is no need to use RandomAccessFile. We can change this with a 1 KiB patch. Further improvements would be to merge this with OutputStreamIndexOutput, so we get many simplifications. There is also no reason anymore to separate DataOutput from IndexOutput. The only additional thing is IndexOutput#getFilePointer(), which is handled by an internal counter (does not use getFilePointer of the underlying RAF) and checksums. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5666) Add UninvertingReader
[ https://issues.apache.org/jira/browse/LUCENE-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000961#comment-14000961 ] David Smiley commented on LUCENE-5666: -- Ok, I see that now; it's good. Add UninvertingReader - Key: LUCENE-5666 URL: https://issues.apache.org/jira/browse/LUCENE-5666 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir Fix For: 5.0 Attachments: LUCENE-5666.patch Currently the fieldcache is not pluggable at all. It would be better if everything used the docvalues apis. This would allow people to customize the implementation, extend the classes with custom subclasses with additional stuff, etc etc. FieldCache can be accessed via the docvalues apis, using the FilterReader api. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5854) facet.limit can limit the output of facet.pivot when facet.sort is on
[ https://issues.apache.org/jira/browse/SOLR-5854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000992#comment-14000992 ] Brett Lucey commented on SOLR-5854: --- Hmn. I think you might be using the facet.sort parameter incorrectly. If you visit http://wiki.apache.org/solr/SimpleFacetParameters#facet.sort it states that the two expected values are count or index. I've been in the facet code somewhat recently and haven't seen anything that would imply that what you are trying to do with facet.sort would work. facet.limit can limit the output of facet.pivot when facet.sort is on - Key: SOLR-5854 URL: https://issues.apache.org/jira/browse/SOLR-5854 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.4 Reporter: Gennaro Frazzingaro Given the query {code} { facet:true, facet.pivot:field1,field2, facet.pivot.mincount:1, facet.sort:field1 asc, field2 asc, q:, rows:1000, start:0, } {code} not all results are returned. Removing facet.sort or setting facet.limit=-1 corrects the problem -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5079) Create ngroups for pivot faceting
[ https://issues.apache.org/jira/browse/SOLR-5079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000993#comment-14000993 ] Brett Lucey commented on SOLR-5079: --- I don't think this patch would work (as is) if the dataset is sharded. Addressing the issue in a sharded dataset could be somewhat challenging as well, since you would need some mechanism to avoid double-counting a value that might be present on more than one shard. Have you considered this use case? Create ngroups for pivot faceting - Key: SOLR-5079 URL: https://issues.apache.org/jira/browse/SOLR-5079 Project: Solr Issue Type: Improvement Affects Versions: 4.5, 5.0 Reporter: Sandro Mario Zbinden Labels: facet, pivot Attachments: SOLR-5079.patch Original Estimate: 4h Remaining Estimate: 4h To save network traffic it would be great to now how many entries a facet list contains without loading the complete facet list. This issue is created because of an out of memory in loading the pivot facet with facet.limit set to -1. The facet.pivot result would then look like q=facet.pivot=cat,id*facet.pivot.ngroup=true* {code:xml} arr name=cat,id lst str name=fieldcat/str str name=valuea/str int name=count20/int arr name=pivot lst str name=fieldid/str int name=value69/int int name=count10/int /lst lst str name=fieldid/str int name=value71/int int name=count10/int /lst int name=ngroup2/int !-- The new ngroup parm -- /lst /arr {code} If you add another new param for example facet.pivot.visible the result could create less traffic especially if there are a lot of ids and the param facet.limit=-1 is set q=facet.pivot=cat,id*facet.ngroup=truef.id.facet.pivot.visible=false* {code:xml} arr name=cat,id lst str name=fieldcat/str str name=valuea/str int name=count20/int !-- No pivot list of id -- int name=ngroup2/int /lst /arr {code} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5650) createTempDir and associated functions no longer create java.io.tmpdir
[ https://issues.apache.org/jira/browse/LUCENE-5650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000997#comment-14000997 ] ASF subversion and git services commented on LUCENE-5650: - Commit 1595562 from [~rjernst] in branch 'dev/branches/lucene5650' [ https://svn.apache.org/r1595562 ] LUCENE-5650: fix solrj test createTempDir and associated functions no longer create java.io.tmpdir -- Key: LUCENE-5650 URL: https://issues.apache.org/jira/browse/LUCENE-5650 Project: Lucene - Core Issue Type: Improvement Components: general/test Reporter: Ryan Ernst Assignee: Dawid Weiss Priority: Minor Fix For: 4.9, 5.0 Attachments: LUCENE-5650.patch, LUCENE-5650.patch, LUCENE-5650.patch, LUCENE-5650.patch The recent refactoring to all the create temp file/dir functions (which is great!) has a minor regression from what existed before. With the old {{LuceneTestCase.TEMP_DIR}}, the directory was created if it did not exist. So, if you set {{java.io.tmpdir}} to {{./temp}}, then it would create that dir within the per jvm working dir. However, {{getBaseTempDirForClass()}} now does asserts that check the dir exists, is a dir, and is writeable. Lucene uses {{.}} as {{java.io.tmpdir}}. Then in the test security manager, the per jvm cwd has read/write/execute permissions. However, this allows tests to write to their cwd, which I'm trying to protect against (by setting cwd to read/execute in my test security manager). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5679) Consolidate IndexWriter.deleteDocuments()
Shai Erera created LUCENE-5679: -- Summary: Consolidate IndexWriter.deleteDocuments() Key: LUCENE-5679 URL: https://issues.apache.org/jira/browse/LUCENE-5679 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Shai Erera Spinoff from here: http://markmail.org/message/7kjlaizqdh7kst4d. We should consolidate the various IW.deleteDocuments(). -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org