[jira] [Commented] (LUCENE-5513) Binary DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13938977#comment-13938977 ] ASF subversion and git services commented on LUCENE-5513: - Commit 1578784 from [~shaie] in branch 'dev/trunk' [ https://svn.apache.org/r1578784 ] LUCENE-5513: add IndexWriter.updateBinaryDocValue Binary DocValues Updates Key: LUCENE-5513 URL: https://issues.apache.org/jira/browse/LUCENE-5513 Project: Lucene - Core Issue Type: Wish Components: core/index Reporter: Mikhail Khludnev Priority: Minor Attachments: LUCENE-5513.patch, LUCENE-5513.patch, LUCENE-5513.patch, LUCENE-5513.patch LUCENE-5189 was a great move toward. I wish to continue. The reason for having this feature is to have join-index - to write children docnums into parent's binaryDV. I can try to proceed the implementation, but I'm not so experienced in such deep Lucene internals. [~shaie], any hint to begin with is much appreciated. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5513) Binary DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13938992#comment-13938992 ] ASF subversion and git services commented on LUCENE-5513: - Commit 1578790 from [~shaie] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1578790 ] LUCENE-5513: add IndexWriter.updateBinaryDocValue Binary DocValues Updates Key: LUCENE-5513 URL: https://issues.apache.org/jira/browse/LUCENE-5513 Project: Lucene - Core Issue Type: Wish Components: core/index Reporter: Mikhail Khludnev Priority: Minor Attachments: LUCENE-5513.patch, LUCENE-5513.patch, LUCENE-5513.patch, LUCENE-5513.patch LUCENE-5189 was a great move toward. I wish to continue. The reason for having this feature is to have join-index - to write children docnums into parent's binaryDV. I can try to proceed the implementation, but I'm not so experienced in such deep Lucene internals. [~shaie], any hint to begin with is much appreciated. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5513) Binary DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13939017#comment-13939017 ] ASF subversion and git services commented on LUCENE-5513: - Commit 1578803 from [~shaie] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1578803 ] LUCENE-5513: suppress codecs in test Binary DocValues Updates Key: LUCENE-5513 URL: https://issues.apache.org/jira/browse/LUCENE-5513 Project: Lucene - Core Issue Type: Wish Components: core/index Reporter: Mikhail Khludnev Assignee: Shai Erera Priority: Minor Fix For: 4.8, 5.0 Attachments: LUCENE-5513.patch, LUCENE-5513.patch, LUCENE-5513.patch, LUCENE-5513.patch LUCENE-5189 was a great move toward. I wish to continue. The reason for having this feature is to have join-index - to write children docnums into parent's binaryDV. I can try to proceed the implementation, but I'm not so experienced in such deep Lucene internals. [~shaie], any hint to begin with is much appreciated. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5513) Binary DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13939088#comment-13939088 ] ASF subversion and git services commented on LUCENE-5513: - Commit 1578831 from [~shaie] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1578831 ] LUCENE-5513: fix bad svn merge Binary DocValues Updates Key: LUCENE-5513 URL: https://issues.apache.org/jira/browse/LUCENE-5513 Project: Lucene - Core Issue Type: Wish Components: core/index Reporter: Mikhail Khludnev Assignee: Shai Erera Priority: Minor Fix For: 4.8, 5.0 Attachments: LUCENE-5513.patch, LUCENE-5513.patch, LUCENE-5513.patch, LUCENE-5513.patch LUCENE-5189 was a great move toward. I wish to continue. The reason for having this feature is to have join-index - to write children docnums into parent's binaryDV. I can try to proceed the implementation, but I'm not so experienced in such deep Lucene internals. [~shaie], any hint to begin with is much appreciated. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5513) Binary DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13938009#comment-13938009 ] Michael McCandless commented on LUCENE-5513: Hmm, on beasting (TestSortingMergePolicy.testSortingMP -seed 2B748835E48BB14A -jvms 6 -noc) I hit this failure: {noformat} .mar 17, 2014 5:41:41 EM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException VARNING: Uncaught exception in thread: Thread[Lucene Merge Thread #8,6,TGRP-TestSortingMergePolicy] org.apache.lucene.index.MergePolicy$MergeException: java.lang.NullPointerException at __randomizedtesting.SeedInfo.seed([2B748835E48BB14A]:0) at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:545) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:518) Caused by: java.lang.NullPointerException at org.apache.lucene.index.IndexWriter.commitMergedDeletesAndUpdates(IndexWriter.java:3468) at org.apache.lucene.index.IndexWriter.commitMerge(IndexWriter.java:3530) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4222) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3679) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:405) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:482) EENOTE: download the large Jenkins line-docs file by running 'ant get-jenkins-line-docs' in the lucene directory. NOTE: reproduce with: ant test -Dtestcase=TestSortingMergePolicy -Dtests.method=testSortingMP -Dtests.seed=2B748835E48BB14A -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt -Dtests.locale=sv_SE -Dtests.timezone=Africa/Malabo -Dtests.file.encoding=UTF-8 NOTE: test params are: codec=Lucene46: {s=PostingsFormat(name=FSTPulsing41)}, docValues:{ndv=DocValuesFormat(name=SimpleText)}, sim=RandomSimilarityProvider(queryNorm=true,coord=yes): {}, locale=sv_SE, timezone=Africa/Malabo NOTE: Linux 3.5.0-47-generic amd64/Oracle Corporation 1.7.0_60-ea (64-bit)/cpus=8,threads=1,free=401611472,total=515375104 NOTE: All tests run in this JVM: [TestSortingMergePolicy] Time: 2.051 There were 2 failures: 1) testSortingMP(org.apache.lucene.index.sorter.TestSortingMergePolicy) java.lang.AssertionError: ndv(89)=8960834324998763998,ndv(90)=-6235091358187467651 at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.assertTrue(Assert.java:43) at org.apache.lucene.index.sorter.TestSortingMergePolicy.assertSorted(TestSortingMergePolicy.java:162) at org.apache.lucene.index.sorter.TestSortingMergePolicy.testSortingMP(TestSortingMergePolicy.java:171) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1617) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:826) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:862) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:876) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:359) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:783) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:443) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:835) at
[jira] [Commented] (LUCENE-5513) Binary DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13938416#comment-13938416 ] Michael McCandless commented on LUCENE-5513: +1 to commit... patch looks great. And beasting isn't uncovering any more failures... Binary DocValues Updates Key: LUCENE-5513 URL: https://issues.apache.org/jira/browse/LUCENE-5513 Project: Lucene - Core Issue Type: Wish Components: core/index Reporter: Mikhail Khludnev Priority: Minor Attachments: LUCENE-5513.patch, LUCENE-5513.patch, LUCENE-5513.patch, LUCENE-5513.patch LUCENE-5189 was a great move toward. I wish to continue. The reason for having this feature is to have join-index - to write children docnums into parent's binaryDV. I can try to proceed the implementation, but I'm not so experienced in such deep Lucene internals. [~shaie], any hint to begin with is much appreciated. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5513) Binary DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13938808#comment-13938808 ] Shai Erera commented on LUCENE-5513: Thanks Mike, will wrap up and commit. One thing I wanted to note, and specifically emphasized in javadocs, is that IW.updateBinaryDocValue *replaces* the existing byte[] value of all affected documents. We could also easily implement an _append_ type of update, where the given bytes are appended to all affected documents. It's only a matter of defining that on the update itself and in ReaderAndUpdates, instead of overriding a document's value, we read its current value from the reader and append the new bytes. Unlike NDV updates, append for Binary (and SortedSet) has more value, since it lets you add values to documents whose existing values may not be currently identical, where the current implementation ignores _all_ existing values and makes all affected documents identical. Perhaps it's acceptable, depending on the nature of the update (e.g. update by PK), but I think we should explore adding update capabilities to Binary and SortedSet DV. And also the IW.update API to allow updating by more than just Term, e.g. this thread: http://markmail.org/message/2wmpvksuwc5t57pg. These are all for separate issues though. Binary DocValues Updates Key: LUCENE-5513 URL: https://issues.apache.org/jira/browse/LUCENE-5513 Project: Lucene - Core Issue Type: Wish Components: core/index Reporter: Mikhail Khludnev Priority: Minor Attachments: LUCENE-5513.patch, LUCENE-5513.patch, LUCENE-5513.patch, LUCENE-5513.patch LUCENE-5189 was a great move toward. I wish to continue. The reason for having this feature is to have join-index - to write children docnums into parent's binaryDV. I can try to proceed the implementation, but I'm not so experienced in such deep Lucene internals. [~shaie], any hint to begin with is much appreciated. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [jira] [Commented] (LUCENE-5513) Binary DocValues Updates
SHL 13.03.2014 2:00 пользователь Shai Erera (JIRA) j...@apache.org написал: [ https://issues.apache.org/jira/browse/LUCENE-5513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13932464#comment-13932464] Shai Erera commented on LUCENE-5513: [~mkhludnev], I started working on this and made some progress. But I've identified a need to do some refactoring to how the updates are represented internally today in order to keep the code (and more importantly, me!) sane. So let me know if you've started to work on it as well, so we can sync. Binary DocValues Updates Key: LUCENE-5513 URL: https://issues.apache.org/jira/browse/LUCENE-5513 Project: Lucene - Core Issue Type: Wish Components: core/index Reporter: Mikhail Khludnev Priority: Minor LUCENE-5189 was a great move toward. I wish to continue. The reason for having this feature is to have join-index - to write children docnums into parent's binaryDV. I can try to proceed the implementation, but I'm not so experienced in such deep Lucene internals. [~shaie], any hint to begin with is much appreciated. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5513) Binary DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13932942#comment-13932942 ] Mikhail Khludnev commented on LUCENE-5513: -- [~shaie] I started too. However, I can't spend much time, and have no deep understanding of the core. For a while I copied testSimple() from Numeric DV update, check that it's red. Now, I'm coming through layers, mostly coping Numeric DV update into Binary DV logic. So, far not so much. Binary DocValues Updates Key: LUCENE-5513 URL: https://issues.apache.org/jira/browse/LUCENE-5513 Project: Lucene - Core Issue Type: Wish Components: core/index Reporter: Mikhail Khludnev Priority: Minor LUCENE-5189 was a great move toward. I wish to continue. The reason for having this feature is to have join-index - to write children docnums into parent's binaryDV. I can try to proceed the implementation, but I'm not so experienced in such deep Lucene internals. [~shaie], any hint to begin with is much appreciated. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5513) Binary DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13932947#comment-13932947 ] Shai Erera commented on LUCENE-5513: I'll upload a patch a bit later -- I've made some good progress last night and already started to cutover tests. The only thing I don't yet handle are updates that are coming in while a merge is in flight, as this requires some refactoring to the code. I will handle that too, but will upload a patch before that to checkpoint progress. Binary DocValues Updates Key: LUCENE-5513 URL: https://issues.apache.org/jira/browse/LUCENE-5513 Project: Lucene - Core Issue Type: Wish Components: core/index Reporter: Mikhail Khludnev Priority: Minor LUCENE-5189 was a great move toward. I wish to continue. The reason for having this feature is to have join-index - to write children docnums into parent's binaryDV. I can try to proceed the implementation, but I'm not so experienced in such deep Lucene internals. [~shaie], any hint to begin with is much appreciated. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5513) Binary DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933060#comment-13933060 ] Michael McCandless commented on LUCENE-5513: Looks good Shai! I agree we need a refactoring of commitMergedDeletes, and we shouldn't worry about stacking/optimizing now. Binary DocValues Updates Key: LUCENE-5513 URL: https://issues.apache.org/jira/browse/LUCENE-5513 Project: Lucene - Core Issue Type: Wish Components: core/index Reporter: Mikhail Khludnev Priority: Minor Attachments: LUCENE-5513.patch LUCENE-5189 was a great move toward. I wish to continue. The reason for having this feature is to have join-index - to write children docnums into parent's binaryDV. I can try to proceed the implementation, but I'm not so experienced in such deep Lucene internals. [~shaie], any hint to begin with is much appreciated. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5513) Binary DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13932464#comment-13932464 ] Shai Erera commented on LUCENE-5513: [~mkhludnev], I started working on this and made some progress. But I've identified a need to do some refactoring to how the updates are represented internally today in order to keep the code (and more importantly, me!) sane. So let me know if you've started to work on it as well, so we can sync. Binary DocValues Updates Key: LUCENE-5513 URL: https://issues.apache.org/jira/browse/LUCENE-5513 Project: Lucene - Core Issue Type: Wish Components: core/index Reporter: Mikhail Khludnev Priority: Minor LUCENE-5189 was a great move toward. I wish to continue. The reason for having this feature is to have join-index - to write children docnums into parent's binaryDV. I can try to proceed the implementation, but I'm not so experienced in such deep Lucene internals. [~shaie], any hint to begin with is much appreciated. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5513) Binary DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13925695#comment-13925695 ] Shai Erera commented on LUCENE-5513: [~mkhludnev], great that you want to take a crack at that. I will help as best as I can. I would start with the high level IW.updateNumericDV API and follow the breadcrumb trail to add the BDV support. The main classes are IW, DW, BufferedDeletes, BufferedDeleteStream, NumericFieldUpdates and ReaderAndUpdates (and of course all the tests: TestNumericDocValuesUpdates and TestIndexWriterExceptions.testNoLostDeletesOrUpdates). At first I think it's best to just follow the NumericDV approach (which copies the entire NDV to the update file, altering the affected documents' values). We can then consider other approaches (as BinaryDV is more expensive than NumericDV to just copy around). But I'm fine if we do it in incremental steps. Binary DocValues Updates Key: LUCENE-5513 URL: https://issues.apache.org/jira/browse/LUCENE-5513 Project: Lucene - Core Issue Type: Wish Components: core/index Reporter: Mikhail Khludnev Priority: Minor LUCENE-5189 was a great move toward. I wish to continue. The reason for having this feature is to have join-index - to write children docnums into parent's binaryDV. I can try to proceed the implementation, but I'm not so experienced in such deep Lucene internals. [~shaie], any hint to begin with is much appreciated. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org