[jira] [Commented] (PHOENIX-3824) Mutable Index partial rebuild adds more than one index row for updated data row
[ https://issues.apache.org/jira/browse/PHOENIX-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16001541#comment-16001541 ] Vincent Poon commented on PHOENIX-3824: --- [~lhofhansl] Yes it's a hot code path in that it's called anytime we want to generate an index update. However the results are cached per mutation, so it's only called once for a mutation with many versions. It's not actually sorting. There's a call to guava's Ordering#min(), which just does an O(N) comparison of cell lists. And for the comparison, I only compare the last cell in each list, since it's assumed they're already ordered from newest to oldest. So overall an O(N) operation where N is your number of cell lists (families), not number of cells. If you have one family with a huge number of cells it should return in constant time. > Mutable Index partial rebuild adds more than one index row for updated data > row > --- > > Key: PHOENIX-3824 > URL: https://issues.apache.org/jira/browse/PHOENIX-3824 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.10.0 >Reporter: Vincent Poon >Assignee: Vincent Poon > Fix For: 4.11.0 > > Attachments: PHOENIX-3824.v1.patch, PHOENIX-3824.v2.patch > > > If you follow this sequence: > 1) disable index > 2) write an updates to a data table row > 3) trigger the BuildIndexScheduleTask partial rebuild > then you end up with two index rows for the one data table row. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PHOENIX-3824) Mutable Index partial rebuild adds more than one index row for updated data row
[ https://issues.apache.org/jira/browse/PHOENIX-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16001493#comment-16001493 ] Lars Hofhansl commented on PHOENIX-3824: [~vincentpoon], is getOldestTimestamp on a hot code path? It seems we're doing a sort O(N logN) when we just need the minimum O(N), and it was O(1) before. (It's possible I am missing something) Looks good otherwise. > Mutable Index partial rebuild adds more than one index row for updated data > row > --- > > Key: PHOENIX-3824 > URL: https://issues.apache.org/jira/browse/PHOENIX-3824 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.10.0 >Reporter: Vincent Poon >Assignee: Vincent Poon > Fix For: 4.11.0 > > Attachments: PHOENIX-3824.v1.patch, PHOENIX-3824.v2.patch > > > If you follow this sequence: > 1) disable index > 2) write an updates to a data table row > 3) trigger the BuildIndexScheduleTask partial rebuild > then you end up with two index rows for the one data table row. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PHOENIX-3824) Mutable Index partial rebuild adds more than one index row for updated data row
[ https://issues.apache.org/jira/browse/PHOENIX-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15998911#comment-15998911 ] Hadoop QA commented on PHOENIX-3824: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12866655/PHOENIX-3824.v2.patch against master branch at commit f51c0db9f2d2ee261e602a114d47dd63353bbba8. ATTACHMENT ID: 12866655 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 8 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 47 warning messages. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 core tests{color}. The patch failed these unit tests: ./phoenix-core/target/failsafe-reports/TEST-org.apache.phoenix.tx.TxCheckpointIT ./phoenix-core/target/failsafe-reports/TEST-org.apache.phoenix.end2end.index.MutableIndexFailureIT ./phoenix-core/target/failsafe-reports/TEST-org.apache.phoenix.end2end.AlterTableWithViewsIT Test results: https://builds.apache.org/job/PreCommit-PHOENIX-Build/846//testReport/ Javadoc warnings: https://builds.apache.org/job/PreCommit-PHOENIX-Build/846//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-PHOENIX-Build/846//console This message is automatically generated. > Mutable Index partial rebuild adds more than one index row for updated data > row > --- > > Key: PHOENIX-3824 > URL: https://issues.apache.org/jira/browse/PHOENIX-3824 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.10.0 >Reporter: Vincent Poon >Assignee: Vincent Poon > Fix For: 4.11.0 > > Attachments: PHOENIX-3824.v1.patch, PHOENIX-3824.v2.patch > > > If you follow this sequence: > 1) disable index > 2) write an updates to a data table row > 3) trigger the BuildIndexScheduleTask partial rebuild > then you end up with two index rows for the one data table row. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PHOENIX-3824) Mutable Index partial rebuild adds more than one index row for updated data row
[ https://issues.apache.org/jira/browse/PHOENIX-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15998780#comment-15998780 ] James Taylor commented on PHOENIX-3824: --- +1. Looks really good, [~vincentpoon]. Thanks for the contribution. I'll get this committed to our 4.x and master branches. > Mutable Index partial rebuild adds more than one index row for updated data > row > --- > > Key: PHOENIX-3824 > URL: https://issues.apache.org/jira/browse/PHOENIX-3824 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.10.0 >Reporter: Vincent Poon > Attachments: PHOENIX-3824.v1.patch, PHOENIX-3824.v2.patch > > > If you follow this sequence: > 1) disable index > 2) write an updates to a data table row > 3) trigger the BuildIndexScheduleTask partial rebuild > then you end up with two index rows for the one data table row. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PHOENIX-3824) Mutable Index partial rebuild adds more than one index row for updated data row
[ https://issues.apache.org/jira/browse/PHOENIX-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15998646#comment-15998646 ] Hadoop QA commented on PHOENIX-3824: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12866645/PHOENIX-3824.v1.patch against master branch at commit f51c0db9f2d2ee261e602a114d47dd63353bbba8. ATTACHMENT ID: 12866645 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 8 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 47 warning messages. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: +// Since we're replaying mutations, we want the oldest timestamp (as anything newer we be replayed) +private static final String TEST_TABLE_DDL = "CREATE TABLE IF NOT EXISTS " + TEST_TABLE_STRING + " (\n" + +private static final String TEST_TABLE_INDEX_DDL = "CREATE INDEX IF NOT EXISTS " + TEST_TABLE_INDEX_STRING + * called, where any read requests to {@link LocalTable#getCurrentRowState(Mutation, Collection, boolean)} {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.phoenix.hbase.index.covered.TestNonTxIndexBuilder Test results: https://builds.apache.org/job/PreCommit-PHOENIX-Build/845//testReport/ Javadoc warnings: https://builds.apache.org/job/PreCommit-PHOENIX-Build/845//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-PHOENIX-Build/845//console This message is automatically generated. > Mutable Index partial rebuild adds more than one index row for updated data > row > --- > > Key: PHOENIX-3824 > URL: https://issues.apache.org/jira/browse/PHOENIX-3824 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.10.0 >Reporter: Vincent Poon > Attachments: PHOENIX-3824.v1.patch > > > If you follow this sequence: > 1) disable index > 2) write an updates to a data table row > 3) trigger the BuildIndexScheduleTask partial rebuild > then you end up with two index rows for the one data table row. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PHOENIX-3824) Mutable Index partial rebuild adds more than one index row for updated data row
[ https://issues.apache.org/jira/browse/PHOENIX-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15997685#comment-15997685 ] Vincent Poon commented on PHOENIX-3824: --- [~jamestaylor] Can you take a look at the logic in the patch? Previously, when we ignoredNewerMutations , we set the scanner max timestamp to the first entry in the cell list, which is the newest timestamp. But it seems for replaying of mutations, we should be getting the oldest timestamp in the current mutation, otherwise we'll fetch data that is in the current mutation being replayed. > Mutable Index partial rebuild adds more than one index row for updated data > row > --- > > Key: PHOENIX-3824 > URL: https://issues.apache.org/jira/browse/PHOENIX-3824 > Project: Phoenix > Issue Type: Bug >Reporter: Vincent Poon > > If you follow this sequence: > 1) disable index > 2) write an updates to a data table row > 3) trigger the BuildIndexScheduleTask partial rebuild > then you end up with two index rows for the one data table row. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PHOENIX-3824) Mutable Index partial rebuild adds more than one index row for updated data row
[ https://issues.apache.org/jira/browse/PHOENIX-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15997681#comment-15997681 ] ASF GitHub Bot commented on PHOENIX-3824: - GitHub user vincentpoon opened a pull request: https://github.com/apache/phoenix/pull/244 PHOENIX-3824 Mutable Index partial rebuild adds more than one index r… …ow for updated data row You can merge this pull request into a Git repository by running: $ git pull https://github.com/vincentpoon/phoenix PHOENIX-3824 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/phoenix/pull/244.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #244 commit 501c11f8a7b4119a96dc4766574d4b33864c7a66 Author: VincentDate: 2017-05-05T00:58:23Z PHOENIX-3824 Mutable Index partial rebuild adds more than one index row for updated data row > Mutable Index partial rebuild adds more than one index row for updated data > row > --- > > Key: PHOENIX-3824 > URL: https://issues.apache.org/jira/browse/PHOENIX-3824 > Project: Phoenix > Issue Type: Bug >Reporter: Vincent Poon > > If you follow this sequence: > 1) disable index > 2) write an updates to a data table row > 3) trigger the BuildIndexScheduleTask partial rebuild > then you end up with two index rows for the one data table row. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PHOENIX-3824) Mutable Index partial rebuild adds more than one index row for updated data row
[ https://issues.apache.org/jira/browse/PHOENIX-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15994307#comment-15994307 ] Vincent Poon commented on PHOENIX-3824: --- [~lhofhansl] it turned out that the two are related. Short summary is, normally when you do an update to a data table row, in the preBatchMutate hook you generate the index update (so you can write it to WAL). To get the index update, you grab the current state of the row (since you're in preBatchMutate, it's the pre-update state of the row). That way, you can figure out the existing index row, and issue a Delete for it, and then Put the new index row. Well when you're doing an index rebuild, all your data table rows are written already. So when you "grab the current state of the row", it's the same as the mutation you're replaying. Since nothing has 'changed', so to speak, the delete isn't issued. Hence you end up with the extra index row. PHOENIX-3806 then gets triggered because there's some logic to handle out-of-order updates. The way they handle out-of-order-updates is, if you get a mutation that isn't the latest timestamp (i.e. backwards in time), the code the rolls up through each version up to present. That way you know the present index state, and if it has changed, you hide your current (back in time) index update by issuing a Delete after your Put. If you have many versions, this "roll up" ends up being done for each one, hence the arithmetic summation problem. I believe the simple fix is to make sure you don't scan for newer versions when you "grab the current state of the row". There's actually code that tries to do that but I think there's a bug. I'm still writing proper tests, etc, but I think that should fix it. I haven't figured out PHOENIX-3825, though. I don't know if the code is built to handle that, and actually it's tricky to make it work with this one. > Mutable Index partial rebuild adds more than one index row for updated data > row > --- > > Key: PHOENIX-3824 > URL: https://issues.apache.org/jira/browse/PHOENIX-3824 > Project: Phoenix > Issue Type: Bug >Reporter: Vincent Poon > > If you follow this sequence: > 1) disable index > 2) write an updates to a data table row > 3) trigger the BuildIndexScheduleTask partial rebuild > then you end up with two index rows for the one data table row. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PHOENIX-3824) Mutable Index partial rebuild adds more than one index row for updated data row
[ https://issues.apache.org/jira/browse/PHOENIX-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15994299#comment-15994299 ] Lars Hofhansl commented on PHOENIX-3824: Is this what causes PHOENIX-3806? > Mutable Index partial rebuild adds more than one index row for updated data > row > --- > > Key: PHOENIX-3824 > URL: https://issues.apache.org/jira/browse/PHOENIX-3824 > Project: Phoenix > Issue Type: Bug >Reporter: Vincent Poon > > If you follow this sequence: > 1) disable index > 2) write an updates to a data table row > 3) trigger the BuildIndexScheduleTask partial rebuild > then you end up with two index rows for the one data table row. -- This message was sent by Atlassian JIRA (v6.3.15#6346)