[jira] [Commented] (PHOENIX-3824) Mutable Index partial rebuild adds more than one index row for updated data row

2017-05-08 Thread Vincent Poon (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16001541#comment-16001541
 ] 

Vincent Poon commented on PHOENIX-3824:
---

[~lhofhansl] Yes it's a hot code path in that it's called anytime we want to 
generate an index update.  However the results are cached per mutation, so it's 
only called once for a mutation with many versions.

It's not actually sorting.  There's a call to guava's Ordering#min(), which 
just does an O(N) comparison of cell lists.  And for the comparison, I only 
compare the last cell in each list, since it's assumed they're already ordered 
from newest to oldest.  So overall an O(N) operation where N is your number of 
cell lists (families), not number of cells.  If you have one family with a huge 
number of cells it should return in constant time.

> Mutable Index partial rebuild adds more than one index row for updated data 
> row
> ---
>
> Key: PHOENIX-3824
> URL: https://issues.apache.org/jira/browse/PHOENIX-3824
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.10.0
>Reporter: Vincent Poon
>Assignee: Vincent Poon
> Fix For: 4.11.0
>
> Attachments: PHOENIX-3824.v1.patch, PHOENIX-3824.v2.patch
>
>
> If you follow this sequence:
> 1) disable index
> 2) write an updates to a data table row
> 3) trigger the BuildIndexScheduleTask partial rebuild
> then you end up with two index rows for the one data table row.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3824) Mutable Index partial rebuild adds more than one index row for updated data row

2017-05-08 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16001493#comment-16001493
 ] 

Lars Hofhansl commented on PHOENIX-3824:


[~vincentpoon], is getOldestTimestamp on a hot code path? It seems we're doing 
a sort O(N logN) when we just need the minimum O(N), and it was O(1) before. 
(It's possible I am missing something)

Looks good otherwise.

> Mutable Index partial rebuild adds more than one index row for updated data 
> row
> ---
>
> Key: PHOENIX-3824
> URL: https://issues.apache.org/jira/browse/PHOENIX-3824
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.10.0
>Reporter: Vincent Poon
>Assignee: Vincent Poon
> Fix For: 4.11.0
>
> Attachments: PHOENIX-3824.v1.patch, PHOENIX-3824.v2.patch
>
>
> If you follow this sequence:
> 1) disable index
> 2) write an updates to a data table row
> 3) trigger the BuildIndexScheduleTask partial rebuild
> then you end up with two index rows for the one data table row.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3824) Mutable Index partial rebuild adds more than one index row for updated data row

2017-05-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15998911#comment-15998911
 ] 

Hadoop QA commented on PHOENIX-3824:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12866655/PHOENIX-3824.v2.patch
  against master branch at commit f51c0db9f2d2ee261e602a114d47dd63353bbba8.
  ATTACHMENT ID: 12866655

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 8 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
47 warning messages.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 
./phoenix-core/target/failsafe-reports/TEST-org.apache.phoenix.tx.TxCheckpointIT
./phoenix-core/target/failsafe-reports/TEST-org.apache.phoenix.end2end.index.MutableIndexFailureIT
./phoenix-core/target/failsafe-reports/TEST-org.apache.phoenix.end2end.AlterTableWithViewsIT

Test results: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/846//testReport/
Javadoc warnings: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/846//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/846//console

This message is automatically generated.

> Mutable Index partial rebuild adds more than one index row for updated data 
> row
> ---
>
> Key: PHOENIX-3824
> URL: https://issues.apache.org/jira/browse/PHOENIX-3824
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.10.0
>Reporter: Vincent Poon
>Assignee: Vincent Poon
> Fix For: 4.11.0
>
> Attachments: PHOENIX-3824.v1.patch, PHOENIX-3824.v2.patch
>
>
> If you follow this sequence:
> 1) disable index
> 2) write an updates to a data table row
> 3) trigger the BuildIndexScheduleTask partial rebuild
> then you end up with two index rows for the one data table row.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3824) Mutable Index partial rebuild adds more than one index row for updated data row

2017-05-05 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15998780#comment-15998780
 ] 

James Taylor commented on PHOENIX-3824:
---

+1. Looks really good, [~vincentpoon]. Thanks for the contribution. I'll get 
this committed to our 4.x and master branches.

> Mutable Index partial rebuild adds more than one index row for updated data 
> row
> ---
>
> Key: PHOENIX-3824
> URL: https://issues.apache.org/jira/browse/PHOENIX-3824
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.10.0
>Reporter: Vincent Poon
> Attachments: PHOENIX-3824.v1.patch, PHOENIX-3824.v2.patch
>
>
> If you follow this sequence:
> 1) disable index
> 2) write an updates to a data table row
> 3) trigger the BuildIndexScheduleTask partial rebuild
> then you end up with two index rows for the one data table row.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3824) Mutable Index partial rebuild adds more than one index row for updated data row

2017-05-05 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15998646#comment-15998646
 ] 

Hadoop QA commented on PHOENIX-3824:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12866645/PHOENIX-3824.v1.patch
  against master branch at commit f51c0db9f2d2ee261e602a114d47dd63353bbba8.
  ATTACHMENT ID: 12866645

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 8 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
47 warning messages.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
+// Since we're replaying mutations, we want the oldest timestamp 
(as anything newer we be replayed)
+private static final String TEST_TABLE_DDL = "CREATE TABLE IF NOT EXISTS " 
+ TEST_TABLE_STRING + " (\n" +
+private static final String TEST_TABLE_INDEX_DDL = "CREATE INDEX IF NOT 
EXISTS " + TEST_TABLE_INDEX_STRING
+ * called, where any read requests to {@link 
LocalTable#getCurrentRowState(Mutation, Collection, boolean)}

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   
org.apache.phoenix.hbase.index.covered.TestNonTxIndexBuilder

Test results: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/845//testReport/
Javadoc warnings: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/845//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/845//console

This message is automatically generated.

> Mutable Index partial rebuild adds more than one index row for updated data 
> row
> ---
>
> Key: PHOENIX-3824
> URL: https://issues.apache.org/jira/browse/PHOENIX-3824
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.10.0
>Reporter: Vincent Poon
> Attachments: PHOENIX-3824.v1.patch
>
>
> If you follow this sequence:
> 1) disable index
> 2) write an updates to a data table row
> 3) trigger the BuildIndexScheduleTask partial rebuild
> then you end up with two index rows for the one data table row.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3824) Mutable Index partial rebuild adds more than one index row for updated data row

2017-05-04 Thread Vincent Poon (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15997685#comment-15997685
 ] 

Vincent Poon commented on PHOENIX-3824:
---

[~jamestaylor] Can you take a look at the logic in the patch?  Previously, when 
we ignoredNewerMutations , we set the scanner max timestamp to the first entry 
in the cell list, which is the newest timestamp.  But it seems for replaying of 
mutations, we should be getting the oldest timestamp in the current mutation, 
otherwise we'll fetch data that is in the current mutation being replayed.

> Mutable Index partial rebuild adds more than one index row for updated data 
> row
> ---
>
> Key: PHOENIX-3824
> URL: https://issues.apache.org/jira/browse/PHOENIX-3824
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Vincent Poon
>
> If you follow this sequence:
> 1) disable index
> 2) write an updates to a data table row
> 3) trigger the BuildIndexScheduleTask partial rebuild
> then you end up with two index rows for the one data table row.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3824) Mutable Index partial rebuild adds more than one index row for updated data row

2017-05-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15997681#comment-15997681
 ] 

ASF GitHub Bot commented on PHOENIX-3824:
-

GitHub user vincentpoon opened a pull request:

https://github.com/apache/phoenix/pull/244

PHOENIX-3824 Mutable Index partial rebuild adds more than one index r…

…ow for updated data row

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vincentpoon/phoenix PHOENIX-3824

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/phoenix/pull/244.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #244


commit 501c11f8a7b4119a96dc4766574d4b33864c7a66
Author: Vincent 
Date:   2017-05-05T00:58:23Z

PHOENIX-3824 Mutable Index partial rebuild adds more than one index row for 
updated data row




> Mutable Index partial rebuild adds more than one index row for updated data 
> row
> ---
>
> Key: PHOENIX-3824
> URL: https://issues.apache.org/jira/browse/PHOENIX-3824
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Vincent Poon
>
> If you follow this sequence:
> 1) disable index
> 2) write an updates to a data table row
> 3) trigger the BuildIndexScheduleTask partial rebuild
> then you end up with two index rows for the one data table row.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3824) Mutable Index partial rebuild adds more than one index row for updated data row

2017-05-02 Thread Vincent Poon (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15994307#comment-15994307
 ] 

Vincent Poon commented on PHOENIX-3824:
---

[~lhofhansl] it turned out that the two are related.  Short summary is, 
normally when you do an update to a data table row, in the preBatchMutate hook 
you generate the index update (so you can write it to WAL).  To get the index 
update, you grab the current state of the row (since you're in preBatchMutate, 
it's the pre-update state of the row).  That way, you can figure out the 
existing index row, and issue a Delete for it, and then Put the new index row.

Well when you're doing an index rebuild, all your data table rows are written 
already.  So when you "grab the current state of the row", it's the same as the 
mutation you're replaying.  Since nothing has 'changed', so to speak, the 
delete isn't issued.  Hence you end up with the extra index row.

PHOENIX-3806 then gets triggered because there's some logic to handle 
out-of-order updates.  The way they handle out-of-order-updates is, if you get 
a mutation that isn't the latest timestamp (i.e. backwards in time), the code 
the rolls up through each version up to present.  That way you know the present 
index state, and if it has changed, you hide your current (back in time) index 
update by issuing a Delete after your Put.  If you have many versions, this 
"roll up" ends up being done for each one, hence the arithmetic summation 
problem.

I believe the simple fix is to make sure you don't scan for newer versions when 
you "grab the current state of the row".  There's actually code that tries to 
do that but I think there's a bug.  I'm still writing proper tests, etc, but I 
think that should fix it.

I haven't figured out PHOENIX-3825, though.  I don't know if the code is built 
to handle that, and actually it's tricky to make it work with this one.

> Mutable Index partial rebuild adds more than one index row for updated data 
> row
> ---
>
> Key: PHOENIX-3824
> URL: https://issues.apache.org/jira/browse/PHOENIX-3824
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Vincent Poon
>
> If you follow this sequence:
> 1) disable index
> 2) write an updates to a data table row
> 3) trigger the BuildIndexScheduleTask partial rebuild
> then you end up with two index rows for the one data table row.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (PHOENIX-3824) Mutable Index partial rebuild adds more than one index row for updated data row

2017-05-02 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15994299#comment-15994299
 ] 

Lars Hofhansl commented on PHOENIX-3824:


Is this what causes PHOENIX-3806?

> Mutable Index partial rebuild adds more than one index row for updated data 
> row
> ---
>
> Key: PHOENIX-3824
> URL: https://issues.apache.org/jira/browse/PHOENIX-3824
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Vincent Poon
>
> If you follow this sequence:
> 1) disable index
> 2) write an updates to a data table row
> 3) trigger the BuildIndexScheduleTask partial rebuild
> then you end up with two index rows for the one data table row.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)