[jira] [Created] (PHOENIX-4914) There is one corner case in which BaseResultIterators.getParallelScans() returns wrong result of the last guide post info update timestamp.

2018-09-20 Thread Bin Shi (JIRA)
Bin Shi created PHOENIX-4914:


 Summary: There is one corner case in which 
BaseResultIterators.getParallelScans() returns wrong result of the last guide 
post info update timestamp.
 Key: PHOENIX-4914
 URL: https://issues.apache.org/jira/browse/PHOENIX-4914
 Project: Phoenix
  Issue Type: Bug
Reporter: Bin Shi
Assignee: Bin Shi


When I add the following test case to testSelectQueriesWithFilters(...)  in 
[ExplainPlanWithStatsEnabledIT.java|https://github.com/apache/phoenix/pull/347/files#diff-21d3742c352623e12ec4889b0ac4f5d2]
 in my clean local repository (without any local changes), the highlighted 
assertion (commented out) will be hit which indicates a bug in 
BaseResultIterators.getParallelScans() in the current code base.

// Query with multiple scan ranges, and each range's start key and end key

// are both between data
 sql = "SELECT a FROM " + tableName + " WHERE K <= 103 AND K >= 101 OR K <= 108 
AND K >= 106";
 rs = conn.createStatement().executeQuery(sql);
 i = 0;
 numRows = 6;
 int[] result = new int[] \{ 101, 102, 103, 106, 107, 108 };
 while (rs.next()) {
 assertEquals(result[i++], rs.getInt(1));
assertEquals(numRows, i);
 info = getByteRowEstimates(conn, sql, binds);
{color:#8eb021}*// TODO: the original code before this change will hit the 
following assertion.* {color}

{color:#8eb021}*// Need to investigate it.*{color}
{color:#8eb021} *// assertTrue(info.getEstimateInfoTs() > 0);*{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-4913) UPDATE STATISTICS should run raw scan to collect the deleted rows

2018-09-20 Thread Bin Shi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-4913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bin Shi updated PHOENIX-4913:
-
Description: 
In order to truly measure the size of data when calculating guide posts, UPDATE 
STATISTIC should run raw scan to take into account the deleted rows.

For the deleted rows, they will contribute to estimated size of guide post but 
it has no contribution to the count of rows of guide post.

  was:
In order to truly measure the size of data when calculating guide posts, UPDATE 
STATISTIC should run raw scan to take into account all versions of cells.

For the deleted rows, they will contribute to estimated size of guide post but 
it has no contribution to the count of rows of guide post.


> UPDATE STATISTICS should run raw scan to collect the deleted rows
> -
>
> Key: PHOENIX-4913
> URL: https://issues.apache.org/jira/browse/PHOENIX-4913
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 5.0.0
>Reporter: Bin Shi
>Assignee: Bin Shi
>Priority: Major
>
> In order to truly measure the size of data when calculating guide posts, 
> UPDATE STATISTIC should run raw scan to take into account the deleted rows.
> For the deleted rows, they will contribute to estimated size of guide post 
> but it has no contribution to the count of rows of guide post.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PHOENIX-4913) UPDATE STATISTICS should run raw scan to collect the deleted rows

2018-09-20 Thread Bin Shi (JIRA)
Bin Shi created PHOENIX-4913:


 Summary: UPDATE STATISTICS should run raw scan to collect the 
deleted rows
 Key: PHOENIX-4913
 URL: https://issues.apache.org/jira/browse/PHOENIX-4913
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 5.0.0
Reporter: Bin Shi
Assignee: Bin Shi


In order to truly measure the size of data when calculating guide posts, UPDATE 
STATISTIC should run raw scan to take into account all versions of cells.

For the deleted rows, they will contribute to estimated size of guide post but 
it has no contribution to the count of rows of guide post.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [jira] [Commented] (PHOENIX-4008) UPDATE STATISTIC should collect all versions of cells

2018-09-20 Thread Bin Shi
Hi, the community

I checked the following two test failures, they even failed in my clean
local repository (just cloned from master without any local changes). Any
suggestions?

Test Result
 (2
failures / ±0)

   -
   
org.apache.phoenix.end2end.ConcurrentMutationsIT.testRowLockDuringPreBatchMutateWhenIndexed
   

   -
   org.apache.phoenix.end2end.ConcurrentMutationsIT.testLockUntilMVCCAdvanced
   



Thanks,

*Bin Shi **LinkedIn *
PMTS | Infrastructure Engineering
*Salesforce (Seattle|Bellevue)*
*Cell:* 425-247-4348
*Email: *b...@salesforce.com


On Thu, Sep 20, 2018 at 3:07 PM Hadoop QA (JIRA)  wrote:

>
> [
> https://issues.apache.org/jira/browse/PHOENIX-4008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16622782#comment-16622782
> ]
>
> Hadoop QA commented on PHOENIX-4008:
> 
>
> {color:red}-1 overall{color}.  Here are the results of testing the latest
> attachment
>
> http://issues.apache.org/jira/secure/attachment/12940630/PHOENIX-4008_0920.patch
>   against master branch at commit 91f085a902357b0a589b6367fb9c5f40b9781c8f.
>   ATTACHMENT ID: 12940630
>
> {color:green}+1 @author{color}.  The patch does not contain any
> @author tags.
>
> {color:red}-1 tests included{color}.  The patch doesn't appear to
> include any new or modified tests.
> Please justify why no new tests are needed for
> this patch.
> Also please list what manual steps were performed
> to verify this patch.
>
> {color:green}+1 javac{color}.  The applied patch does not increase the
> total number of javac compiler warnings.
>
> {color:green}+1 release audit{color}.  The applied patch does not
> increase the total number of release audit warnings.
>
> {color:red}-1 lineLengths{color}.  The patch introduces the following
> lines longer than 100:
> +new
> GuidePostsKey(Bytes.toBytes(tableName), Bytes.toBytes(familyName)),
> +assertTrue(emptyGuidePostExpected ?
> gps.isEmptyGuidePost() : !gps.isEmptyGuidePost());
> +verifyGuidePostGenerated(queryServices, tableName, new
> String[] {"C1", "C2"}, guidePostWidth, true);
> +"CREATE TABLE " + tableName + " (k INTEGER PRIMARY
> KEY, c1.a bigint, c2.b bigint)"
> +// The table only has one row. All cells just has one
> version, and the data size of the row
> +verifyGuidePostGenerated(queryServices, tableName, new
> String[] {"C1", "C2"}, guidePostWidth, true);
> +verifyGuidePostGenerated(queryServices, tableName, new
> String[] {"C1", "C2"}, guidePostWidth, false);
>
>  {color:red}-1 core tests{color}.  The patch failed these unit tests:
>
>  
> ./phoenix-core/target/failsafe-reports/TEST-org.apache.phoenix.end2end.ConcurrentMutationsIT
>
> Test results:
> https://builds.apache.org/job/PreCommit-PHOENIX-Build/2050//testReport/
> Console output:
> https://builds.apache.org/job/PreCommit-PHOENIX-Build/2050//console
>
> This message is automatically generated.
>
> > UPDATE STATISTIC should collect all versions of cells
> > -
> >
> > Key: PHOENIX-4008
> > URL: https://issues.apache.org/jira/browse/PHOENIX-4008
> > Project: Phoenix
> >  Issue Type: Bug
> >Reporter: Samarth Jain
> >Assignee: Bin Shi
> >Priority: Major
> > Attachments: PHOENIX-4008_0918.patch, PHOENIX-4008_0920.patch
> >
> >
> > In order to truly measure the size of data when calculating guide posts,
> UPDATE STATISTIC should taken into account all versions of cells. We should
> also be setting the max versions on the scan.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v7.6.3#76005)
>


[jira] [Updated] (PHOENIX-4912) Make Table Sampling algorithm to accommodate to the imbalance row distribution across guide posts

2018-09-20 Thread Bin Shi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bin Shi updated PHOENIX-4912:
-
Description: 
The current implementation of table sampling is based on the assumption "Every 
two consecutive guide posts contains the equal number of rows" which isn't 
accurate in practice, and once we collect multiple versions of cells and the 
deleted rows, the thing will become worse.

In details, the current implementation of table sampling is (see 
BaseResultIterators.getParallelScan() which calls sampleScans(...) at the end 
of function) as described below:
 # Iterate all parallel scans generated;
 # For each scan, if getHashHode(start row key of the scan) MOD 100 < 
tableSamplingRate (See TableSamplerPredicate.java) then pick this scan; 
otherwise discard this scan.

The problem can be formalized as: We have a group of scans and each scan is 
defined as . 
Now we want to randomly pick X groups so that the sum of count of rows in the 
selected groups is close to Y, where Y = the total count of rows of all scans T 
* table sampling rate R.

To resolve the above problem, one of algorithms that we can consider are 
described below:

ArrayList TableSampling(ArrayList scans, T, R)

{  

    ArrayList pickedScans = new ArrayList();

    Y = T * R;

    for (scan in scans) {

        if (Y <= 0) break;

        if (getHashCode(Ki) MOD 100 < R) {

            // then pick this scan, and adjust T, R, Y accordingly

            pickedScans.Add(scan);

            T -= Ci;

            Y -= Ci;

            if (T != 0 && Y > 0) {

                R = Y / T;

            }

        }

    }

    return pickedScans;

}

  was:
The current implementation of table sampling is based on the assumption "Every 
two consecutive guide posts contains the equal number of rows" which isn't 
accurate in practice, and once we collect multiple versions of cells and the 
deleted rows, the thing will become worse.

In details, the current implementation of table sampling is (see 
BaseResultIterators.getParallelScan() which calls sampleScans(...) at the end 
of function) as described below:
 # Iterate all parallel scans generated;
 # For each scan, if getHashHode(start row key of the scan) MOD 100 < 
tableSamplingRate (See TableSamplerPredicate.java) then pick this scan; 
otherwise discard this scan.

The problem can be formalized as: We have a group of scans and each scan is 
defined as . 
Now we want to randomly pick X groups so that the sum of count of rows in the 
selected groups is close to Y, where Y = the total count of rows of all scans T 
* table sampling rate R.

To resolve the above problem, one of algorithms that we can consider are 
described below:

ArrayList TableSampling(ArrayList scans, T, R)

{  

    ArrayList pickedScans = new ArrayList();

    Y = T * R;

    for (scan in scans) {

        if (Y <= 0) break;

        if (getHashCode(Ki) MOD 100 < R) {

            // then pick this scan, and adjust T, R, Y accordingly

            pickedScans.Add(scan);

            T -= Ci;

            Y -= Ci;

            if (T != 0 && Y > 0) R = Y / T;

        }

    }

    return pickedScans;

}


> Make Table Sampling algorithm to accommodate to the imbalance row 
> distribution across guide posts
> -
>
> Key: PHOENIX-4912
> URL: https://issues.apache.org/jira/browse/PHOENIX-4912
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.0.0
>Reporter: Bin Shi
>Assignee: Bin Shi
>Priority: Major
>
> The current implementation of table sampling is based on the assumption 
> "Every two consecutive guide posts contains the equal number of rows" which 
> isn't accurate in practice, and once we collect multiple versions of cells 
> and the deleted rows, the thing will become worse.
> In details, the current implementation of table sampling is (see 
> BaseResultIterators.getParallelScan() which calls sampleScans(...) at the end 
> of function) as described below:
>  # Iterate all parallel scans generated;
>  # For each scan, if getHashHode(start row key of the scan) MOD 100 < 
> tableSamplingRate (See TableSamplerPredicate.java) then pick this scan; 
> otherwise discard this scan.
> The problem can be formalized as: We have a group of scans and each scan is 
> defined as  Ci>. Now we want to randomly pick X groups so that the sum of count of rows 
> in the selected groups is close to Y, where Y = the total count of rows of 
> all scans T * table sampling rate R.
> To resolve the above problem, one of algorithms that we can consider are 
> described below:
> ArrayList TableSampling(ArrayList scans, T, R)
> {  
>     ArrayList pickedScans = new ArrayList();
>     Y = T * R;
>     for (scan in scans) {
>         if (Y <= 0) break;
>         

[jira] [Updated] (PHOENIX-4912) Make Table Sampling algorithm to accommodate to the imbalance row distribution across guide posts

2018-09-20 Thread Bin Shi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bin Shi updated PHOENIX-4912:
-
Description: 
The current implementation of table sampling is based on the assumption "Every 
two consecutive guide posts contains the equal number of rows" which isn't 
accurate in practice, and once we collect multiple versions of cells and the 
deleted rows, the thing will become worse.

In details, the current implementation of table sampling is (see 
BaseResultIterators.getParallelScan() which calls sampleScans(...) at the end 
of function) as described below:
 # Iterate all parallel scans generated;
 # For each scan, if getHashHode(start row key of the scan) MOD 100 < 
tableSamplingRate (See TableSamplerPredicate.java) then pick this scan; 
otherwise discard this scan.

The problem can be formalized as: We have a group of scans and each scan is 
defined as . 
Now we want to randomly pick X groups so that the sum of count of rows in the 
selected groups is close to Y, where Y = the total count of rows of all scans T 
* table sampling rate R.

To resolve the above problem, one of algorithms that we can consider are 
described below:

ArrayList TableSampling(ArrayList scans, T, R)

{  

    ArrayList pickedScans = new ArrayList();

    Y = T * R;

    for (scan in scans) {

        if (Y <= 0) break;

        if (getHashCode(Ki) MOD 100 < R) {

            // then pick this scan, and adjust T, R, Y accordingly

            pickedScans.Add(scan);

            T -= Ci;

            Y -= Ci;

            if (T != 0 && Y > 0) R = Y / T;

        }

    }

    return pickedScans;

}

  was:
The current implementation of table sampling is based on the assumption "Every 
two consecutive guide posts contains the equal number of rows" which isn't 
accurate in practice, and once we collect multiple versions of cells and the 
deleted rows, the thing will become worse.

In details, the current implementation of table sampling is (see 
BaseResultIterators.getParallelScan() which calls sampleScans(...) at the end 
of function) as described below:
 # Iterate all parallel scans generated;
 # For each scan, if getHashHode(start row key of the scan) MOD 100 < 
tableSamplingRate (See TableSamplerPredicate.java) then pick this scan; 
otherwise discard this scan.

The problem can be formalized as: We have a group of scans and each scan is 
defined as . 
Now we want to randomly pick X groups so that the sum of count of rows in the 
selected groups is close to Y, where Y = the total count of rows of all scans T 
* table sampling rate R.

To resolve the above problem, one of algorithms that we can consider are 
described below:

ArrayList TableSampling(ArrayList scans, T, R)

{  

    ArrayList pickedScans = new ArrayList();

    Y = T * R;

    for (scan in scans) {

        if (Y <= 0) break;

        if (getHashCode(Ki) MOD 100 < R) {

            // then pick this scan, and adjust T, R, Y accordingly

            pickedScans.Add(scan);

            T -= Ci;

            Y -= Ci;

            if (T != 0 && Y > 0)  {

                 R = Y / T;

            }

        }

    }

    return pickedScans;

}


> Make Table Sampling algorithm to accommodate to the imbalance row 
> distribution across guide posts
> -
>
> Key: PHOENIX-4912
> URL: https://issues.apache.org/jira/browse/PHOENIX-4912
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.0.0
>Reporter: Bin Shi
>Assignee: Bin Shi
>Priority: Major
>
> The current implementation of table sampling is based on the assumption 
> "Every two consecutive guide posts contains the equal number of rows" which 
> isn't accurate in practice, and once we collect multiple versions of cells 
> and the deleted rows, the thing will become worse.
> In details, the current implementation of table sampling is (see 
> BaseResultIterators.getParallelScan() which calls sampleScans(...) at the end 
> of function) as described below:
>  # Iterate all parallel scans generated;
>  # For each scan, if getHashHode(start row key of the scan) MOD 100 < 
> tableSamplingRate (See TableSamplerPredicate.java) then pick this scan; 
> otherwise discard this scan.
> The problem can be formalized as: We have a group of scans and each scan is 
> defined as  Ci>. Now we want to randomly pick X groups so that the sum of count of rows 
> in the selected groups is close to Y, where Y = the total count of rows of 
> all scans T * table sampling rate R.
> To resolve the above problem, one of algorithms that we can consider are 
> described below:
> ArrayList TableSampling(ArrayList scans, T, R)
> {  
>     ArrayList pickedScans = new ArrayList();
>     Y = T * R;
>     for (scan in scans) {
>         if (Y <= 0) break;
>       

[jira] [Updated] (PHOENIX-4912) Make Table Sampling algorithm to accommodate to the imbalance row distribution across guide posts

2018-09-20 Thread Bin Shi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bin Shi updated PHOENIX-4912:
-
Description: 
The current implementation of table sampling is based on the assumption "Every 
two consecutive guide posts contains the equal number of rows" which isn't 
accurate in practice, and once we collect multiple versions of cells and the 
deleted rows, the thing will become worse.

In details, the current implementation of table sampling is (see 
BaseResultIterators.getParallelScan() which calls sampleScans(...) at the end 
of function) as described below:
 # Iterate all parallel scans generated;
 # For each scan, if getHashHode(start row key of the scan) MOD 100 < 
tableSamplingRate (See TableSamplerPredicate.java) then pick this scan; 
otherwise discard this scan.

The problem can be formalized as: We have a group of scans and each scan is 
defined as . 
Now we want to randomly pick X groups so that the sum of count of rows in the 
selected groups is close to Y, where Y = the total count of rows of all scans T 
* table sampling rate R.

To resolve the above problem, one of algorithms that we can consider are 
described below:

ArrayList TableSampling(ArrayList scans, T, R)

{  

    ArrayList pickedScans = new ArrayList();

    Y = T * R;

    for (scan in scans) {

        if (Y <= 0) break;

        if (getHashCode(Ki) MOD 100 < R) {

            // then pick this scan, and adjust T, R, Y accordingly

            pickedScans.Add(scan);

            T -= Ci;

            Y -= Ci;

            if (T != 0 && Y > 0)  {

                 R = Y / T;

            }

        }

    }

    return pickedScans;

}

  was:
The current implementation of table sampling is based on the assumption "Every 
two consecutive guide posts contains the equal number of rows" which isn't 
accurate in practice, and once we collect multiple versions of cells and the 
deleted rows, the thing will become worse.

In details, the current implementation of table sampling is (see 
BaseResultIterators.getParallelScan() which calls sampleScans(...) at the end 
of function) as described below:
 # Iterate all parallel scans generated;
 # For each scan, if getHashHode(start row key of the scan) MOD 100 < 
tableSamplingRate (See TableSamplerPredicate.java) then pick this scan; 
otherwise discard this scan.

The problem can be formalized as: We have a group of scans and each scan is 
defined as . 
Now we want to randomly pick X groups so that the sum of count of rows in the 
selected groups is close to Y, where Y = the total count of rows of all scans T 
* table sampling rate R.

To resolve the above problem, one of algorithms that we can consider are 
described below:

ArrayList TableSampling(ArrayList scans, T, R)

{  

    ArrayList pickedScans = new ArrayList();

    Y = T * R;

    for (scan in scans) {

        if (Y <= 0) break;

        if (getHashCode(Ki) MOD 100 < R) {

            // then pick this scan, and adjust T, R, Y accordingly

            pickedScans.Add(scan);

            T -= Ci;

            Y -= Ci;

            if (T != 0 && Y > 0) {

                  R = Y / T;

            }

        }

    }

    return pickedScans;

}


> Make Table Sampling algorithm to accommodate to the imbalance row 
> distribution across guide posts
> -
>
> Key: PHOENIX-4912
> URL: https://issues.apache.org/jira/browse/PHOENIX-4912
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.0.0
>Reporter: Bin Shi
>Assignee: Bin Shi
>Priority: Major
>
> The current implementation of table sampling is based on the assumption 
> "Every two consecutive guide posts contains the equal number of rows" which 
> isn't accurate in practice, and once we collect multiple versions of cells 
> and the deleted rows, the thing will become worse.
> In details, the current implementation of table sampling is (see 
> BaseResultIterators.getParallelScan() which calls sampleScans(...) at the end 
> of function) as described below:
>  # Iterate all parallel scans generated;
>  # For each scan, if getHashHode(start row key of the scan) MOD 100 < 
> tableSamplingRate (See TableSamplerPredicate.java) then pick this scan; 
> otherwise discard this scan.
> The problem can be formalized as: We have a group of scans and each scan is 
> defined as  Ci>. Now we want to randomly pick X groups so that the sum of count of rows 
> in the selected groups is close to Y, where Y = the total count of rows of 
> all scans T * table sampling rate R.
> To resolve the above problem, one of algorithms that we can consider are 
> described below:
> ArrayList TableSampling(ArrayList scans, T, R)
> {  
>     ArrayList pickedScans = new ArrayList();
>     Y = T * R;
>     for (scan in scans) {

[jira] [Created] (PHOENIX-4912) Make Table Sampling algorithm to accommodate to the imbalance row distribution across guide posts

2018-09-20 Thread Bin Shi (JIRA)
Bin Shi created PHOENIX-4912:


 Summary: Make Table Sampling algorithm to accommodate to the 
imbalance row distribution across guide posts
 Key: PHOENIX-4912
 URL: https://issues.apache.org/jira/browse/PHOENIX-4912
 Project: Phoenix
  Issue Type: Improvement
Affects Versions: 5.0.0
Reporter: Bin Shi
Assignee: Bin Shi


The current implementation of table sampling is based on the assumption "Every 
two consecutive guide posts contains the equal number of rows" which isn't 
accurate in practice, and once we collect multiple versions of cells and the 
deleted rows, the thing will become worse.

In details, the current implementation of table sampling is (see 
BaseResultIterators.getParallelScan() which calls sampleScans(...) at the end 
of function) as described below:
 # Iterate all parallel scans generated;
 # For each scan, if getHashHode(start row key of the scan) MOD 100 < 
tableSamplingRate (See TableSamplerPredicate.java) then pick this scan; 
otherwise discard this scan.

The problem can be formalized as: We have a group of scans and each scan is 
defined as . 
Now we want to randomly pick X groups so that the sum of count of rows in the 
selected groups is close to Y, where Y = the total count of rows of all scans T 
* table sampling rate R.

To resolve the above problem, one of algorithms that we can consider are 
described below:

ArrayList TableSampling(ArrayList scans, T, R)

{  

    ArrayList pickedScans = new ArrayList();

    Y = T * R;

    for (scan in scans) {

        if (Y <= 0) break;

        if (getHashCode(Ki) MOD 100 < R) {

            // then pick this scan, and adjust T, R, Y accordingly

            pickedScans.Add(scan);

            T -= Ci;

            Y -= Ci;

            if (T != 0 && Y > 0) {

                  R = Y / T;

            }

        }

    }

    return pickedScans;

}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-3163) Split during global index creation may cause ERROR 201 error

2018-09-20 Thread Lars Hofhansl (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-3163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated PHOENIX-3163:
---
Comment: was deleted

(was: I suggest we roll this one back.

[~sergey.soldatov], [~rajeshbabu], [~jamestaylor], this breaks the general 
UPSERT SELECT case now. Also this has no test, so it's hard to make sure any 
changes we do in PHOENIX-4849 will not break this case again.

Edit: Since the change never resets the stopKey from the scan (see newScan in 
the code), this would even theoretically not fix the problem, we'd pass a new 
scan with the old stopKey and the server would again flag it as a stale region 
cache.

I'm going to revert.)

> Split during global index creation may cause ERROR 201 error
> 
>
> Key: PHOENIX-3163
> URL: https://issues.apache.org/jira/browse/PHOENIX-3163
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.8.0
>Reporter: Sergey Soldatov
>Assignee: Sergey Soldatov
>Priority: Major
> Fix For: 4.14.0, 5.0.0
>
> Attachments: PHOENIX-3163_addendum1.patch, PHOENIX-3163_v1.patch, 
> PHOENIX-3163_v3.patch, PHOENIX-3163_v4.patch, PHOENIX-3163_v5.patch, 
> PHOENIX-3163_v6.patch
>
>
> When we create global index and split happen meanwhile there is a chance to 
> fail with ERROR 201:
> {noformat}
> 2016-08-08 15:55:17,248 INFO  [Thread-6] 
> org.apache.phoenix.iterate.BaseResultIterators(878): Failed to execute task 
> during cancel
> java.util.concurrent.ExecutionException: java.sql.SQLException: ERROR 201 
> (22000): Illegal data.
>   at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>   at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>   at 
> org.apache.phoenix.iterate.BaseResultIterators.close(BaseResultIterators.java:872)
>   at 
> org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:809)
>   at 
> org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:713)
>   at 
> org.apache.phoenix.iterate.RoundRobinResultIterator.getIterators(RoundRobinResultIterator.java:176)
>   at 
> org.apache.phoenix.iterate.RoundRobinResultIterator.next(RoundRobinResultIterator.java:91)
>   at 
> org.apache.phoenix.compile.UpsertCompiler$2.execute(UpsertCompiler.java:815)
>   at 
> org.apache.phoenix.compile.DelegateMutationPlan.execute(DelegateMutationPlan.java:31)
>   at 
> org.apache.phoenix.compile.PostIndexDDLCompiler$1.execute(PostIndexDDLCompiler.java:124)
>   at 
> org.apache.phoenix.query.ConnectionQueryServicesImpl.updateData(ConnectionQueryServicesImpl.java:2823)
>   at 
> org.apache.phoenix.schema.MetaDataClient.buildIndex(MetaDataClient.java:1079)
>   at 
> org.apache.phoenix.schema.MetaDataClient.createIndex(MetaDataClient.java:1382)
>   at 
> org.apache.phoenix.compile.CreateIndexCompiler$1.execute(CreateIndexCompiler.java:85)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:343)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:331)
>   at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:330)
>   at 
> org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:1440)
>   at 
> org.apache.phoenix.hbase.index.write.TestIndexWriter$1.run(TestIndexWriter.java:93)
> Caused by: java.sql.SQLException: ERROR 201 (22000): Illegal data.
>   at 
> org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:441)
>   at 
> org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:145)
>   at 
> org.apache.phoenix.schema.types.PDataType.newIllegalDataException(PDataType.java:287)
>   at 
> org.apache.phoenix.schema.types.PUnsignedSmallint$UnsignedShortCodec.decodeShort(PUnsignedSmallint.java:146)
>   at 
> org.apache.phoenix.schema.types.PSmallint.toObject(PSmallint.java:104)
>   at org.apache.phoenix.schema.types.PSmallint.toObject(PSmallint.java:28)
>   at 
> org.apache.phoenix.schema.types.PDataType.toObject(PDataType.java:980)
>   at 
> org.apache.phoenix.schema.types.PUnsignedSmallint.toObject(PUnsignedSmallint.java:102)
>   at 
> org.apache.phoenix.schema.types.PDataType.toObject(PDataType.java:980)
>   at 
> org.apache.phoenix.schema.types.PDataType.toObject(PDataType.java:992)
>   at 
> org.apache.phoenix.schema.types.PDataType.coerceBytes(PDataType.java:830)
>   at 
> org.apache.phoenix.schema.types.PDecimal.coerceBytes(PDecimal.java:342)
>   at 
> org.apache.phoenix.schema.types.PDataType.coerceBytes(PDataType.java:810)
>   at 
> 

[jira] [Updated] (PHOENIX-4908) [Apache Spark Plugin Doc] update save api when using spark dataframe

2018-09-20 Thread Sandeep Nemuri (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandeep Nemuri updated PHOENIX-4908:

Attachment: PHOENIX-4908.002.patch

> [Apache Spark Plugin Doc] update save api when using spark dataframe
> 
>
> Key: PHOENIX-4908
> URL: https://issues.apache.org/jira/browse/PHOENIX-4908
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Sandeep Nemuri
>Assignee: Sandeep Nemuri
>Priority: Major
> Attachments: PHOENIX-4908.001.patch, PHOENIX-4908.002.patch
>
>
>  
> Error, when saving the dataframe to phoenix table as mentioned in 
> [https://phoenix.apache.org/phoenix_spark.html] (Mentioned In Saving 
> DataFrames section)
> {code:java}
> scala> val df = sqlContext.load("org.apache.phoenix.spark", Map("table" -> 
> "INPUT_TABLE", | "zkUrl" -> "c221-node4.com:2181")) 
> warning: there was one deprecation warning; re-run with -deprecation for 
> details df: org.apache.spark.sql.DataFrame = [ID: bigint, COL1: string ... 1 
> more field] 
> scala> dfin.show() 
> +---+--++ 
> | ID| COL1|COL2| 
> +---+--++ 
> | 1|test_row_1| 1| 
> +---+--++ 
>  
> scala> df.save("org.apache.phoenix.spark", SaveMode.Overwrite, Map("table" -> 
> "OUTPUT_TABLE","zkUrl" -> "c221-node4.com:2181")) :32: error: value 
> save is not a member of org.apache.spark.sql.DataFrame 
> df.save("org.apache.phoenix.spark", SaveMode.Overwrite, Map("table" -> 
> "OUTPUT_TABLE","zkUrl" -> "c221-node4.com:2181")) ^
>  
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (PHOENIX-4908) [Apache Spark Plugin Doc] update save api when using spark dataframe

2018-09-20 Thread Ankit Singhal (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-4908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Singhal reassigned PHOENIX-4908:
--

Assignee: Sandeep Nemuri

> [Apache Spark Plugin Doc] update save api when using spark dataframe
> 
>
> Key: PHOENIX-4908
> URL: https://issues.apache.org/jira/browse/PHOENIX-4908
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Sandeep Nemuri
>Assignee: Sandeep Nemuri
>Priority: Major
> Attachments: PHOENIX-4908.001.patch
>
>
>  
> Error, when saving the dataframe to phoenix table as mentioned in 
> [https://phoenix.apache.org/phoenix_spark.html] (Mentioned In Saving 
> DataFrames section)
> {code:java}
> scala> val df = sqlContext.load("org.apache.phoenix.spark", Map("table" -> 
> "INPUT_TABLE", | "zkUrl" -> "c221-node4.com:2181")) 
> warning: there was one deprecation warning; re-run with -deprecation for 
> details df: org.apache.spark.sql.DataFrame = [ID: bigint, COL1: string ... 1 
> more field] 
> scala> dfin.show() 
> +---+--++ 
> | ID| COL1|COL2| 
> +---+--++ 
> | 1|test_row_1| 1| 
> +---+--++ 
>  
> scala> df.save("org.apache.phoenix.spark", SaveMode.Overwrite, Map("table" -> 
> "OUTPUT_TABLE","zkUrl" -> "c221-node4.com:2181")) :32: error: value 
> save is not a member of org.apache.spark.sql.DataFrame 
> df.save("org.apache.phoenix.spark", SaveMode.Overwrite, Map("table" -> 
> "OUTPUT_TABLE","zkUrl" -> "c221-node4.com:2181")) ^
>  
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-4008) UPDATE STATISTIC should collect all versions of cells

2018-09-20 Thread Bin Shi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-4008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bin Shi updated PHOENIX-4008:
-
Attachment: PHOENIX-4008_0920.patch

> UPDATE STATISTIC should collect all versions of cells
> -
>
> Key: PHOENIX-4008
> URL: https://issues.apache.org/jira/browse/PHOENIX-4008
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Samarth Jain
>Assignee: Bin Shi
>Priority: Major
> Attachments: PHOENIX-4008_0918.patch, PHOENIX-4008_0920.patch
>
>
> In order to truly measure the size of data when calculating guide posts, 
> UPDATE STATISTIC should taken into account all versions of cells. We should 
> also be setting the max versions on the scan.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (PHOENIX-4911) Local index has stale data upon deletion of rows

2018-09-20 Thread Rajeshbabu Chintaguntla (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-4911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajeshbabu Chintaguntla reassigned PHOENIX-4911:


Assignee: Rajeshbabu Chintaguntla

> Local index has stale data upon deletion of rows
> 
>
> Key: PHOENIX-4911
> URL: https://issues.apache.org/jira/browse/PHOENIX-4911
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.13.2-cdh5.11.2, 5.0.0, 5.1.0
>Reporter: Ievgen Nekrashevych
>Assignee: Rajeshbabu Chintaguntla
>Priority: Major
>
> When deleting data from main table index table seem to have stale data, and 
> when the row upserted again - index has wrong values.
> Reproducable with script:
> {code}
> create schema if not exists TS
> create table if not exists TS.TEST (STR varchar not null,INTCOL bigint not 
> null, STARTTIME integer, DUMMY integer default 0 CONSTRAINT PK PRIMARY KEY 
> (STR, INTCOL))
> create local index if not exists "TEST_INDEX" on TS.TEST (STR,STARTTIME)
> -- optional delete
> -- delete from TS.TEST
> upsert into TS.TEST(STR,INTCOL,STARTTIME,DUMMY) values ('TEST',4,1,3)
> delete from TS.TEST
> upsert into TS.TEST(STR, INTCOL, STARTTIME, DUMMY) values ('TEST',4,2,4)
> delete from TS.TEST
> upsert into TS.TEST(STR, INTCOL, DUMMY) values ('TEST',4,5)
>  
> SELECT /*+NO_INDEX*/* FROM TS.TEST where STR = 'TEST'
> -- yields: STARTTIME = null
> SELECT /*+TEST_INDEX*/ * FROM TS.TEST where STR = 'TEST'
> -- yields: STARTTIME = 2
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-4874) psql doesn't support date/time with values smaller than milliseconds

2018-09-20 Thread Rajeshbabu Chintaguntla (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-4874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajeshbabu Chintaguntla updated PHOENIX-4874:
-
Attachment: PHOENIX-4874_v2.patch

> psql doesn't support date/time with values smaller than milliseconds
> 
>
> Key: PHOENIX-4874
> URL: https://issues.apache.org/jira/browse/PHOENIX-4874
> Project: Phoenix
>  Issue Type: Task
>Reporter: Josh Elser
>Assignee: Rajeshbabu Chintaguntla
>Priority: Major
> Attachments: PHOENIX-4874.patch, PHOENIX-4874_v2.patch
>
>
> [https://phoenix.apache.org/tuning.html] lacks entries for 
> phoenix.query.timeFormat, phoenix.query.timestampFormat which are used by 
> psql to parse out TIME and TIMESTAMP data types.
> Add them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-4911) Local index has stale data upon deletion of rows

2018-09-20 Thread Josh Elser (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-4911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser updated PHOENIX-4911:

Summary: Local index has stale data upon deletion of rows  (was: phoenix 
index has stale data upon deletion of rows)

> Local index has stale data upon deletion of rows
> 
>
> Key: PHOENIX-4911
> URL: https://issues.apache.org/jira/browse/PHOENIX-4911
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.13.2-cdh5.11.2, 5.0.0, 5.1.0
>Reporter: Ievgen Nekrashevych
>Priority: Major
>
> When deleting data from main table index table seem to have stale data, and 
> when the row upserted again - index has wrong values.
> Reproducable with script:
> {code}
> create schema if not exists TS
> create table if not exists TS.TEST (STR varchar not null,INTCOL bigint not 
> null, STARTTIME integer, DUMMY integer default 0 CONSTRAINT PK PRIMARY KEY 
> (STR, INTCOL))
> create local index if not exists "TEST_INDEX" on TS.TEST (STR,STARTTIME)
> -- optional delete
> -- delete from TS.TEST
> upsert into TS.TEST(STR,INTCOL,STARTTIME,DUMMY) values ('TEST',4,1,3)
> delete from TS.TEST
> upsert into TS.TEST(STR, INTCOL, STARTTIME, DUMMY) values ('TEST',4,2,4)
> delete from TS.TEST
> upsert into TS.TEST(STR, INTCOL, DUMMY) values ('TEST',4,5)
>  
> SELECT /*+NO_INDEX*/* FROM TS.TEST where STR = 'TEST'
> -- yields: STARTTIME = null
> SELECT /*+TEST_INDEX*/ * FROM TS.TEST where STR = 'TEST'
> -- yields: STARTTIME = 2
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-4911) phoenix index has stale data upon deletion of rows

2018-09-20 Thread Ievgen Nekrashevych (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-4911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ievgen Nekrashevych updated PHOENIX-4911:
-
Description: 
When deleting data from main table index table seem to have stale data, and 
when the row upserted again - index has wrong values.

Reproducable with script:

{code}
create schema if not exists TS
create table if not exists TS.TEST (STR varchar not null,INTCOL bigint not 
null, STARTTIME integer, DUMMY integer default 0 CONSTRAINT PK PRIMARY KEY 
(STR, INTCOL))
create local index if not exists "TEST_INDEX" on TS.TEST (STR,STARTTIME)

-- optional delete
-- delete from TS.TEST
upsert into TS.TEST(STR,INTCOL,STARTTIME,DUMMY) values ('TEST',4,1,3)
delete from TS.TEST
upsert into TS.TEST(STR, INTCOL, STARTTIME, DUMMY) values ('TEST',4,2,4)
delete from TS.TEST
upsert into TS.TEST (STR, INTCOL, DUMMY) values ('TEST',4,5)
 
SELECT /*+NO_INDEX*/* FROM TS.TEST where STR = 'TEST'
-- yields: STARTTIME = null
SELECT /*+TEST_INDEX*/ * FROM TS.TEST where STR = 'TEST'
-- yields: STARTTIME = 2
{code}


  was:
When deleting data from main table index table seem to have stale data, and 
when the row upserted again - index has wrong values.

Reproducable with script:

{code}
create schema if not exists TS
create table if not exists TS.TEST (STR varchar not null,INTCOL bigint not 
null, STARTTIME integer, DUMMY integer default 0 CONSTRAINT PK PRIMARY KEY 
(STR, INTCOL))
create local index if not exists "TEST_INDEX" on TS.TEST (STR,STARTTIME)

-- optional delete
-- delete from TS.TEST
upsert into TS.TEST(STR,INTCOL,STARTTIME,DUMMY) values ('TEST',4,1,3)
delete from TS.TEST
upsert into TS.TEST( STR, INTCOL, STARTTIME, DUMMY) values ('TEST',4,2,4)
delete from TS.TEST
upsert into TS.TEST ( STR, INTCOL, DUMMY) values ('TEST',4,5)
 
SELECT /*+NO_INDEX*/* FROM TS.TEST where STR = 'TEST'
-- yields: STARTTIME = null
SELECT /*+TEST_INDEX*/ * FROM TS.TEST where STR = 'TEST'
-- yields: STARTTIME = 2
{code}



> phoenix index has stale data upon deletion of rows
> --
>
> Key: PHOENIX-4911
> URL: https://issues.apache.org/jira/browse/PHOENIX-4911
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.13.2-cdh5.11.2, 5.0.0, 5.1.0
>Reporter: Ievgen Nekrashevych
>Priority: Major
>
> When deleting data from main table index table seem to have stale data, and 
> when the row upserted again - index has wrong values.
> Reproducable with script:
> {code}
> create schema if not exists TS
> create table if not exists TS.TEST (STR varchar not null,INTCOL bigint not 
> null, STARTTIME integer, DUMMY integer default 0 CONSTRAINT PK PRIMARY KEY 
> (STR, INTCOL))
> create local index if not exists "TEST_INDEX" on TS.TEST (STR,STARTTIME)
> -- optional delete
> -- delete from TS.TEST
> upsert into TS.TEST(STR,INTCOL,STARTTIME,DUMMY) values ('TEST',4,1,3)
> delete from TS.TEST
> upsert into TS.TEST(STR, INTCOL, STARTTIME, DUMMY) values ('TEST',4,2,4)
> delete from TS.TEST
> upsert into TS.TEST (STR, INTCOL, DUMMY) values ('TEST',4,5)
>  
> SELECT /*+NO_INDEX*/* FROM TS.TEST where STR = 'TEST'
> -- yields: STARTTIME = null
> SELECT /*+TEST_INDEX*/ * FROM TS.TEST where STR = 'TEST'
> -- yields: STARTTIME = 2
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-4911) phoenix index has stale data upon deletion of rows

2018-09-20 Thread Ievgen Nekrashevych (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-4911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ievgen Nekrashevych updated PHOENIX-4911:
-
Description: 
When deleting data from main table index table seem to have stale data, and 
when the row upserted again - index has wrong values.

Reproducable with script:

{code}
create schema if not exists TS
create table if not exists TS.TEST (STR varchar not null,INTCOL bigint not 
null, STARTTIME integer, DUMMY integer default 0 CONSTRAINT PK PRIMARY KEY 
(STR, INTCOL))
create local index if not exists "TEST_INDEX" on TS.TEST (STR,STARTTIME)

-- optional delete
-- delete from TS.TEST
upsert into TS.TEST(STR,INTCOL,STARTTIME,DUMMY) values ('TEST',4,1,3)
delete from TS.TEST
upsert into TS.TEST(STR, INTCOL, STARTTIME, DUMMY) values ('TEST',4,2,4)
delete from TS.TEST
upsert into TS.TEST(STR, INTCOL, DUMMY) values ('TEST',4,5)
 
SELECT /*+NO_INDEX*/* FROM TS.TEST where STR = 'TEST'
-- yields: STARTTIME = null
SELECT /*+TEST_INDEX*/ * FROM TS.TEST where STR = 'TEST'
-- yields: STARTTIME = 2
{code}


  was:
When deleting data from main table index table seem to have stale data, and 
when the row upserted again - index has wrong values.

Reproducable with script:

{code}
create schema if not exists TS
create table if not exists TS.TEST (STR varchar not null,INTCOL bigint not 
null, STARTTIME integer, DUMMY integer default 0 CONSTRAINT PK PRIMARY KEY 
(STR, INTCOL))
create local index if not exists "TEST_INDEX" on TS.TEST (STR,STARTTIME)

-- optional delete
-- delete from TS.TEST
upsert into TS.TEST(STR,INTCOL,STARTTIME,DUMMY) values ('TEST',4,1,3)
delete from TS.TEST
upsert into TS.TEST(STR, INTCOL, STARTTIME, DUMMY) values ('TEST',4,2,4)
delete from TS.TEST
upsert into TS.TEST (STR, INTCOL, DUMMY) values ('TEST',4,5)
 
SELECT /*+NO_INDEX*/* FROM TS.TEST where STR = 'TEST'
-- yields: STARTTIME = null
SELECT /*+TEST_INDEX*/ * FROM TS.TEST where STR = 'TEST'
-- yields: STARTTIME = 2
{code}



> phoenix index has stale data upon deletion of rows
> --
>
> Key: PHOENIX-4911
> URL: https://issues.apache.org/jira/browse/PHOENIX-4911
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.13.2-cdh5.11.2, 5.0.0, 5.1.0
>Reporter: Ievgen Nekrashevych
>Priority: Major
>
> When deleting data from main table index table seem to have stale data, and 
> when the row upserted again - index has wrong values.
> Reproducable with script:
> {code}
> create schema if not exists TS
> create table if not exists TS.TEST (STR varchar not null,INTCOL bigint not 
> null, STARTTIME integer, DUMMY integer default 0 CONSTRAINT PK PRIMARY KEY 
> (STR, INTCOL))
> create local index if not exists "TEST_INDEX" on TS.TEST (STR,STARTTIME)
> -- optional delete
> -- delete from TS.TEST
> upsert into TS.TEST(STR,INTCOL,STARTTIME,DUMMY) values ('TEST',4,1,3)
> delete from TS.TEST
> upsert into TS.TEST(STR, INTCOL, STARTTIME, DUMMY) values ('TEST',4,2,4)
> delete from TS.TEST
> upsert into TS.TEST(STR, INTCOL, DUMMY) values ('TEST',4,5)
>  
> SELECT /*+NO_INDEX*/* FROM TS.TEST where STR = 'TEST'
> -- yields: STARTTIME = null
> SELECT /*+TEST_INDEX*/ * FROM TS.TEST where STR = 'TEST'
> -- yields: STARTTIME = 2
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PHOENIX-4911) phoenix index has stale data upon deletion of rows

2018-09-20 Thread Ievgen Nekrashevych (JIRA)
Ievgen Nekrashevych created PHOENIX-4911:


 Summary: phoenix index has stale data upon deletion of rows
 Key: PHOENIX-4911
 URL: https://issues.apache.org/jira/browse/PHOENIX-4911
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 5.0.0, 4.13.2-cdh5.11.2, 5.1.0
Reporter: Ievgen Nekrashevych


When deleting data from main table index table seem to have stale data, and 
when the row upserted again - index has wrong values.

Reproducable with script:

{code}
create schema if not exists TS
create table if not exists TS.TEST (STR varchar not null,INTCOL bigint not 
null, STARTTIME integer, DUMMY integer default 0 CONSTRAINT PK PRIMARY KEY 
(STR, INTCOL))
create local index if not exists "TEST_INDEX" on TS.TEST (STR,STARTTIME)

-- optional delete
-- delete from TS.TEST
upsert into TS.TEST(STR,INTCOL,STARTTIME,DUMMY) values ('TEST',4,1,3)
delete from TS.TEST
upsert into TS.TEST( STR, INTCOL, STARTTIME, DUMMY) values ('TEST',4,2,4)
delete from TS.TEST
upsert into TS.TEST ( STR, INTCOL, DUMMY) values ('TEST',4,5)
 
SELECT /*+NO_INDEX*/* FROM TS.TEST where STR = 'TEST'
-- yields: STARTTIME = null
SELECT /*+TEST_INDEX*/ * FROM TS.TEST where STR = 'TEST'
-- yields: STARTTIME = 2
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)