[jira] [Updated] (PHOENIX-4912) Make Table Sampling algorithm to accommodate to the imbalance row distribution across guide posts

2018-09-21 Thread Karan Mehta (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karan Mehta updated PHOENIX-4912:
-
Description: 
The current implementation of table sampling is based on the assumption "Every 
two consecutive guide posts contains the equal number of rows" which isn't 
accurate in practice, and once we collect multiple versions of cells and the 
deleted rows, the thing will become worse.

In details, the current implementation of table sampling is (see 
BaseResultIterators.getParallelScan() which calls sampleScans(...) at the end 
of function) as described below:
 # Iterate all parallel scans generated;
 # For each scan, if getHashHode(start row key of the scan) MOD 100 < 
tableSamplingRate (See TableSamplerPredicate.java) then pick this scan; 
otherwise discard this scan.

The problem can be formalized as: We have a group of scans and each scan is 
defined as . 
Now we want to randomly pick X groups so that the sum of count of rows in the 
selected groups is close to Y, where Y = the total count of rows of all scans T 
* table sampling rate R.

To resolve the above problem, one of algorithms that we can consider are 
described below:
{code:java}
ArrayList TableSampling(ArrayList scans, T, R) {  
    ArrayList pickedScans = new ArrayList();
    Y = T * R;
    for (scan in scans) {
        if (Y <= 0) break;
        if (getHashCode(Ki) MOD 100 < R) {
            // then pick this scan, and adjust T, R, Y accordingly
            pickedScans.Add(scan);
            T -= Ci;
            Y -= Ci;
            if (T != 0 && Y > 0) { 
                R = Y / T;
             }
        }
    }
    return pickedScans;
}
{code}

  was:
The current implementation of table sampling is based on the assumption "Every 
two consecutive guide posts contains the equal number of rows" which isn't 
accurate in practice, and once we collect multiple versions of cells and the 
deleted rows, the thing will become worse.

In details, the current implementation of table sampling is (see 
BaseResultIterators.getParallelScan() which calls sampleScans(...) at the end 
of function) as described below:
 # Iterate all parallel scans generated;
 # For each scan, if getHashHode(start row key of the scan) MOD 100 < 
tableSamplingRate (See TableSamplerPredicate.java) then pick this scan; 
otherwise discard this scan.

The problem can be formalized as: We have a group of scans and each scan is 
defined as . 
Now we want to randomly pick X groups so that the sum of count of rows in the 
selected groups is close to Y, where Y = the total count of rows of all scans T 
* table sampling rate R.

To resolve the above problem, one of algorithms that we can consider are 
described below:

ArrayList TableSampling(ArrayList scans, T, R)

{  

    ArrayList pickedScans = new ArrayList();

    Y = T * R;

    for (scan in scans) {

        if (Y <= 0) break;

        if (getHashCode(Ki) MOD 100 < R) {

            // then pick this scan, and adjust T, R, Y accordingly

            pickedScans.Add(scan);

            T -= Ci;

            Y -= Ci;

            if (T != 0 && Y > 0) {

                R = Y / T;

            }

        }

    }

    return pickedScans;

}


> Make Table Sampling algorithm to accommodate to the imbalance row 
> distribution across guide posts
> -
>
> Key: PHOENIX-4912
> URL: https://issues.apache.org/jira/browse/PHOENIX-4912
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.0.0, 4.15.0
>Reporter: Bin Shi
>Assignee: Bin Shi
>Priority: Major
>
> The current implementation of table sampling is based on the assumption 
> "Every two consecutive guide posts contains the equal number of rows" which 
> isn't accurate in practice, and once we collect multiple versions of cells 
> and the deleted rows, the thing will become worse.
> In details, the current implementation of table sampling is (see 
> BaseResultIterators.getParallelScan() which calls sampleScans(...) at the end 
> of function) as described below:
>  # Iterate all parallel scans generated;
>  # For each scan, if getHashHode(start row key of the scan) MOD 100 < 
> tableSamplingRate (See TableSamplerPredicate.java) then pick this scan; 
> otherwise discard this scan.
> The problem can be formalized as: We have a group of scans and each scan is 
> defined as  Ci>. Now we want to randomly pick X groups so that the sum of count of rows 
> in the selected groups is close to Y, where Y = the total count of rows of 
> all scans T * table sampling rate R.
> To resolve the above problem, one of algorithms that we can consider are 
> described below:
> {code:java}
> ArrayList TableSampling(ArrayList scans, T, R) {  
>     ArrayList pickedScans = new ArrayList();
>     Y = T * R;
> 

[jira] [Updated] (PHOENIX-4008) UPDATE STATISTIC should collect all versions of cells

2018-09-21 Thread Karan Mehta (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-4008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karan Mehta updated PHOENIX-4008:
-
Fix Version/s: 5.1.0
   4.15.0

> UPDATE STATISTIC should collect all versions of cells
> -
>
> Key: PHOENIX-4008
> URL: https://issues.apache.org/jira/browse/PHOENIX-4008
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Samarth Jain
>Assignee: Bin Shi
>Priority: Major
> Fix For: 4.15.0, 5.1.0
>
> Attachments: PHOENIX-4008_0918.patch, PHOENIX-4008_0920.patch
>
>
> In order to truly measure the size of data when calculating guide posts, 
> UPDATE STATISTIC should taken into account all versions of cells. We should 
> also be setting the max versions on the scan.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-4913) UPDATE STATISTICS should run raw scan to collect the deleted rows

2018-09-21 Thread Karan Mehta (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-4913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karan Mehta updated PHOENIX-4913:
-
Affects Version/s: 4.15.0

> UPDATE STATISTICS should run raw scan to collect the deleted rows
> -
>
> Key: PHOENIX-4913
> URL: https://issues.apache.org/jira/browse/PHOENIX-4913
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 5.0.0, 4.15.0
>Reporter: Bin Shi
>Assignee: Bin Shi
>Priority: Major
>
> In order to truly measure the size of data when calculating guide posts, 
> UPDATE STATISTIC should run raw scan to take into account the deleted rows.
> For the deleted rows, they will contribute to estimated size of guide post 
> but it has no contribution to the count of rows of guide post.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-4912) Make Table Sampling algorithm to accommodate to the imbalance row distribution across guide posts

2018-09-21 Thread Karan Mehta (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karan Mehta updated PHOENIX-4912:
-
Affects Version/s: 4.15.0

> Make Table Sampling algorithm to accommodate to the imbalance row 
> distribution across guide posts
> -
>
> Key: PHOENIX-4912
> URL: https://issues.apache.org/jira/browse/PHOENIX-4912
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.0.0, 4.15.0
>Reporter: Bin Shi
>Assignee: Bin Shi
>Priority: Major
>
> The current implementation of table sampling is based on the assumption 
> "Every two consecutive guide posts contains the equal number of rows" which 
> isn't accurate in practice, and once we collect multiple versions of cells 
> and the deleted rows, the thing will become worse.
> In details, the current implementation of table sampling is (see 
> BaseResultIterators.getParallelScan() which calls sampleScans(...) at the end 
> of function) as described below:
>  # Iterate all parallel scans generated;
>  # For each scan, if getHashHode(start row key of the scan) MOD 100 < 
> tableSamplingRate (See TableSamplerPredicate.java) then pick this scan; 
> otherwise discard this scan.
> The problem can be formalized as: We have a group of scans and each scan is 
> defined as  Ci>. Now we want to randomly pick X groups so that the sum of count of rows 
> in the selected groups is close to Y, where Y = the total count of rows of 
> all scans T * table sampling rate R.
> To resolve the above problem, one of algorithms that we can consider are 
> described below:
> ArrayList TableSampling(ArrayList scans, T, R)
> {  
>     ArrayList pickedScans = new ArrayList();
>     Y = T * R;
>     for (scan in scans) {
>         if (Y <= 0) break;
>         if (getHashCode(Ki) MOD 100 < R) {
>             // then pick this scan, and adjust T, R, Y accordingly
>             pickedScans.Add(scan);
>             T -= Ci;
>             Y -= Ci;
>             if (T != 0 && Y > 0) {
>                 R = Y / T;
>             }
>         }
>     }
>     return pickedScans;
> }



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-4815) support alter table modify column

2018-09-21 Thread Jaanai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jaanai updated PHOENIX-4815:

Affects Version/s: 4.13.0
   4.14.0

> support alter table modify column 
> --
>
> Key: PHOENIX-4815
> URL: https://issues.apache.org/jira/browse/PHOENIX-4815
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.12.0, 4.13.0, 4.14.0
>Reporter: Jaanai
>Assignee: Jaanai
>Priority: Major
> Attachments: PHOENIX-4815.patch
>
>
> if we want to change max length or scale of  fields of  variable length type( 
>  example for :varchar, char and decimal type etc),  we can not drop column to 
> recreate new column when the table has massive data,  which may affects 
> online service,meanwhile, it is also very expensive. so sometimes this 
> function is very useful.
> Taking ORACLE dialect as an reference 
> {code:java}
> alter table
>table_name
> modify
>column_name  datatype;
> {code}
> reference link: 
> https://docs.oracle.com/cd/B28359_01/server.111/b28286/statements_3001.htm#i2103956



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PHOENIX-4915) The client gets stuck when multi threads concurrently writing data table(has global index) with same rows

2018-09-21 Thread Jaanai (JIRA)
Jaanai created PHOENIX-4915:
---

 Summary: The client gets stuck when multi threads concurrently 
writing data table(has global index) with same rows 
 Key: PHOENIX-4915
 URL: https://issues.apache.org/jira/browse/PHOENIX-4915
 Project: Phoenix
  Issue Type: Bug
Affects Versions: 4.14.0, 4.12.0
Reporter: Jaanai
Assignee: Jaanai


The client has got stuck when using the multi-thread writes the same rows data 
into a data table which has a global index.

I find that rows lock of the data table will not be released under highly 
writing load and throwing " ERROR 2008 (INT10): ERROR 2008 (INT10): Unable to 
find cached index metadata." exception information.  Most of the threads will 
be waiting for getting the row lock in Jstack information. 

The following are exceptions on the server side:

{code:java}
[B.defaultRpcServer.handler=37,queue=1,port=16020] 
regionserver.RSRpcServices(103): Failed doing multi operation, current call is 
: callId: 3455 service: ClientService meth
odName: Multi size: 23.1 K connection: 192.168.199.7:52050 param: 
actionCount=44#regionCount=8#LOCK,\x02,1537434393195.ee6d441a04ee6a59b24262f22f618d88.#
org.apache.hadoop.hbase.DoNotRetryIOException: ERROR 2008 (INT10): ERROR 2008 
(INT10): Unable to find cached index metadata.  key=-727998515684050837 
region=LOCK,\x0E,1537434393195.f4de29d4b36775589a49f
1c7a20c73a2.host=hb-bp1v2q830426r6763-004.hbase.rds.aliyuncs.com,16020,1537434304031
 Index update failed
at 
org.apache.phoenix.util.ServerUtil.createIOException(ServerUtil.java:88)
at 
org.apache.phoenix.util.ServerUtil.throwIOException(ServerUtil.java:62)
at 
org.apache.phoenix.index.PhoenixIndexMetaData.getIndexMetaData(PhoenixIndexMetaData.java:87)
at 
org.apache.phoenix.index.PhoenixIndexMetaData.(PhoenixIndexMetaData.java:103)
at 
org.apache.phoenix.index.PhoenixIndexBuilder.getIndexMetaData(PhoenixIndexBuilder.java:95)
at 
org.apache.phoenix.hbase.index.builder.IndexBuildManager.getIndexUpdate(IndexBuildManager.java:80)
at 
org.apache.phoenix.hbase.index.Indexer.preBatchMutateWithExceptions(Indexer.java:528)
at 
org.apache.phoenix.hbase.index.Indexer.preBatchMutate(Indexer.java:374)
at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$37.call(RegionCoprocessorHost.java:1032)
at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1714)
at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1789)
at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1746)
at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.preBatchMutate(RegionCoprocessorHost.java:1028)
at 
org.apache.hadoop.hbase.regionserver.HRegion.asyncBatchMutate(HRegion.java:3236)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.doAsyncBatchOp(RSRpcServices.java:2147)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicBatchMutationCrossRegions(RSRpcServices.java:2308)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2578)
at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32303)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2394)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:174)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$CallHandler.run(RpcExecutor.java:178)
Caused by: java.sql.SQLException: ERROR 2008 (INT10): Unable to find cached 
index metadata.  key=-727998515684050837 
region=LOCK,\x0E,1537434393195.f4de29d4b36775589a49f1c7a20c73a2.host=hb-bp1v2q830426r
6763-004.hbase.rds.aliyuncs.com,16020,1537434304031
at 
org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:493)
at 
org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:150)
at 
org.apache.phoenix.index.PhoenixIndexMetaData.getIndexMetaData(PhoenixIndexMetaData.java:85)

2018-09-20 17:35:39,254 INFO  
[B.defaultRpcServer.handler=13,queue=1,port=16020] 
regionserver.RSRpcServices(103): Failed doing multi operation, current call is 
: callId: 3848 service: ClientService meth
odName: Multi size: 27.2 K connection: 192.168.199.7:52042 param: 
actionCount=52#regionCount=8#LOCK,\x02,1537434393195.ee6d441a04ee6a59b24262f22f618d88.#
org.apache.hadoop.hbase.exceptions.TimeoutIOException: Timed out waiting for 
lock for row: 0e 30 64 32 65 34 35 63 37 2d 63 63 33 64 2d 34 36 61 35 2d 61 34 
38 64 2d 31 38 61 62 36 31 61 31 30 63 30 39
at 
org.apache.phoenix.hbase.index.LockManager.lockRow(LockManager.java:96)
at 

[jira] [Updated] (PHOENIX-4915) The client gets stuck when using same rows concurrently writing data table

2018-09-21 Thread Jaanai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-4915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jaanai updated PHOENIX-4915:

Summary: The client gets stuck when using same rows concurrently writing 
data table  (was: The client gets stuck when multi threads concurrently writing 
data table(has global index) with same rows )

> The client gets stuck when using same rows concurrently writing data table
> --
>
> Key: PHOENIX-4915
> URL: https://issues.apache.org/jira/browse/PHOENIX-4915
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0, 4.14.0
>Reporter: Jaanai
>Assignee: Jaanai
>Priority: Blocker
>
> The client has got stuck when using the multi-thread writes the same rows 
> data into a data table which has a global index.
> I find that rows lock of the data table will not be released under highly 
> writing load and throwing " ERROR 2008 (INT10): ERROR 2008 (INT10): Unable to 
> find cached index metadata." exception information.  Most of the threads will 
> be waiting for getting the row lock in Jstack information. 
> The following are exceptions on the server side:
> {code:java}
> [B.defaultRpcServer.handler=37,queue=1,port=16020] 
> regionserver.RSRpcServices(103): Failed doing multi operation, current call 
> is : callId: 3455 service: ClientService meth
> odName: Multi size: 23.1 K connection: 192.168.199.7:52050 param: 
> actionCount=44#regionCount=8#LOCK,\x02,1537434393195.ee6d441a04ee6a59b24262f22f618d88.#
> org.apache.hadoop.hbase.DoNotRetryIOException: ERROR 2008 (INT10): ERROR 2008 
> (INT10): Unable to find cached index metadata.  key=-727998515684050837 
> region=LOCK,\x0E,1537434393195.f4de29d4b36775589a49f
> 1c7a20c73a2.host=hb-bp1v2q830426r6763-004.hbase.rds.aliyuncs.com,16020,1537434304031
>  Index update failed
> at 
> org.apache.phoenix.util.ServerUtil.createIOException(ServerUtil.java:88)
> at 
> org.apache.phoenix.util.ServerUtil.throwIOException(ServerUtil.java:62)
> at 
> org.apache.phoenix.index.PhoenixIndexMetaData.getIndexMetaData(PhoenixIndexMetaData.java:87)
> at 
> org.apache.phoenix.index.PhoenixIndexMetaData.(PhoenixIndexMetaData.java:103)
> at 
> org.apache.phoenix.index.PhoenixIndexBuilder.getIndexMetaData(PhoenixIndexBuilder.java:95)
> at 
> org.apache.phoenix.hbase.index.builder.IndexBuildManager.getIndexUpdate(IndexBuildManager.java:80)
> at 
> org.apache.phoenix.hbase.index.Indexer.preBatchMutateWithExceptions(Indexer.java:528)
> at 
> org.apache.phoenix.hbase.index.Indexer.preBatchMutate(Indexer.java:374)
> at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$37.call(RegionCoprocessorHost.java:1032)
> at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1714)
> at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1789)
> at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1746)
> at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.preBatchMutate(RegionCoprocessorHost.java:1028)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.asyncBatchMutate(HRegion.java:3236)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doAsyncBatchOp(RSRpcServices.java:2147)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicBatchMutationCrossRegions(RSRpcServices.java:2308)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2578)
> at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32303)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2394)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:174)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$CallHandler.run(RpcExecutor.java:178)
> Caused by: java.sql.SQLException: ERROR 2008 (INT10): Unable to find cached 
> index metadata.  key=-727998515684050837 
> region=LOCK,\x0E,1537434393195.f4de29d4b36775589a49f1c7a20c73a2.host=hb-bp1v2q830426r
> 6763-004.hbase.rds.aliyuncs.com,16020,1537434304031
> at 
> org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:493)
> at 
> org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:150)
> at 
> org.apache.phoenix.index.PhoenixIndexMetaData.getIndexMetaData(PhoenixIndexMetaData.java:85)
> 2018-09-20 17:35:39,254 INFO  
> [B.defaultRpcServer.handler=13,queue=1,port=16020] 
> regionserver.RSRpcServices(103): Failed doing multi operation, current call 
> is : callId: 

[jira] [Updated] (PHOENIX-4915) The client gets stuck when using same rows concurrently writing data table

2018-09-21 Thread Jaanai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-4915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jaanai updated PHOENIX-4915:

Attachment: (was: Screen Shot 2018-09-21 at 19.19.28.png)

> The client gets stuck when using same rows concurrently writing data table
> --
>
> Key: PHOENIX-4915
> URL: https://issues.apache.org/jira/browse/PHOENIX-4915
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0, 4.14.0
>Reporter: Jaanai
>Assignee: Jaanai
>Priority: Blocker
> Attachments: image-2018-09-21-19-30-12-989.png, test.java, test.sql
>
>
> The client has got stuck when using the multi-thread writes the same rows 
> data into a data table which has a global index.
> I find that rows lock of the data table will not be released under highly 
> writing load and throwing " ERROR 2008 (INT10): ERROR 2008 (INT10): Unable to 
> find cached index metadata." exception information. Most of the threads will 
> be waiting for getting the row lock in Jstack information.
> The following are exceptions on the server side:
> {code:java}
> [B.defaultRpcServer.handler=37,queue=1,port=16020] 
> regionserver.RSRpcServices(103): Failed doing multi operation, current call 
> is : callId: 3455 service: ClientService meth
> odName: Multi size: 23.1 K connection: 192.168.199.7:52050 param: 
> actionCount=44#regionCount=8#LOCK,\x02,1537434393195.ee6d441a04ee6a59b24262f22f618d88.#
> org.apache.hadoop.hbase.DoNotRetryIOException: ERROR 2008 (INT10): ERROR 2008 
> (INT10): Unable to find cached index metadata.  key=-727998515684050837 
> region=LOCK,\x0E,1537434393195.f4de29d4b36775589a49f
> 1c7a20c73a2.host=hb-bp1v2q830426r6763-004.hbase.rds.aliyuncs.com,16020,1537434304031
>  Index update failed
> at 
> org.apache.phoenix.util.ServerUtil.createIOException(ServerUtil.java:88)
> at 
> org.apache.phoenix.util.ServerUtil.throwIOException(ServerUtil.java:62)
> at 
> org.apache.phoenix.index.PhoenixIndexMetaData.getIndexMetaData(PhoenixIndexMetaData.java:87)
> at 
> org.apache.phoenix.index.PhoenixIndexMetaData.(PhoenixIndexMetaData.java:103)
> at 
> org.apache.phoenix.index.PhoenixIndexBuilder.getIndexMetaData(PhoenixIndexBuilder.java:95)
> at 
> org.apache.phoenix.hbase.index.builder.IndexBuildManager.getIndexUpdate(IndexBuildManager.java:80)
> at 
> org.apache.phoenix.hbase.index.Indexer.preBatchMutateWithExceptions(Indexer.java:528)
> at 
> org.apache.phoenix.hbase.index.Indexer.preBatchMutate(Indexer.java:374)
> at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$37.call(RegionCoprocessorHost.java:1032)
> at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1714)
> at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1789)
> at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1746)
> at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.preBatchMutate(RegionCoprocessorHost.java:1028)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.asyncBatchMutate(HRegion.java:3236)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doAsyncBatchOp(RSRpcServices.java:2147)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicBatchMutationCrossRegions(RSRpcServices.java:2308)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2578)
> at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32303)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2394)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:174)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$CallHandler.run(RpcExecutor.java:178)
> Caused by: java.sql.SQLException: ERROR 2008 (INT10): Unable to find cached 
> index metadata.  key=-727998515684050837 
> region=LOCK,\x0E,1537434393195.f4de29d4b36775589a49f1c7a20c73a2.host=hb-bp1v2q830426r
> 6763-004.hbase.rds.aliyuncs.com,16020,1537434304031
> at 
> org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:493)
> at 
> org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:150)
> at 
> org.apache.phoenix.index.PhoenixIndexMetaData.getIndexMetaData(PhoenixIndexMetaData.java:85)
> 2018-09-20 17:35:39,254 INFO  
> [B.defaultRpcServer.handler=13,queue=1,port=16020] 
> regionserver.RSRpcServices(103): Failed doing multi operation, current call 
> is : callId: 3848 service: ClientService meth
> odName: Multi size: 27.2 K 

[jira] [Updated] (PHOENIX-4904) NPE exception when use non-existing fields in function

2018-09-21 Thread Jaanai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-4904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jaanai updated PHOENIX-4904:

Summary: NPE exception when use non-existing fields in function   (was: NPE 
exception when use non-existing filed in function )

> NPE exception when use non-existing fields in function 
> ---
>
> Key: PHOENIX-4904
> URL: https://issues.apache.org/jira/browse/PHOENIX-4904
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0, 4.13.0, 4.14.0
>Reporter: Jaanai
>Assignee: Jaanai
>Priority: Major
>
> Using following SQL to reoccur error:
> {code:sql}
> create table "test_truncate"("ROW" varchar primary key,"f"."0" 
> varchar,"f"."1" varchar);
> select * from "test_truncate" order by TO_NUMBER("f.1");
> {code}
> Exception information:
> {code}
> java.lang.NullPointerException     at 
> org.apache.phoenix.util.SchemaUtil.getSchemaNameFromFullName(SchemaUtil.java:632)
>      at 
> org.apache.phoenix.schema.TableNotFoundException.(TableNotFoundException.java:44)
>      at 
> org.apache.phoenix.compile.FromCompiler$MultiTableColumnResolver.resolveTable(FromCompiler.java:858)
>      at 
> org.apache.phoenix.compile.FromCompiler$ProjectedTableColumnResolver.resolveColumn(FromCompiler.java:984)
>      at 
> org.apache.phoenix.compile.ExpressionCompiler.resolveColumn(ExpressionCompiler.java:372)
>      at 
> org.apache.phoenix.compile.ExpressionCompiler.visit(ExpressionCompiler.java:408)
>      at 
> org.apache.phoenix.compile.ExpressionCompiler.visit(ExpressionCompiler.java:146)
>      at 
> org.apache.phoenix.parse.ColumnParseNode.accept(ColumnParseNode.java:56)     
> at 
> org.apache.phoenix.parse.CompoundParseNode.acceptChildren(CompoundParseNode.java:64)
>      at 
> org.apache.phoenix.parse.FunctionParseNode.accept(FunctionParseNode.java:84)  
>    at 
> org.apache.phoenix.compile.OrderByCompiler.compile(OrderByCompiler.java:123)  
>    at 
> org.apache.phoenix.compile.QueryCompiler.compileSingleFlatQuery(QueryCompiler.java:562)
>      at 
> org.apache.phoenix.compile.QueryCompiler.compileSingleQuery(QueryCompiler.java:507)
>      at 
> org.apache.phoenix.compile.QueryCompiler.compileSelect(QueryCompiler.java:202)
>      at 
> org.apache.phoenix.compile.QueryCompiler.compile(QueryCompiler.java:157)     
> at 
> org.apache.phoenix.jdbc.PhoenixStatement$ExecutableSelectStatement.compilePlan(PhoenixStatement.java:478)
>      at 
> org.apache.phoenix.jdbc.PhoenixStatement$ExecutableSelectStatement.compilePlan(PhoenixStatement.java:444)
>      a
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-4915) The client gets stuck when using same rows concurrently writing data table

2018-09-21 Thread Jaanai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-4915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jaanai updated PHOENIX-4915:

Attachment: Screen Shot 2018-09-21 at 19.19.28.png

> The client gets stuck when using same rows concurrently writing data table
> --
>
> Key: PHOENIX-4915
> URL: https://issues.apache.org/jira/browse/PHOENIX-4915
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0, 4.14.0
>Reporter: Jaanai
>Assignee: Jaanai
>Priority: Blocker
> Attachments: Screen Shot 2018-09-21 at 19.19.28.png, test.java, 
> test.sql
>
>
> The client has got stuck when using the multi-thread writes the same rows 
> data into a data table which has a global index.
> I find that rows lock of the data table will not be released under highly 
> writing load and throwing " ERROR 2008 (INT10): ERROR 2008 (INT10): Unable to 
> find cached index metadata." exception information.  Most of the threads will 
> be waiting for getting the row lock in Jstack information. 
> The following are exceptions on the server side:
> {code:java}
> [B.defaultRpcServer.handler=37,queue=1,port=16020] 
> regionserver.RSRpcServices(103): Failed doing multi operation, current call 
> is : callId: 3455 service: ClientService meth
> odName: Multi size: 23.1 K connection: 192.168.199.7:52050 param: 
> actionCount=44#regionCount=8#LOCK,\x02,1537434393195.ee6d441a04ee6a59b24262f22f618d88.#
> org.apache.hadoop.hbase.DoNotRetryIOException: ERROR 2008 (INT10): ERROR 2008 
> (INT10): Unable to find cached index metadata.  key=-727998515684050837 
> region=LOCK,\x0E,1537434393195.f4de29d4b36775589a49f
> 1c7a20c73a2.host=hb-bp1v2q830426r6763-004.hbase.rds.aliyuncs.com,16020,1537434304031
>  Index update failed
> at 
> org.apache.phoenix.util.ServerUtil.createIOException(ServerUtil.java:88)
> at 
> org.apache.phoenix.util.ServerUtil.throwIOException(ServerUtil.java:62)
> at 
> org.apache.phoenix.index.PhoenixIndexMetaData.getIndexMetaData(PhoenixIndexMetaData.java:87)
> at 
> org.apache.phoenix.index.PhoenixIndexMetaData.(PhoenixIndexMetaData.java:103)
> at 
> org.apache.phoenix.index.PhoenixIndexBuilder.getIndexMetaData(PhoenixIndexBuilder.java:95)
> at 
> org.apache.phoenix.hbase.index.builder.IndexBuildManager.getIndexUpdate(IndexBuildManager.java:80)
> at 
> org.apache.phoenix.hbase.index.Indexer.preBatchMutateWithExceptions(Indexer.java:528)
> at 
> org.apache.phoenix.hbase.index.Indexer.preBatchMutate(Indexer.java:374)
> at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$37.call(RegionCoprocessorHost.java:1032)
> at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1714)
> at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1789)
> at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1746)
> at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.preBatchMutate(RegionCoprocessorHost.java:1028)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.asyncBatchMutate(HRegion.java:3236)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doAsyncBatchOp(RSRpcServices.java:2147)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicBatchMutationCrossRegions(RSRpcServices.java:2308)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2578)
> at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32303)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2394)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:174)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$CallHandler.run(RpcExecutor.java:178)
> Caused by: java.sql.SQLException: ERROR 2008 (INT10): Unable to find cached 
> index metadata.  key=-727998515684050837 
> region=LOCK,\x0E,1537434393195.f4de29d4b36775589a49f1c7a20c73a2.host=hb-bp1v2q830426r
> 6763-004.hbase.rds.aliyuncs.com,16020,1537434304031
> at 
> org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:493)
> at 
> org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:150)
> at 
> org.apache.phoenix.index.PhoenixIndexMetaData.getIndexMetaData(PhoenixIndexMetaData.java:85)
> 2018-09-20 17:35:39,254 INFO  
> [B.defaultRpcServer.handler=13,queue=1,port=16020] 
> regionserver.RSRpcServices(103): Failed doing multi operation, current call 
> is : callId: 3848 service: ClientService meth
> odName: Multi size: 27.2 K 

[jira] [Updated] (PHOENIX-4915) The client gets stuck when using same rows concurrently writing data table

2018-09-21 Thread Jaanai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-4915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jaanai updated PHOENIX-4915:

Attachment: test.sql
test.java

> The client gets stuck when using same rows concurrently writing data table
> --
>
> Key: PHOENIX-4915
> URL: https://issues.apache.org/jira/browse/PHOENIX-4915
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0, 4.14.0
>Reporter: Jaanai
>Assignee: Jaanai
>Priority: Blocker
> Attachments: test.java, test.sql
>
>
> The client has got stuck when using the multi-thread writes the same rows 
> data into a data table which has a global index.
> I find that rows lock of the data table will not be released under highly 
> writing load and throwing " ERROR 2008 (INT10): ERROR 2008 (INT10): Unable to 
> find cached index metadata." exception information.  Most of the threads will 
> be waiting for getting the row lock in Jstack information. 
> The following are exceptions on the server side:
> {code:java}
> [B.defaultRpcServer.handler=37,queue=1,port=16020] 
> regionserver.RSRpcServices(103): Failed doing multi operation, current call 
> is : callId: 3455 service: ClientService meth
> odName: Multi size: 23.1 K connection: 192.168.199.7:52050 param: 
> actionCount=44#regionCount=8#LOCK,\x02,1537434393195.ee6d441a04ee6a59b24262f22f618d88.#
> org.apache.hadoop.hbase.DoNotRetryIOException: ERROR 2008 (INT10): ERROR 2008 
> (INT10): Unable to find cached index metadata.  key=-727998515684050837 
> region=LOCK,\x0E,1537434393195.f4de29d4b36775589a49f
> 1c7a20c73a2.host=hb-bp1v2q830426r6763-004.hbase.rds.aliyuncs.com,16020,1537434304031
>  Index update failed
> at 
> org.apache.phoenix.util.ServerUtil.createIOException(ServerUtil.java:88)
> at 
> org.apache.phoenix.util.ServerUtil.throwIOException(ServerUtil.java:62)
> at 
> org.apache.phoenix.index.PhoenixIndexMetaData.getIndexMetaData(PhoenixIndexMetaData.java:87)
> at 
> org.apache.phoenix.index.PhoenixIndexMetaData.(PhoenixIndexMetaData.java:103)
> at 
> org.apache.phoenix.index.PhoenixIndexBuilder.getIndexMetaData(PhoenixIndexBuilder.java:95)
> at 
> org.apache.phoenix.hbase.index.builder.IndexBuildManager.getIndexUpdate(IndexBuildManager.java:80)
> at 
> org.apache.phoenix.hbase.index.Indexer.preBatchMutateWithExceptions(Indexer.java:528)
> at 
> org.apache.phoenix.hbase.index.Indexer.preBatchMutate(Indexer.java:374)
> at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$37.call(RegionCoprocessorHost.java:1032)
> at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1714)
> at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1789)
> at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1746)
> at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.preBatchMutate(RegionCoprocessorHost.java:1028)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.asyncBatchMutate(HRegion.java:3236)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doAsyncBatchOp(RSRpcServices.java:2147)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicBatchMutationCrossRegions(RSRpcServices.java:2308)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2578)
> at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32303)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2394)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:174)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$CallHandler.run(RpcExecutor.java:178)
> Caused by: java.sql.SQLException: ERROR 2008 (INT10): Unable to find cached 
> index metadata.  key=-727998515684050837 
> region=LOCK,\x0E,1537434393195.f4de29d4b36775589a49f1c7a20c73a2.host=hb-bp1v2q830426r
> 6763-004.hbase.rds.aliyuncs.com,16020,1537434304031
> at 
> org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:493)
> at 
> org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:150)
> at 
> org.apache.phoenix.index.PhoenixIndexMetaData.getIndexMetaData(PhoenixIndexMetaData.java:85)
> 2018-09-20 17:35:39,254 INFO  
> [B.defaultRpcServer.handler=13,queue=1,port=16020] 
> regionserver.RSRpcServices(103): Failed doing multi operation, current call 
> is : callId: 3848 service: ClientService meth
> odName: Multi size: 27.2 K connection: 192.168.199.7:52042 param: 
> 

[jira] [Updated] (PHOENIX-4915) The client gets stuck when using same rows concurrently writing data table

2018-09-21 Thread Jaanai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-4915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jaanai updated PHOENIX-4915:

Attachment: image-2018-09-21-19-21-41-898.png

> The client gets stuck when using same rows concurrently writing data table
> --
>
> Key: PHOENIX-4915
> URL: https://issues.apache.org/jira/browse/PHOENIX-4915
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0, 4.14.0
>Reporter: Jaanai
>Assignee: Jaanai
>Priority: Blocker
> Attachments: Screen Shot 2018-09-21 at 19.19.28.png, 
> image-2018-09-21-19-21-41-898.png, test.java, test.sql
>
>
> The client has got stuck when using the multi-thread writes the same rows 
> data into a data table which has a global index.
> I find that rows lock of the data table will not be released under highly 
> writing load and throwing " ERROR 2008 (INT10): ERROR 2008 (INT10): Unable to 
> find cached index metadata." exception information.  Most of the threads will 
> be waiting for getting the row lock in Jstack information. 
> The following are exceptions on the server side:
> {code:java}
> [B.defaultRpcServer.handler=37,queue=1,port=16020] 
> regionserver.RSRpcServices(103): Failed doing multi operation, current call 
> is : callId: 3455 service: ClientService meth
> odName: Multi size: 23.1 K connection: 192.168.199.7:52050 param: 
> actionCount=44#regionCount=8#LOCK,\x02,1537434393195.ee6d441a04ee6a59b24262f22f618d88.#
> org.apache.hadoop.hbase.DoNotRetryIOException: ERROR 2008 (INT10): ERROR 2008 
> (INT10): Unable to find cached index metadata.  key=-727998515684050837 
> region=LOCK,\x0E,1537434393195.f4de29d4b36775589a49f
> 1c7a20c73a2.host=hb-bp1v2q830426r6763-004.hbase.rds.aliyuncs.com,16020,1537434304031
>  Index update failed
> at 
> org.apache.phoenix.util.ServerUtil.createIOException(ServerUtil.java:88)
> at 
> org.apache.phoenix.util.ServerUtil.throwIOException(ServerUtil.java:62)
> at 
> org.apache.phoenix.index.PhoenixIndexMetaData.getIndexMetaData(PhoenixIndexMetaData.java:87)
> at 
> org.apache.phoenix.index.PhoenixIndexMetaData.(PhoenixIndexMetaData.java:103)
> at 
> org.apache.phoenix.index.PhoenixIndexBuilder.getIndexMetaData(PhoenixIndexBuilder.java:95)
> at 
> org.apache.phoenix.hbase.index.builder.IndexBuildManager.getIndexUpdate(IndexBuildManager.java:80)
> at 
> org.apache.phoenix.hbase.index.Indexer.preBatchMutateWithExceptions(Indexer.java:528)
> at 
> org.apache.phoenix.hbase.index.Indexer.preBatchMutate(Indexer.java:374)
> at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$37.call(RegionCoprocessorHost.java:1032)
> at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1714)
> at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1789)
> at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1746)
> at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.preBatchMutate(RegionCoprocessorHost.java:1028)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.asyncBatchMutate(HRegion.java:3236)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doAsyncBatchOp(RSRpcServices.java:2147)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicBatchMutationCrossRegions(RSRpcServices.java:2308)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2578)
> at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32303)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2394)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:174)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$CallHandler.run(RpcExecutor.java:178)
> Caused by: java.sql.SQLException: ERROR 2008 (INT10): Unable to find cached 
> index metadata.  key=-727998515684050837 
> region=LOCK,\x0E,1537434393195.f4de29d4b36775589a49f1c7a20c73a2.host=hb-bp1v2q830426r
> 6763-004.hbase.rds.aliyuncs.com,16020,1537434304031
> at 
> org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:493)
> at 
> org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:150)
> at 
> org.apache.phoenix.index.PhoenixIndexMetaData.getIndexMetaData(PhoenixIndexMetaData.java:85)
> 2018-09-20 17:35:39,254 INFO  
> [B.defaultRpcServer.handler=13,queue=1,port=16020] 
> regionserver.RSRpcServices(103): Failed doing multi operation, current call 
> is : callId: 3848 service: ClientService meth
> 

[jira] [Updated] (PHOENIX-4915) The client gets stuck when using same rows concurrently writing data table

2018-09-21 Thread Jaanai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-4915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jaanai updated PHOENIX-4915:

Attachment: (was: image-2018-09-21-19-21-41-898.png)

> The client gets stuck when using same rows concurrently writing data table
> --
>
> Key: PHOENIX-4915
> URL: https://issues.apache.org/jira/browse/PHOENIX-4915
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0, 4.14.0
>Reporter: Jaanai
>Assignee: Jaanai
>Priority: Blocker
> Attachments: Screen Shot 2018-09-21 at 19.19.28.png, 
> image-2018-09-21-19-30-12-989.png, test.java, test.sql
>
>
> The client has got stuck when using the multi-thread writes the same rows 
> data into a data table which has a global index.
> I find that rows lock of the data table will not be released under highly 
> writing load and throwing " ERROR 2008 (INT10): ERROR 2008 (INT10): Unable to 
> find cached index metadata." exception information. Most of the threads will 
> be waiting for getting the row lock in Jstack information.
> The following are exceptions on the server side:
> {code:java}
> [B.defaultRpcServer.handler=37,queue=1,port=16020] 
> regionserver.RSRpcServices(103): Failed doing multi operation, current call 
> is : callId: 3455 service: ClientService meth
> odName: Multi size: 23.1 K connection: 192.168.199.7:52050 param: 
> actionCount=44#regionCount=8#LOCK,\x02,1537434393195.ee6d441a04ee6a59b24262f22f618d88.#
> org.apache.hadoop.hbase.DoNotRetryIOException: ERROR 2008 (INT10): ERROR 2008 
> (INT10): Unable to find cached index metadata.  key=-727998515684050837 
> region=LOCK,\x0E,1537434393195.f4de29d4b36775589a49f
> 1c7a20c73a2.host=hb-bp1v2q830426r6763-004.hbase.rds.aliyuncs.com,16020,1537434304031
>  Index update failed
> at 
> org.apache.phoenix.util.ServerUtil.createIOException(ServerUtil.java:88)
> at 
> org.apache.phoenix.util.ServerUtil.throwIOException(ServerUtil.java:62)
> at 
> org.apache.phoenix.index.PhoenixIndexMetaData.getIndexMetaData(PhoenixIndexMetaData.java:87)
> at 
> org.apache.phoenix.index.PhoenixIndexMetaData.(PhoenixIndexMetaData.java:103)
> at 
> org.apache.phoenix.index.PhoenixIndexBuilder.getIndexMetaData(PhoenixIndexBuilder.java:95)
> at 
> org.apache.phoenix.hbase.index.builder.IndexBuildManager.getIndexUpdate(IndexBuildManager.java:80)
> at 
> org.apache.phoenix.hbase.index.Indexer.preBatchMutateWithExceptions(Indexer.java:528)
> at 
> org.apache.phoenix.hbase.index.Indexer.preBatchMutate(Indexer.java:374)
> at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$37.call(RegionCoprocessorHost.java:1032)
> at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1714)
> at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1789)
> at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1746)
> at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.preBatchMutate(RegionCoprocessorHost.java:1028)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.asyncBatchMutate(HRegion.java:3236)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doAsyncBatchOp(RSRpcServices.java:2147)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicBatchMutationCrossRegions(RSRpcServices.java:2308)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2578)
> at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32303)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2394)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:174)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$CallHandler.run(RpcExecutor.java:178)
> Caused by: java.sql.SQLException: ERROR 2008 (INT10): Unable to find cached 
> index metadata.  key=-727998515684050837 
> region=LOCK,\x0E,1537434393195.f4de29d4b36775589a49f1c7a20c73a2.host=hb-bp1v2q830426r
> 6763-004.hbase.rds.aliyuncs.com,16020,1537434304031
> at 
> org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:493)
> at 
> org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:150)
> at 
> org.apache.phoenix.index.PhoenixIndexMetaData.getIndexMetaData(PhoenixIndexMetaData.java:85)
> 2018-09-20 17:35:39,254 INFO  
> [B.defaultRpcServer.handler=13,queue=1,port=16020] 
> regionserver.RSRpcServices(103): Failed doing multi operation, current call 
> is : callId: 3848 service: 

[jira] [Updated] (PHOENIX-4915) The client gets stuck when using same rows concurrently writing data table

2018-09-21 Thread Jaanai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-4915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jaanai updated PHOENIX-4915:

Attachment: image-2018-09-21-19-30-12-989.png

> The client gets stuck when using same rows concurrently writing data table
> --
>
> Key: PHOENIX-4915
> URL: https://issues.apache.org/jira/browse/PHOENIX-4915
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0, 4.14.0
>Reporter: Jaanai
>Assignee: Jaanai
>Priority: Blocker
> Attachments: Screen Shot 2018-09-21 at 19.19.28.png, 
> image-2018-09-21-19-30-12-989.png, test.java, test.sql
>
>
> The client has got stuck when using the multi-thread writes the same rows 
> data into a data table which has a global index.
> I find that rows lock of the data table will not be released under highly 
> writing load and throwing " ERROR 2008 (INT10): ERROR 2008 (INT10): Unable to 
> find cached index metadata." exception information. Most of the threads will 
> be waiting for getting the row lock in Jstack information.
> The following are exceptions on the server side:
> {code:java}
> [B.defaultRpcServer.handler=37,queue=1,port=16020] 
> regionserver.RSRpcServices(103): Failed doing multi operation, current call 
> is : callId: 3455 service: ClientService meth
> odName: Multi size: 23.1 K connection: 192.168.199.7:52050 param: 
> actionCount=44#regionCount=8#LOCK,\x02,1537434393195.ee6d441a04ee6a59b24262f22f618d88.#
> org.apache.hadoop.hbase.DoNotRetryIOException: ERROR 2008 (INT10): ERROR 2008 
> (INT10): Unable to find cached index metadata.  key=-727998515684050837 
> region=LOCK,\x0E,1537434393195.f4de29d4b36775589a49f
> 1c7a20c73a2.host=hb-bp1v2q830426r6763-004.hbase.rds.aliyuncs.com,16020,1537434304031
>  Index update failed
> at 
> org.apache.phoenix.util.ServerUtil.createIOException(ServerUtil.java:88)
> at 
> org.apache.phoenix.util.ServerUtil.throwIOException(ServerUtil.java:62)
> at 
> org.apache.phoenix.index.PhoenixIndexMetaData.getIndexMetaData(PhoenixIndexMetaData.java:87)
> at 
> org.apache.phoenix.index.PhoenixIndexMetaData.(PhoenixIndexMetaData.java:103)
> at 
> org.apache.phoenix.index.PhoenixIndexBuilder.getIndexMetaData(PhoenixIndexBuilder.java:95)
> at 
> org.apache.phoenix.hbase.index.builder.IndexBuildManager.getIndexUpdate(IndexBuildManager.java:80)
> at 
> org.apache.phoenix.hbase.index.Indexer.preBatchMutateWithExceptions(Indexer.java:528)
> at 
> org.apache.phoenix.hbase.index.Indexer.preBatchMutate(Indexer.java:374)
> at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$37.call(RegionCoprocessorHost.java:1032)
> at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1714)
> at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1789)
> at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1746)
> at 
> org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.preBatchMutate(RegionCoprocessorHost.java:1028)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion.asyncBatchMutate(HRegion.java:3236)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doAsyncBatchOp(RSRpcServices.java:2147)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicBatchMutationCrossRegions(RSRpcServices.java:2308)
> at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2578)
> at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32303)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2394)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:174)
> at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$CallHandler.run(RpcExecutor.java:178)
> Caused by: java.sql.SQLException: ERROR 2008 (INT10): Unable to find cached 
> index metadata.  key=-727998515684050837 
> region=LOCK,\x0E,1537434393195.f4de29d4b36775589a49f1c7a20c73a2.host=hb-bp1v2q830426r
> 6763-004.hbase.rds.aliyuncs.com,16020,1537434304031
> at 
> org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:493)
> at 
> org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:150)
> at 
> org.apache.phoenix.index.PhoenixIndexMetaData.getIndexMetaData(PhoenixIndexMetaData.java:85)
> 2018-09-20 17:35:39,254 INFO  
> [B.defaultRpcServer.handler=13,queue=1,port=16020] 
> regionserver.RSRpcServices(103): Failed doing multi operation, current call 
> is : callId: 3848 service: ClientService meth
> 

[jira] [Updated] (PHOENIX-4915) The client gets stuck when using same rows concurrently writing data table

2018-09-21 Thread Jaanai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-4915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jaanai updated PHOENIX-4915:

Description: 
The client has got stuck when using the multi-thread writes the same rows data 
into a data table which has a global index.

I find that rows lock of the data table will not be released under highly 
writing load and throwing " ERROR 2008 (INT10): ERROR 2008 (INT10): Unable to 
find cached index metadata." exception information. Most of the threads will be 
waiting for getting the row lock in Jstack information.

The following are exceptions on the server side:
{code:java}
[B.defaultRpcServer.handler=37,queue=1,port=16020] 
regionserver.RSRpcServices(103): Failed doing multi operation, current call is 
: callId: 3455 service: ClientService meth
odName: Multi size: 23.1 K connection: 192.168.199.7:52050 param: 
actionCount=44#regionCount=8#LOCK,\x02,1537434393195.ee6d441a04ee6a59b24262f22f618d88.#
org.apache.hadoop.hbase.DoNotRetryIOException: ERROR 2008 (INT10): ERROR 2008 
(INT10): Unable to find cached index metadata.  key=-727998515684050837 
region=LOCK,\x0E,1537434393195.f4de29d4b36775589a49f
1c7a20c73a2.host=hb-bp1v2q830426r6763-004.hbase.rds.aliyuncs.com,16020,1537434304031
 Index update failed
at 
org.apache.phoenix.util.ServerUtil.createIOException(ServerUtil.java:88)
at 
org.apache.phoenix.util.ServerUtil.throwIOException(ServerUtil.java:62)
at 
org.apache.phoenix.index.PhoenixIndexMetaData.getIndexMetaData(PhoenixIndexMetaData.java:87)
at 
org.apache.phoenix.index.PhoenixIndexMetaData.(PhoenixIndexMetaData.java:103)
at 
org.apache.phoenix.index.PhoenixIndexBuilder.getIndexMetaData(PhoenixIndexBuilder.java:95)
at 
org.apache.phoenix.hbase.index.builder.IndexBuildManager.getIndexUpdate(IndexBuildManager.java:80)
at 
org.apache.phoenix.hbase.index.Indexer.preBatchMutateWithExceptions(Indexer.java:528)
at 
org.apache.phoenix.hbase.index.Indexer.preBatchMutate(Indexer.java:374)
at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$37.call(RegionCoprocessorHost.java:1032)
at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$RegionOperation.call(RegionCoprocessorHost.java:1714)
at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1789)
at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1746)
at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.preBatchMutate(RegionCoprocessorHost.java:1028)
at 
org.apache.hadoop.hbase.regionserver.HRegion.asyncBatchMutate(HRegion.java:3236)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.doAsyncBatchOp(RSRpcServices.java:2147)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicBatchMutationCrossRegions(RSRpcServices.java:2308)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2578)
at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32303)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2394)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:174)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$CallHandler.run(RpcExecutor.java:178)
Caused by: java.sql.SQLException: ERROR 2008 (INT10): Unable to find cached 
index metadata.  key=-727998515684050837 
region=LOCK,\x0E,1537434393195.f4de29d4b36775589a49f1c7a20c73a2.host=hb-bp1v2q830426r
6763-004.hbase.rds.aliyuncs.com,16020,1537434304031
at 
org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:493)
at 
org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:150)
at 
org.apache.phoenix.index.PhoenixIndexMetaData.getIndexMetaData(PhoenixIndexMetaData.java:85)

2018-09-20 17:35:39,254 INFO  
[B.defaultRpcServer.handler=13,queue=1,port=16020] 
regionserver.RSRpcServices(103): Failed doing multi operation, current call is 
: callId: 3848 service: ClientService meth
odName: Multi size: 27.2 K connection: 192.168.199.7:52042 param: 
actionCount=52#regionCount=8#LOCK,\x02,1537434393195.ee6d441a04ee6a59b24262f22f618d88.#
org.apache.hadoop.hbase.exceptions.TimeoutIOException: Timed out waiting for 
lock for row: 0e 30 64 32 65 34 35 63 37 2d 63 63 33 64 2d 34 36 61 35 2d 61 34 
38 64 2d 31 38 61 62 36 31 61 31 30 63 30 39
at 
org.apache.phoenix.hbase.index.LockManager.lockRow(LockManager.java:96)
at 
org.apache.phoenix.hbase.index.Indexer.preBatchMutateWithExceptions(Indexer.java:425)
at 
org.apache.phoenix.hbase.index.Indexer.preBatchMutate(Indexer.java:374)
at 
org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$37.call(RegionCoprocessorHost.java:1032)
at 

[jira] [Updated] (PHOENIX-4904) NPE exception when use non-existing fields in function

2018-09-21 Thread Jaanai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-4904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jaanai updated PHOENIX-4904:

Attachment: PHOENIX-4904-master.patch

> NPE exception when use non-existing fields in function 
> ---
>
> Key: PHOENIX-4904
> URL: https://issues.apache.org/jira/browse/PHOENIX-4904
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0, 4.13.0, 4.14.0
>Reporter: Jaanai
>Assignee: Jaanai
>Priority: Major
> Attachments: PHOENIX-4904-master.patch
>
>
> Using following SQL to reoccur error:
> {code:sql}
> create table "test_truncate"("ROW" varchar primary key,"f"."0" 
> varchar,"f"."1" varchar);
> select * from "test_truncate" order by TO_NUMBER("f.1");
> {code}
> Exception information:
> {code}
> java.lang.NullPointerException     at 
> org.apache.phoenix.util.SchemaUtil.getSchemaNameFromFullName(SchemaUtil.java:632)
>      at 
> org.apache.phoenix.schema.TableNotFoundException.(TableNotFoundException.java:44)
>      at 
> org.apache.phoenix.compile.FromCompiler$MultiTableColumnResolver.resolveTable(FromCompiler.java:858)
>      at 
> org.apache.phoenix.compile.FromCompiler$ProjectedTableColumnResolver.resolveColumn(FromCompiler.java:984)
>      at 
> org.apache.phoenix.compile.ExpressionCompiler.resolveColumn(ExpressionCompiler.java:372)
>      at 
> org.apache.phoenix.compile.ExpressionCompiler.visit(ExpressionCompiler.java:408)
>      at 
> org.apache.phoenix.compile.ExpressionCompiler.visit(ExpressionCompiler.java:146)
>      at 
> org.apache.phoenix.parse.ColumnParseNode.accept(ColumnParseNode.java:56)     
> at 
> org.apache.phoenix.parse.CompoundParseNode.acceptChildren(CompoundParseNode.java:64)
>      at 
> org.apache.phoenix.parse.FunctionParseNode.accept(FunctionParseNode.java:84)  
>    at 
> org.apache.phoenix.compile.OrderByCompiler.compile(OrderByCompiler.java:123)  
>    at 
> org.apache.phoenix.compile.QueryCompiler.compileSingleFlatQuery(QueryCompiler.java:562)
>      at 
> org.apache.phoenix.compile.QueryCompiler.compileSingleQuery(QueryCompiler.java:507)
>      at 
> org.apache.phoenix.compile.QueryCompiler.compileSelect(QueryCompiler.java:202)
>      at 
> org.apache.phoenix.compile.QueryCompiler.compile(QueryCompiler.java:157)     
> at 
> org.apache.phoenix.jdbc.PhoenixStatement$ExecutableSelectStatement.compilePlan(PhoenixStatement.java:478)
>      at 
> org.apache.phoenix.jdbc.PhoenixStatement$ExecutableSelectStatement.compilePlan(PhoenixStatement.java:444)
>      a
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-4912) Make Table Sampling algorithm to accommodate to the imbalance row distribution across guide posts

2018-09-21 Thread Bin Shi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bin Shi updated PHOENIX-4912:
-
Description: 
The current implementation of table sampling is based on the assumption "Every 
two consecutive guide posts contains the equal number of rows" which isn't 
accurate in practice, and once we collect multiple versions of cells and the 
deleted rows, the thing will become worse.

In details, the current implementation of table sampling is (see 
BaseResultIterators.getParallelScan() which calls sampleScans(...) at the end 
of function) as described below:
 # Iterate all parallel scans generated;
 # For each scan, if getHashHode(start row key of the scan) MOD 100 < 
tableSamplingRate (See TableSamplerPredicate.java) then pick this scan; 
otherwise discard this scan.

The problem can be formalized as: We have a group of scans and each scan is 
defined as . 
Now we want to randomly pick X groups so that the sum of count of rows in the 
selected groups is close to Y, where Y = the total count of rows of all scans 
denoted as T * table sampling rate denoted as R (0 <= R <= 100).

To resolve the above problem, one of algorithms that we can consider are 
described below. The core idea is to adjust T, R, Y after each pick, so the new 
problem is a child problem of the original problem.
{code:java}
ArrayList TableSampling(ArrayList scans, T, R) {  
    ArrayList pickedScans = new ArrayList();
    Y = T * R / 100.00;
    for (scan in scans) {
        if (Y <= 0) break;
        if (getHashCode(Ki) MOD 100 < R) {
            // then pick this scan, and adjust T, R, Y accordingly
            pickedScans.Add(scan);
            T -= Ci;
            Y -= Ci;
            if (T != 0 && Y > 0) { 
                R = 100.00 * Y / T;
            }
        }
    }
    return pickedScans;
}
{code}

  was:
The current implementation of table sampling is based on the assumption "Every 
two consecutive guide posts contains the equal number of rows" which isn't 
accurate in practice, and once we collect multiple versions of cells and the 
deleted rows, the thing will become worse.

In details, the current implementation of table sampling is (see 
BaseResultIterators.getParallelScan() which calls sampleScans(...) at the end 
of function) as described below:
 # Iterate all parallel scans generated;
 # For each scan, if getHashHode(start row key of the scan) MOD 100 < 
tableSamplingRate (See TableSamplerPredicate.java) then pick this scan; 
otherwise discard this scan.

The problem can be formalized as: We have a group of scans and each scan is 
defined as . 
Now we want to randomly pick X groups so that the sum of count of rows in the 
selected groups is close to Y, where Y = the total count of rows of all scans T 
* table sampling rate R.

To resolve the above problem, one of algorithms that we can consider are 
described below:
{code:java}
ArrayList TableSampling(ArrayList scans, T, R) {  
    ArrayList pickedScans = new ArrayList();
    Y = T * R;
    for (scan in scans) {
        if (Y <= 0) break;
        if (getHashCode(Ki) MOD 100 < R) {
            // then pick this scan, and adjust T, R, Y accordingly
            pickedScans.Add(scan);
            T -= Ci;
            Y -= Ci;
            if (T != 0 && Y > 0) { 
                R = Y / T;
             }
        }
    }
    return pickedScans;
}
{code}


> Make Table Sampling algorithm to accommodate to the imbalance row 
> distribution across guide posts
> -
>
> Key: PHOENIX-4912
> URL: https://issues.apache.org/jira/browse/PHOENIX-4912
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.0.0, 4.15.0
>Reporter: Bin Shi
>Assignee: Bin Shi
>Priority: Major
>
> The current implementation of table sampling is based on the assumption 
> "Every two consecutive guide posts contains the equal number of rows" which 
> isn't accurate in practice, and once we collect multiple versions of cells 
> and the deleted rows, the thing will become worse.
> In details, the current implementation of table sampling is (see 
> BaseResultIterators.getParallelScan() which calls sampleScans(...) at the end 
> of function) as described below:
>  # Iterate all parallel scans generated;
>  # For each scan, if getHashHode(start row key of the scan) MOD 100 < 
> tableSamplingRate (See TableSamplerPredicate.java) then pick this scan; 
> otherwise discard this scan.
> The problem can be formalized as: We have a group of scans and each scan is 
> defined as  Ci>. Now we want to randomly pick X groups so that the sum of count of rows 
> in the selected groups is close to Y, where Y = the total count of rows of 
> all scans denoted as T * table sampling rate denoted as R (0 <= R <= 100).
> To resolve the above 

[jira] [Created] (PHOENIX-4916) When collecting statistics, the estimated size of a guide post may only count part of cells of the last row

2018-09-21 Thread Bin Shi (JIRA)
Bin Shi created PHOENIX-4916:


 Summary: When collecting statistics, the estimated size of a guide 
post may only count part of cells of the last row
 Key: PHOENIX-4916
 URL: https://issues.apache.org/jira/browse/PHOENIX-4916
 Project: Phoenix
  Issue Type: Bug
Reporter: Bin Shi
Assignee: Bin Shi


In DefaultStatisticsCollector.collectStatistics(...), it iterate all cells of 
the current row, once the accumulated estimated size plus the size of the 
current cell >= guide post width, it skipped all the remaining cells. The 
result is that  he estimated size of a guide post may only count part of cells 
of the last row.

This problem can be ignored in clusters with real data where the guide post 
width is much bigger than the row size, but it does have impact on unit test 
and iteration test, because we use very small guide post width in the test 
which results in inaccuracy of the estimated size of the query.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PHOENIX-4912) Make Table Sampling algorithm to accommodate to the imbalance row distribution across guide posts

2018-09-21 Thread Bin Shi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PHOENIX-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bin Shi updated PHOENIX-4912:
-
Description: 
The current implementation of table sampling is based on the assumption "Every 
two consecutive guide posts contains the equal number of rows" which isn't 
accurate in practice, and once we collect multiple versions of cells and the 
deleted rows, the thing will become worse.

In details, the current implementation of table sampling is (see 
BaseResultIterators.getParallelScan() which calls sampleScans(...) at the end 
of function) as described below:
 # Iterate all parallel scans generated;
 # For each scan, if getHashHode(start row key of the scan) MOD 100 < 
tableSamplingRate (See TableSamplerPredicate.java) then pick this scan; 
otherwise discard this scan.

The problem can be formalized as: We have a group of scans and each scan is 
defined as . 
Now we want to randomly pick X groups so that the sum of count of rows in the 
selected groups is close to Y, where Y = the total count of rows of all scans 
denoted as T * table sampling rate denoted as R (0 <= R <= 100) / 100.00.

To resolve the above problem, one of algorithms that we can consider are 
described below. The core idea is to adjust T, R, Y after each pick, so the new 
problem is a child problem of the original problem.
{code:java}
ArrayList TableSampling(ArrayList scans, T, R) {  
    ArrayList pickedScans = new ArrayList();
    Y = T * R / 100.00;
    for (scan in scans) {
        if (Y <= 0) break;
        if (getHashCode(Ki) MOD 100 < R) {
            // then pick this scan, and adjust T, R, Y accordingly
            pickedScans.Add(scan);
            T -= Ci;
            Y -= Ci;
            if (T != 0 && Y > 0) { 
                R = 100.00 * Y / T;
            }
        }
    }
    return pickedScans;
}
{code}

  was:
The current implementation of table sampling is based on the assumption "Every 
two consecutive guide posts contains the equal number of rows" which isn't 
accurate in practice, and once we collect multiple versions of cells and the 
deleted rows, the thing will become worse.

In details, the current implementation of table sampling is (see 
BaseResultIterators.getParallelScan() which calls sampleScans(...) at the end 
of function) as described below:
 # Iterate all parallel scans generated;
 # For each scan, if getHashHode(start row key of the scan) MOD 100 < 
tableSamplingRate (See TableSamplerPredicate.java) then pick this scan; 
otherwise discard this scan.

The problem can be formalized as: We have a group of scans and each scan is 
defined as . 
Now we want to randomly pick X groups so that the sum of count of rows in the 
selected groups is close to Y, where Y = the total count of rows of all scans 
denoted as T * table sampling rate denoted as R (0 <= R <= 100).

To resolve the above problem, one of algorithms that we can consider are 
described below. The core idea is to adjust T, R, Y after each pick, so the new 
problem is a child problem of the original problem.
{code:java}
ArrayList TableSampling(ArrayList scans, T, R) {  
    ArrayList pickedScans = new ArrayList();
    Y = T * R / 100.00;
    for (scan in scans) {
        if (Y <= 0) break;
        if (getHashCode(Ki) MOD 100 < R) {
            // then pick this scan, and adjust T, R, Y accordingly
            pickedScans.Add(scan);
            T -= Ci;
            Y -= Ci;
            if (T != 0 && Y > 0) { 
                R = 100.00 * Y / T;
            }
        }
    }
    return pickedScans;
}
{code}


> Make Table Sampling algorithm to accommodate to the imbalance row 
> distribution across guide posts
> -
>
> Key: PHOENIX-4912
> URL: https://issues.apache.org/jira/browse/PHOENIX-4912
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.0.0, 4.15.0
>Reporter: Bin Shi
>Assignee: Bin Shi
>Priority: Major
>
> The current implementation of table sampling is based on the assumption 
> "Every two consecutive guide posts contains the equal number of rows" which 
> isn't accurate in practice, and once we collect multiple versions of cells 
> and the deleted rows, the thing will become worse.
> In details, the current implementation of table sampling is (see 
> BaseResultIterators.getParallelScan() which calls sampleScans(...) at the end 
> of function) as described below:
>  # Iterate all parallel scans generated;
>  # For each scan, if getHashHode(start row key of the scan) MOD 100 < 
> tableSamplingRate (See TableSamplerPredicate.java) then pick this scan; 
> otherwise discard this scan.
> The problem can be formalized as: We have a group of scans and each scan is 
> defined as  Ci>. Now we want to randomly pick X groups so that the sum of count of rows