[jira] [Commented] (PHOENIX-6501) Use batching when joining data table rows with uncovered index rows

2022-03-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-6501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17501638#comment-17501638
 ] 

ASF GitHub Bot commented on PHOENIX-6501:
-

kadirozde commented on pull request #1399:
URL: https://github.com/apache/phoenix/pull/1399#issuecomment-1059669953


   > Can we do this for local indexes as well? There is also a significant cost 
to seeking (even when done locally)
   > 
   > In fact the only difference might be how we get the a table reference.
   
   @lhofhansl, we can definitely do this for local indexes. There are some 
differences. The first is as you pointed out table vs region. For local 
indexes, both data and index are accessed via the local region while the global 
index is accessed remotely so we need to get the table for the global index and 
then get connections to access the table. The concern of which connection pool 
to use is not applicable to the local index.  The second difference is that 
there is one table region to retrieve data rows for the local index. However, 
for the global index, there can be many. So we discover the table region 
boundaries and access them in parallel for the global indexes using a thread 
pool, which is not necessary for the local index. The last difference is 
handling the row key offset for local indexes, which is not necessary for the 
global indexes. So, I thought instead of lumping local and global index 
batching together, we should handle them separately. I suggest having a 
separate Jira and PR for the local index.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@phoenix.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Use batching when joining data table rows with uncovered index rows
> ---
>
> Key: PHOENIX-6501
> URL: https://issues.apache.org/jira/browse/PHOENIX-6501
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.1.2
>Reporter: Kadir Ozdemir
>Assignee: Kadir OZDEMIR
>Priority: Major
>
> PHOENIX-6458 extends the existing uncovered local index support for global 
> indexes. The current solution uses HBase get operations to join data table 
> rows with uncovered index rows on the server side. Doing a separate RPC call 
> for every data table row can be expensive. Instead, we can buffer lots of 
> data row keys in memory,  use a skip scan filter and even multiple threads to 
> issue a separate scan for each data table region in parallel. This will 
> reduce the cost of join and also improve the performance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [phoenix] kadirozde commented on pull request #1399: PHOENIX-6501 Use batching when joining data table rows with uncovered…

2022-03-04 Thread GitBox


kadirozde commented on pull request #1399:
URL: https://github.com/apache/phoenix/pull/1399#issuecomment-1059669953


   > Can we do this for local indexes as well? There is also a significant cost 
to seeking (even when done locally)
   > 
   > In fact the only difference might be how we get the a table reference.
   
   @lhofhansl, we can definitely do this for local indexes. There are some 
differences. The first is as you pointed out table vs region. For local 
indexes, both data and index are accessed via the local region while the global 
index is accessed remotely so we need to get the table for the global index and 
then get connections to access the table. The concern of which connection pool 
to use is not applicable to the local index.  The second difference is that 
there is one table region to retrieve data rows for the local index. However, 
for the global index, there can be many. So we discover the table region 
boundaries and access them in parallel for the global indexes using a thread 
pool, which is not necessary for the local index. The last difference is 
handling the row key offset for local indexes, which is not necessary for the 
global indexes. So, I thought instead of lumping local and global index 
batching together, we should handle them separately. I suggest having a 
separate Jira
  and PR for the local index.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@phoenix.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (PHOENIX-6501) Use batching when joining data table rows with uncovered index rows

2022-03-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-6501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17501617#comment-17501617
 ] 

ASF GitHub Bot commented on PHOENIX-6501:
-

lhofhansl edited a comment on pull request #1399:
URL: https://github.com/apache/phoenix/pull/1399#issuecomment-1059635660


   Can we do this for local indexes as well? There is also a significant cost 
to seeking (even when done locally)
   
   In fact the only difference might be how we get the a table reference.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@phoenix.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Use batching when joining data table rows with uncovered index rows
> ---
>
> Key: PHOENIX-6501
> URL: https://issues.apache.org/jira/browse/PHOENIX-6501
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.1.2
>Reporter: Kadir Ozdemir
>Assignee: Kadir OZDEMIR
>Priority: Major
>
> PHOENIX-6458 extends the existing uncovered local index support for global 
> indexes. The current solution uses HBase get operations to join data table 
> rows with uncovered index rows on the server side. Doing a separate RPC call 
> for every data table row can be expensive. Instead, we can buffer lots of 
> data row keys in memory,  use a skip scan filter and even multiple threads to 
> issue a separate scan for each data table region in parallel. This will 
> reduce the cost of join and also improve the performance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [phoenix] lhofhansl edited a comment on pull request #1399: PHOENIX-6501 Use batching when joining data table rows with uncovered…

2022-03-04 Thread GitBox


lhofhansl edited a comment on pull request #1399:
URL: https://github.com/apache/phoenix/pull/1399#issuecomment-1059635660


   Can we do this for local indexes as well? There is also a significant cost 
to seeking (even when done locally)
   
   In fact the only difference might be how we get the a table reference.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@phoenix.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (PHOENIX-6501) Use batching when joining data table rows with uncovered index rows

2022-03-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-6501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17501615#comment-17501615
 ] 

ASF GitHub Bot commented on PHOENIX-6501:
-

lhofhansl commented on pull request #1399:
URL: https://github.com/apache/phoenix/pull/1399#issuecomment-1059635660


   Can we do this for local indexes as well? There is also a significant cost 
to seeking (even when done locally)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@phoenix.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Use batching when joining data table rows with uncovered index rows
> ---
>
> Key: PHOENIX-6501
> URL: https://issues.apache.org/jira/browse/PHOENIX-6501
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.1.2
>Reporter: Kadir Ozdemir
>Assignee: Kadir OZDEMIR
>Priority: Major
>
> PHOENIX-6458 extends the existing uncovered local index support for global 
> indexes. The current solution uses HBase get operations to join data table 
> rows with uncovered index rows on the server side. Doing a separate RPC call 
> for every data table row can be expensive. Instead, we can buffer lots of 
> data row keys in memory,  use a skip scan filter and even multiple threads to 
> issue a separate scan for each data table region in parallel. This will 
> reduce the cost of join and also improve the performance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [phoenix] lhofhansl commented on pull request #1399: PHOENIX-6501 Use batching when joining data table rows with uncovered…

2022-03-04 Thread GitBox


lhofhansl commented on pull request #1399:
URL: https://github.com/apache/phoenix/pull/1399#issuecomment-1059635660


   Can we do this for local indexes as well? There is also a significant cost 
to seeking (even when done locally)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@phoenix.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (PHOENIX-6458) Using global indexes for queries with uncovered columns

2022-03-04 Thread Lars Hofhansl (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-6458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17501614#comment-17501614
 ] 

Lars Hofhansl commented on PHOENIX-6458:


Hmm... Doesn't quite seem to work:
{code:java}
 > select /*+ NO_INDEX */ count(suppkey) from lineitem where tax = 0.08;
++
| COUNT(SUPPKEY) |
++
| 2000406        |
++
1 row selected (6.614 seconds)

> select /*+ INDEX(lineitem g_l_tax) */ count(suppkey) from lineitem where tax 
> = 0.08;
+--+
| COUNT("SUPPKEY") |
+--+
| 0                |
+--+
1 row selected (7.422 seconds)

> explain select /*+ INDEX(lineitem g_l_tax) */ count(suppkey) from lineitem 
> where tax = 0.08;
+-++---+---+
|                                          PLAN                                 
          | EST_BYTES_READ | EST_ROWS_READ |  EST_INFO_TS  |
+-++---+---+
| CLIENT 3-CHUNK 511502 ROWS 20971582 BYTES PARALLEL 1-WAY RANGE SCAN OVER 
G_L_TAX [0.08] | 20971582       | 511502        | 1646441656705 |
|     SERVER MERGE [0.SUPPKEY]                                                  
          | 20971582       | 511502        | 1646441656705 |
|     SERVER FILTER BY FIRST KEY ONLY                                           
          | 20971582       | 511502        | 1646441656705 |
|     SERVER AGGREGATE INTO SINGLE ROW                                          
          | 20971582       | 511502        | 1646441656705 |
+-++---+---+
4 rows selected (0.03 seconds){code}
 

[~kozdemir] 

> Using global indexes for queries with uncovered columns
> ---
>
> Key: PHOENIX-6458
> URL: https://issues.apache.org/jira/browse/PHOENIX-6458
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.1.0
>Reporter: Kadir Ozdemir
>Assignee: Kadir OZDEMIR
>Priority: Major
> Fix For: 4.17.0, 5.2.0, 5.1.3
>
> Attachments: PHOENIX-6458.master.001.patch, 
> PHOENIX-6458.master.002.patch
>
>
> The Phoenix query optimizer does not use a global index for a query with the 
> columns that are not covered by the global index if the query does not have 
> the corresponding index hint for this index. With the index hint, the 
> optimizer rewrites the query where the index is used within a subquery. With 
> this subquery, the row keys of the index rows that satisfy the subquery are 
> retrieved by the Phoenix client and then pushed into the Phoenix server 
> caches of the data table regions. Finally, on the server side, data table 
> rows are scanned and joined with the index rows using HashJoin. Based on the 
> selectivity of the original query, this join operation may still result in 
> scanning a large amount of data table rows. 
> Eliminating these data table scans would be a significant improvement. To do 
> that, instead of rewriting the query, the Phoenix optimizer simply treats the 
> global index as a covered index for the given query. With this, the Phoenix 
> query optimizer chooses the index table for the query especially when the 
> index row key prefix length is greater than the data row key prefix length 
> for the query. On the server side, the index table is scanned using index row 
> key ranges implied by the query and the index row keys are then mapped to the 
> data table row keys (please note an index row key includes all the data row 
> key columns). Finally, the corresponding data table rows are scanned using 
> server-to-server RPCs.  PHOENIX-6458 (this Jira) retrieves the data table 
> rows one by one using the HBase get operation. PHOENIX-6501 replaces this get 
> operation with the scan operation to reduce the number of server-to-server 
> RPC calls.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)