[jira] [Commented] (PHOENIX-6501) Use batching when joining data table rows with uncovered index rows

2022-03-07 Thread Lars Hofhansl (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-6501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17502624#comment-17502624
 ] 

Lars Hofhansl commented on PHOENIX-6501:


As discussed in PHOENIX-6458, there was an issue with synchronously creating 
the global index.
With that out of the way this seems to work fine. In my test env I didn't see a 
perf improvement, but that's because everything is local, and so the network is 
negligible.


> Use batching when joining data table rows with uncovered index rows
> ---
>
> Key: PHOENIX-6501
> URL: https://issues.apache.org/jira/browse/PHOENIX-6501
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.1.2
>Reporter: Kadir Ozdemir
>Assignee: Kadir OZDEMIR
>Priority: Major
> Attachments: PHOENIX-6501.master.001.patch
>
>
> PHOENIX-6458 extends the existing uncovered local index support for global 
> indexes. The current solution uses HBase get operations to join data table 
> rows with uncovered index rows on the server side. Doing a separate RPC call 
> for every data table row can be expensive. Instead, we can buffer lots of 
> data row keys in memory,  use a skip scan filter and even multiple threads to 
> issue a separate scan for each data table region in parallel. This will 
> reduce the cost of join and also improve the performance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (PHOENIX-6501) Use batching when joining data table rows with uncovered index rows

2022-03-07 Thread Lars Hofhansl (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-6501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17502556#comment-17502556
 ] 

Lars Hofhansl commented on PHOENIX-6501:


That might be a bit tricky. I loaded the TPCH lineitem table (scale factor 3) 
into Phoenix via the Trino connector.

{code}
CREATE TABLE phoenix.default.lineitem (
orderkey bigint NOT NULL,
partkey bigint,
suppkey bigint,
linenumber integer NOT NULL,
quantity double,
extendedprice double,
discount double,
tax double,
returnflag varchar(1),
linestatus varchar(1),
shipdate date,
commitdate date,
receiptdate date,
shipinstruct varchar(25),
shipmode varchar(10),
comment varchar(44)
)
WITH (
compression = 'ZSTD',
data_block_encoding = 'ROW_INDEX_V1',
disable_wal = true,
immutable_rows = true,
rowkeys = 'ORDERKEY,LINENUMBER'
)
{code}

(I do disable WAL everywhere, because that's not what I am testing and it 
speeds up loading/creating)

Then I created the global index on the tax column.
{{create index g_l_tax on lineitem(tax) DISABLE_WAL=true;}}

Then I ran {{select /*+ INDEX(lineitem g_l_tax) */ count(suppkey) from lineitem 
where tax = 0.08}}

Let me connect with you offline and see if I can send you a CSV with the 
lineitem data.


> Use batching when joining data table rows with uncovered index rows
> ---
>
> Key: PHOENIX-6501
> URL: https://issues.apache.org/jira/browse/PHOENIX-6501
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.1.2
>Reporter: Kadir Ozdemir
>Assignee: Kadir OZDEMIR
>Priority: Major
> Attachments: PHOENIX-6501.master.001.patch
>
>
> PHOENIX-6458 extends the existing uncovered local index support for global 
> indexes. The current solution uses HBase get operations to join data table 
> rows with uncovered index rows on the server side. Doing a separate RPC call 
> for every data table row can be expensive. Instead, we can buffer lots of 
> data row keys in memory,  use a skip scan filter and even multiple threads to 
> issue a separate scan for each data table region in parallel. This will 
> reduce the cost of join and also improve the performance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (PHOENIX-6501) Use batching when joining data table rows with uncovered index rows

2022-03-07 Thread Kadir OZDEMIR (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-6501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17502543#comment-17502543
 ] 

Kadir OZDEMIR commented on PHOENIX-6501:


[~larsh], Thank you for checking it. Would please post the steps to run the 
test? I want to run it and see if I can find the root cause. 

> Use batching when joining data table rows with uncovered index rows
> ---
>
> Key: PHOENIX-6501
> URL: https://issues.apache.org/jira/browse/PHOENIX-6501
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.1.2
>Reporter: Kadir Ozdemir
>Assignee: Kadir OZDEMIR
>Priority: Major
> Attachments: PHOENIX-6501.master.001.patch
>
>
> PHOENIX-6458 extends the existing uncovered local index support for global 
> indexes. The current solution uses HBase get operations to join data table 
> rows with uncovered index rows on the server side. Doing a separate RPC call 
> for every data table row can be expensive. Instead, we can buffer lots of 
> data row keys in memory,  use a skip scan filter and even multiple threads to 
> issue a separate scan for each data table region in parallel. This will 
> reduce the cost of join and also improve the performance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (PHOENIX-6501) Use batching when joining data table rows with uncovered index rows

2022-03-07 Thread Lars Hofhansl (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-6501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17502541#comment-17502541
 ] 

Lars Hofhansl commented on PHOENIX-6501:


Testing the attached patch. Running a query on a table with 18m rows, that 
selects (counts) 2m of them.

The query on the uncovered global index *does not finish* (I stopped it after 
10 minutes). :(

With no index it takes about 7s, with an uncovered local index it takes about 
10s (due to the merging cost and low selectivity of the query).

So there's some bug somewhere.

 

> Use batching when joining data table rows with uncovered index rows
> ---
>
> Key: PHOENIX-6501
> URL: https://issues.apache.org/jira/browse/PHOENIX-6501
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.1.2
>Reporter: Kadir Ozdemir
>Assignee: Kadir OZDEMIR
>Priority: Major
> Attachments: PHOENIX-6501.master.001.patch
>
>
> PHOENIX-6458 extends the existing uncovered local index support for global 
> indexes. The current solution uses HBase get operations to join data table 
> rows with uncovered index rows on the server side. Doing a separate RPC call 
> for every data table row can be expensive. Instead, we can buffer lots of 
> data row keys in memory,  use a skip scan filter and even multiple threads to 
> issue a separate scan for each data table region in parallel. This will 
> reduce the cost of join and also improve the performance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (PHOENIX-6501) Use batching when joining data table rows with uncovered index rows

2022-03-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-6501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17501638#comment-17501638
 ] 

ASF GitHub Bot commented on PHOENIX-6501:
-

kadirozde commented on pull request #1399:
URL: https://github.com/apache/phoenix/pull/1399#issuecomment-1059669953


   > Can we do this for local indexes as well? There is also a significant cost 
to seeking (even when done locally)
   > 
   > In fact the only difference might be how we get the a table reference.
   
   @lhofhansl, we can definitely do this for local indexes. There are some 
differences. The first is as you pointed out table vs region. For local 
indexes, both data and index are accessed via the local region while the global 
index is accessed remotely so we need to get the table for the global index and 
then get connections to access the table. The concern of which connection pool 
to use is not applicable to the local index.  The second difference is that 
there is one table region to retrieve data rows for the local index. However, 
for the global index, there can be many. So we discover the table region 
boundaries and access them in parallel for the global indexes using a thread 
pool, which is not necessary for the local index. The last difference is 
handling the row key offset for local indexes, which is not necessary for the 
global indexes. So, I thought instead of lumping local and global index 
batching together, we should handle them separately. I suggest having a 
separate Jira and PR for the local index.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@phoenix.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Use batching when joining data table rows with uncovered index rows
> ---
>
> Key: PHOENIX-6501
> URL: https://issues.apache.org/jira/browse/PHOENIX-6501
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.1.2
>Reporter: Kadir Ozdemir
>Assignee: Kadir OZDEMIR
>Priority: Major
>
> PHOENIX-6458 extends the existing uncovered local index support for global 
> indexes. The current solution uses HBase get operations to join data table 
> rows with uncovered index rows on the server side. Doing a separate RPC call 
> for every data table row can be expensive. Instead, we can buffer lots of 
> data row keys in memory,  use a skip scan filter and even multiple threads to 
> issue a separate scan for each data table region in parallel. This will 
> reduce the cost of join and also improve the performance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (PHOENIX-6501) Use batching when joining data table rows with uncovered index rows

2022-03-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-6501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17501617#comment-17501617
 ] 

ASF GitHub Bot commented on PHOENIX-6501:
-

lhofhansl edited a comment on pull request #1399:
URL: https://github.com/apache/phoenix/pull/1399#issuecomment-1059635660


   Can we do this for local indexes as well? There is also a significant cost 
to seeking (even when done locally)
   
   In fact the only difference might be how we get the a table reference.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@phoenix.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Use batching when joining data table rows with uncovered index rows
> ---
>
> Key: PHOENIX-6501
> URL: https://issues.apache.org/jira/browse/PHOENIX-6501
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.1.2
>Reporter: Kadir Ozdemir
>Assignee: Kadir OZDEMIR
>Priority: Major
>
> PHOENIX-6458 extends the existing uncovered local index support for global 
> indexes. The current solution uses HBase get operations to join data table 
> rows with uncovered index rows on the server side. Doing a separate RPC call 
> for every data table row can be expensive. Instead, we can buffer lots of 
> data row keys in memory,  use a skip scan filter and even multiple threads to 
> issue a separate scan for each data table region in parallel. This will 
> reduce the cost of join and also improve the performance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (PHOENIX-6501) Use batching when joining data table rows with uncovered index rows

2022-03-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-6501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17501615#comment-17501615
 ] 

ASF GitHub Bot commented on PHOENIX-6501:
-

lhofhansl commented on pull request #1399:
URL: https://github.com/apache/phoenix/pull/1399#issuecomment-1059635660


   Can we do this for local indexes as well? There is also a significant cost 
to seeking (even when done locally)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@phoenix.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Use batching when joining data table rows with uncovered index rows
> ---
>
> Key: PHOENIX-6501
> URL: https://issues.apache.org/jira/browse/PHOENIX-6501
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.1.2
>Reporter: Kadir Ozdemir
>Assignee: Kadir OZDEMIR
>Priority: Major
>
> PHOENIX-6458 extends the existing uncovered local index support for global 
> indexes. The current solution uses HBase get operations to join data table 
> rows with uncovered index rows on the server side. Doing a separate RPC call 
> for every data table row can be expensive. Instead, we can buffer lots of 
> data row keys in memory,  use a skip scan filter and even multiple threads to 
> issue a separate scan for each data table region in parallel. This will 
> reduce the cost of join and also improve the performance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (PHOENIX-6501) Use batching when joining data table rows with uncovered index rows

2022-03-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-6501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17499817#comment-17499817
 ] 

ASF GitHub Bot commented on PHOENIX-6501:
-

kadirozde opened a new pull request #1399:
URL: https://github.com/apache/phoenix/pull/1399


   … index rows


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@phoenix.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Use batching when joining data table rows with uncovered index rows
> ---
>
> Key: PHOENIX-6501
> URL: https://issues.apache.org/jira/browse/PHOENIX-6501
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.1.2
>Reporter: Kadir Ozdemir
>Assignee: Kadir OZDEMIR
>Priority: Major
>
> PHOENIX-6458 extends the existing uncovered local index support for global 
> indexes. The current solution uses HBase get operations to join data table 
> rows with uncovered index rows on the server side. Doing a separate RPC call 
> for every data table row can be expensive. Instead, we can buffer lots of 
> data row keys in memory,  use a skip scan filter and even multiple threads to 
> issue a separate scan for each data table region in parallel. This will 
> reduce the cost of join and also improve the performance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)