from:"Lars Hofhansl \(Jira\)"

[jira] [Comment Edited] (PHOENIX-6983) Add hint to disable server merges for uncovered index queries

2023-06-26 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17737302#comment-17737302
 ] 

Lars Hofhansl edited comment on PHOENIX-6983 at 6/26/23 5:50 PM:
-

isn't this the same as hinting not to use the index in question, or hinting a 
full scan?

[~stoty] 


was (Author: lhofhansl):
isn't this the same as hinting not to use the index in question, or hinting a 
full scan?

> Add hint to disable server merges for uncovered index queries
> -
>
> Key: PHOENIX-6983
> URL: https://issues.apache.org/jira/browse/PHOENIX-6983
> Project: Phoenix
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 5.2.0, 5.1.3
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
> Fix For: 5.2.0, 5.1.4
>
>
> In certain cases, the new server merge code is far less efficient than the 
> old skip-scan-merge code path.
> Specifically, when a large number of rows is matched on the index table, then 
> each of those rows has to be resolved from the data table, and the filtering 
> must be done on the index RS.
> With the old code path, these filters were pushed to the data table, and 
> processed in parallel, with much better performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (PHOENIX-6983) Add hint to disable server merges for uncovered index queries

2023-06-26 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17737302#comment-17737302
 ] 

Lars Hofhansl commented on PHOENIX-6983:


isn't this the same as hinting not to use the index in question, or hinting a 
full scan?

> Add hint to disable server merges for uncovered index queries
> -
>
> Key: PHOENIX-6983
> URL: https://issues.apache.org/jira/browse/PHOENIX-6983
> Project: Phoenix
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 5.2.0, 5.1.3
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
> Fix For: 5.2.0, 5.1.4
>
>
> In certain cases, the new server merge code is far less efficient than the 
> old skip-scan-merge code path.
> Specifically, when a large number of rows is matched on the index table, then 
> each of those rows has to be resolved from the data table, and the filtering 
> must be done on the index RS.
> With the old code path, these filters were pushed to the data table, and 
> processed in parallel, with much better performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] (PHOENIX-792) Support UPSERT SET command

2022-10-26 Thread Lars Hofhansl (Jira)



[ https://issues.apache.org/jira/browse/PHOENIX-792 ]


Lars Hofhansl deleted comment on PHOENIX-792:
---

was (Author: JIRAUSER297473):
Informative, for more info visit this [Blog|https://techlarapoint.com/].

> Support UPSERT SET command
> --
>
> Key: PHOENIX-792
> URL: https://issues.apache.org/jira/browse/PHOENIX-792
> Project: Phoenix
>  Issue Type: Task
>Reporter: James R. Taylor
>Assignee: thrylokya
>
> Support setting values in a table through a new UPSERT SET command like this:
> UPSERT my_table SET title = 'CEO'
> WHERE name = 'John Doe'
> UPSERT my_table SET pay_by_quarter = ARRAY[25000,25000,27000,27000]
> WHERE name = 'Carol';
> UPSERT my_table SET pay_by_quarter[4] = 15000
> WHERE name = 'Carol';
> This would essentially be syntactic sugar and use the same UpsertCompiler, 
> mapping to an UPSERT SELECT command that simply fills in the primary key 
> columns like this:
> UPSERT FROM my_table(name,title) 
> SELECT name,'CEO' FROM my_table
> WHERE name = 'John Doe'
> UPSERT FROM my_table(name, pay_by_quarter[4]) 
> SELECT name,15000 FROM my_table
> WHERE name = 'Carol';



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (PHOENIX-6671) Avoid ShortCirtuation Coprocessor Connection with HBase 2.x

2022-04-02 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17516359#comment-17516359
 ] 

Lars Hofhansl commented on PHOENIX-6671:


Or just this (checks HBase version), assuming this lands in 2.4.12.

> Avoid ShortCirtuation Coprocessor Connection with HBase 2.x
> ---
>
> Key: PHOENIX-6671
> URL: https://issues.apache.org/jira/browse/PHOENIX-6671
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
>Priority: Major
> Fix For: 5.2.0, 5.1.3
>
> Attachments: 6671-5.1.txt, 6671-v2-5.1.txt
>
>
> See PHOENIX-6501, PHOENIX-6458, and HBASE-26812.
> HBase's ShortCircuit Connection are fundamentally broken in HBase 2. We might 
> be able to fix it there, but with all the work the RPC handlers perform now 
> (closing scanning, resolving current user, etc), I doubt we'll get that 100% 
> right. HBase 3 has removed this functionality.
> Even with HBase 2, which does not have the async protobuf code, I could 
> hardly see any performance improvement from circumventing the RPC stack in 
> case the target of a Get or Scan is local. Even in the most ideal conditions 
> where everything is local, there was improvement outside of noise.
> I suggest we do not use ShortCircuited Connections in Phoenix 5+.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (PHOENIX-6671) Avoid ShortCirtuation Coprocessor Connection with HBase 2.x

2022-04-01 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17516185#comment-17516185
 ] 

Lars Hofhansl commented on PHOENIX-6671:


With HBASE-26869 and HBASE-26812 committed I think we won't need as long as 
require HBase 2.4.12 (once released) as minimum version.

 

> Avoid ShortCirtuation Coprocessor Connection with HBase 2.x
> ---
>
> Key: PHOENIX-6671
> URL: https://issues.apache.org/jira/browse/PHOENIX-6671
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
>Priority: Major
> Fix For: 5.2.0, 5.1.3
>
> Attachments: 6671-5.1.txt
>
>
> See PHOENIX-6501, PHOENIX-6458, and HBASE-26812.
> HBase's ShortCircuit Connection are fundamentally broken in HBase 2. We might 
> be able to fix it there, but with all the work the RPC handlers perform now 
> (closing scanning, resolving current user, etc), I doubt we'll get that 100% 
> right. HBase 3 has removed this functionality.
> Even with HBase 2, which does not have the async protobuf code, I could 
> hardly see any performance improvement from circumventing the RPC stack in 
> case the target of a Get or Scan is local. Even in the most ideal conditions 
> where everything is local, there was improvement outside of noise.
> I suggest we do not use ShortCircuited Connections in Phoenix 5+.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (PHOENIX-6678) Dbeaver jdbc connetion for Phoenix

2022-03-31 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17515672#comment-17515672
 ] 

Lars Hofhansl commented on PHOENIX-6678:


The email list [d...@phoenix.apache.org|mailto:d...@phoenix.apache.org] and 
[u...@phoenix.apache.org|mailto:u...@phoenix.apache.org] are very much alive 
and valid.

> Dbeaver jdbc connetion for Phoenix
> --
>
> Key: PHOENIX-6678
> URL: https://issues.apache.org/jira/browse/PHOENIX-6678
> Project: Phoenix
>  Issue Type: Bug
>  Components: connectors
>Affects Versions: 5.0.0
> Environment: Hbase 2.0.2
> Phoenix 5.0.0
>Reporter: Dmitry Kravchuk
>Priority: Major
> Attachments: image-2022-03-30-15-53-14-255.png, 
> image-2022-03-30-15-53-23-717.png, image-2022-03-30-15-55-19-706.png, 
> image-2022-03-30-15-56-42-691.png, image-2022-03-30-15-57-39-813.png, 
> image-2022-03-31-10-56-32-334.png, image-2022-03-31-10-57-11-856.png, 
> image-2022-03-31-10-57-26-287.png, image-2022-03-31-13-49-29-131.png, 
> image-2022-03-31-13-50-18-105.png, image-2022-03-31-13-50-45-594.png, 
> image-2022-03-31-13-51-07-925.png, image-2022-03-31-13-51-29-100.png
>
>
> I've been searching solution for successfull connection to Phoenix usig 
> DBeaven for a weeks and came here for help.
> Here is my HBase config:
> !image-2022-03-30-15-53-23-717.png!
> Here is my DBeaver connection properties:
> !image-2022-03-30-15-55-19-706.png!
> I've donwloaded phoenix and hbase jars from hadoop cluster as connection 
> drivers:
> !image-2022-03-30-15-56-42-691.png!
> Error:
> !image-2022-03-30-15-57-39-813.png!
>  
> Phoneix works from shell using sqlline.py script.
>  
> Can anybody help?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (PHOENIX-6678) Dbeaver jdbc connetion for Phoenix

2022-03-30 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17515014#comment-17515014
 ] 

Lars Hofhansl commented on PHOENIX-6678:


It is better to ask this on the mailing lists. 
[https://phoenix.apache.org/mailing_list.html]

 

> Dbeaver jdbc connetion for Phoenix
> --
>
> Key: PHOENIX-6678
> URL: https://issues.apache.org/jira/browse/PHOENIX-6678
> Project: Phoenix
>  Issue Type: Bug
>  Components: connectors
>Affects Versions: 5.0.0
> Environment: Hbase 2.0.2
> Phoenix 5.0.0
>Reporter: Dmitry Kravchuk
>Priority: Major
> Attachments: image-2022-03-30-15-53-14-255.png, 
> image-2022-03-30-15-53-23-717.png, image-2022-03-30-15-55-19-706.png, 
> image-2022-03-30-15-56-42-691.png, image-2022-03-30-15-57-39-813.png
>
>
> I've been searching solution for successfull connection to Phoenix usig 
> DBeaven for a weeks and came here for help.
> Here is my HBase config:
> !image-2022-03-30-15-53-23-717.png!
> Here is my DBeaver connection properties:
> !image-2022-03-30-15-55-19-706.png!
> I've donwloaded phoenix and hbase jars from hadoop cluster as connection 
> drivers:
> !image-2022-03-30-15-56-42-691.png!
> Error:
> !image-2022-03-30-15-57-39-813.png!
>  
> Phoneix works from shell using sqlline.py script.
>  
> Can anybody help?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (PHOENIX-6677) Parallelism within a batch of mutations

2022-03-30 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17515013#comment-17515013
 ] 

Lars Hofhansl commented on PHOENIX-6677:


I never quite understood why Phoenix needs to break up a batch into chunks of 
100. Probably has some historical reasons. Hand the whole thing to HBase and 
let it do that, it will parallelize at least per region.

Of course we can do better, as [~kozdemir] suggests.

 

> Parallelism within a batch of mutations 
> 
>
> Key: PHOENIX-6677
> URL: https://issues.apache.org/jira/browse/PHOENIX-6677
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Kadir OZDEMIR
>Priority: Major
> Fix For: 4.17.0, 5.2.0
>
>
> Currently, Phoenix client simply passes the batches of row mutations from the 
> application to HBase client without any parallelism or intelligent grouping 
> (except grouping mutations for the same row). 
> Assume that the application creates batches 1 row mutations for a given 
> table. Phoenix client divides these rows based on their arrival order into 
> HBase batches of n (e.g., 100) rows based on the configured batch size, i.e., 
> the number of rows and bytes. Then, Phoenix calls HBase batch API, one batch 
> at a time (i.e., serially).  HBase client further divides a given batch of 
> rows into smaller batches based on their regions. This means that a large 
> batch created by the application is divided into many tiny batches and 
> executed mostly serially. For slated tables, this will result in even smaller 
> batches. 
> We can improve the current implementation greatly if we group the rows of the 
> batch prepared by the application into sub batches based on table region 
> boundaries and then execute these batches in parallel. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (PHOENIX-6671) Avoid ShortCirtuation Coprocessor Connection with HBase 2.x

2022-03-22 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17510820#comment-17510820
 ] 

Lars Hofhansl commented on PHOENIX-6671:


Hi [~comnetwork] , thanks for working out the patch. HBASE-26869 does indeed 
fix the slow scanning problem. I think the user problem is not actually a 
problem.

(Although I will say that generally it's weird the calls issues locally via a 
short circuit connection are issued as the system user who started the HBase 
process, instead of on behalf of the user who issued the outer request. But 
that is a different problem.)

 

> Avoid ShortCirtuation Coprocessor Connection with HBase 2.x
> ---
>
> Key: PHOENIX-6671
> URL: https://issues.apache.org/jira/browse/PHOENIX-6671
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
>Priority: Major
> Fix For: 5.2.0, 5.1.3
>
> Attachments: 6671-5.1.txt
>
>
> See PHOENIX-6501, PHOENIX-6458, and HBASE-26812.
> HBase's ShortCircuit Connection are fundamentally broken in HBase 2. We might 
> be able to fix it there, but with all the work the RPC handlers perform now 
> (closing scanning, resolving current user, etc), I doubt we'll get that 100% 
> right. HBase 3 has removed this functionality.
> Even with HBase 2, which does not have the async protobuf code, I could 
> hardly see any performance improvement from circumventing the RPC stack in 
> case the target of a Get or Scan is local. Even in the most ideal conditions 
> where everything is local, there was improvement outside of noise.
> I suggest we do not use ShortCircuited Connections in Phoenix 5+.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (PHOENIX-6671) Avoid ShortCirtuation Coprocessor Connection with HBase 2.x

2022-03-21 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17510025#comment-17510025
 ] 

Lars Hofhansl commented on PHOENIX-6671:


Note that HBASE-26869 *does* fix the problem.

So now we have a choice to make:
 # State that only HBase 2.4.12 or later is supported (assuming the HBASE-26869 
lands in 2.4.12)
 # Put this patch in anyway - this might be the safest bet against other 
related issues.
 # Check the actual HBase version and create the right kind of connection 
depending on that.

I have no strong opinion one way of the other.

 

> Avoid ShortCirtuation Coprocessor Connection with HBase 2.x
> ---
>
> Key: PHOENIX-6671
> URL: https://issues.apache.org/jira/browse/PHOENIX-6671
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
>Priority: Major
> Fix For: 5.2.0, 5.1.3
>
> Attachments: 6671-5.1.txt
>
>
> See PHOENIX-6501, PHOENIX-6458, and HBASE-26812.
> HBase's ShortCircuit Connection are fundamentally broken in HBase 2. We might 
> be able to fix it there, but with all the work the RPC handlers perform now 
> (closing scanning, resolving current user, etc), I doubt we'll get that 100% 
> right. HBase 3 has removed this functionality.
> Even with HBase 2, which does not have the async protobuf code, I could 
> hardly see any performance improvement from circumventing the RPC stack in 
> case the target of a Get or Scan is local. Even in the most ideal conditions 
> where everything is local, there was improvement outside of noise.
> I suggest we do not use ShortCircuited Connections in Phoenix 5+.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (PHOENIX-6671) Avoid ShortCirtuation Coprocessor Connection with HBase 2.x

2022-03-20 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17509610#comment-17509610
 ] 

Lars Hofhansl commented on PHOENIX-6671:


Cool. Thanks for checking! I'd vote to get it in then. We can always revert to 
the old behavior if/when this is fixed in HBase (and we dropped support for or 
discourage any version of HBase that has this bug.)


> Avoid ShortCirtuation Coprocessor Connection with HBase 2.x
> ---
>
> Key: PHOENIX-6671
> URL: https://issues.apache.org/jira/browse/PHOENIX-6671
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
>Priority: Major
> Fix For: 5.2.0, 5.1.3
>
> Attachments: 6671-5.1.txt
>
>
> See PHOENIX-6501, PHOENIX-6458, and HBASE-26812.
> HBase's ShortCircuit Connection are fundamentally broken in HBase 2. We might 
> be able to fix it there, but with all the work the RPC handlers perform now 
> (closing scanning, resolving current user, etc), I doubt we'll get that 100% 
> right. HBase 3 has removed this functionality.
> Even with HBase 2, which does not have the async protobuf code, I could 
> hardly see any performance improvement from circumventing the RPC stack in 
> case the target of a Get or Scan is local. Even in the most ideal conditions 
> where everything is local, there was improvement outside of noise.
> I suggest we do not use ShortCircuited Connections in Phoenix 5+.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (PHOENIX-6671) Avoid ShortCirtuation Coprocessor Connection with HBase 2.x

2022-03-19 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17509306#comment-17509306
 ] 

Lars Hofhansl commented on PHOENIX-6671:


We need to be careful with this in production environments too. The remote call 
will now always tie up an extra handler thread even when the target regions are 
local.

On the other hand, the chance for that is p=1/number-of-regionservers, so 
unless there are special scenarios where we know the region is local I do not 
think there is any risk.


> Avoid ShortCirtuation Coprocessor Connection with HBase 2.x
> ---
>
> Key: PHOENIX-6671
> URL: https://issues.apache.org/jira/browse/PHOENIX-6671
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
>Priority: Major
> Fix For: 5.2.0, 5.1.3
>
> Attachments: 6671-5.1.txt
>
>
> See PHOENIX-6501, PHOENIX-6458, and HBASE-26812.
> HBase's ShortCircuit Connection are fundamentally broken in HBase 2. We might 
> be able to fix it there, but with all the work the RPC handlers perform now 
> (closing scanning, resolving current user, etc), I doubt we'll get that 100% 
> right. HBase 3 has removed this functionality.
> Even with HBase 2, which does not have the async protobuf code, I could 
> hardly see any performance improvement from circumventing the RPC stack in 
> case the target of a Get or Scan is local. Even in the most ideal conditions 
> where everything is local, there was improvement outside of noise.
> I suggest we do not use ShortCircuited Connections in Phoenix 5+.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Comment Edited] (PHOENIX-6671) Avoid ShortCirtuation Coprocessor Connection with HBase 2.x

2022-03-19 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17509306#comment-17509306
 ] 

Lars Hofhansl edited comment on PHOENIX-6671 at 3/19/22, 4:52 PM:
--

We need to be careful with this in production environments too. The remote call 
will now always tie up an extra handler thread even when the target regions are 
local.

On the other hand, the chance for that is p=1/number-of-regionservers, so 
unless there are special scenarios where we know the region is local I do not 
think there is any risk.

Before we merge, let's see where HBASE-26812 is going.


was (Author: lhofhansl):
We need to be careful with this in production environments too. The remote call 
will now always tie up an extra handler thread even when the target regions are 
local.

On the other hand, the chance for that is p=1/number-of-regionservers, so 
unless there are special scenarios where we know the region is local I do not 
think there is any risk.


> Avoid ShortCirtuation Coprocessor Connection with HBase 2.x
> ---
>
> Key: PHOENIX-6671
> URL: https://issues.apache.org/jira/browse/PHOENIX-6671
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
>Priority: Major
> Fix For: 5.2.0, 5.1.3
>
> Attachments: 6671-5.1.txt
>
>
> See PHOENIX-6501, PHOENIX-6458, and HBASE-26812.
> HBase's ShortCircuit Connection are fundamentally broken in HBase 2. We might 
> be able to fix it there, but with all the work the RPC handlers perform now 
> (closing scanning, resolving current user, etc), I doubt we'll get that 100% 
> right. HBase 3 has removed this functionality.
> Even with HBase 2, which does not have the async protobuf code, I could 
> hardly see any performance improvement from circumventing the RPC stack in 
> case the target of a Get or Scan is local. Even in the most ideal conditions 
> where everything is local, there was improvement outside of noise.
> I suggest we do not use ShortCircuited Connections in Phoenix 5+.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (PHOENIX-6671) Avoid ShortCirtuation Coprocessor Connection with HBase 2.x

2022-03-18 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17509105#comment-17509105
 ] 

Lars Hofhansl commented on PHOENIX-6671:


[~kozdemir], [~apurtell], FYI.

> Avoid ShortCirtuation Coprocessor Connection with HBase 2.x
> ---
>
> Key: PHOENIX-6671
> URL: https://issues.apache.org/jira/browse/PHOENIX-6671
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Priority: Major
> Attachments: 6671-5.1.txt
>
>
> See PHOENIX-6501, PHOENIX-6458, and HBASE-26812.
> HBase's ShortCircuit Connection are fundamentally broken in HBase 2. We might 
> be able to fix it there, but with all the work the RPC handlers perform now 
> (closing scanning, resolving current user, etc), I doubt we'll get that 100% 
> right. HBase 3 has removed this functionality.
> Even with HBase, which does not have the async protobuf code, I could hardly 
> see any performance improvement from circumventing the RPC stack in case the 
> target of a Get or Scan is local. Even in the most ideal conditions where 
> everything is local, there was improvement outside of noise.
> I suggest we do not use ShortCircuited Connections in Phoenix 5+.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (PHOENIX-6671) Avoid ShortCirtuation Coprocessor Connection with HBase 2.x

2022-03-18 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17509104#comment-17509104
 ] 

Lars Hofhansl commented on PHOENIX-6671:


One line change. Just get a regular Connection.

> Avoid ShortCirtuation Coprocessor Connection with HBase 2.x
> ---
>
> Key: PHOENIX-6671
> URL: https://issues.apache.org/jira/browse/PHOENIX-6671
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Priority: Major
> Attachments: 6671-5.1.txt
>
>
> See PHOENIX-6501, PHOENIX-6458, and HBASE-26812.
> HBase's ShortCircuit Connection are fundamentally broken in HBase 2. We might 
> be able to fix it there, but with all the work the RPC handlers perform now 
> (closing scanning, resolving current user, etc), I doubt we'll get that 100% 
> right. HBase 3 has removed this functionality.
> Even with HBase, which does not have the async protobuf code, I could hardly 
> see any performance improvement from circumventing the RPC stack in case the 
> target of a Get or Scan is local. Even in the most ideal conditions where 
> everything is local, there was improvement outside of noise.
> I suggest we do not use ShortCircuited Connections in Phoenix 5+.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (PHOENIX-6501) Use batching when joining data table rows with uncovered global index rows

2022-03-11 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17505073#comment-17505073
 ] 

Lars Hofhansl commented on PHOENIX-6501:


With the latest version of this patch the query finishes, but it still take a 
very long time.

Same scenario: 18m rows, count\(*) matching 2m rows. Without index the query 
takes about 7s on my system, with a local index it takes 10s, with the global 
index and this patch it takes about 4 minutes. A cursory look in the profiler 
reveals that most time is spent in ClientScanner.next()...

>From the code it's not immediately clear what the problem is.

> Use batching when joining data table rows with uncovered global index rows
> --
>
> Key: PHOENIX-6501
> URL: https://issues.apache.org/jira/browse/PHOENIX-6501
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.1.2
>Reporter: Kadir Ozdemir
>Assignee: Lars Hofhansl
>Priority: Major
> Attachments: PHOENIX-6501.master.001.patch
>
>
> PHOENIX-6458 extends the existing uncovered local index support for global 
> indexes. The current solution uses HBase get operations to join data table 
> rows with uncovered index rows on the server side. Doing a separate RPC call 
> for every data table row can be expensive. Instead, we can buffer lots of 
> data row keys in memory,  use a skip scan filter and even multiple threads to 
> issue a separate scan for each data table region in parallel. This will 
> reduce the cost of join and also improve the performance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (PHOENIX-6458) Using global indexes for queries with uncovered columns

2022-03-07 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17502747#comment-17502747
 ] 

Lars Hofhansl commented on PHOENIX-6458:


See: HBASE-26812

> Using global indexes for queries with uncovered columns
> ---
>
> Key: PHOENIX-6458
> URL: https://issues.apache.org/jira/browse/PHOENIX-6458
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.1.0
>Reporter: Kadir Ozdemir
>Assignee: Kadir OZDEMIR
>Priority: Major
> Fix For: 4.17.0, 5.2.0, 5.1.3
>
> Attachments: PHOENIX-6458.master.001.patch, 
> PHOENIX-6458.master.002.patch, PHOENIX-6458.master.addendum.patch
>
>
> The Phoenix query optimizer does not use a global index for a query with the 
> columns that are not covered by the global index if the query does not have 
> the corresponding index hint for this index. With the index hint, the 
> optimizer rewrites the query where the index is used within a subquery. With 
> this subquery, the row keys of the index rows that satisfy the subquery are 
> retrieved by the Phoenix client and then pushed into the Phoenix server 
> caches of the data table regions. Finally, on the server side, data table 
> rows are scanned and joined with the index rows using HashJoin. Based on the 
> selectivity of the original query, this join operation may still result in 
> scanning a large amount of data table rows. 
> Eliminating these data table scans would be a significant improvement. To do 
> that, instead of rewriting the query, the Phoenix optimizer simply treats the 
> global index as a covered index for the given query. With this, the Phoenix 
> query optimizer chooses the index table for the query especially when the 
> index row key prefix length is greater than the data row key prefix length 
> for the query. On the server side, the index table is scanned using index row 
> key ranges implied by the query and the index row keys are then mapped to the 
> data table row keys (please note an index row key includes all the data row 
> key columns). Finally, the corresponding data table rows are scanned using 
> server-to-server RPCs.  PHOENIX-6458 (this Jira) retrieves the data table 
> rows one by one using the HBase get operation. PHOENIX-6501 replaces this get 
> operation with the scan operation to reduce the number of server-to-server 
> RPC calls.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (PHOENIX-6458) Using global indexes for queries with uncovered columns

2022-03-07 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17502729#comment-17502729
 ] 

Lars Hofhansl commented on PHOENIX-6458:


I think I figured out the problem... It's an HBase 2.x problem:
In HBase 2.x the Rpc handler is responsible for closing scanners. However, when 
you retrieve a Connection from a RegionCoprocessorEnvironment and the target 
happens to be local then there is no RPC handler, and hence the RegionScanners 
will never get close. This is a gaping HBase bug.

In Phoenix we can fix that by using 
{{org.apache.hadoop.hbase.client.ConnectionFactory#createConnection}}, but that 
is very expensive. I think we should give up here and focus on PHOENIX-6501 and 
file an HBase bug.

> Using global indexes for queries with uncovered columns
> ---
>
> Key: PHOENIX-6458
> URL: https://issues.apache.org/jira/browse/PHOENIX-6458
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.1.0
>Reporter: Kadir Ozdemir
>Assignee: Kadir OZDEMIR
>Priority: Major
> Fix For: 4.17.0, 5.2.0, 5.1.3
>
> Attachments: PHOENIX-6458.master.001.patch, 
> PHOENIX-6458.master.002.patch, PHOENIX-6458.master.addendum.patch
>
>
> The Phoenix query optimizer does not use a global index for a query with the 
> columns that are not covered by the global index if the query does not have 
> the corresponding index hint for this index. With the index hint, the 
> optimizer rewrites the query where the index is used within a subquery. With 
> this subquery, the row keys of the index rows that satisfy the subquery are 
> retrieved by the Phoenix client and then pushed into the Phoenix server 
> caches of the data table regions. Finally, on the server side, data table 
> rows are scanned and joined with the index rows using HashJoin. Based on the 
> selectivity of the original query, this join operation may still result in 
> scanning a large amount of data table rows. 
> Eliminating these data table scans would be a significant improvement. To do 
> that, instead of rewriting the query, the Phoenix optimizer simply treats the 
> global index as a covered index for the given query. With this, the Phoenix 
> query optimizer chooses the index table for the query especially when the 
> index row key prefix length is greater than the data row key prefix length 
> for the query. On the server side, the index table is scanned using index row 
> key ranges implied by the query and the index row keys are then mapped to the 
> data table row keys (please note an index row key includes all the data row 
> key columns). Finally, the corresponding data table rows are scanned using 
> server-to-server RPCs.  PHOENIX-6458 (this Jira) retrieves the data table 
> rows one by one using the HBase get operation. PHOENIX-6501 replaces this get 
> operation with the scan operation to reduce the number of server-to-server 
> RPC calls.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Comment Edited] (PHOENIX-6458) Using global indexes for queries with uncovered columns

2022-03-07 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17502660#comment-17502660
 ] 

Lars Hofhansl edited comment on PHOENIX-6458 at 3/8/22, 1:33 AM:
-

Sorry all... I'm an idiot. I ran the last queries with count\(*) instead of 
count(suppkey).

The issue is still there - creating a StoreScanner leak or delay for every 
merged row - hanging the RegionServer.

(The only excuse I have is multiplexing multiple work streams at the same time)


was (Author: lhofhansl):
Sorry all... I'm an idiot. I ran the last queries with count(*) instead of 
count(suppkey).

The issue is still there - creating a StoreScanner leak or delay for every 
merged row - hanging the RegionServer.

(The only excuse I have is multiplexing multiple work streams at the same time)

> Using global indexes for queries with uncovered columns
> ---
>
> Key: PHOENIX-6458
> URL: https://issues.apache.org/jira/browse/PHOENIX-6458
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.1.0
>Reporter: Kadir Ozdemir
>Assignee: Kadir OZDEMIR
>Priority: Major
> Fix For: 4.17.0, 5.2.0, 5.1.3
>
> Attachments: PHOENIX-6458.master.001.patch, 
> PHOENIX-6458.master.002.patch, PHOENIX-6458.master.addendum.patch
>
>
> The Phoenix query optimizer does not use a global index for a query with the 
> columns that are not covered by the global index if the query does not have 
> the corresponding index hint for this index. With the index hint, the 
> optimizer rewrites the query where the index is used within a subquery. With 
> this subquery, the row keys of the index rows that satisfy the subquery are 
> retrieved by the Phoenix client and then pushed into the Phoenix server 
> caches of the data table regions. Finally, on the server side, data table 
> rows are scanned and joined with the index rows using HashJoin. Based on the 
> selectivity of the original query, this join operation may still result in 
> scanning a large amount of data table rows. 
> Eliminating these data table scans would be a significant improvement. To do 
> that, instead of rewriting the query, the Phoenix optimizer simply treats the 
> global index as a covered index for the given query. With this, the Phoenix 
> query optimizer chooses the index table for the query especially when the 
> index row key prefix length is greater than the data row key prefix length 
> for the query. On the server side, the index table is scanned using index row 
> key ranges implied by the query and the index row keys are then mapped to the 
> data table row keys (please note an index row key includes all the data row 
> key columns). Finally, the corresponding data table rows are scanned using 
> server-to-server RPCs.  PHOENIX-6458 (this Jira) retrieves the data table 
> rows one by one using the HBase get operation. PHOENIX-6501 replaces this get 
> operation with the scan operation to reduce the number of server-to-server 
> RPC calls.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Comment Edited] (PHOENIX-6501) Use batching when joining data table rows with uncovered global index rows

2022-03-07 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17502661#comment-17502661
 ] 

Lars Hofhansl edited comment on PHOENIX-6501 at 3/8/22, 1:32 AM:
-

Sorry - doing too many things as the same time - the index was correct. I 
accidentally use count(*), which does not need to do the merge. So the query in 
question is still "hanging".


was (Author: lhofhansl):
Sorry - doing too many things as the same time - the index was correct. I 
accidentally use count(*), which does not need to do the merge. So the query in 
question is still "hanging".

> Use batching when joining data table rows with uncovered global index rows
> --
>
> Key: PHOENIX-6501
> URL: https://issues.apache.org/jira/browse/PHOENIX-6501
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.1.2
>Reporter: Kadir Ozdemir
>Assignee: Kadir OZDEMIR
>Priority: Major
> Attachments: PHOENIX-6501.master.001.patch
>
>
> PHOENIX-6458 extends the existing uncovered local index support for global 
> indexes. The current solution uses HBase get operations to join data table 
> rows with uncovered index rows on the server side. Doing a separate RPC call 
> for every data table row can be expensive. Instead, we can buffer lots of 
> data row keys in memory,  use a skip scan filter and even multiple threads to 
> issue a separate scan for each data table region in parallel. This will 
> reduce the cost of join and also improve the performance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Comment Edited] (PHOENIX-6501) Use batching when joining data table rows with uncovered global index rows

2022-03-07 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17502661#comment-17502661
 ] 

Lars Hofhansl edited comment on PHOENIX-6501 at 3/8/22, 1:32 AM:
-

Sorry - doing too many things as the same time - the index was correct. I 
accidentally use count\(*), which does not need to do the merge. So the query 
in question is still "hanging".


was (Author: lhofhansl):
Sorry - doing too many things as the same time - the index was correct. I 
accidentally use count(*), which does not need to do the merge. So the query in 
question is still "hanging".

> Use batching when joining data table rows with uncovered global index rows
> --
>
> Key: PHOENIX-6501
> URL: https://issues.apache.org/jira/browse/PHOENIX-6501
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.1.2
>Reporter: Kadir Ozdemir
>Assignee: Kadir OZDEMIR
>Priority: Major
> Attachments: PHOENIX-6501.master.001.patch
>
>
> PHOENIX-6458 extends the existing uncovered local index support for global 
> indexes. The current solution uses HBase get operations to join data table 
> rows with uncovered index rows on the server side. Doing a separate RPC call 
> for every data table row can be expensive. Instead, we can buffer lots of 
> data row keys in memory,  use a skip scan filter and even multiple threads to 
> issue a separate scan for each data table region in parallel. This will 
> reduce the cost of join and also improve the performance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (PHOENIX-6501) Use batching when joining data table rows with uncovered global index rows

2022-03-07 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17502661#comment-17502661
 ] 

Lars Hofhansl commented on PHOENIX-6501:


Sorry - doing too many things as the same time - the index was correct. I 
accidentally use count(*), which does not need to do the merge. So the query in 
question is still "hanging".

> Use batching when joining data table rows with uncovered global index rows
> --
>
> Key: PHOENIX-6501
> URL: https://issues.apache.org/jira/browse/PHOENIX-6501
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.1.2
>Reporter: Kadir Ozdemir
>Assignee: Kadir OZDEMIR
>Priority: Major
> Attachments: PHOENIX-6501.master.001.patch
>
>
> PHOENIX-6458 extends the existing uncovered local index support for global 
> indexes. The current solution uses HBase get operations to join data table 
> rows with uncovered index rows on the server side. Doing a separate RPC call 
> for every data table row can be expensive. Instead, we can buffer lots of 
> data row keys in memory,  use a skip scan filter and even multiple threads to 
> issue a separate scan for each data table region in parallel. This will 
> reduce the cost of join and also improve the performance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (PHOENIX-6458) Using global indexes for queries with uncovered columns

2022-03-07 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17502660#comment-17502660
 ] 

Lars Hofhansl commented on PHOENIX-6458:


Sorry all... I'm an idiot. I ran the last queries with count(*) instead of 
count(suppkey).

The issue is still there - creating a StoreScanner leak or delay for every 
merged row - hanging the RegionServer.

(The only excuse I have is multiplexing multiple work streams at the same time)

> Using global indexes for queries with uncovered columns
> ---
>
> Key: PHOENIX-6458
> URL: https://issues.apache.org/jira/browse/PHOENIX-6458
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.1.0
>Reporter: Kadir Ozdemir
>Assignee: Kadir OZDEMIR
>Priority: Major
> Fix For: 4.17.0, 5.2.0, 5.1.3
>
> Attachments: PHOENIX-6458.master.001.patch, 
> PHOENIX-6458.master.002.patch, PHOENIX-6458.master.addendum.patch
>
>
> The Phoenix query optimizer does not use a global index for a query with the 
> columns that are not covered by the global index if the query does not have 
> the corresponding index hint for this index. With the index hint, the 
> optimizer rewrites the query where the index is used within a subquery. With 
> this subquery, the row keys of the index rows that satisfy the subquery are 
> retrieved by the Phoenix client and then pushed into the Phoenix server 
> caches of the data table regions. Finally, on the server side, data table 
> rows are scanned and joined with the index rows using HashJoin. Based on the 
> selectivity of the original query, this join operation may still result in 
> scanning a large amount of data table rows. 
> Eliminating these data table scans would be a significant improvement. To do 
> that, instead of rewriting the query, the Phoenix optimizer simply treats the 
> global index as a covered index for the given query. With this, the Phoenix 
> query optimizer chooses the index table for the query especially when the 
> index row key prefix length is greater than the data row key prefix length 
> for the query. On the server side, the index table is scanned using index row 
> key ranges implied by the query and the index row keys are then mapped to the 
> data table row keys (please note an index row key includes all the data row 
> key columns). Finally, the corresponding data table rows are scanned using 
> server-to-server RPCs.  PHOENIX-6458 (this Jira) retrieves the data table 
> rows one by one using the HBase get operation. PHOENIX-6501 replaces this get 
> operation with the scan operation to reduce the number of server-to-server 
> RPC calls.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (PHOENIX-6501) Use batching when joining data table rows with uncovered index rows

2022-03-07 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17502624#comment-17502624
 ] 

Lars Hofhansl commented on PHOENIX-6501:


As discussed in PHOENIX-6458, there was an issue with synchronously creating 
the global index.
With that out of the way this seems to work fine. In my test env I didn't see a 
perf improvement, but that's because everything is local, and so the network is 
negligible.


> Use batching when joining data table rows with uncovered index rows
> ---
>
> Key: PHOENIX-6501
> URL: https://issues.apache.org/jira/browse/PHOENIX-6501
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.1.2
>Reporter: Kadir Ozdemir
>Assignee: Kadir OZDEMIR
>Priority: Major
> Attachments: PHOENIX-6501.master.001.patch
>
>
> PHOENIX-6458 extends the existing uncovered local index support for global 
> indexes. The current solution uses HBase get operations to join data table 
> rows with uncovered index rows on the server side. Doing a separate RPC call 
> for every data table row can be expensive. Instead, we can buffer lots of 
> data row keys in memory,  use a skip scan filter and even multiple threads to 
> issue a separate scan for each data table region in parallel. This will 
> reduce the cost of join and also improve the performance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (PHOENIX-6458) Using global indexes for queries with uncovered columns

2022-03-07 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17502599#comment-17502599
 ] 

Lars Hofhansl commented on PHOENIX-6458:


[~kozdemir] and I talked offline. Looks like creating a global index 
synchronously (through the commandline) might not work (maybe does not build a 
verified index).
Creating the index before loading the data does not have the problem...

So +1 on the addendum.

> Using global indexes for queries with uncovered columns
> ---
>
> Key: PHOENIX-6458
> URL: https://issues.apache.org/jira/browse/PHOENIX-6458
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.1.0
>Reporter: Kadir Ozdemir
>Assignee: Kadir OZDEMIR
>Priority: Major
> Fix For: 4.17.0, 5.2.0, 5.1.3
>
> Attachments: PHOENIX-6458.master.001.patch, 
> PHOENIX-6458.master.002.patch, PHOENIX-6458.master.addendum.patch
>
>
> The Phoenix query optimizer does not use a global index for a query with the 
> columns that are not covered by the global index if the query does not have 
> the corresponding index hint for this index. With the index hint, the 
> optimizer rewrites the query where the index is used within a subquery. With 
> this subquery, the row keys of the index rows that satisfy the subquery are 
> retrieved by the Phoenix client and then pushed into the Phoenix server 
> caches of the data table regions. Finally, on the server side, data table 
> rows are scanned and joined with the index rows using HashJoin. Based on the 
> selectivity of the original query, this join operation may still result in 
> scanning a large amount of data table rows. 
> Eliminating these data table scans would be a significant improvement. To do 
> that, instead of rewriting the query, the Phoenix optimizer simply treats the 
> global index as a covered index for the given query. With this, the Phoenix 
> query optimizer chooses the index table for the query especially when the 
> index row key prefix length is greater than the data row key prefix length 
> for the query. On the server side, the index table is scanned using index row 
> key ranges implied by the query and the index row keys are then mapped to the 
> data table row keys (please note an index row key includes all the data row 
> key columns). Finally, the corresponding data table rows are scanned using 
> server-to-server RPCs.  PHOENIX-6458 (this Jira) retrieves the data table 
> rows one by one using the HBase get operation. PHOENIX-6501 replaces this get 
> operation with the scan operation to reduce the number of server-to-server 
> RPC calls.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (PHOENIX-6501) Use batching when joining data table rows with uncovered index rows

2022-03-07 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17502556#comment-17502556
 ] 

Lars Hofhansl commented on PHOENIX-6501:


That might be a bit tricky. I loaded the TPCH lineitem table (scale factor 3) 
into Phoenix via the Trino connector.

{code}
CREATE TABLE phoenix.default.lineitem (
orderkey bigint NOT NULL,
partkey bigint,
suppkey bigint,
linenumber integer NOT NULL,
quantity double,
extendedprice double,
discount double,
tax double,
returnflag varchar(1),
linestatus varchar(1),
shipdate date,
commitdate date,
receiptdate date,
shipinstruct varchar(25),
shipmode varchar(10),
comment varchar(44)
)
WITH (
compression = 'ZSTD',
data_block_encoding = 'ROW_INDEX_V1',
disable_wal = true,
immutable_rows = true,
rowkeys = 'ORDERKEY,LINENUMBER'
)
{code}

(I do disable WAL everywhere, because that's not what I am testing and it 
speeds up loading/creating)

Then I created the global index on the tax column.
{{create index g_l_tax on lineitem(tax) DISABLE_WAL=true;}}

Then I ran {{select /*+ INDEX(lineitem g_l_tax) */ count(suppkey) from lineitem 
where tax = 0.08}}

Let me connect with you offline and see if I can send you a CSV with the 
lineitem data.


> Use batching when joining data table rows with uncovered index rows
> ---
>
> Key: PHOENIX-6501
> URL: https://issues.apache.org/jira/browse/PHOENIX-6501
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.1.2
>Reporter: Kadir Ozdemir
>Assignee: Kadir OZDEMIR
>Priority: Major
> Attachments: PHOENIX-6501.master.001.patch
>
>
> PHOENIX-6458 extends the existing uncovered local index support for global 
> indexes. The current solution uses HBase get operations to join data table 
> rows with uncovered index rows on the server side. Doing a separate RPC call 
> for every data table row can be expensive. Instead, we can buffer lots of 
> data row keys in memory,  use a skip scan filter and even multiple threads to 
> issue a separate scan for each data table region in parallel. This will 
> reduce the cost of join and also improve the performance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (PHOENIX-6501) Use batching when joining data table rows with uncovered index rows

2022-03-07 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17502541#comment-17502541
 ] 

Lars Hofhansl commented on PHOENIX-6501:


Testing the attached patch. Running a query on a table with 18m rows, that 
selects (counts) 2m of them.

The query on the uncovered global index *does not finish* (I stopped it after 
10 minutes). :(

With no index it takes about 7s, with an uncovered local index it takes about 
10s (due to the merging cost and low selectivity of the query).

So there's some bug somewhere.

 

> Use batching when joining data table rows with uncovered index rows
> ---
>
> Key: PHOENIX-6501
> URL: https://issues.apache.org/jira/browse/PHOENIX-6501
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.1.2
>Reporter: Kadir Ozdemir
>Assignee: Kadir OZDEMIR
>Priority: Major
> Attachments: PHOENIX-6501.master.001.patch
>
>
> PHOENIX-6458 extends the existing uncovered local index support for global 
> indexes. The current solution uses HBase get operations to join data table 
> rows with uncovered index rows on the server side. Doing a separate RPC call 
> for every data table row can be expensive. Instead, we can buffer lots of 
> data row keys in memory,  use a skip scan filter and even multiple threads to 
> issue a separate scan for each data table region in parallel. This will 
> reduce the cost of join and also improve the performance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (PHOENIX-6458) Using global indexes for queries with uncovered columns

2022-03-07 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17502524#comment-17502524
 ] 

Lars Hofhansl commented on PHOENIX-6458:


I agree. I do not understand either. It's as if the StoreScanner opened on 
behalf of the Table.get() call for the merges are not closed.

 

> Using global indexes for queries with uncovered columns
> ---
>
> Key: PHOENIX-6458
> URL: https://issues.apache.org/jira/browse/PHOENIX-6458
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.1.0
>Reporter: Kadir Ozdemir
>Assignee: Kadir OZDEMIR
>Priority: Major
> Fix For: 4.17.0, 5.2.0, 5.1.3
>
> Attachments: PHOENIX-6458.master.001.patch, 
> PHOENIX-6458.master.002.patch, PHOENIX-6458.master.addendum.patch
>
>
> The Phoenix query optimizer does not use a global index for a query with the 
> columns that are not covered by the global index if the query does not have 
> the corresponding index hint for this index. With the index hint, the 
> optimizer rewrites the query where the index is used within a subquery. With 
> this subquery, the row keys of the index rows that satisfy the subquery are 
> retrieved by the Phoenix client and then pushed into the Phoenix server 
> caches of the data table regions. Finally, on the server side, data table 
> rows are scanned and joined with the index rows using HashJoin. Based on the 
> selectivity of the original query, this join operation may still result in 
> scanning a large amount of data table rows. 
> Eliminating these data table scans would be a significant improvement. To do 
> that, instead of rewriting the query, the Phoenix optimizer simply treats the 
> global index as a covered index for the given query. With this, the Phoenix 
> query optimizer chooses the index table for the query especially when the 
> index row key prefix length is greater than the data row key prefix length 
> for the query. On the server side, the index table is scanned using index row 
> key ranges implied by the query and the index row keys are then mapped to the 
> data table row keys (please note an index row key includes all the data row 
> key columns). Finally, the corresponding data table rows are scanned using 
> server-to-server RPCs.  PHOENIX-6458 (this Jira) retrieves the data table 
> rows one by one using the HBase get operation. PHOENIX-6501 replaces this get 
> operation with the scan operation to reduce the number of server-to-server 
> RPC calls.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (PHOENIX-6458) Using global indexes for queries with uncovered columns

2022-03-07 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17502488#comment-17502488
 ] 

Lars Hofhansl commented on PHOENIX-6458:


OK... I see 200410 open StoreScanners. So looks like the StoreScanners used for 
the merge (the GET requests) are not closed.

I reopened the issue.

> Using global indexes for queries with uncovered columns
> ---
>
> Key: PHOENIX-6458
> URL: https://issues.apache.org/jira/browse/PHOENIX-6458
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.1.0
>Reporter: Kadir Ozdemir
>Assignee: Kadir OZDEMIR
>Priority: Major
> Fix For: 4.17.0, 5.2.0, 5.1.3
>
> Attachments: PHOENIX-6458.master.001.patch, 
> PHOENIX-6458.master.002.patch, PHOENIX-6458.master.addendum.patch
>
>
> The Phoenix query optimizer does not use a global index for a query with the 
> columns that are not covered by the global index if the query does not have 
> the corresponding index hint for this index. With the index hint, the 
> optimizer rewrites the query where the index is used within a subquery. With 
> this subquery, the row keys of the index rows that satisfy the subquery are 
> retrieved by the Phoenix client and then pushed into the Phoenix server 
> caches of the data table regions. Finally, on the server side, data table 
> rows are scanned and joined with the index rows using HashJoin. Based on the 
> selectivity of the original query, this join operation may still result in 
> scanning a large amount of data table rows. 
> Eliminating these data table scans would be a significant improvement. To do 
> that, instead of rewriting the query, the Phoenix optimizer simply treats the 
> global index as a covered index for the given query. With this, the Phoenix 
> query optimizer chooses the index table for the query especially when the 
> index row key prefix length is greater than the data row key prefix length 
> for the query. On the server side, the index table is scanned using index row 
> key ranges implied by the query and the index row keys are then mapped to the 
> data table row keys (please note an index row key includes all the data row 
> key columns). Finally, the corresponding data table rows are scanned using 
> server-to-server RPCs.  PHOENIX-6458 (this Jira) retrieves the data table 
> rows one by one using the HBase get operation. PHOENIX-6501 replaces this get 
> operation with the scan operation to reduce the number of server-to-server 
> RPC calls.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Comment Edited] (PHOENIX-6458) Using global indexes for queries with uncovered columns

2022-03-07 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17502468#comment-17502468
 ] 

Lars Hofhansl edited comment on PHOENIX-6458 at 3/7/22, 6:09 PM:
-

Trying now. There's something pretty terrible going on:

First time I run the query above it returns the right value after 32s (local 
uncovered index take about 10s). Second time, there seems to be some memory 
leak. Getting lots of "responseTooSlow" and GC pauses in the logs.

The query essentially never finishes, only way out is to kill the region server.

Edit: This is repeatable, and does not happen with a local index or without an 
index. I'll see if I can look at a memory profiler.


was (Author: lhofhansl):
Trying now. There's something pretty terrible going on:

First time I run the query above it returns the right value after 32s. Second 
time, there seems to be some memory leak. Getting lots of "responseTooSlow" and 
GC pauses in the logs.

The query essentially never finishes, only way out is to kill the region server.

Edit: This is repeatable, and does not happen with a local index or without an 
index. I'll see if I can look at a memory profiler.

> Using global indexes for queries with uncovered columns
> ---
>
> Key: PHOENIX-6458
> URL: https://issues.apache.org/jira/browse/PHOENIX-6458
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.1.0
>Reporter: Kadir Ozdemir
>Assignee: Kadir OZDEMIR
>Priority: Major
> Fix For: 4.17.0, 5.2.0, 5.1.3
>
> Attachments: PHOENIX-6458.master.001.patch, 
> PHOENIX-6458.master.002.patch, PHOENIX-6458.master.addendum.patch
>
>
> The Phoenix query optimizer does not use a global index for a query with the 
> columns that are not covered by the global index if the query does not have 
> the corresponding index hint for this index. With the index hint, the 
> optimizer rewrites the query where the index is used within a subquery. With 
> this subquery, the row keys of the index rows that satisfy the subquery are 
> retrieved by the Phoenix client and then pushed into the Phoenix server 
> caches of the data table regions. Finally, on the server side, data table 
> rows are scanned and joined with the index rows using HashJoin. Based on the 
> selectivity of the original query, this join operation may still result in 
> scanning a large amount of data table rows. 
> Eliminating these data table scans would be a significant improvement. To do 
> that, instead of rewriting the query, the Phoenix optimizer simply treats the 
> global index as a covered index for the given query. With this, the Phoenix 
> query optimizer chooses the index table for the query especially when the 
> index row key prefix length is greater than the data row key prefix length 
> for the query. On the server side, the index table is scanned using index row 
> key ranges implied by the query and the index row keys are then mapped to the 
> data table row keys (please note an index row key includes all the data row 
> key columns). Finally, the corresponding data table rows are scanned using 
> server-to-server RPCs.  PHOENIX-6458 (this Jira) retrieves the data table 
> rows one by one using the HBase get operation. PHOENIX-6501 replaces this get 
> operation with the scan operation to reduce the number of server-to-server 
> RPC calls.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Comment Edited] (PHOENIX-6458) Using global indexes for queries with uncovered columns

2022-03-07 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17502468#comment-17502468
 ] 

Lars Hofhansl edited comment on PHOENIX-6458 at 3/7/22, 6:08 PM:
-

Trying now. There's something pretty terrible going on:

First time I run the query above it returns the right value after 32s. Second 
time, there seems to be some memory leak. Getting lots of "responseTooSlow" and 
GC pauses in the logs.

The query essentially never finishes, only way out is to kill the region server.

Edit: This is repeatable, and does not happen with a local index or without an 
index. I'll see if I can look at a memory profiler.


was (Author: lhofhansl):
Trying now. There's something pretty terrible going on:

First time I run the query above it returns the right value after 32s. Second 
time, there seems to be some memory leak. Getting lots of "responseTooSlow" and 
GC pauses in the logs.

The query essentially never finishes, only way out is to kill the region server.

> Using global indexes for queries with uncovered columns
> ---
>
> Key: PHOENIX-6458
> URL: https://issues.apache.org/jira/browse/PHOENIX-6458
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.1.0
>Reporter: Kadir Ozdemir
>Assignee: Kadir OZDEMIR
>Priority: Major
> Fix For: 4.17.0, 5.2.0, 5.1.3
>
> Attachments: PHOENIX-6458.master.001.patch, 
> PHOENIX-6458.master.002.patch, PHOENIX-6458.master.addendum.patch
>
>
> The Phoenix query optimizer does not use a global index for a query with the 
> columns that are not covered by the global index if the query does not have 
> the corresponding index hint for this index. With the index hint, the 
> optimizer rewrites the query where the index is used within a subquery. With 
> this subquery, the row keys of the index rows that satisfy the subquery are 
> retrieved by the Phoenix client and then pushed into the Phoenix server 
> caches of the data table regions. Finally, on the server side, data table 
> rows are scanned and joined with the index rows using HashJoin. Based on the 
> selectivity of the original query, this join operation may still result in 
> scanning a large amount of data table rows. 
> Eliminating these data table scans would be a significant improvement. To do 
> that, instead of rewriting the query, the Phoenix optimizer simply treats the 
> global index as a covered index for the given query. With this, the Phoenix 
> query optimizer chooses the index table for the query especially when the 
> index row key prefix length is greater than the data row key prefix length 
> for the query. On the server side, the index table is scanned using index row 
> key ranges implied by the query and the index row keys are then mapped to the 
> data table row keys (please note an index row key includes all the data row 
> key columns). Finally, the corresponding data table rows are scanned using 
> server-to-server RPCs.  PHOENIX-6458 (this Jira) retrieves the data table 
> rows one by one using the HBase get operation. PHOENIX-6501 replaces this get 
> operation with the scan operation to reduce the number of server-to-server 
> RPC calls.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Comment Edited] (PHOENIX-6458) Using global indexes for queries with uncovered columns

2022-03-07 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17502468#comment-17502468
 ] 

Lars Hofhansl edited comment on PHOENIX-6458 at 3/7/22, 5:59 PM:
-

Trying now. There's something pretty terrible going on:

First time I run the query above it returns the right value after 32s. Second 
time, there seems to be some memory leak. Getting lots of "responseTooSlow" and 
GC pauses in the logs.

The query essentially never finishes, only way out is to kill the region server.


was (Author: lhofhansl):
Trying now. There's something pretty terrible going on:

First time I run the query above it returns the right value.

Second time, there seems to be some memory leak. Getting lots of 
"responseTooSlow" and GC pauses in the logs.

 

> Using global indexes for queries with uncovered columns
> ---
>
> Key: PHOENIX-6458
> URL: https://issues.apache.org/jira/browse/PHOENIX-6458
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.1.0
>Reporter: Kadir Ozdemir
>Assignee: Kadir OZDEMIR
>Priority: Major
> Fix For: 4.17.0, 5.2.0, 5.1.3
>
> Attachments: PHOENIX-6458.master.001.patch, 
> PHOENIX-6458.master.002.patch, PHOENIX-6458.master.addendum.patch
>
>
> The Phoenix query optimizer does not use a global index for a query with the 
> columns that are not covered by the global index if the query does not have 
> the corresponding index hint for this index. With the index hint, the 
> optimizer rewrites the query where the index is used within a subquery. With 
> this subquery, the row keys of the index rows that satisfy the subquery are 
> retrieved by the Phoenix client and then pushed into the Phoenix server 
> caches of the data table regions. Finally, on the server side, data table 
> rows are scanned and joined with the index rows using HashJoin. Based on the 
> selectivity of the original query, this join operation may still result in 
> scanning a large amount of data table rows. 
> Eliminating these data table scans would be a significant improvement. To do 
> that, instead of rewriting the query, the Phoenix optimizer simply treats the 
> global index as a covered index for the given query. With this, the Phoenix 
> query optimizer chooses the index table for the query especially when the 
> index row key prefix length is greater than the data row key prefix length 
> for the query. On the server side, the index table is scanned using index row 
> key ranges implied by the query and the index row keys are then mapped to the 
> data table row keys (please note an index row key includes all the data row 
> key columns). Finally, the corresponding data table rows are scanned using 
> server-to-server RPCs.  PHOENIX-6458 (this Jira) retrieves the data table 
> rows one by one using the HBase get operation. PHOENIX-6501 replaces this get 
> operation with the scan operation to reduce the number of server-to-server 
> RPC calls.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (PHOENIX-6458) Using global indexes for queries with uncovered columns

2022-03-07 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17502468#comment-17502468
 ] 

Lars Hofhansl commented on PHOENIX-6458:


Trying now. There's something pretty terrible going on:

First time I run the query above it returns the right value.

Second time, there seems to be some memory leak. Getting lots of 
"responseTooSlow" and GC pauses in the logs.

 

> Using global indexes for queries with uncovered columns
> ---
>
> Key: PHOENIX-6458
> URL: https://issues.apache.org/jira/browse/PHOENIX-6458
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.1.0
>Reporter: Kadir Ozdemir
>Assignee: Kadir OZDEMIR
>Priority: Major
> Fix For: 4.17.0, 5.2.0, 5.1.3
>
> Attachments: PHOENIX-6458.master.001.patch, 
> PHOENIX-6458.master.002.patch, PHOENIX-6458.master.addendum.patch
>
>
> The Phoenix query optimizer does not use a global index for a query with the 
> columns that are not covered by the global index if the query does not have 
> the corresponding index hint for this index. With the index hint, the 
> optimizer rewrites the query where the index is used within a subquery. With 
> this subquery, the row keys of the index rows that satisfy the subquery are 
> retrieved by the Phoenix client and then pushed into the Phoenix server 
> caches of the data table regions. Finally, on the server side, data table 
> rows are scanned and joined with the index rows using HashJoin. Based on the 
> selectivity of the original query, this join operation may still result in 
> scanning a large amount of data table rows. 
> Eliminating these data table scans would be a significant improvement. To do 
> that, instead of rewriting the query, the Phoenix optimizer simply treats the 
> global index as a covered index for the given query. With this, the Phoenix 
> query optimizer chooses the index table for the query especially when the 
> index row key prefix length is greater than the data row key prefix length 
> for the query. On the server side, the index table is scanned using index row 
> key ranges implied by the query and the index row keys are then mapped to the 
> data table row keys (please note an index row key includes all the data row 
> key columns). Finally, the corresponding data table rows are scanned using 
> server-to-server RPCs.  PHOENIX-6458 (this Jira) retrieves the data table 
> rows one by one using the HBase get operation. PHOENIX-6501 replaces this get 
> operation with the scan operation to reduce the number of server-to-server 
> RPC calls.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (PHOENIX-6458) Using global indexes for queries with uncovered columns

2022-03-07 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17502458#comment-17502458
 ] 

Lars Hofhansl commented on PHOENIX-6458:


I'll verify today - and my apologies that I didn't have time to track it down 
myself.

 

> Using global indexes for queries with uncovered columns
> ---
>
> Key: PHOENIX-6458
> URL: https://issues.apache.org/jira/browse/PHOENIX-6458
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.1.0
>Reporter: Kadir Ozdemir
>Assignee: Kadir OZDEMIR
>Priority: Major
> Fix For: 4.17.0, 5.2.0, 5.1.3
>
> Attachments: PHOENIX-6458.master.001.patch, 
> PHOENIX-6458.master.002.patch, PHOENIX-6458.master.addendum.patch
>
>
> The Phoenix query optimizer does not use a global index for a query with the 
> columns that are not covered by the global index if the query does not have 
> the corresponding index hint for this index. With the index hint, the 
> optimizer rewrites the query where the index is used within a subquery. With 
> this subquery, the row keys of the index rows that satisfy the subquery are 
> retrieved by the Phoenix client and then pushed into the Phoenix server 
> caches of the data table regions. Finally, on the server side, data table 
> rows are scanned and joined with the index rows using HashJoin. Based on the 
> selectivity of the original query, this join operation may still result in 
> scanning a large amount of data table rows. 
> Eliminating these data table scans would be a significant improvement. To do 
> that, instead of rewriting the query, the Phoenix optimizer simply treats the 
> global index as a covered index for the given query. With this, the Phoenix 
> query optimizer chooses the index table for the query especially when the 
> index row key prefix length is greater than the data row key prefix length 
> for the query. On the server side, the index table is scanned using index row 
> key ranges implied by the query and the index row keys are then mapped to the 
> data table row keys (please note an index row key includes all the data row 
> key columns). Finally, the corresponding data table rows are scanned using 
> server-to-server RPCs.  PHOENIX-6458 (this Jira) retrieves the data table 
> rows one by one using the HBase get operation. PHOENIX-6501 replaces this get 
> operation with the scan operation to reduce the number of server-to-server 
> RPC calls.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (PHOENIX-6458) Using global indexes for queries with uncovered columns

2022-03-04 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17501614#comment-17501614
 ] 

Lars Hofhansl commented on PHOENIX-6458:


Hmm... Doesn't quite seem to work:
{code:java}
 > select /*+ NO_INDEX */ count(suppkey) from lineitem where tax = 0.08;
++
| COUNT(SUPPKEY) |
++
| 2000406        |
++
1 row selected (6.614 seconds)

> select /*+ INDEX(lineitem g_l_tax) */ count(suppkey) from lineitem where tax 
> = 0.08;
+--+
| COUNT("SUPPKEY") |
+--+
| 0                |
+--+
1 row selected (7.422 seconds)

> explain select /*+ INDEX(lineitem g_l_tax) */ count(suppkey) from lineitem 
> where tax = 0.08;
+-++---+---+
|                                          PLAN                                 
          | EST_BYTES_READ | EST_ROWS_READ |  EST_INFO_TS  |
+-++---+---+
| CLIENT 3-CHUNK 511502 ROWS 20971582 BYTES PARALLEL 1-WAY RANGE SCAN OVER 
G_L_TAX [0.08] | 20971582       | 511502        | 1646441656705 |
|     SERVER MERGE [0.SUPPKEY]                                                  
          | 20971582       | 511502        | 1646441656705 |
|     SERVER FILTER BY FIRST KEY ONLY                                           
          | 20971582       | 511502        | 1646441656705 |
|     SERVER AGGREGATE INTO SINGLE ROW                                          
          | 20971582       | 511502        | 1646441656705 |
+-++---+---+
4 rows selected (0.03 seconds){code}
 

[~kozdemir] 

> Using global indexes for queries with uncovered columns
> ---
>
> Key: PHOENIX-6458
> URL: https://issues.apache.org/jira/browse/PHOENIX-6458
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.1.0
>Reporter: Kadir Ozdemir
>Assignee: Kadir OZDEMIR
>Priority: Major
> Fix For: 4.17.0, 5.2.0, 5.1.3
>
> Attachments: PHOENIX-6458.master.001.patch, 
> PHOENIX-6458.master.002.patch
>
>
> The Phoenix query optimizer does not use a global index for a query with the 
> columns that are not covered by the global index if the query does not have 
> the corresponding index hint for this index. With the index hint, the 
> optimizer rewrites the query where the index is used within a subquery. With 
> this subquery, the row keys of the index rows that satisfy the subquery are 
> retrieved by the Phoenix client and then pushed into the Phoenix server 
> caches of the data table regions. Finally, on the server side, data table 
> rows are scanned and joined with the index rows using HashJoin. Based on the 
> selectivity of the original query, this join operation may still result in 
> scanning a large amount of data table rows. 
> Eliminating these data table scans would be a significant improvement. To do 
> that, instead of rewriting the query, the Phoenix optimizer simply treats the 
> global index as a covered index for the given query. With this, the Phoenix 
> query optimizer chooses the index table for the query especially when the 
> index row key prefix length is greater than the data row key prefix length 
> for the query. On the server side, the index table is scanned using index row 
> key ranges implied by the query and the index row keys are then mapped to the 
> data table row keys (please note an index row key includes all the data row 
> key columns). Finally, the corresponding data table rows are scanned using 
> server-to-server RPCs.  PHOENIX-6458 (this Jira) retrieves the data table 
> rows one by one using the HBase get operation. PHOENIX-6501 replaces this get 
> operation with the scan operation to reduce the number of server-to-server 
> RPC calls.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (PHOENIX-6636) Replace bundled log4j libraries with reload4j

2022-02-28 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17499082#comment-17499082
 ] 

Lars Hofhansl commented on PHOENIX-6636:


Thanks. Works fine now.

> Replace bundled log4j libraries with reload4j
> -
>
> Key: PHOENIX-6636
> URL: https://issues.apache.org/jira/browse/PHOENIX-6636
> Project: Phoenix
>  Issue Type: Bug
>  Components: connectors, core, queryserver
>Affects Versions: 5.2.0
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
> Fix For: 4.17.0, 5.2.0, 4.16.2, 5.1.3
>
>
> To reduce the number of dependecies with unresolved CVEs, replace the bundled 
> log4j libraries with reload4j ([https://reload4j.qos.ch/).]
> This will also require bumping the slf4j version.
> This is a quick fix, and does not preclude moving to some different backend 
> later (like log4j2 or logback)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (PHOENIX-6636) Replace bundled log4j libraries with reload4j

2022-02-25 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17498354#comment-17498354
 ] 

Lars Hofhansl commented on PHOENIX-6636:


(y)

> Replace bundled log4j libraries with reload4j
> -
>
> Key: PHOENIX-6636
> URL: https://issues.apache.org/jira/browse/PHOENIX-6636
> Project: Phoenix
>  Issue Type: Bug
>  Components: connectors, core, queryserver
>Affects Versions: 5.2.0
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
> Fix For: 4.17.0, 5.2.0, 4.16.2, 5.1.3
>
>
> To reduce the number of dependecies with unresolved CVEs, replace the bundled 
> log4j libraries with reload4j ([https://reload4j.qos.ch/).]
> This will also require bumping the slf4j version.
> This is a quick fix, and does not preclude moving to some different backend 
> later (like log4j2 or logback)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Comment Edited] (PHOENIX-6636) Replace bundled log4j libraries with reload4j

2022-02-25 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17498305#comment-17498305
 ] 

Lars Hofhansl edited comment on PHOENIX-6636 at 2/25/22, 9:26 PM:
--

Hmm... Getting this now when running Phoenix client (hbase 2.4 and hadoop 3.3.x 
profile).

This used to work before. Might be due to the this, or the new 3rd party 
dependency.

Update: Yeah, it's this one, not the updated 3rd party dependency.

[~stoty] 
{code:java}
WARNING: Exception thrown by removal listener
java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.phoenix.monitoring.GlobalClientMetrics
at 
org.apache.phoenix.query.ConnectionQueryServicesImpl.close(ConnectionQueryServicesImpl.java:569)
at org.apache.phoenix.jdbc.PhoenixDriver$2.onRemoval(PhoenixDriver.java:163)
at 
org.apache.phoenix.thirdparty.com.google.common.cache.LocalCache.processPendingNotifications(LocalCache.java:1808)
at 
org.apache.phoenix.thirdparty.com.google.common.cache.LocalCache$Segment.runUnlockedCleanup(LocalCache.java:3379)
at 
org.apache.phoenix.thirdparty.com.google.common.cache.LocalCache$Segment.postWriteCleanup(LocalCache.java:3355)
at 
org.apache.phoenix.thirdparty.com.google.common.cache.LocalCache$Segment.remove(LocalCache.java:2989)
at 
org.apache.phoenix.thirdparty.com.google.common.cache.LocalCache.remove(LocalCache.java:4104)
at 
org.apache.phoenix.thirdparty.com.google.common.cache.LocalCache$LocalManualCache.invalidate(LocalCache.java:4739)
at 
org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDriver.java:270)
at 
org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.createConnection(PhoenixEmbeddedDriver.java:144)
at org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:221)
at sqlline.DatabaseConnection.connect(DatabaseConnection.java:135)
at sqlline.DatabaseConnection.getConnection(DatabaseConnection.java:192)
at sqlline.Commands.connect(Commands.java:1364)
at sqlline.Commands.connect(Commands.java:1244)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at sqlline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:38)
at sqlline.SqlLine.dispatch(SqlLine.java:730)
at sqlline.SqlLine.initArgs(SqlLine.java:410)
at sqlline.SqlLine.begin(SqlLine.java:515)
at sqlline.SqlLine.start(SqlLine.java:267)
at sqlline.SqlLine.main(SqlLine.java:206)

java.lang.NoClassDefFoundError: org/apache/log4j/AppenderSkeleton
at java.base/java.lang.ClassLoader.defineClass1(Native Method)
at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
at 
java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:174)
at 
java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:800)
at 
java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:698)
at 
java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:621)
at 
java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:579)
at 
java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
at 
org.apache.hadoop.metrics2.source.JvmMetrics.getEventCounters(JvmMetrics.java:288)
at org.apache.hadoop.metrics2.source.JvmMetrics.getMetrics(JvmMetrics.java:157)
at 
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:200)
at 
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:183)
at 
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:156)
at 
java.management/com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(DefaultMBeanServerInterceptor.java:329)
at 
java.management/com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:315)
at 
java.management/com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522)
at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:100)
at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:73)
at 
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.startMBeans(MetricsSourceAdapter.java:222)
at 
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.start(MetricsSourceAdapter.java:101)
at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.registerSource(MetricsSystemImpl.java:268)
at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:233)
at org.apache.hadoop.metrics2.source.JvmMetrics.create(JvmMetrics.java:123)
at

[jira] [Comment Edited] (PHOENIX-6636) Replace bundled log4j libraries with reload4j

2022-02-25 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17498305#comment-17498305
 ] 

Lars Hofhansl edited comment on PHOENIX-6636 at 2/25/22, 8:50 PM:
--

Hmm... Getting this now when running Phoenix client (hbase 2.4 and hadoop 3.3.x 
profile).

This used to work before. Might be due to the this, or the new 3rd party 
dependency.

[~stoty] 
{code:java}
WARNING: Exception thrown by removal listener
java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.phoenix.monitoring.GlobalClientMetrics
at 
org.apache.phoenix.query.ConnectionQueryServicesImpl.close(ConnectionQueryServicesImpl.java:569)
at org.apache.phoenix.jdbc.PhoenixDriver$2.onRemoval(PhoenixDriver.java:163)
at 
org.apache.phoenix.thirdparty.com.google.common.cache.LocalCache.processPendingNotifications(LocalCache.java:1808)
at 
org.apache.phoenix.thirdparty.com.google.common.cache.LocalCache$Segment.runUnlockedCleanup(LocalCache.java:3379)
at 
org.apache.phoenix.thirdparty.com.google.common.cache.LocalCache$Segment.postWriteCleanup(LocalCache.java:3355)
at 
org.apache.phoenix.thirdparty.com.google.common.cache.LocalCache$Segment.remove(LocalCache.java:2989)
at 
org.apache.phoenix.thirdparty.com.google.common.cache.LocalCache.remove(LocalCache.java:4104)
at 
org.apache.phoenix.thirdparty.com.google.common.cache.LocalCache$LocalManualCache.invalidate(LocalCache.java:4739)
at 
org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDriver.java:270)
at 
org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.createConnection(PhoenixEmbeddedDriver.java:144)
at org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:221)
at sqlline.DatabaseConnection.connect(DatabaseConnection.java:135)
at sqlline.DatabaseConnection.getConnection(DatabaseConnection.java:192)
at sqlline.Commands.connect(Commands.java:1364)
at sqlline.Commands.connect(Commands.java:1244)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at sqlline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:38)
at sqlline.SqlLine.dispatch(SqlLine.java:730)
at sqlline.SqlLine.initArgs(SqlLine.java:410)
at sqlline.SqlLine.begin(SqlLine.java:515)
at sqlline.SqlLine.start(SqlLine.java:267)
at sqlline.SqlLine.main(SqlLine.java:206)

java.lang.NoClassDefFoundError: org/apache/log4j/AppenderSkeleton
at java.base/java.lang.ClassLoader.defineClass1(Native Method)
at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
at 
java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:174)
at 
java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:800)
at 
java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:698)
at 
java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:621)
at 
java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:579)
at 
java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
at 
org.apache.hadoop.metrics2.source.JvmMetrics.getEventCounters(JvmMetrics.java:288)
at org.apache.hadoop.metrics2.source.JvmMetrics.getMetrics(JvmMetrics.java:157)
at 
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:200)
at 
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:183)
at 
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:156)
at 
java.management/com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(DefaultMBeanServerInterceptor.java:329)
at 
java.management/com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:315)
at 
java.management/com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522)
at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:100)
at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:73)
at 
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.startMBeans(MetricsSourceAdapter.java:222)
at 
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.start(MetricsSourceAdapter.java:101)
at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.registerSource(MetricsSystemImpl.java:268)
at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:233)
at org.apache.hadoop.metrics2.source.JvmMetrics.create(JvmMetrics.java:123)
at

[jira] [Commented] (PHOENIX-6636) Replace bundled log4j libraries with reload4j

2022-02-25 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17498305#comment-17498305
 ] 

Lars Hofhansl commented on PHOENIX-6636:


Hmm... Getting this now when starting Phoenix (hbase 2.4 and hadoop 3.3.x 
profile).

This used to work before. Might be due to the this, or the new 3rd party 
dependency.

[~stoty] 
{code:java}
WARNING: Exception thrown by removal listener
java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.phoenix.monitoring.GlobalClientMetrics
at 
org.apache.phoenix.query.ConnectionQueryServicesImpl.close(ConnectionQueryServicesImpl.java:569)
at org.apache.phoenix.jdbc.PhoenixDriver$2.onRemoval(PhoenixDriver.java:163)
at 
org.apache.phoenix.thirdparty.com.google.common.cache.LocalCache.processPendingNotifications(LocalCache.java:1808)
at 
org.apache.phoenix.thirdparty.com.google.common.cache.LocalCache$Segment.runUnlockedCleanup(LocalCache.java:3379)
at 
org.apache.phoenix.thirdparty.com.google.common.cache.LocalCache$Segment.postWriteCleanup(LocalCache.java:3355)
at 
org.apache.phoenix.thirdparty.com.google.common.cache.LocalCache$Segment.remove(LocalCache.java:2989)
at 
org.apache.phoenix.thirdparty.com.google.common.cache.LocalCache.remove(LocalCache.java:4104)
at 
org.apache.phoenix.thirdparty.com.google.common.cache.LocalCache$LocalManualCache.invalidate(LocalCache.java:4739)
at 
org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDriver.java:270)
at 
org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.createConnection(PhoenixEmbeddedDriver.java:144)
at org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:221)
at sqlline.DatabaseConnection.connect(DatabaseConnection.java:135)
at sqlline.DatabaseConnection.getConnection(DatabaseConnection.java:192)
at sqlline.Commands.connect(Commands.java:1364)
at sqlline.Commands.connect(Commands.java:1244)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at sqlline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:38)
at sqlline.SqlLine.dispatch(SqlLine.java:730)
at sqlline.SqlLine.initArgs(SqlLine.java:410)
at sqlline.SqlLine.begin(SqlLine.java:515)
at sqlline.SqlLine.start(SqlLine.java:267)
at sqlline.SqlLine.main(SqlLine.java:206)

java.lang.NoClassDefFoundError: org/apache/log4j/AppenderSkeleton
at java.base/java.lang.ClassLoader.defineClass1(Native Method)
at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1017)
at 
java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:174)
at 
java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:800)
at 
java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:698)
at 
java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:621)
at 
java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:579)
at 
java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
at 
org.apache.hadoop.metrics2.source.JvmMetrics.getEventCounters(JvmMetrics.java:288)
at org.apache.hadoop.metrics2.source.JvmMetrics.getMetrics(JvmMetrics.java:157)
at 
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMetrics(MetricsSourceAdapter.java:200)
at 
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.updateJmxCache(MetricsSourceAdapter.java:183)
at 
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.getMBeanInfo(MetricsSourceAdapter.java:156)
at 
java.management/com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(DefaultMBeanServerInterceptor.java:329)
at 
java.management/com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:315)
at 
java.management/com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522)
at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:100)
at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:73)
at 
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.startMBeans(MetricsSourceAdapter.java:222)
at 
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter.start(MetricsSourceAdapter.java:101)
at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.registerSource(MetricsSystemImpl.java:268)
at 
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:233)
at org.apache.hadoop.metrics2.source.JvmMetrics.create(JvmMetrics.java:123)
at 
org.apache.hadoop.metrics2.source.JvmMetrics$Singleton.init(JvmMetrics.java:63)
at

[jira] [Commented] (PHOENIX-6458) Using global indexes for queries with uncovered columns

2022-02-25 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17498285#comment-17498285
 ] 

Lars Hofhansl commented on PHOENIX-6458:


(y)

Awesome

> Using global indexes for queries with uncovered columns
> ---
>
> Key: PHOENIX-6458
> URL: https://issues.apache.org/jira/browse/PHOENIX-6458
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.1.0
>Reporter: Kadir Ozdemir
>Assignee: Lars Hofhansl
>Priority: Major
> Attachments: PHOENIX-6458.master.001.patch, 
> PHOENIX-6458.master.002.patch
>
>
> The Phoenix query optimizer does not use a global index for a query with the 
> columns that are not covered by the global index if the query does not have 
> the corresponding index hint for this index. With the index hint, the 
> optimizer rewrites the query where the index is used within a subquery. With 
> this subquery, the row keys of the index rows that satisfy the subquery are 
> retrieved by the Phoenix client and then pushed into the Phoenix server 
> caches of the data table regions. Finally, on the server side, data table 
> rows are scanned and joined with the index rows using HashJoin. Based on the 
> selectivity of the original query, this join operation may still result in 
> scanning a large amount of data table rows. 
> Eliminating these data table scans would be a significant improvement. To do 
> that, instead of rewriting the query, the Phoenix optimizer simply treats the 
> global index as a covered index for the given query. With this, the Phoenix 
> query optimizer chooses the index table for the query especially when the 
> index row key prefix length is greater than the data row key prefix length 
> for the query. On the server side, the index table is scanned using index row 
> key ranges implied by the query and the index row keys are then mapped to the 
> data table row keys (please note an index row key includes all the data row 
> key columns). Finally, the corresponding data table rows are scanned using 
> server-to-server RPCs.  PHOENIX-6458 (this Jira) retrieves the data table 
> rows one by one using the HBase get operation. PHOENIX-6501 replaces this get 
> operation with the scan operation to reduce the number of server-to-server 
> RPC calls.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (PHOENIX-6458) Using global indexes for queries with uncovered columns

2022-02-24 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17497945#comment-17497945
 ] 

Lars Hofhansl commented on PHOENIX-6458:


[~kozdemir] This should go into the 5.1 branch as well, right?

> Using global indexes for queries with uncovered columns
> ---
>
> Key: PHOENIX-6458
> URL: https://issues.apache.org/jira/browse/PHOENIX-6458
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.1.0
>Reporter: Kadir Ozdemir
>Assignee: Lars Hofhansl
>Priority: Major
> Attachments: PHOENIX-6458.master.001.patch, 
> PHOENIX-6458.master.002.patch
>
>
> The Phoenix query optimizer does not use a global index for a query with the 
> columns that are not covered by the global index if the query does not have 
> the corresponding index hint for this index. With the index hint, the 
> optimizer rewrites the query where the index is used within a subquery. With 
> this subquery, the row keys of the index rows that satisfy the subquery are 
> retrieved by the Phoenix client and then pushed into the Phoenix server 
> caches of the data table regions. Finally, on the server side, data table 
> rows are scanned and joined with the index rows using HashJoin. Based on the 
> selectivity of the original query, this join operation may still result in 
> scanning a large amount of data table rows. 
> Eliminating these data table scans would be a significant improvement. To do 
> that, instead of rewriting the query, the Phoenix optimizer simply treats the 
> global index as a covered index for the given query. With this, the Phoenix 
> query optimizer chooses the index table for the query especially when the 
> index row key prefix length is greater than the data row key prefix length 
> for the query. On the server side, the index table is scanned using index row 
> key ranges implied by the query and the index row keys are then mapped to the 
> data table row keys (please note an index row key includes all the data row 
> key columns). Finally, the corresponding data table rows are scanned using 
> server-to-server RPCs.  PHOENIX-6458 (this Jira) retrieves the data table 
> rows one by one using the HBase get operation. PHOENIX-6501 replaces this get 
> operation with the scan operation to reduce the number of server-to-server 
> RPC calls.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (PHOENIX-6647) A local index should not be chosen for a full scan if that scan is not covered by the index.

2022-02-10 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17490481#comment-17490481
 ] 

Lars Hofhansl commented on PHOENIX-6647:


Now that I spent a 1/2 looking at this, the cost estimation is all weird anyway.

Why do we apply a parallelLevel for join and sort operations? We do not apply 
that to scans, which also run in parallel on the servers. If anything we might 
want to apply a multiplier for client sort/joins instead.

That throws the calculations off. In addition we're already mixing CPU, IO, and 
memory metrics in the code, but then say we are not using CPU and memory, yet.

I doubt anyone has the cost based optimizer enabled since it is default off. So 
I wonder if there's any value spending a few more hours on this...

> A local index should not be chosen for a full scan if that scan is not 
> covered by the index.
> 
>
> Key: PHOENIX-6647
> URL: https://issues.apache.org/jira/browse/PHOENIX-6647
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 5.1.2
>Reporter: Lars Hofhansl
>Priority: Major
> Attachments: 6647-5.1.txt, 6647-v2-5.1.txt
>
>
> {code}
> > explain select * from lineitem;
> +-++-+
> | 
>  PLAN 
>   | EST_BYTES_READ | EST |
> +-++-+
> | CLIENT 103-CHUNK 17711182 ROWS 1059064693 BYTES PARALLEL 2-WAY ROUND ROBIN 
> RANGE SCAN OVER LINEITEM [1]  
>| 1059064693 | 177 |
> | SERVER MERGE [0.PARTKEY, 0.SUPPKEY, 0.QUANTITY, 0.EXTENDEDPRICE, 
> 0.DISCOUNT, 0.TAX, 0.RETURNFLAG, 0.LINESTATUS, 0.COMMITDATE, 0.RECEIPTDATE, 
> 0.SHIPINSTRUCT, 0.SHIPMODE, 0.COMMENT] | 1059064693 | 177 |
> | SERVER FILTER BY FIRST KEY ONLY 
>   
>   | 1059064693 | 177 |
> +-++-+
> 3 rows selected (0.056 seconds)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (PHOENIX-6647) A local index should not be chosen for a full scan if that scan is not covered by the index.

2022-02-10 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17490038#comment-17490038
 ] 

Lars Hofhansl commented on PHOENIX-6647:


But that doesn't work well for sorting... I'll leave it here for now.

> A local index should not be chosen for a full scan if that scan is not 
> covered by the index.
> 
>
> Key: PHOENIX-6647
> URL: https://issues.apache.org/jira/browse/PHOENIX-6647
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 5.1.2
>Reporter: Lars Hofhansl
>Priority: Major
> Attachments: 6647-5.1.txt, 6647-v2-5.1.txt
>
>
> {code}
> > explain select * from lineitem;
> +-++-+
> | 
>  PLAN 
>   | EST_BYTES_READ | EST |
> +-++-+
> | CLIENT 103-CHUNK 17711182 ROWS 1059064693 BYTES PARALLEL 2-WAY ROUND ROBIN 
> RANGE SCAN OVER LINEITEM [1]  
>| 1059064693 | 177 |
> | SERVER MERGE [0.PARTKEY, 0.SUPPKEY, 0.QUANTITY, 0.EXTENDEDPRICE, 
> 0.DISCOUNT, 0.TAX, 0.RETURNFLAG, 0.LINESTATUS, 0.COMMITDATE, 0.RECEIPTDATE, 
> 0.SHIPINSTRUCT, 0.SHIPMODE, 0.COMMENT] | 1059064693 | 177 |
> | SERVER FILTER BY FIRST KEY ONLY 
>   
>   | 1059064693 | 177 |
> +-++-+
> 3 rows selected (0.056 seconds)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (PHOENIX-6647) A local index should not be chosen for a full scan if that scan is not covered by the index.

2022-02-09 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17490014#comment-17490014
 ] 

Lars Hofhansl commented on PHOENIX-6647:


-v2 is better.

Takes increased size and significant row cost into account. Reasoning goes as 
follows:
 * Cost goes by input bytes mostly
 * Merge will add IO from the main table
 * But, merge cost is mostly per row (a seek into the main table's region for 
every single matched row)
 * Hence: Uncovered index scan needs to safe a lot of IO in order to make up 
for merge cost (see also 
https://hadoop-hbase.blogspot.com/2018/10/apache-hbase-and-apache-phoenix-more-on.html)
 * A per row cost of 1 is reasonable because it would force an index to at 
least be 1% selective for 100 byte rows, 10% for 1000 bytes rows, etc, which 
experimentally turned out to be right.

Ideally the merge cost would count as CPU cost, not IO.

> A local index should not be chosen for a full scan if that scan is not 
> covered by the index.
> 
>
> Key: PHOENIX-6647
> URL: https://issues.apache.org/jira/browse/PHOENIX-6647
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 5.1.2
>Reporter: Lars Hofhansl
>Priority: Major
> Attachments: 6647-5.1.txt, 6647-v2-5.1.txt
>
>
> {code}
> > explain select * from lineitem;
> +-++-+
> | 
>  PLAN 
>   | EST_BYTES_READ | EST |
> +-++-+
> | CLIENT 103-CHUNK 17711182 ROWS 1059064693 BYTES PARALLEL 2-WAY ROUND ROBIN 
> RANGE SCAN OVER LINEITEM [1]  
>| 1059064693 | 177 |
> | SERVER MERGE [0.PARTKEY, 0.SUPPKEY, 0.QUANTITY, 0.EXTENDEDPRICE, 
> 0.DISCOUNT, 0.TAX, 0.RETURNFLAG, 0.LINESTATUS, 0.COMMITDATE, 0.RECEIPTDATE, 
> 0.SHIPINSTRUCT, 0.SHIPMODE, 0.COMMENT] | 1059064693 | 177 |
> | SERVER FILTER BY FIRST KEY ONLY 
>   
>   | 1059064693 | 177 |
> +-++-+
> 3 rows selected (0.056 seconds)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Comment Edited] (PHOENIX-6647) A local index should not be chosen for a full scan if that scan is not covered by the index.

2022-02-09 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17489869#comment-17489869
 ] 

Lars Hofhansl edited comment on PHOENIX-6647 at 2/10/22, 5:03 AM:
--

Ah. Can't help it. Here's a change that fixes it for me.

Something like. Other heuristics with server merge are also possible.


was (Author: lhofhansl):
Ah. Can't help it. Here's a change that fixes it for me.

> A local index should not be chosen for a full scan if that scan is not 
> covered by the index.
> 
>
> Key: PHOENIX-6647
> URL: https://issues.apache.org/jira/browse/PHOENIX-6647
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 5.1.2
>Reporter: Lars Hofhansl
>Priority: Major
> Attachments: 6647-5.1.txt
>
>
> {code}
> > explain select * from lineitem;
> +-++-+
> | 
>  PLAN 
>   | EST_BYTES_READ | EST |
> +-++-+
> | CLIENT 103-CHUNK 17711182 ROWS 1059064693 BYTES PARALLEL 2-WAY ROUND ROBIN 
> RANGE SCAN OVER LINEITEM [1]  
>| 1059064693 | 177 |
> | SERVER MERGE [0.PARTKEY, 0.SUPPKEY, 0.QUANTITY, 0.EXTENDEDPRICE, 
> 0.DISCOUNT, 0.TAX, 0.RETURNFLAG, 0.LINESTATUS, 0.COMMITDATE, 0.RECEIPTDATE, 
> 0.SHIPINSTRUCT, 0.SHIPMODE, 0.COMMENT] | 1059064693 | 177 |
> | SERVER FILTER BY FIRST KEY ONLY 
>   
>   | 1059064693 | 177 |
> +-++-+
> 3 rows selected (0.056 seconds)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (PHOENIX-6647) A local index should not be chosen for a full scan if that scan is not covered by the index.

2022-02-09 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17489869#comment-17489869
 ] 

Lars Hofhansl commented on PHOENIX-6647:


Ah. Can't help it. Here's a change that fixes it for me.

> A local index should not be chosen for a full scan if that scan is not 
> covered by the index.
> 
>
> Key: PHOENIX-6647
> URL: https://issues.apache.org/jira/browse/PHOENIX-6647
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 5.1.2
>Reporter: Lars Hofhansl
>Priority: Major
> Attachments: 6647-5.1.txt
>
>
> {code}
> > explain select * from lineitem;
> +-++-+
> | 
>  PLAN 
>   | EST_BYTES_READ | EST |
> +-++-+
> | CLIENT 103-CHUNK 17711182 ROWS 1059064693 BYTES PARALLEL 2-WAY ROUND ROBIN 
> RANGE SCAN OVER LINEITEM [1]  
>| 1059064693 | 177 |
> | SERVER MERGE [0.PARTKEY, 0.SUPPKEY, 0.QUANTITY, 0.EXTENDEDPRICE, 
> 0.DISCOUNT, 0.TAX, 0.RETURNFLAG, 0.LINESTATUS, 0.COMMITDATE, 0.RECEIPTDATE, 
> 0.SHIPINSTRUCT, 0.SHIPMODE, 0.COMMENT] | 1059064693 | 177 |
> | SERVER FILTER BY FIRST KEY ONLY 
>   
>   | 1059064693 | 177 |
> +-++-+
> 3 rows selected (0.056 seconds)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Comment Edited] (PHOENIX-6647) A local index should not be chosen for a full scan if that scan is not covered by the index.

2022-02-09 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17489844#comment-17489844
 ] 

Lars Hofhansl edited comment on PHOENIX-6647 at 2/9/22, 10:32 PM:
--

I do have the cost based optimizer enabled, which I think we haven't maintained 
a lot.

So the fix is to take the cost of merging the main table into the cost 
calculation.


was (Author: lhofhansl):
I do have the cost based optimizer enabled, which I think we haven't maintained 
a lot.

> A local index should not be chosen for a full scan if that scan is not 
> covered by the index.
> 
>
> Key: PHOENIX-6647
> URL: https://issues.apache.org/jira/browse/PHOENIX-6647
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 5.1.2
>Reporter: Lars Hofhansl
>Priority: Major
>
> {code}
> > explain select * from lineitem;
> +-++-+
> | 
>  PLAN 
>   | EST_BYTES_READ | EST |
> +-++-+
> | CLIENT 103-CHUNK 17711182 ROWS 1059064693 BYTES PARALLEL 2-WAY ROUND ROBIN 
> RANGE SCAN OVER LINEITEM [1]  
>| 1059064693 | 177 |
> | SERVER MERGE [0.PARTKEY, 0.SUPPKEY, 0.QUANTITY, 0.EXTENDEDPRICE, 
> 0.DISCOUNT, 0.TAX, 0.RETURNFLAG, 0.LINESTATUS, 0.COMMITDATE, 0.RECEIPTDATE, 
> 0.SHIPINSTRUCT, 0.SHIPMODE, 0.COMMENT] | 1059064693 | 177 |
> | SERVER FILTER BY FIRST KEY ONLY 
>   
>   | 1059064693 | 177 |
> +-++-+
> 3 rows selected (0.056 seconds)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (PHOENIX-6647) A local index should not be chosen for a full scan if that scan is not covered by the index.

2022-02-09 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17489844#comment-17489844
 ] 

Lars Hofhansl commented on PHOENIX-6647:


I do have the cost based optimizer enabled, which I think we haven't maintained 
a lot.

> A local index should not be chosen for a full scan if that scan is not 
> covered by the index.
> 
>
> Key: PHOENIX-6647
> URL: https://issues.apache.org/jira/browse/PHOENIX-6647
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 5.1.2
>Reporter: Lars Hofhansl
>Priority: Major
>
> {code}
> > explain select * from lineitem;
> +-++-+
> | 
>  PLAN 
>   | EST_BYTES_READ | EST |
> +-++-+
> | CLIENT 103-CHUNK 17711182 ROWS 1059064693 BYTES PARALLEL 2-WAY ROUND ROBIN 
> RANGE SCAN OVER LINEITEM [1]  
>| 1059064693 | 177 |
> | SERVER MERGE [0.PARTKEY, 0.SUPPKEY, 0.QUANTITY, 0.EXTENDEDPRICE, 
> 0.DISCOUNT, 0.TAX, 0.RETURNFLAG, 0.LINESTATUS, 0.COMMITDATE, 0.RECEIPTDATE, 
> 0.SHIPINSTRUCT, 0.SHIPMODE, 0.COMMENT] | 1059064693 | 177 |
> | SERVER FILTER BY FIRST KEY ONLY 
>   
>   | 1059064693 | 177 |
> +-++-+
> 3 rows selected (0.056 seconds)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (PHOENIX-6647) A local index should not be chosen for a full scan if that scan is not covered by the index.

2022-02-09 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17489841#comment-17489841
 ] 

Lars Hofhansl commented on PHOENIX-6647:


The planner determines that there are fewer bytes to scan through the index, 
but with the merge, this is 10-100x more expensive to execute.

 

(Just wanna park it here, won't have time to look at it.)

> A local index should not be chosen for a full scan if that scan is not 
> covered by the index.
> 
>
> Key: PHOENIX-6647
> URL: https://issues.apache.org/jira/browse/PHOENIX-6647
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 5.1.2
>Reporter: Lars Hofhansl
>Priority: Major
>
> {code}
> > explain select * from lineitem;
> +-++-+
> | 
>  PLAN 
>   | EST_BYTES_READ | EST |
> +-++-+
> | CLIENT 103-CHUNK 17711182 ROWS 1059064693 BYTES PARALLEL 2-WAY ROUND ROBIN 
> RANGE SCAN OVER LINEITEM [1]  
>| 1059064693 | 177 |
> | SERVER MERGE [0.PARTKEY, 0.SUPPKEY, 0.QUANTITY, 0.EXTENDEDPRICE, 
> 0.DISCOUNT, 0.TAX, 0.RETURNFLAG, 0.LINESTATUS, 0.COMMITDATE, 0.RECEIPTDATE, 
> 0.SHIPINSTRUCT, 0.SHIPMODE, 0.COMMENT] | 1059064693 | 177 |
> | SERVER FILTER BY FIRST KEY ONLY 
>   
>   | 1059064693 | 177 |
> +-++-+
> 3 rows selected (0.056 seconds)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (PHOENIX-6615) The Tephra transaction processor cannot be loaded anymore.

2022-01-06 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17470170#comment-17470170
 ] 

Lars Hofhansl commented on PHOENIX-6615:


Thanks! Just approved the PR. Happy to merge unless you want to that.

> The Tephra transaction processor cannot be loaded anymore.
> --
>
> Key: PHOENIX-6615
> URL: https://issues.apache.org/jira/browse/PHOENIX-6615
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 5.1.2
>Reporter: Lars Hofhansl
>Assignee: Istvan Toth
>Priority: Major
> Attachments: 6615.txt
>
>
> See
>  # TransactionFactory
>  # TephraTransactionProvider
> Can you spot the problem? :)  (Hint: The constructor is private.)
> Broken since PHOENIX-6064. [~stoty] .
> Can I just say... Unless I am missing something... How could we not have 
> noticed that one of the transaction processors has not been working since 
> August (in 5.x at least)? Is really nobody using the transaction engines?
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (PHOENIX-6615) The Tephra transaction processor cannot be loaded anymore.

2021-12-21 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17463377#comment-17463377
 ] 

Lars Hofhansl commented on PHOENIX-6615:


I guess the breakage was hard to notice, since the tests are (necessarily) 
disabled for the NotAvailableTransactionProvider.

> The Tephra transaction processor cannot be loaded anymore.
> --
>
> Key: PHOENIX-6615
> URL: https://issues.apache.org/jira/browse/PHOENIX-6615
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 5.1.2
>Reporter: Lars Hofhansl
>Priority: Major
> Attachments: 6615.txt
>
>
> See
>  # TransactionFactory
>  # TephraTransactionProvider
> Can you spot the problem? :)  (Hint: The constructor is private.)
> Broken since PHOENIX-6064. [~stoty] .
> Can I just say... Unless I am missing something... How could we not have 
> noticed that one of the transaction processors has not been working since 
> August (in 5.x at least)? Is really nobody using the transaction engines?
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (PHOENIX-6615) The Tephra transaction processor cannot be loaded anymore.

2021-12-21 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17463376#comment-17463376
 ] 

Lars Hofhansl commented on PHOENIX-6615:


Thanks [~stoty] .

I agree the bitrot is around Tephra, not Phoenix. And the job you do 
maintaining Tephra is appreciated!

Omid and Tephra have different pros and cons. Tephra has no (or negligible) per 
row cost, but a higher per transaction cost (since all failed or rolled-back 
transactions have to be sent to all future transaction until they are 
collected), Omid on the other hand has a fairly high per row cost (as the 
shadow columns need to be updated) but no lasting per transactions cost (once 
the shadow columns are updated no further information is needed).

So I'd use Omid for many small transactions, and Tephra for few but large 
transactions... And maybe it's really not worth it. :)

I agree with you that maintaining the perception that a project is maintained 
is worth than ripping it out.

As for transactions in general... Phoenix is the only engine I know that 
supports OLTP like interactions (JDBC et al), and high-volume interactions (via 
regions, guideposts and M/R, Spark, and Trino integrations), and transactions. 
It's something that really sets Phoenix apart. It would be just a shame to let 
that rot.

Yes, let's take it up on the dev list.

My priorities have since changed more into the bigger ecosystem of data (Trino, 
Iceberg, Spark, Federation, Real-time Ingestion, M/L, Governance, etc). That 
said, Phoenix has still a place in that ecosystem as I mentioned before, and so 
I am still interested at that level.

 

> The Tephra transaction processor cannot be loaded anymore.
> --
>
> Key: PHOENIX-6615
> URL: https://issues.apache.org/jira/browse/PHOENIX-6615
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 5.1.2
>Reporter: Lars Hofhansl
>Priority: Major
> Attachments: 6615.txt
>
>
> See
>  # TransactionFactory
>  # TephraTransactionProvider
> Can you spot the problem? :)  (Hint: The constructor is private.)
> Broken since PHOENIX-6064. [~stoty] .
> Can I just say... Unless I am missing something... How could we not have 
> noticed that one of the transaction processors has not been working since 
> August (in 5.x at least)? Is really nobody using the transaction engines?
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (PHOENIX-6615) The Tephra transaction processor cannot be loaded anymore.

2021-12-20 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17463016#comment-17463016
 ] 

Lars Hofhansl commented on PHOENIX-6615:


I guess something like this is what you wanted... Attempt to load the singleton 
via reflection.

> The Tephra transaction processor cannot be loaded anymore.
> --
>
> Key: PHOENIX-6615
> URL: https://issues.apache.org/jira/browse/PHOENIX-6615
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 5.1.2
>Reporter: Lars Hofhansl
>Priority: Major
> Attachments: 6615.txt
>
>
> See
>  # TransactionFactory
>  # TephraTransactionProvider
> Can you spot the problem? :)  (Hint: The constructor is private.)
> Broken since PHOENIX-6064. [~stoty] .
> Can I just say... Unless I am missing something... How could we not have 
> noticed that one of the transaction processors does not work since August? Is 
> really nobody using the transaction engines?
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (PHOENIX-6064) Make Tephra support optional

2021-12-20 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17463007#comment-17463007
 ] 

Lars Hofhansl commented on PHOENIX-6064:


Turns out this actually break Tephra completely and nobody even noticed.

Not blaming this change, but Phoenix and associated projects have started to 
bitrot it seems.

Anyway... I don't work on this more, was just trying something unrelated. Which 
then turned in a few hours getting Tephra to build without no longer supported 
JDKs following by the realization that Phoenix cannot even load the 
TephraTransactionProvider anymore.

Not sure why I even care - and if this sounds frustrated is because I am.

 

> Make Tephra support optional
> 
>
> Key: PHOENIX-6064
> URL: https://issues.apache.org/jira/browse/PHOENIX-6064
> Project: Phoenix
>  Issue Type: Improvement
>  Components: core, tephra
>Affects Versions: 5.1.0
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
> Fix For: 5.1.0
>
> Attachments: PHOENIX-6064.master.v1.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Tephra has and old Guava dependency, that cannot be removed due to Twill 
> depending on it. Removing the Twill dependency from Tephra is possible, but 
> not trivial. 
> This Guava has CVEs, which will show up in static analysis tools, which will 
> cause some potential users not to adopt Phoenix.
> Provide an option to build Phoenix without Tephra, and its problematic 
> dependencies.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (PHOENIX-6608) DISCUSS: Rethink MapReduce split generation

2021-12-10 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17457296#comment-17457296
 ] 

Lars Hofhansl commented on PHOENIX-6608:


Yeah, if each worker uses a JDBC client and replans the query, that's a 
problem. Maybe not the right place here... But how does each worker know what 
part of the overall query it should execute?

Could look at how exactly the guidepost table is queried. Does it always scan 
along the PK? If not we're doing full scans.

There is clearly some work to do in Phoenix in this in general.

The notion that scans are starting when they are placed into the task queue is 
insane. It might buy a little bit on latency for small queries since we start 
scanning before the iterators are scheduled in Phoenix, but for large queries 
it's a disaster... Imagine a 5000 scans start fetching up to 2mb (~4mb on the 
heap)... You'd have 20GB of fairly useless prefetched results for Phoenix 
iterators that are not even scheduled to run.

Anyway. Once the Trino patch is committed, this prefetching is completely 
disabled there.

> DISCUSS: Rethink MapReduce split generation
> ---
>
> Key: PHOENIX-6608
> URL: https://issues.apache.org/jira/browse/PHOENIX-6608
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Lars Hofhansl
>Priority: Major
>
> I just ran into an issue with Trino, which uses Phoenix' M/R integration to 
> generate splits for its worker nodes.
> See: [https://github.com/trinodb/trino/issues/10143]
> And a fix: [https://github.com/trinodb/trino/pull/10153]
> In short the issue is that with large data size and guideposts enabled 
> (default) Phoenix' RoundRobinResultIterator starts scanning when tasks are 
> submitted to the queue. For large datasets (per client) this fills the heap 
> with pre-fetches HBase result objects.
> MapReduce (and Spark) integrations have presumably the same issue.
> My proposed solution is instead of allowing Phoenix to do intra-split 
> parallelism we create more splits (the fix above groups 20 scans into a split 
> - 20 turned out to be a good number).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] (PHOENIX-6604) Allow using indexes for wildcard topN queries on salted tables

2021-12-08 Thread Lars Hofhansl (Jira)



[ https://issues.apache.org/jira/browse/PHOENIX-6604 ]


Lars Hofhansl deleted comment on PHOENIX-6604:


was (Author: githubbot):
lhofhansl edited a comment on pull request #1362:
URL: https://github.com/apache/phoenix/pull/1362#issuecomment-989012434


   Sorry, I've been away for a while, What do I need to do to trigger the test 
run and link this PR to the Jira issue?
   
   Edit: looks like I forgot the - the jira number.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@phoenix.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Allow using indexes for wildcard topN queries on salted tables
> --
>
> Key: PHOENIX-6604
> URL: https://issues.apache.org/jira/browse/PHOENIX-6604
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 5.1.2
>Reporter: Lars Hofhansl
>Priority: Major
> Fix For: 5.1.3
>
> Attachments: 6604-1.5.1.3, 6604.5.1.3
>
>
> Just randomly came across this, playing with TPCH data.
> {code:java}
> CREATE TABLE lineitem (
>  orderkey bigint not null,
>  partkey bigint,
>  suppkey bigint,
>  linenumber integer not null,
>  quantity double,
>  extendedprice double,
>  discount double,
>  tax double,
>  returnflag varchar(1),
>  linestatus varchar(1),
>  shipdate date,
>  commitdate date,
>  receiptdate date,
>  shipinstruct varchar(25),
>  shipmode varchar(10),
>  comment varchar(44)
>  constraint pk primary key(orderkey, linenumber)) 
> IMMUTABLE_ROWS=true,SALT_BUCKETS=4;
> CREATE LOCAL INDEX l_shipdate ON lineitem(shipdate);{code}
> Now:
> {code:java}
>  > explain select * from lineitem order by shipdate limit 1;
> +---+
> |                                          PLAN                               
>       |
> +---+
> | CLIENT 199-CHUNK 8859938 ROWS 2044738843 BYTES PARALLEL 199-WAY FULL SCAN 
> OVER LI |
> |     SERVER TOP 1 ROW SORTED BY [SHIPDATE]                                   
>       |
> | CLIENT MERGE SORT                                                           
>       |
> | CLIENT LIMIT 1                                                              
>       |
> +---+
> 4 rows selected (6.525 seconds)
> -- SAME COLUMNS!
> > explain select ORDERKEY, PARTKEY, SUPPKEY, LINENUMBER, QUANTITY, 
> > EXTENDEDPRICE, DISCOUNT, TAX, RETURNFLAG, LINESTATUS, SHIPDATE, COMMITDATE, 
> > RECEIPTDATE, SHIPINSTRUCT, SHIPMODE, COMMENT from lineitem order by 
> > shipdate limit 1;
> +---+
> |                                                                             
>       |
> +---+
> | CLIENT 4-CHUNK 4 ROWS 204 BYTES PARALLEL 4-WAY RANGE SCAN OVER LINEITEM [1] 
>       |
> |     SERVER MERGE [0.PARTKEY, 0.SUPPKEY, 0.QUANTITY, 0.EXTENDEDPRICE, 
> 0.DISCOUNT,  |
> |     SERVER FILTER BY FIRST KEY ONLY                                         
>       |
> |     SERVER 1 ROW LIMIT                                                      
>       |
> | CLIENT MERGE SORT                                                           
>       |
> | CLIENT 1 ROW LIMIT                                                          
>       |
> +---+
> 6 rows selected (2.736 seconds){code}
>  
> The same happens with a covered global index.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (PHOENIX-6608) DISCUSS: Rethink MapReduce split generation

2021-12-08 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17455885#comment-17455885
 ] 

Lars Hofhansl commented on PHOENIX-6608:


What kind of worker? Is that some custom worker with a JDBC client?

An M/R job or Trino job is only planned once, right? So that should not be a 
problem there...?

Hopefully the worker do not need to re-load the stats. That would be another 
bug.

 

> DISCUSS: Rethink MapReduce split generation
> ---
>
> Key: PHOENIX-6608
> URL: https://issues.apache.org/jira/browse/PHOENIX-6608
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Lars Hofhansl
>Priority: Major
>
> I just ran into an issue with Trino, which uses Phoenix' M/R integration to 
> generate splits for its worker nodes.
> See: [https://github.com/trinodb/trino/issues/10143]
> And a fix: [https://github.com/trinodb/trino/pull/10153]
> In short the issue is that with large data size and guideposts enabled 
> (default) Phoenix' RoundRobinResultIterator starts scanning when tasks are 
> submitted to the queue. For large datasets (per client) this fills the heap 
> with pre-fetches HBase result objects.
> MapReduce (and Spark) integrations have presumably the same issue.
> My proposed solution is instead of allowing Phoenix to do intra-split 
> parallelism we create more splits (the fix above groups 20 scans into a split 
> - 20 turned out to be a good number).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (PHOENIX-6604) Index not used for wildcard topN query on salted table

2021-12-04 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17453452#comment-17453452
 ] 

Lars Hofhansl commented on PHOENIX-6604:


Is anyone looking at Jira anyore?

> Index not used for wildcard topN query on salted table
> --
>
> Key: PHOENIX-6604
> URL: https://issues.apache.org/jira/browse/PHOENIX-6604
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 5.1.2
>Reporter: Lars Hofhansl
>Priority: Major
> Fix For: 5.1.3
>
> Attachments: 6604-1.5.1.3, 6604.5.1.3
>
>
> Just randomly came across this, playing with TPCH data.
> {code:java}
> CREATE TABLE lineitem (
>  orderkey bigint not null,
>  partkey bigint,
>  suppkey bigint,
>  linenumber integer not null,
>  quantity double,
>  extendedprice double,
>  discount double,
>  tax double,
>  returnflag varchar(1),
>  linestatus varchar(1),
>  shipdate date,
>  commitdate date,
>  receiptdate date,
>  shipinstruct varchar(25),
>  shipmode varchar(10),
>  comment varchar(44)
>  constraint pk primary key(orderkey, linenumber)) 
> IMMUTABLE_ROWS=true,SALT_BUCKETS=4;
> CREATE LOCAL INDEX l_shipdate ON lineitem(shipdate);{code}
> Now:
> {code:java}
>  > explain select * from lineitem order by shipdate limit 1;
> +---+
> |                                          PLAN                               
>       |
> +---+
> | CLIENT 199-CHUNK 8859938 ROWS 2044738843 BYTES PARALLEL 199-WAY FULL SCAN 
> OVER LI |
> |     SERVER TOP 1 ROW SORTED BY [SHIPDATE]                                   
>       |
> | CLIENT MERGE SORT                                                           
>       |
> | CLIENT LIMIT 1                                                              
>       |
> +---+
> 4 rows selected (6.525 seconds)
> -- SAME COLUMNS!
> > explain select ORDERKEY, PARTKEY, SUPPKEY, LINENUMBER, QUANTITY, 
> > EXTENDEDPRICE, DISCOUNT, TAX, RETURNFLAG, LINESTATUS, SHIPDATE, COMMITDATE, 
> > RECEIPTDATE, SHIPINSTRUCT, SHIPMODE, COMMENT from lineitem order by 
> > shipdate limit 1;
> +---+
> |                                                                             
>       |
> +---+
> | CLIENT 4-CHUNK 4 ROWS 204 BYTES PARALLEL 4-WAY RANGE SCAN OVER LINEITEM [1] 
>       |
> |     SERVER MERGE [0.PARTKEY, 0.SUPPKEY, 0.QUANTITY, 0.EXTENDEDPRICE, 
> 0.DISCOUNT,  |
> |     SERVER FILTER BY FIRST KEY ONLY                                         
>       |
> |     SERVER 1 ROW LIMIT                                                      
>       |
> | CLIENT MERGE SORT                                                           
>       |
> | CLIENT 1 ROW LIMIT                                                          
>       |
> +---+
> 6 rows selected (2.736 seconds){code}
>  
> The same happens with a covered global index.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (PHOENIX-6604) Index not used for wildcard topN query on salted table

2021-11-25 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17449277#comment-17449277
 ] 

Lars Hofhansl commented on PHOENIX-6604:


With a test. (Can't run tests locally, but I think this is the right test).

 

> Index not used for wildcard topN query on salted table
> --
>
> Key: PHOENIX-6604
> URL: https://issues.apache.org/jira/browse/PHOENIX-6604
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 5.1.2
>Reporter: Lars Hofhansl
>Priority: Major
> Fix For: 5.1.3
>
> Attachments: 6604-1.5.1.3, 6604.5.1.3
>
>
> Just randomly came across this, playing with TPCH data.
> {code:java}
> CREATE TABLE lineitem (
>  orderkey bigint not null,
>  partkey bigint,
>  suppkey bigint,
>  linenumber integer not null,
>  quantity double,
>  extendedprice double,
>  discount double,
>  tax double,
>  returnflag varchar(1),
>  linestatus varchar(1),
>  shipdate date,
>  commitdate date,
>  receiptdate date,
>  shipinstruct varchar(25),
>  shipmode varchar(10),
>  comment varchar(44)
>  constraint pk primary key(orderkey, linenumber)) 
> IMMUTABLE_ROWS=true,SALT_BUCKETS=4;{code}
> Now:
> {code:java}
>  > explain select * from lineitem order by shipdate limit 1;
> +---+
> |                                          PLAN                               
>       |
> +---+
> | CLIENT 199-CHUNK 8859938 ROWS 2044738843 BYTES PARALLEL 199-WAY FULL SCAN 
> OVER LI |
> |     SERVER TOP 1 ROW SORTED BY [SHIPDATE]                                   
>       |
> | CLIENT MERGE SORT                                                           
>       |
> | CLIENT LIMIT 1                                                              
>       |
> +---+
> 4 rows selected (6.525 seconds)
> -- SAME COLUMNS!
> > explain select ORDERKEY, PARTKEY, SUPPKEY, LINENUMBER, QUANTITY, 
> > EXTENDEDPRICE, DISCOUNT, TAX, RETURNFLAG, LINESTATUS, SHIPDATE, COMMITDATE, 
> > RECEIPTDATE, SHIPINSTRUCT, SHIPMODE, COMMENT from lineitem order by 
> > shipdate limit 1;
> +---+
> |                                                                             
>       |
> +---+
> | CLIENT 4-CHUNK 4 ROWS 204 BYTES PARALLEL 4-WAY RANGE SCAN OVER LINEITEM [1] 
>       |
> |     SERVER MERGE [0.PARTKEY, 0.SUPPKEY, 0.QUANTITY, 0.EXTENDEDPRICE, 
> 0.DISCOUNT,  |
> |     SERVER FILTER BY FIRST KEY ONLY                                         
>       |
> |     SERVER 1 ROW LIMIT                                                      
>       |
> | CLIENT MERGE SORT                                                           
>       |
> | CLIENT 1 ROW LIMIT                                                          
>       |
> +---+
> 6 rows selected (2.736 seconds){code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (PHOENIX-6604) Index not used for wildcard topN query on salted table

2021-11-24 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17448966#comment-17448966
 ] 

Lars Hofhansl commented on PHOENIX-6604:


Actually attached patch fixes it for me. Should be safe, but not 100% sure 
about side effects.

> Index not used for wildcard topN query on salted table
> --
>
> Key: PHOENIX-6604
> URL: https://issues.apache.org/jira/browse/PHOENIX-6604
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 5.1.2
>Reporter: Lars Hofhansl
>Priority: Major
> Fix For: 5.1.3
>
> Attachments: 6604.5.1.3
>
>
> Just randomly came across this, playing with TPCH data.
> {code:java}
> CREATE TABLE lineitem (
>  orderkey bigint not null,
>  partkey bigint,
>  suppkey bigint,
>  linenumber integer not null,
>  quantity double,
>  extendedprice double,
>  discount double,
>  tax double,
>  returnflag varchar(1),
>  linestatus varchar(1),
>  shipdate date,
>  commitdate date,
>  receiptdate date,
>  shipinstruct varchar(25),
>  shipmode varchar(10),
>  comment varchar(44)
>  constraint pk primary key(orderkey, linenumber)) 
> IMMUTABLE_ROWS=true,SALT_BUCKETS=4;{code}
> Now:
> {code:java}
>  > explain select * from lineitem order by shipdate limit 1;
> +---+
> |                                          PLAN                               
>       |
> +---+
> | CLIENT 199-CHUNK 8859938 ROWS 2044738843 BYTES PARALLEL 199-WAY FULL SCAN 
> OVER LI |
> |     SERVER TOP 1 ROW SORTED BY [SHIPDATE]                                   
>       |
> | CLIENT MERGE SORT                                                           
>       |
> | CLIENT LIMIT 1                                                              
>       |
> +---+
> 4 rows selected (6.525 seconds)
> -- SAME COLUMNS!
> > explain select ORDERKEY, PARTKEY, SUPPKEY, LINENUMBER, QUANTITY, 
> > EXTENDEDPRICE, DISCOUNT, TAX, RETURNFLAG, LINESTATUS, SHIPDATE, COMMITDATE, 
> > RECEIPTDATE, SHIPINSTRUCT, SHIPMODE, COMMENT from lineitem order by 
> > shipdate limit 1;
> +---+
> |                                                                             
>       |
> +---+
> | CLIENT 4-CHUNK 4 ROWS 204 BYTES PARALLEL 4-WAY RANGE SCAN OVER LINEITEM [1] 
>       |
> |     SERVER MERGE [0.PARTKEY, 0.SUPPKEY, 0.QUANTITY, 0.EXTENDEDPRICE, 
> 0.DISCOUNT,  |
> |     SERVER FILTER BY FIRST KEY ONLY                                         
>       |
> |     SERVER 1 ROW LIMIT                                                      
>       |
> | CLIENT MERGE SORT                                                           
>       |
> | CLIENT 1 ROW LIMIT                                                          
>       |
> +---+
> 6 rows selected (2.736 seconds){code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (PHOENIX-6604) Index not used for wildcard topN query on salted table

2021-11-24 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17448958#comment-17448958
 ] 

Lars Hofhansl commented on PHOENIX-6604:


The same happens with a fully covered global index.

[~kozdemir] , FYI.

> Index not used for wildcard topN query on salted table
> --
>
> Key: PHOENIX-6604
> URL: https://issues.apache.org/jira/browse/PHOENIX-6604
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 5.1.2
>Reporter: Lars Hofhansl
>Priority: Major
> Fix For: 5.1.3
>
>
> Just randomly came across this, playing with TPCH data.
> {code:java}
> CREATE TABLE lineitem (
>  orderkey bigint not null,
>  partkey bigint,
>  suppkey bigint,
>  linenumber integer not null,
>  quantity double,
>  extendedprice double,
>  discount double,
>  tax double,
>  returnflag varchar(1),
>  linestatus varchar(1),
>  shipdate date,
>  commitdate date,
>  receiptdate date,
>  shipinstruct varchar(25),
>  shipmode varchar(10),
>  comment varchar(44)
>  constraint pk primary key(orderkey, linenumber)) 
> IMMUTABLE_ROWS=true,SALT_BUCKETS=4;{code}
> Now:
> {code:java}
>  > explain select * from lineitem order by shipdate limit 1;
> +---+
> |                                          PLAN                               
>       |
> +---+
> | CLIENT 199-CHUNK 8859938 ROWS 2044738843 BYTES PARALLEL 199-WAY FULL SCAN 
> OVER LI |
> |     SERVER TOP 1 ROW SORTED BY [SHIPDATE]                                   
>       |
> | CLIENT MERGE SORT                                                           
>       |
> | CLIENT LIMIT 1                                                              
>       |
> +---+
> 4 rows selected (6.525 seconds)
> -- SAME COLUMNS!
> > explain select ORDERKEY, PARTKEY, SUPPKEY, LINENUMBER, QUANTITY, 
> > EXTENDEDPRICE, DISCOUNT, TAX, RETURNFLAG, LINESTATUS, SHIPDATE, COMMITDATE, 
> > RECEIPTDATE, SHIPINSTRUCT, SHIPMODE, COMMENT from lineitem order by 
> > shipdate limit 1;
> +---+
> |                                                                             
>       |
> +---+
> | CLIENT 4-CHUNK 4 ROWS 204 BYTES PARALLEL 4-WAY RANGE SCAN OVER LINEITEM [1] 
>       |
> |     SERVER MERGE [0.PARTKEY, 0.SUPPKEY, 0.QUANTITY, 0.EXTENDEDPRICE, 
> 0.DISCOUNT,  |
> |     SERVER FILTER BY FIRST KEY ONLY                                         
>       |
> |     SERVER 1 ROW LIMIT                                                      
>       |
> | CLIENT MERGE SORT                                                           
>       |
> | CLIENT 1 ROW LIMIT                                                          
>       |
> +---+
> 6 rows selected (2.736 seconds){code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (PHOENIX-6604) Local index not used for wildcard topN query on salted table

2021-11-24 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17448953#comment-17448953
 ] 

Lars Hofhansl commented on PHOENIX-6604:


I likely won't have time to track it down. Just leaving it here.

Local indexes are especially to allow for cheap sorting with an uncovered (i.e. 
small) index.

{{SELECT *}} fails with it attempts to rewrite the query, with an unknown 
column called: {{L_SHIPDATE.:_SALT}}

> Local index not used for wildcard topN query on salted table
> 
>
> Key: PHOENIX-6604
> URL: https://issues.apache.org/jira/browse/PHOENIX-6604
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Priority: Major
> Fix For: 5.1.3
>
>
> Just randomly came across this, playing with TPCH data.
> {code:java}
> CREATE TABLE lineitem (
>  orderkey bigint not null,
>  partkey bigint,
>  suppkey bigint,
>  linenumber integer not null,
>  quantity double,
>  extendedprice double,
>  discount double,
>  tax double,
>  returnflag varchar(1),
>  linestatus varchar(1),
>  shipdate date,
>  commitdate date,
>  receiptdate date,
>  shipinstruct varchar(25),
>  shipmode varchar(10),
>  comment varchar(44)
>  constraint pk primary key(orderkey, linenumber)) 
> DATA_BLOCK_ENCODING='ROW_INDEX_V1', COMPRESSION='ZSTD', DISABLE_WAL=true, 
> IMMUTABLE_ROWS=true,SALT_BUCKETS=4;{code}
> Now:
> {code:java}
>  > explain select * from lineitem order by shipdate limit 1;
> +---+
> |                                          PLAN                               
>       |
> +---+
> | CLIENT 199-CHUNK 8859938 ROWS 2044738843 BYTES PARALLEL 199-WAY FULL SCAN 
> OVER LI |
> |     SERVER TOP 1 ROW SORTED BY [SHIPDATE]                                   
>       |
> | CLIENT MERGE SORT                                                           
>       |
> | CLIENT LIMIT 1                                                              
>       |
> +---+
> 4 rows selected (6.525 seconds)
> -- SAME COLUMNS!
> > explain select ORDERKEY, PARTKEY, SUPPKEY, LINENUMBER, QUANTITY, 
> > EXTENDEDPRICE, DISCOUNT, TAX, RETURNFLAG, LINESTATUS, SHIPDATE, COMMITDATE, 
> > RECEIPTDATE, SHIPINSTRUCT, SHIPMODE, COMMENT from lineitem order by 
> > shipdate limit 1;
> +---+
> |                                                                             
>       |
> +---+
> | CLIENT 4-CHUNK 4 ROWS 204 BYTES PARALLEL 4-WAY RANGE SCAN OVER LINEITEM [1] 
>       |
> |     SERVER MERGE [0.PARTKEY, 0.SUPPKEY, 0.QUANTITY, 0.EXTENDEDPRICE, 
> 0.DISCOUNT,  |
> |     SERVER FILTER BY FIRST KEY ONLY                                         
>       |
> |     SERVER 1 ROW LIMIT                                                      
>       |
> | CLIENT MERGE SORT                                                           
>       |
> | CLIENT 1 ROW LIMIT                                                          
>       |
> +---+
> 6 rows selected (2.736 seconds){code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (PHOENIX-6444) Extend Cell Tags to Delete object for Indexer coproc

2021-05-27 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17352930#comment-17352930
 ] 

Lars Hofhansl commented on PHOENIX-6444:


If the master change applies (mostly) cleanly, we can just cherry-pick the 
change into 5.1 without a separate PR.

> Extend Cell Tags to Delete object for Indexer coproc
> 
>
> Key: PHOENIX-6444
> URL: https://issues.apache.org/jira/browse/PHOENIX-6444
> Project: Phoenix
>  Issue Type: Improvement
>  Components: core
>Reporter: Rushabh Shah
>Assignee: Rushabh Shah
>Priority: Major
> Fix For: 4.17.0, 5.2.0
>
>
> In PHOENIX-6213 we added support for adding source of operation cell tag to 
> Delete Markers. But we added the logic to create TagRewriteCell and add it to 
> DeleteMarker only in IndexRegionObserver coproc. I missed adding the same 
> logic to Indexer coproc. Thank you [~tkhurana] for finding this bug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (PHOENIX-5639) Exception during DROP TABLE CASCADE with VIEW and INDEX in phoenix_sandbox

2021-05-26 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-5639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17352006#comment-17352006
 ] 

Lars Hofhansl commented on PHOENIX-5639:


Removing release targets of a 18 months old issue.

> Exception during DROP TABLE CASCADE with VIEW and INDEX in phoenix_sandbox
> --
>
> Key: PHOENIX-5639
> URL: https://issues.apache.org/jira/browse/PHOENIX-5639
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Priority: Trivial
>
> {code:java}
> > CREATE TABLE TEST_5 (ID INTEGER NOT NULL PRIMARY KEY, HOST VARCHAR(10), 
> > FLAG BOOLEAN);
> > CREATE VIEW TEST_5_VIEW (col1 INTEGER, col2 INTEGER, col3 INTEGER, col4 
> > INTEGER, col5 INTEGER) AS SELECT * FROM TEST_5 WHERE ID>10;
> > CREATE INDEX TEST_5_INDEX ON TEST_5_VIEW(COL4);
> > DROP TABLE test_5 CASCASE;{code}
> Table, view, and index are dropped, but in the sandbox' log I see:
> {code:java}
> 19/12/18 07:17:41 WARN iterate.BaseResultIterators: Unable to find parent 
> table "TEST_5_VIEW" of table "TEST_5_INDEX" to determine 
> USE_STATS_FOR_PARALLELIZATION
> org.apache.phoenix.schema.TableNotFoundException: ERROR 1012 (42M03): Table 
> undefined. tableName=TEST_5_VIEW
> at 
> org.apache.phoenix.schema.PMetaDataImpl.getTableRef(PMetaDataImpl.java:73)
> at 
> org.apache.phoenix.jdbc.PhoenixConnection.getTable(PhoenixConnection.java:584)
> at 
> org.apache.phoenix.mapreduce.util.PhoenixConfigurationUtil.getStatsForParallelizationProp(PhoenixConfigurationUtil.java:712)
> at 
> org.apache.phoenix.iterate.BaseResultIterators.(BaseResultIterators.java:513)
> at 
> org.apache.phoenix.iterate.ParallelIterators.(ParallelIterators.java:62)
> at 
> org.apache.phoenix.iterate.ParallelIterators.(ParallelIterators.java:69)
> at 
> org.apache.phoenix.execute.AggregatePlan.newIterator(AggregatePlan.java:287)
> at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:365)
> at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:217)
> at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:212)
> at 
> org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:207)
> at 
> org.apache.phoenix.compile.PostDDLCompiler$PostDDLMutationPlan.execute(PostDDLCompiler.java:273)
> at 
> org.apache.phoenix.query.ConnectionQueryServicesImpl.updateData(ConnectionQueryServicesImpl.java:4426)
> at 
> org.apache.phoenix.query.DelegateConnectionQueryServices.updateData(DelegateConnectionQueryServices.java:166)
> at 
> org.apache.phoenix.schema.MetaDataClient.dropTable(MetaDataClient.java:3238)
> at 
> org.apache.phoenix.schema.MetaDataClient.dropTable(MetaDataClient.java:3055)
> at org.apache.phoenix.util.ViewUtil.dropChildViews(ViewUtil.java:218)
> at 
> org.apache.phoenix.coprocessor.tasks.DropChildViewsTask.run(DropChildViewsTask.java:63)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.phoenix.coprocessor.TaskRegionObserver$SelfHealingTask.run(TaskRegionObserver.java:203)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {code}
> Note that this only happens in the sandbox, so it's not important to fix.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (PHOENIX-6437) Delete marker for parent-child rows does not get replicated via SystemCatalogWALEntryFilter

2021-05-26 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17352003#comment-17352003
 ] 

Lars Hofhansl commented on PHOENIX-6437:


I cherry-picked it into branch-5.1.

> Delete marker for parent-child rows does not get replicated via 
> SystemCatalogWALEntryFilter
> ---
>
> Key: PHOENIX-6437
> URL: https://issues.apache.org/jira/browse/PHOENIX-6437
> Project: Phoenix
>  Issue Type: Bug
>  Components: core
>Reporter: Ankit Jain
>Assignee: Ankit Jain
>Priority: Major
> Fix For: 4.17.0, 5.2.0, 5.1.2, 4.16.2
>
>
> As part of PHOENIX-3639  SystemCatalogWALEntryFilter was introduced to 
> replicate tenant owned rows from system.catalog and ignore the non-tenant 
> rows. During recent testing it was realized that delete markers for 
> parent-child rows does not get replicated. As part of this Jira we want to 
> fix that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (PHOENIX-6435) Fix ViewTTLIT test flapper

2021-05-26 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351996#comment-17351996
 ] 

Lars Hofhansl commented on PHOENIX-6435:


Done.

> Fix ViewTTLIT test flapper
> --
>
> Key: PHOENIX-6435
> URL: https://issues.apache.org/jira/browse/PHOENIX-6435
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Xinyi Yan
>Assignee: Xinyi Yan
>Priority: Blocker
> Fix For: 4.16.1, 4.17.0, 5.2.0, 5.1.2
>
>
> [ERROR] Errors:
> [ERROR]   
> PermissionNSDisabledWithCustomAccessControllerIT>BasePermissionsIT.testAutomaticGrantWithIndexAndView:1278->BasePermissionsIT.verifyAllowed:769->BasePermissionsIT.verifyAllowed:776
>  ? UndeclaredThrowable
> [ERROR]   
> PermissionNSEnabledWithCustomAccessControllerIT>BasePermissionsIT.testAutomaticGrantWithIndexAndView:1279->BasePermissionsIT.verifyAllowed:769->BasePermissionsIT.verifyAllowed:776
>  ? UndeclaredThrowable
> [ERROR]   ImmutableIndexIT.testGlobalImmutableIndexDelete:407 ? StackOverflow
>  
> mvn verify failed with the above error. We have to address and fix test 
> flappers before releasing the next 4.16.1RC.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (PHOENIX-6435) Fix ViewTTLIT test flapper

2021-05-26 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351995#comment-17351995
 ] 

Lars Hofhansl commented on PHOENIX-6435:


I'm going to cherry-pick this into 5.1.

> Fix ViewTTLIT test flapper
> --
>
> Key: PHOENIX-6435
> URL: https://issues.apache.org/jira/browse/PHOENIX-6435
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Xinyi Yan
>Assignee: Xinyi Yan
>Priority: Blocker
> Fix For: 4.16.1, 4.17.0, 5.2.0
>
>
> [ERROR] Errors:
> [ERROR]   
> PermissionNSDisabledWithCustomAccessControllerIT>BasePermissionsIT.testAutomaticGrantWithIndexAndView:1278->BasePermissionsIT.verifyAllowed:769->BasePermissionsIT.verifyAllowed:776
>  ? UndeclaredThrowable
> [ERROR]   
> PermissionNSEnabledWithCustomAccessControllerIT>BasePermissionsIT.testAutomaticGrantWithIndexAndView:1279->BasePermissionsIT.verifyAllowed:769->BasePermissionsIT.verifyAllowed:776
>  ? UndeclaredThrowable
> [ERROR]   ImmutableIndexIT.testGlobalImmutableIndexDelete:407 ? StackOverflow
>  
> mvn verify failed with the above error. We have to address and fix test 
> flappers before releasing the next 4.16.1RC.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (PHOENIX-6437) Delete marker for parent-child rows does not get replicated via SystemCatalogWALEntryFilter

2021-05-26 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351994#comment-17351994
 ] 

Lars Hofhansl commented on PHOENIX-6437:


What about 5.1?

Since it's in 4.16.2 it should be in 5.1.2 as well, right?

> Delete marker for parent-child rows does not get replicated via 
> SystemCatalogWALEntryFilter
> ---
>
> Key: PHOENIX-6437
> URL: https://issues.apache.org/jira/browse/PHOENIX-6437
> Project: Phoenix
>  Issue Type: Bug
>  Components: core
>Reporter: Ankit Jain
>Assignee: Ankit Jain
>Priority: Major
> Fix For: 4.17.0, 5.2.0, 4.16.2
>
>
> As part of PHOENIX-3639  SystemCatalogWALEntryFilter was introduced to 
> replicate tenant owned rows from system.catalog and ignore the non-tenant 
> rows. During recent testing it was realized that delete markers for 
> parent-child rows does not get replicated. As part of this Jira we want to 
> fix that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (PHOENIX-6447) Add support for SYSTEM.CHILD_LINK table in systemcatalogwalentryfilter

2021-05-26 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351991#comment-17351991
 ] 

Lars Hofhansl commented on PHOENIX-6447:


I cherry-picked the change in. All done. All good :)

> Add support for SYSTEM.CHILD_LINK table in systemcatalogwalentryfilter
> --
>
> Key: PHOENIX-6447
> URL: https://issues.apache.org/jira/browse/PHOENIX-6447
> Project: Phoenix
>  Issue Type: Bug
>  Components: core
>Reporter: Sandeep Pal
>Assignee: Sandeep Pal
>Priority: Major
> Fix For: 4.17.0, 5.2.0, 5.1.2, 4.16.2
>
>
> In order to replicate system tables, we have a special filter for system 
> catalog table to just replicate tenant owner data in order NOT to mess up the 
> system catalog at the sink cluster. In 4.16, there is a new table getting 
> added (SYSTEM.CHILD_LINK) which will not be replicated completely from our 
> existing filter. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (PHOENIX-6447) Add support for SYSTEM.CHILD_LINK table in systemcatalogwalentryfilter

2021-05-26 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351988#comment-17351988
 ] 

Lars Hofhansl commented on PHOENIX-6447:


[~sandeep.pal] did you get a chance yet? If the change applies you can also 
just cherry-pick from master into the 5.1 branch.

 

> Add support for SYSTEM.CHILD_LINK table in systemcatalogwalentryfilter
> --
>
> Key: PHOENIX-6447
> URL: https://issues.apache.org/jira/browse/PHOENIX-6447
> Project: Phoenix
>  Issue Type: Bug
>  Components: core
>Reporter: Sandeep Pal
>Assignee: Sandeep Pal
>Priority: Major
> Fix For: 4.17.0, 5.2.0, 4.16.2
>
>
> In order to replicate system tables, we have a special filter for system 
> catalog table to just replicate tenant owner data in order NOT to mess up the 
> system catalog at the sink cluster. In 4.16, there is a new table getting 
> added (SYSTEM.CHILD_LINK) which will not be replicated completely from our 
> existing filter. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (PHOENIX-6435) Fix ViewTTLIT test flapper

2021-05-14 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344834#comment-17344834
 ] 

Lars Hofhansl commented on PHOENIX-6435:


5.1.2? (branch 5.1)

> Fix ViewTTLIT test flapper
> --
>
> Key: PHOENIX-6435
> URL: https://issues.apache.org/jira/browse/PHOENIX-6435
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Xinyi Yan
>Assignee: Xinyi Yan
>Priority: Blocker
> Fix For: 4.16.1, 4.17.0, 5.2.0
>
>
> [ERROR] Errors:
> [ERROR]   
> PermissionNSDisabledWithCustomAccessControllerIT>BasePermissionsIT.testAutomaticGrantWithIndexAndView:1278->BasePermissionsIT.verifyAllowed:769->BasePermissionsIT.verifyAllowed:776
>  ? UndeclaredThrowable
> [ERROR]   
> PermissionNSEnabledWithCustomAccessControllerIT>BasePermissionsIT.testAutomaticGrantWithIndexAndView:1279->BasePermissionsIT.verifyAllowed:769->BasePermissionsIT.verifyAllowed:776
>  ? UndeclaredThrowable
> [ERROR]   ImmutableIndexIT.testGlobalImmutableIndexDelete:407 ? StackOverflow
>  
> mvn verify failed with the above error. We have to address and fix test 
> flappers before releasing the next 4.16.1RC.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (PHOENIX-6420) Wrong result when conditional and regular upserts are passed in the same commit batch

2021-05-14 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344833#comment-17344833
 ] 

Lars Hofhansl commented on PHOENIX-6420:


Looks like this should be merged into branch 5.1 (for 5.1.2) as well, right?

> Wrong result when conditional and regular upserts are passed in the same 
> commit batch
> -
>
> Key: PHOENIX-6420
> URL: https://issues.apache.org/jira/browse/PHOENIX-6420
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 5.0.0, 4.16.0
>Reporter: Tanuj Khurana
>Assignee: Tanuj Khurana
>Priority: Major
> Fix For: 4.17.0, 5.2.0
>
> Attachments: PHOENIX-6420.patch
>
>
> Consider this example:
> {code:java}
> CREATE TABLE T1 (k integer not null primary key, v1 bigint, v2 bigint);
> {code}
> Now consider this batch:
> {code:java}
> UPSERT INTO T1 VALUES(0,0,1);
> UPSERT INTO T1 VALUES(0,1,1) ON DUPLICATE KEY UPDATE v1 = v1 + 2;
> commit();
> {code}
> Expected row state: 0, 2, 1
> Actual: 0, 2, 0
> The value of the column (v2) not updated in the conditional expression 
> remains default. It's value should have been the one set in the regular 
> upsert in the batch.
>  Now, the row exists. Consider another batch of updates
> {code:java}
> UPSERT INTO T1 VALUES(0, 7, 4);
> UPSERT INTO T1 VALUES(0,1,1) ON DUPLICATE KEY UPDATE v1 = v1 + 2;
> commit();
> {code}
> Expected row state: 0,2,1  -> 0, 9, 4
> Actual: 0,2,0 -> 0, 4, 0
> The conditional update expression is evaluated and applied on the row state 
> already committed instead of on the regular update in the same batch. Also, 
> v2 still remains 0 (the default value).
>  Now consider the case of a partial regular update following a conditional 
> update:
> {code:java}
> UPSERT INTO T1 (k, v2) VALUES(0,100) ON DUPLICATE KEY UPDATE v1 = v1 + 2;
> UPSERT INTO T1 (k, v2) VALUES (0,125);
> commit();
> {code}
> Expected row state: 0, 9, 4 -> 0, 11, 125
> Actual: 0, 4, 0 -> 0, 4, 125
> Only the regular update is applied and the conditional update is completely 
> ignored.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (PHOENIX-6395) Reusing Connection instance object instead of creating everytime in PhoenixAccessController class.

2021-05-14 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344831#comment-17344831
 ] 

Lars Hofhansl commented on PHOENIX-6395:


What about branch 5.1 (5.1.2)?

> Reusing Connection instance object instead of creating everytime in 
> PhoenixAccessController class. 
> ---
>
> Key: PHOENIX-6395
> URL: https://issues.apache.org/jira/browse/PHOENIX-6395
> Project: Phoenix
>  Issue Type: Bug
>Reporter: vikas meka
>Assignee: vikas meka
>Priority: Major
> Fix For: 4.17.0, 5.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (PHOENIX-6271) Effective DDL generated by SchemaExtractionTool should maintain the order of PK and other columns

2021-05-14 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344830#comment-17344830
 ] 

Lars Hofhansl commented on PHOENIX-6271:


Looks like this be merged into 5.1.2 (branch 5.1) as well, no?

> Effective DDL generated by SchemaExtractionTool should maintain the order of 
> PK and other columns
> -
>
> Key: PHOENIX-6271
> URL: https://issues.apache.org/jira/browse/PHOENIX-6271
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Swaroopa Kadam
>Assignee: Swaroopa Kadam
>Priority: Minor
> Fix For: 4.17.0, 5.2.0
>
>
> SchemaExtractionTool is used to generate effective DDL which can be then 
> compared with the DDL on the cluster to perform schema monitoring. 
> This won't affect the monitoring part but would be good to have the PR order 
> in place so that effective DDL can be used for creating the entity for the 
> first time in a new environment.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (PHOENIX-6227) Option for DDL changes to export to external schema repository

2021-05-14 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344826#comment-17344826
 ] 

Lars Hofhansl commented on PHOENIX-6227:


Should this be in 5.1.2 as well?

> Option for DDL changes to export to external schema repository
> --
>
> Key: PHOENIX-6227
> URL: https://issues.apache.org/jira/browse/PHOENIX-6227
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Geoffrey Jacoby
>Assignee: Geoffrey Jacoby
>Priority: Major
> Fix For: 4.17.0, 5.2.0
>
>
> When a user creates or drops a table or view, or adds/removes a column from 
> one, there should be the option for Phoenix to notify an external schema 
> repository. This should be a configurable plugin so that core Phoenix is not 
> coupled to any particular repository implementation. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (PHOENIX-6085) Remove duplicate calls to getSysMutexPhysicalTableNameBytes() during the upgrade path

2021-05-14 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344825#comment-17344825
 ] 

Lars Hofhansl commented on PHOENIX-6085:


And what about Phoenix 5.1.2? (branch 5.1)

> Remove duplicate calls to getSysMutexPhysicalTableNameBytes() during the 
> upgrade path
> -
>
> Key: PHOENIX-6085
> URL: https://issues.apache.org/jira/browse/PHOENIX-6085
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.0.0, 4.15.0
>Reporter: Chinmay Kulkarni
>Assignee: Richárd Antal
>Priority: Minor
>  Labels: phoenix-hardening, quality-improvement
> Fix For: 4.17.0, 5.2.0
>
> Attachments: PHOENIX-6085.4.x.v1.patch, PHOENIX-6085.master.v1.patch
>
>
> We already make this call inside 
> [CQSI.acquireUpgradeMutex()|https://github.com/apache/phoenix/blob/1922895dfe5960dc025709b04acfaf974d3959dc/phoenix-core/src/main/java/org/apache/phoenix/query/ConnectionQueryServicesImpl.java#L4220]
>  and then call writeMutexCell() which calls this again 
> [here|https://github.com/apache/phoenix/blob/1922895dfe5960dc025709b04acfaf974d3959dc/phoenix-core/src/main/java/org/apache/phoenix/query/ConnectionQueryServicesImpl.java#L4244].
>  
> We should move this to inside writeMutexCell() itself and throw 
> UpgradeInProgressException if required there to avoid unnecessary expensive 
> HBase admin API calls.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (PHOENIX-6447) Add support for SYSTEM.CHILD_LINK table in systemcatalogwalentryfilter

2021-05-13 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344242#comment-17344242
 ] 

Lars Hofhansl commented on PHOENIX-6447:


Should this one be in branch-5.1 (i.e. Phoenix 5.1.2)?

> Add support for SYSTEM.CHILD_LINK table in systemcatalogwalentryfilter
> --
>
> Key: PHOENIX-6447
> URL: https://issues.apache.org/jira/browse/PHOENIX-6447
> Project: Phoenix
>  Issue Type: Bug
>  Components: core
>Reporter: Sandeep Pal
>Assignee: Sandeep Pal
>Priority: Major
> Fix For: 4.16.1, 4.17.0, 5.2.0
>
>
> In order to replicate system tables, we have a special filter for system 
> catalog table to just replicate tenant owner data in order NOT to mess up the 
> system catalog at the sink cluster. In 4.16, there is a new table getting 
> added (SYSTEM.CHILD_LINK) which will not be replicated completely from our 
> existing filter. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (PHOENIX-6444) Extend Cell Tags to Delete object for Indexer coproc

2021-05-13 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344244#comment-17344244
 ] 

Lars Hofhansl commented on PHOENIX-6444:


What about branch-5.1 (Phoenix 5.1.2)?

> Extend Cell Tags to Delete object for Indexer coproc
> 
>
> Key: PHOENIX-6444
> URL: https://issues.apache.org/jira/browse/PHOENIX-6444
> Project: Phoenix
>  Issue Type: Improvement
>  Components: core
>Reporter: Rushabh Shah
>Assignee: Rushabh Shah
>Priority: Major
> Fix For: 4.17.0, 5.2.0
>
>
> In PHOENIX-6213 we added support for adding source of operation cell tag to 
> Delete Markers. But we added the logic to create TagRewriteCell and add it to 
> DeleteMarker only in IndexRegionObserver coproc. I missed adding the same 
> logic to Indexer coproc. Thank you [~tkhurana] for finding this bug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (PHOENIX-6454) Add feature to SchemaTool to get the DDL in specification mode

2021-05-13 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344239#comment-17344239
 ] 

Lars Hofhansl commented on PHOENIX-6454:


Should this one be in branch-5.1 (i.e. Phoenix 5.1.2) too?

> Add feature to SchemaTool to get the DDL in specification mode
> --
>
> Key: PHOENIX-6454
> URL: https://issues.apache.org/jira/browse/PHOENIX-6454
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Swaroopa Kadam
>Assignee: Swaroopa Kadam
>Priority: Major
> Fix For: 4.17.0, 5.2.0
>
>
> Currently, SchemExtractionTool uses PTable representation to get the 
> effective DDL on the cluster. 
> Rename SchemaExtractionTool to SchemaTool, add a feature that accepts create 
> DDL and alter DDL to give effective DDL without using PTable implementation. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (PHOENIX-6457) Optionally store schema version string in SYSTEM.CATALOG

2021-05-13 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17344237#comment-17344237
 ] 

Lars Hofhansl commented on PHOENIX-6457:


I think. Thanks [~gjacoby]

> Optionally store schema version string in SYSTEM.CATALOG
> 
>
> Key: PHOENIX-6457
> URL: https://issues.apache.org/jira/browse/PHOENIX-6457
> Project: Phoenix
>  Issue Type: New Feature
>Reporter: Geoffrey Jacoby
>Assignee: Geoffrey Jacoby
>Priority: Major
> Fix For: 4.17.0, 5.2.0
>
>
> In many environments, schema changes to Phoenix tables are applied in batches 
> associated with a version of an application. (For example, v1.0 of an app may 
> start with one set of CREATE statements, v1.1 then adds some ALTER 
> statements, etc.) 
> It can be useful to be able to look up the latest app version in which a 
> table or view was changed; this could potentially be added as a feature of 
> the Schema Tool. 
> This change would add an optional property to CREATE and ALTER statements, 
> SCHEMA_VERSION, which would take a user-supplied string. 
> This is also a pre-req for PHOENIX-6227, because we would want to pass the 
> schema version string, if any, to an external schema repository in 
> environments where we're integrating with one. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (PHOENIX-6434) Secondary Indexes on PHOENIX_ROW_TIMESTAMP()

2021-04-22 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17329519#comment-17329519
 ] 

Lars Hofhansl commented on PHOENIX-6434:


It's resolved, but I do not see it in the 5.1 branch, yet.

> Secondary Indexes on PHOENIX_ROW_TIMESTAMP()
> 
>
> Key: PHOENIX-6434
> URL: https://issues.apache.org/jira/browse/PHOENIX-6434
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.1.0, 4.16.0
>Reporter: Kadir Ozdemir
>Priority: Major
> Attachments: PHOENIX-6434.4.x.001.patch, PHOENIX-6434.4.x.002.patch, 
> PHOENIX-6434.4.x.003.patch, PHOENIX-6434.4.x.004.patch, 
> PHOENIX-6434.master.001.patch, PHOENIX-6434.master.002.patch
>
>
> PHOENIX-5629 introduced the function PHOENIX_ROW_TIMESTAMP() that returns the 
> last modified time of a row. PHOENIX_ROW_TIMESTAMP() can be used as a 
> projection column and referred in a WHERE clause. It is desirable to have 
> indexes on row timestamps. This will result in fast time range queries. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (PHOENIX-6449) Cannot start minicluster in github CI check for connectors

2021-04-22 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17329305#comment-17329305
 ] 

Lars Hofhansl commented on PHOENIX-6449:


(y)

> Cannot start minicluster in github CI check for connectors
> --
>
> Key: PHOENIX-6449
> URL: https://issues.apache.org/jira/browse/PHOENIX-6449
> Project: Phoenix
>  Issue Type: Bug
>  Components: connectors
>Affects Versions: connectors-6.0.0
>Reporter: Istvan Toth
>Assignee: Istvan Toth
>Priority: Major
> Fix For: connectors-6.0.0
>
>
> mvn verify cannot start minicluster when run by github actions.
> It works locally.
> Most likely a problem with name resolution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (PHOENIX-6449) Cannot start minicluster in github CI check for connectors

2021-04-19 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325190#comment-17325190
 ] 

Lars Hofhansl commented on PHOENIX-6449:


We just ran into something similar with Trino when testing the Phoenix 
connector, which also has to start a minicluster.

This might be helpful: [https://github.com/trinodb/trino/pull/7588]

> Cannot start minicluster in github CI check for connectors
> --
>
> Key: PHOENIX-6449
> URL: https://issues.apache.org/jira/browse/PHOENIX-6449
> Project: Phoenix
>  Issue Type: Bug
>  Components: connectors
>Affects Versions: connectors-6.0.0
>Reporter: Istvan Toth
>Priority: Major
>
> mvn verify cannot start minicluster when run by github actions.
> It works locally.
> Most likely a problem with name resolution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (PHOENIX-6448) ConnectionQueryServicesImpl init failure may cause Full GC.

2021-04-19 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17325189#comment-17325189
 ] 

Lars Hofhansl commented on PHOENIX-6448:


With [~kadir]'s client paced iteration, do we even still need the 
lease-renewal, or can we just get rid of it?

> ConnectionQueryServicesImpl init failure may cause Full GC.
> ---
>
> Key: PHOENIX-6448
> URL: https://issues.apache.org/jira/browse/PHOENIX-6448
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Chen Feng
>Priority: Major
>
> in ConnectionQueryServicesImpl.init()
> In some cases(e.g. the user has not permissions to create SYSTEM.CATALOG), 
> there's only LOGGER.WARN and return null directly.
> {code:java}
> // Some comments here
> {
>   ...
>   if (inspectIfAnyExceptionInChain(e, Collections. Exception>> singletonList(AccessDeniedException.class))) {
> // Pass
> LOGGER.warn("Could not check for Phoenix SYSTEM tables," +
>   " assuming they exist and are properly configured");
> 
> checkClientServerCompatibility(SchemaUtil.getPhysicalName(SYSTEM_CATALOG_NAME_BYTES,
>  getProps()).getName());
> success = true;
>   }
>   ...
>   return null;
> }
> ...
> scheduleRenewLeaseTasks();
> {code}
> Therefore, the following scheduleRenewLeaseTasks will be skipped and no 
> exception is thrown.
>  
> 1. scheduleRenewLeaseTasks not called
> 2. no renew task started
> 3. queries will call PhoenixConection.addIteratorForLeaseRenewal() as usual
> 4. the scannerQueue is unlimited therefore it will always adding new items.
> 5. Full GC.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (PHOENIX-6445) Wrong query plans with functions

2021-04-15 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17322460#comment-17322460
 ] 

Lars Hofhansl commented on PHOENIX-6445:


Four. This should be a valid query:
{code}
> explain select count(*), phoenix_row_timestamp() from test group by 
> phoenix_row_timestamp();
Error: ERROR 1018 (42Y27): Aggregate may not contain columns not in GROUP BY.  
PHOENIX_ROW_TIMESTAMP() (state=42Y27,code=1018)
java.sql.SQLException: ERROR 1018 (42Y27): Aggregate may not contain columns 
not in GROUP BY.  PHOENIX_ROW_TIMESTAMP()
at 
org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:606)
at 
org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:217)
at 
org.apache.phoenix.compile.ExpressionCompiler.throwNonAggExpressionInAggException(ExpressionCompiler.java:1090)
at 
org.apache.phoenix.compile.ProjectionCompiler.compile(ProjectionCompiler.java:434)
at 
org.apache.phoenix.compile.QueryCompiler.compileSingleFlatQuery(QueryCompiler.java:755)
{code}

> Wrong query plans with functions
> 
>
> Key: PHOENIX-6445
> URL: https://issues.apache.org/jira/browse/PHOENIX-6445
> Project: Phoenix
>  Issue Type: Wish
>Reporter: Lars Hofhansl
>Priority: Major
>
> Phoenix seems to sometimes create incorrect query plans when functions are 
> used.
> I'll post these in the comments.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (PHOENIX-6434) Secondary Indexes on PHOENIX_ROW_TIMESTAMP()

2021-04-14 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17321841#comment-17321841
 ] 

Lars Hofhansl edited comment on PHOENIX-6434 at 4/15/21, 1:48 AM:
--

See PHOENIX-6445

>From my testing. +1 on this one.
(and.. back to vacation now)


was (Author: lhofhansl):
See PHOENIX-6445 (back to vacation now)

> Secondary Indexes on PHOENIX_ROW_TIMESTAMP()
> 
>
> Key: PHOENIX-6434
> URL: https://issues.apache.org/jira/browse/PHOENIX-6434
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.1.0, 4.16.0
>Reporter: Kadir Ozdemir
>Priority: Major
> Attachments: PHOENIX-6434.4.x.001.patch, PHOENIX-6434.4.x.002.patch, 
> PHOENIX-6434.4.x.003.patch, PHOENIX-6434.4.x.004.patch
>
>
> PHOENIX-5629 introduced the function PHOENIX_ROW_TIMESTAMP() that returns the 
> last modified time of a row. PHOENIX_ROW_TIMESTAMP() can be used as a 
> projection column and referred in a WHERE clause. It is desirable to have 
> indexes on row timestamps. This will result in fast time range queries. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (PHOENIX-6434) Secondary Indexes on PHOENIX_ROW_TIMESTAMP()

2021-04-14 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17321841#comment-17321841
 ] 

Lars Hofhansl commented on PHOENIX-6434:


See PHOENIX-6445 (back to vacation now)

> Secondary Indexes on PHOENIX_ROW_TIMESTAMP()
> 
>
> Key: PHOENIX-6434
> URL: https://issues.apache.org/jira/browse/PHOENIX-6434
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.1.0, 4.16.0
>Reporter: Kadir Ozdemir
>Priority: Major
> Attachments: PHOENIX-6434.4.x.001.patch, PHOENIX-6434.4.x.002.patch, 
> PHOENIX-6434.4.x.003.patch, PHOENIX-6434.4.x.004.patch
>
>
> PHOENIX-5629 introduced the function PHOENIX_ROW_TIMESTAMP() that returns the 
> last modified time of a row. PHOENIX_ROW_TIMESTAMP() can be used as a 
> projection column and referred in a WHERE clause. It is desirable to have 
> indexes on row timestamps. This will result in fast time range queries. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (PHOENIX-6445) Wrong query plans with functions

2021-04-14 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17321840#comment-17321840
 ] 

Lars Hofhansl commented on PHOENIX-6445:


Three. Final count is missing.
{code}
> explain select count(*) from (select * from test group by rand());
+--+
|  PLAN 
   |
+--+
| CLIENT 10-CHUNK 2578465 ROWS 314572800 BYTES PARALLEL 1-WAY FULL SCAN OVER 
TEST  |
| SERVER FILTER BY FIRST KEY ONLY   
   |
+--+
2 rows selected (0.03 seconds)
{code}

> Wrong query plans with functions
> 
>
> Key: PHOENIX-6445
> URL: https://issues.apache.org/jira/browse/PHOENIX-6445
> Project: Phoenix
>  Issue Type: Wish
>Reporter: Lars Hofhansl
>Priority: Major
>
> Phoenix seems to sometimes create incorrect query plans when functions are 
> used.
> I'll post these in the comments.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (PHOENIX-6445) Wrong query plans with functions

2021-04-14 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17321839#comment-17321839
 ] 

Lars Hofhansl commented on PHOENIX-6445:


Two. This should be an invalid query.
{code}
> explain select v1 from test group by rand();
+--+
|   PLAN
   |
+--+
| CLIENT 10-CHUNK 7315646 ROWS 314572800 BYTES PARALLEL 10-WAY ROUND ROBIN 
RANGE S |
| SERVER FILTER BY FIRST KEY ONLY   
   |
+--+
2 rows selected (0.151 seconds)
{code}


> Wrong query plans with functions
> 
>
> Key: PHOENIX-6445
> URL: https://issues.apache.org/jira/browse/PHOENIX-6445
> Project: Phoenix
>  Issue Type: Wish
>Reporter: Lars Hofhansl
>Priority: Major
>
> Phoenix seems to sometimes create incorrect query plans when functions are 
> used.
> I'll post these in the comments.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (PHOENIX-6445) Wrong query plans with functions

2021-04-14 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17321838#comment-17321838
 ] 

Lars Hofhansl commented on PHOENIX-6445:


Example one. The group by is missing.

{code}
> explain select /*+ NO_INDEX */ count(distinct(phoenix_row_timestamp())) from 
> test1;
+--+
|   PLAN
   |
+--+
| CLIENT 10-CHUNK 2578465 ROWS 314572800 BYTES PARALLEL 10-WAY FULL SCAN OVER 
TEST |
| SERVER FILTER BY FIRST KEY ONLY   
   |
| SERVER AGGREGATE INTO SINGLE ROW  
   |
+--+
3 rows selected (0.03 seconds)
{code}


> Wrong query plans with functions
> 
>
> Key: PHOENIX-6445
> URL: https://issues.apache.org/jira/browse/PHOENIX-6445
> Project: Phoenix
>  Issue Type: Wish
>Reporter: Lars Hofhansl
>Priority: Major
>
> Phoenix seems to sometimes create incorrect query plans when functions are 
> used.
> I'll post these in the comments.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (PHOENIX-6434) Secondary Indexes on PHOENIX_ROW_TIMESTAMP()

2021-04-14 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17321836#comment-17321836
 ] 

Lars Hofhansl edited comment on PHOENIX-6434 at 4/15/21, 1:33 AM:
--

Thanks [~kadir]

More problems:
{code:java}
 > select count(distinct(phoenix_row_timestamp())) from test; 
++
| DISTINCT_COUNT(" PHOENIX_ROW_TIMESTAMP()") |
++
| 234466 |
++
1 row selected (3.374 seconds)

> select /*+ NO_INDEX */ count(distinct(phoenix_row_timestamp())) from test;
+---+
| DISTINCT_COUNT(PHOENIX_ROW_TIMESTAMP(X.)) |
+---+
| 26638668  |
+---+
1 row selected (11.65 seconds)

{code}
 

Looks like the value with from the index is actually the correct one. Phoenix 
generates an incorrect plan without the index. I found other weird things. For 
example Phoenix will create an incorrect plan for (the final count is 
completely missing from the plan)
{code:java}
select count(*) from (select 1 from test group by phoenix_row_timestamp()){code}
(I tried this to check whether it would produce a different number from the 
count(distinct).

And the following should not be a valid query, but it actually return v1 from 
each row
{code:java}
select v1 from test group by phoenix_row_timestamp(){code}
Anyway... These are all _*unrelated*_ to this PR, and some not even related to 
phoenix_row_timestamp(). I'm on vacation that's why I did not look at the first 
issue and won't look at this for a bit.

 


was (Author: lhofhansl):
Thanks [~kadir]

More problems:
{code:java}
 > select count(distinct(phoenix_row_timestamp())) from test; 
++
| DISTINCT_COUNT(" PHOENIX_ROW_TIMESTAMP()") |
++
| 234466 |
++
1 row selected (3.374 seconds)

> select /*+ NO_INDEX */ count(distinct(phoenix_row_timestamp())) from test;
+---+
| DISTINCT_COUNT(PHOENIX_ROW_TIMESTAMP(X.)) |
+---+
| 26638668  |
+---+
1 row selected (11.65 seconds)

{code}
 

Looks like the value with from the index is actually the correct one. Phoenix 
generates an incorrect plan without the index. I found other weird things. For 
example Phoenix will create an incorrect plan for (the final count is 
completely missing from the plan)
{code:java}
select count(*) from (select 1 from test group by phoenix_row_timestamp()){code}
(I tried this to check whether it would produce a different number from the 
count(distinct).

And the following should not be a valid query, but it actually return v1 from 
each row
{code:java}
select v1 from test group by phoenix_row_timestamp(){code}
Anyway... These are all unrelated to this PR, and some not even related to 
phoenix_row_timestamp(). I'm on vacation that's why I did not look at the first 
issue and won't look at this for a bit.

 

> Secondary Indexes on PHOENIX_ROW_TIMESTAMP()
> 
>
> Key: PHOENIX-6434
> URL: https://issues.apache.org/jira/browse/PHOENIX-6434
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.1.0, 4.16.0
>Reporter: Kadir Ozdemir
>Priority: Major
> Attachments: PHOENIX-6434.4.x.001.patch, PHOENIX-6434.4.x.002.patch, 
> PHOENIX-6434.4.x.003.patch, PHOENIX-6434.4.x.004.patch
>
>
> PHOENIX-5629 introduced the function PHOENIX_ROW_TIMESTAMP() that returns the 
> last modified time of a row. PHOENIX_ROW_TIMESTAMP() can be used as a 
> projection column and referred in a WHERE clause. It is desirable to have 
> indexes on row timestamps. This will result in fast time range queries. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (PHOENIX-6434) Secondary Indexes on PHOENIX_ROW_TIMESTAMP()

2021-04-14 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17321836#comment-17321836
 ] 

Lars Hofhansl edited comment on PHOENIX-6434 at 4/15/21, 1:33 AM:
--

Thanks [~kadir]

More problems:
{code:java}
 > select count(distinct(phoenix_row_timestamp())) from test; 
++
| DISTINCT_COUNT(" PHOENIX_ROW_TIMESTAMP()") |
++
| 234466 |
++
1 row selected (3.374 seconds)

> select /*+ NO_INDEX */ count(distinct(phoenix_row_timestamp())) from test;
+---+
| DISTINCT_COUNT(PHOENIX_ROW_TIMESTAMP()) |
+---+
| 26638668  |
+---+
1 row selected (11.65 seconds)

{code}
 

Looks like the value with from the index is actually the correct one. Phoenix 
generates an incorrect plan without the index. I found other weird things. For 
example Phoenix will create an incorrect plan for (the final count is 
completely missing from the plan)
{code:java}
select count(*) from (select 1 from test group by phoenix_row_timestamp()){code}
(I tried this to check whether it would produce a different number from the 
count(distinct).

And the following should not be a valid query, but it actually return v1 from 
each row
{code:java}
select v1 from test group by phoenix_row_timestamp(){code}
Anyway... These are all _*unrelated*_ to this PR, and some not even related to 
phoenix_row_timestamp(). I'm on vacation that's why I did not look at the first 
issue and won't look at this for a bit.

 


was (Author: lhofhansl):
Thanks [~kadir]

More problems:
{code:java}
 > select count(distinct(phoenix_row_timestamp())) from test; 
++
| DISTINCT_COUNT(" PHOENIX_ROW_TIMESTAMP()") |
++
| 234466 |
++
1 row selected (3.374 seconds)

> select /*+ NO_INDEX */ count(distinct(phoenix_row_timestamp())) from test;
+---+
| DISTINCT_COUNT(PHOENIX_ROW_TIMESTAMP(X.)) |
+---+
| 26638668  |
+---+
1 row selected (11.65 seconds)

{code}
 

Looks like the value with from the index is actually the correct one. Phoenix 
generates an incorrect plan without the index. I found other weird things. For 
example Phoenix will create an incorrect plan for (the final count is 
completely missing from the plan)
{code:java}
select count(*) from (select 1 from test group by phoenix_row_timestamp()){code}
(I tried this to check whether it would produce a different number from the 
count(distinct).

And the following should not be a valid query, but it actually return v1 from 
each row
{code:java}
select v1 from test group by phoenix_row_timestamp(){code}
Anyway... These are all _*unrelated*_ to this PR, and some not even related to 
phoenix_row_timestamp(). I'm on vacation that's why I did not look at the first 
issue and won't look at this for a bit.

 

> Secondary Indexes on PHOENIX_ROW_TIMESTAMP()
> 
>
> Key: PHOENIX-6434
> URL: https://issues.apache.org/jira/browse/PHOENIX-6434
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.1.0, 4.16.0
>Reporter: Kadir Ozdemir
>Priority: Major
> Attachments: PHOENIX-6434.4.x.001.patch, PHOENIX-6434.4.x.002.patch, 
> PHOENIX-6434.4.x.003.patch, PHOENIX-6434.4.x.004.patch
>
>
> PHOENIX-5629 introduced the function PHOENIX_ROW_TIMESTAMP() that returns the 
> last modified time of a row. PHOENIX_ROW_TIMESTAMP() can be used as a 
> projection column and referred in a WHERE clause. It is desirable to have 
> indexes on row timestamps. This will result in fast time range queries. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (PHOENIX-6434) Secondary Indexes on PHOENIX_ROW_TIMESTAMP()

2021-04-14 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17321836#comment-17321836
 ] 

Lars Hofhansl commented on PHOENIX-6434:


Thanks [~kadir]

More problems:
{code:java}
 > select count(distinct(phoenix_row_timestamp())) from test; 
++
| DISTINCT_COUNT(" PHOENIX_ROW_TIMESTAMP()") |
++
| 234466 |
++
1 row selected (3.374 seconds)

> select /*+ NO_INDEX */ count(distinct(phoenix_row_timestamp())) from test;
+---+
| DISTINCT_COUNT(PHOENIX_ROW_TIMESTAMP(X.)) |
+---+
| 26638668  |
+---+
1 row selected (11.65 seconds)

{code}
 

Looks like the value with from the index is actually the correct one. Phoenix 
generates an incorrect plan without the index. I found other weird things. For 
example Phoenix will create an incorrect plan for (the final count is 
completely missing from the plan)
{code:java}
select count(*) from (select 1 from test group by phoenix_row_timestamp()){code}
(I tried this to check whether it would produce a different number from the 
count(distinct).

And the following should not be a valid query, but it actually return v1 from 
each row
{code:java}
select v1 from test group by phoenix_row_timestamp(){code}
Anyway... These are all unrelated to this PR, and some not even related to 
phoenix_row_timestamp(). I'm on vacation that's why I did not look at the first 
issue and won't look at this for a bit.

 

> Secondary Indexes on PHOENIX_ROW_TIMESTAMP()
> 
>
> Key: PHOENIX-6434
> URL: https://issues.apache.org/jira/browse/PHOENIX-6434
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.1.0, 4.16.0
>Reporter: Kadir Ozdemir
>Priority: Major
> Attachments: PHOENIX-6434.4.x.001.patch, PHOENIX-6434.4.x.002.patch, 
> PHOENIX-6434.4.x.003.patch, PHOENIX-6434.4.x.004.patch
>
>
> PHOENIX-5629 introduced the function PHOENIX_ROW_TIMESTAMP() that returns the 
> last modified time of a row. PHOENIX_ROW_TIMESTAMP() can be used as a 
> projection column and referred in a WHERE clause. It is desirable to have 
> indexes on row timestamps. This will result in fast time range queries. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (PHOENIX-6434) Secondary Indexes on PHOENIX_ROW_TIMESTAMP()

2021-04-12 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319684#comment-17319684
 ] 

Lars Hofhansl commented on PHOENIX-6434:


So before we can use PHOENIX_ROW_TIMESTAMP for timed retrieval it seems we have 
some bugs to fix (this is with Phoenix 5.1)

> Secondary Indexes on PHOENIX_ROW_TIMESTAMP()
> 
>
> Key: PHOENIX-6434
> URL: https://issues.apache.org/jira/browse/PHOENIX-6434
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.1.0, 4.16.0
>Reporter: Kadir Ozdemir
>Priority: Major
> Attachments: PHOENIX-6434.4.x.001.patch, PHOENIX-6434.4.x.002.patch, 
> PHOENIX-6434.4.x.003.patch
>
>
> PHOENIX-5629 introduced the function PHOENIX_ROW_TIMESTAMP() that returns the 
> last modified time of a row. PHOENIX_ROW_TIMESTAMP() can be used as a 
> projection column and referred in a WHERE clause. It is desirable to have 
> indexes on row timestamps. This will result in fast time range queries. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (PHOENIX-6434) Secondary Indexes on PHOENIX_ROW_TIMESTAMP()

2021-04-12 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319683#comment-17319683
 ] 

Lars Hofhansl commented on PHOENIX-6434:


And even worse (again with or without this):
{code:java}
> create table test2(pk1 integer not null primary key, v1 float, v2 float, v3 
> float);
No rows affected (1.418 seconds)
> upsert into test2 values(rand() * 1, rand(), rand(), rand());
1 row affected (0.185 seconds)
> select * from test2 order by phoenix_row_timestamp();

org.apache.hadoop.hbase.DoNotRetryIOException: 
org.apache.hadoop.hbase.DoNotRetryIOException: 
TEST2,,1618257122882.f93ab0ca1bf32530d879c55908b6326c
.: Qualifier 11 is out of the valid range - (0, 0)

at org.apache.phoenix.util.ServerUtil.createIOException(ServerUtil.java:114)
at org.apache.phoenix.util.ServerUtil.throwIOException(ServerUtil.java:80)
at 
org.apache.phoenix.iterate.RegionScannerFactory$1.nextRaw(RegionScannerFactory.java:271)
at 
org.apache.phoenix.iterate.RegionScannerResultIterator.next(RegionScannerResultIterator.java:65)
at 
org.apache.phoenix.iterate.OrderedResultIterator.getResultIterator(OrderedResultIterator.java:284)
at 
org.apache.phoenix.iterate.OrderedResultIterator.next(OrderedResultIterator.java:260)
at 
org.apache.phoenix.iterate.NonAggregateRegionScannerFactory.getTopNScanner(NonAggregateRegionScannerFactory.java:356)
at 
org.apache.phoenix.iterate.NonAggregateRegionScannerFactory.getRegionScanner(NonAggregateRegionScannerFactory.java:186)
at 
org.apache.phoenix.coprocessor.ScanRegionObserver.doPostScannerOpen(ScanRegionObserver.java:194)
at 
org.apache.phoenix.coprocessor.BaseScannerRegionObserver$RegionScannerHolder.overrideDelegate(BaseScannerRegionObserver.java:273)
at 
org.apache.phoenix.coprocessor.BaseScannerRegionObserver$RegionScannerHolder.nextRaw(BaseScannerRegionObserver.java:321)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3307)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:3557)
at 
org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:45253)
{code}
 

> Secondary Indexes on PHOENIX_ROW_TIMESTAMP()
> 
>
> Key: PHOENIX-6434
> URL: https://issues.apache.org/jira/browse/PHOENIX-6434
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.1.0, 4.16.0
>Reporter: Kadir Ozdemir
>Priority: Major
> Attachments: PHOENIX-6434.4.x.001.patch, PHOENIX-6434.4.x.002.patch, 
> PHOENIX-6434.4.x.003.patch
>
>
> PHOENIX-5629 introduced the function PHOENIX_ROW_TIMESTAMP() that returns the 
> last modified time of a row. PHOENIX_ROW_TIMESTAMP() can be used as a 
> projection column and referred in a WHERE clause. It is desirable to have 
> indexes on row timestamps. This will result in fast time range queries. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (PHOENIX-6434) Secondary Indexes on PHOENIX_ROW_TIMESTAMP()

2021-04-12 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319680#comment-17319680
 ] 

Lars Hofhansl commented on PHOENIX-6434:


Tried it and works.

There's a funny thing I noticed (which happens *with or without* this issue):
{code:java}
> create table test2(pk1 integer not null primary key, x.v1float, y.v2 float, 
> z.v3 float);
No rows affected (1.418 seconds)
> upsert into test2 values(rand() * 1, rand(), rand(), rand());
1 row affected (0.185 seconds)
> select * from test2 order by 
> phoenix_row_timestamp();+--+--++---+
|   PK1|  V1  | V2 |V3 |
+--+--++---+
| 48717214 | null | 0.54710484 | 0.8657283 |
+--+--++---+
1 row selected (0.06 seconds)
-- Note how v1 is NULL!
> select v1 from test2 order by phoenix_row_timestamp();
+---+
|V1 |
+---+
| 0.3282114 |
+---+
1 row selected (0.023 seconds)
{code}

> Secondary Indexes on PHOENIX_ROW_TIMESTAMP()
> 
>
> Key: PHOENIX-6434
> URL: https://issues.apache.org/jira/browse/PHOENIX-6434
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.1.0, 4.16.0
>Reporter: Kadir Ozdemir
>Priority: Major
> Attachments: PHOENIX-6434.4.x.001.patch, PHOENIX-6434.4.x.002.patch, 
> PHOENIX-6434.4.x.003.patch
>
>
> PHOENIX-5629 introduced the function PHOENIX_ROW_TIMESTAMP() that returns the 
> last modified time of a row. PHOENIX_ROW_TIMESTAMP() can be used as a 
> projection column and referred in a WHERE clause. It is desirable to have 
> indexes on row timestamps. This will result in fast time range queries. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (PHOENIX-6436) OrderedResultIterator does bad size estimation

2021-04-04 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17314633#comment-17314633
 ] 

Lars Hofhansl commented on PHOENIX-6436:


Actually. Even when spooling is disabled, there is thresholdBytes is the 
maximum number of bytes allowed to be allocated, so that should be the limit. 
I'll create a patch.

> OrderedResultIterator does bad size estimation
> --
>
> Key: PHOENIX-6436
> URL: https://issues.apache.org/jira/browse/PHOENIX-6436
> Project: Phoenix
>  Issue Type: Wish
>Reporter: Lars Hofhansl
>Priority: Major
>
> Just came across this.
> The size estimation is: {{(limit + offset) * estimatedEntrySize}}
> with just the passed limit and offset, and this estimate is applied for each 
> single scan.
> This is way too pessimistic when a large limit is passed as just a safety 
> measure.
> Assuming you pass 10.000.000. That is the overall limit, but Phoenix will 
> apply it to every scan (at least one per involved region) and take that much 
> memory of the pool.
> Not sure what a better estimate would be. Ideally we'd divide by the number 
> of involved regions with some fuss, or use a size estimate of the region. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (PHOENIX-6436) OrderedResultIterator does bad size estimation

2021-04-04 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17314632#comment-17314632
 ] 

Lars Hofhansl commented on PHOENIX-6436:


Yet another option is to limit the maximum amount of RAM estimated per scan to 
some fraction of the region size (it can't use more memory than the region 
size).
Something like {{min((limit + offset) * estimatedEntrySize, 
MAX_REGION_SIZE/10)}}, this does not have to be perfect just mostly right. 
Perhaps one actually determine the actual region size and limit that way.


> OrderedResultIterator does bad size estimation
> --
>
> Key: PHOENIX-6436
> URL: https://issues.apache.org/jira/browse/PHOENIX-6436
> Project: Phoenix
>  Issue Type: Wish
>Reporter: Lars Hofhansl
>Priority: Major
>
> Just came across this.
> The size estimation is: {{(limit + offset) * estimatedEntrySize}}
> with just the passed limit and offset, and this estimate is applied for each 
> single scan.
> This is way too pessimistic when a large limit is passed as just a safety 
> measure.
> Assuming you pass 10.000.000. That is the overall limit, but Phoenix will 
> apply it to every scan (at least one per involved region) and take that much 
> memory of the pool.
> Not sure what a better estimate would be. Ideally we'd divide by the number 
> of involved regions with some fuss, or use a size estimate of the region. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (PHOENIX-6436) OrderedResultIterator does bad size estimation

2021-04-02 Thread Lars Hofhansl (Jira)



[ 
https://issues.apache.org/jira/browse/PHOENIX-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17314125#comment-17314125
 ] 

Lars Hofhansl commented on PHOENIX-6436:


When spooling is enabled (default) the estimated memory size should just be the 
threshold number of bytes.


> OrderedResultIterator does bad size estimation
> --
>
> Key: PHOENIX-6436
> URL: https://issues.apache.org/jira/browse/PHOENIX-6436
> Project: Phoenix
>  Issue Type: Wish
>Reporter: Lars Hofhansl
>Priority: Major
>
> Just came across this.
> The size estimation is: {{(limit + offset) * estimatedEntrySize}}
> with just the passed limit and offset, and this estimate is applied for each 
> single scan.
> This is way too pessimistic when a large limit is passed as just a safety 
> measure.
> Assuming you pass 10.000.000. That is the overall limit, but Phoenix will 
> apply it to every scan (at least one per involved region) and take that much 
> memory of the pool.
> Not sure what a better estimate would be. Ideally we'd divide by the number 
> of involved regions with some fuss, or use a size estimate of the region. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1713 matches

Mail list logo