[jira] [Comment Edited] (PHOENIX-6434) Secondary Indexes on PHOENIX_ROW_TIMESTAMP()

2021-04-14 Thread Lars Hofhansl (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-6434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17321841#comment-17321841
 ] 

Lars Hofhansl edited comment on PHOENIX-6434 at 4/15/21, 1:48 AM:
--

See PHOENIX-6445

>From my testing. +1 on this one.
(and.. back to vacation now)


was (Author: lhofhansl):
See PHOENIX-6445 (back to vacation now)

> Secondary Indexes on PHOENIX_ROW_TIMESTAMP()
> 
>
> Key: PHOENIX-6434
> URL: https://issues.apache.org/jira/browse/PHOENIX-6434
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.1.0, 4.16.0
>Reporter: Kadir Ozdemir
>Priority: Major
> Attachments: PHOENIX-6434.4.x.001.patch, PHOENIX-6434.4.x.002.patch, 
> PHOENIX-6434.4.x.003.patch, PHOENIX-6434.4.x.004.patch
>
>
> PHOENIX-5629 introduced the function PHOENIX_ROW_TIMESTAMP() that returns the 
> last modified time of a row. PHOENIX_ROW_TIMESTAMP() can be used as a 
> projection column and referred in a WHERE clause. It is desirable to have 
> indexes on row timestamps. This will result in fast time range queries. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (PHOENIX-6434) Secondary Indexes on PHOENIX_ROW_TIMESTAMP()

2021-04-14 Thread Lars Hofhansl (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-6434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17321836#comment-17321836
 ] 

Lars Hofhansl edited comment on PHOENIX-6434 at 4/15/21, 1:33 AM:
--

Thanks [~kadir]

More problems:
{code:java}
 > select count(distinct(phoenix_row_timestamp())) from test; 
++
| DISTINCT_COUNT(" PHOENIX_ROW_TIMESTAMP()") |
++
| 234466 |
++
1 row selected (3.374 seconds)

> select /*+ NO_INDEX */ count(distinct(phoenix_row_timestamp())) from test;
+---+
| DISTINCT_COUNT(PHOENIX_ROW_TIMESTAMP(X.)) |
+---+
| 26638668  |
+---+
1 row selected (11.65 seconds)

{code}
 

Looks like the value with from the index is actually the correct one. Phoenix 
generates an incorrect plan without the index. I found other weird things. For 
example Phoenix will create an incorrect plan for (the final count is 
completely missing from the plan)
{code:java}
select count(*) from (select 1 from test group by phoenix_row_timestamp()){code}
(I tried this to check whether it would produce a different number from the 
count(distinct).

And the following should not be a valid query, but it actually return v1 from 
each row
{code:java}
select v1 from test group by phoenix_row_timestamp(){code}
Anyway... These are all _*unrelated*_ to this PR, and some not even related to 
phoenix_row_timestamp(). I'm on vacation that's why I did not look at the first 
issue and won't look at this for a bit.

 


was (Author: lhofhansl):
Thanks [~kadir]

More problems:
{code:java}
 > select count(distinct(phoenix_row_timestamp())) from test; 
++
| DISTINCT_COUNT(" PHOENIX_ROW_TIMESTAMP()") |
++
| 234466 |
++
1 row selected (3.374 seconds)

> select /*+ NO_INDEX */ count(distinct(phoenix_row_timestamp())) from test;
+---+
| DISTINCT_COUNT(PHOENIX_ROW_TIMESTAMP(X.)) |
+---+
| 26638668  |
+---+
1 row selected (11.65 seconds)

{code}
 

Looks like the value with from the index is actually the correct one. Phoenix 
generates an incorrect plan without the index. I found other weird things. For 
example Phoenix will create an incorrect plan for (the final count is 
completely missing from the plan)
{code:java}
select count(*) from (select 1 from test group by phoenix_row_timestamp()){code}
(I tried this to check whether it would produce a different number from the 
count(distinct).

And the following should not be a valid query, but it actually return v1 from 
each row
{code:java}
select v1 from test group by phoenix_row_timestamp(){code}
Anyway... These are all unrelated to this PR, and some not even related to 
phoenix_row_timestamp(). I'm on vacation that's why I did not look at the first 
issue and won't look at this for a bit.

 

> Secondary Indexes on PHOENIX_ROW_TIMESTAMP()
> 
>
> Key: PHOENIX-6434
> URL: https://issues.apache.org/jira/browse/PHOENIX-6434
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.1.0, 4.16.0
>Reporter: Kadir Ozdemir
>Priority: Major
> Attachments: PHOENIX-6434.4.x.001.patch, PHOENIX-6434.4.x.002.patch, 
> PHOENIX-6434.4.x.003.patch, PHOENIX-6434.4.x.004.patch
>
>
> PHOENIX-5629 introduced the function PHOENIX_ROW_TIMESTAMP() that returns the 
> last modified time of a row. PHOENIX_ROW_TIMESTAMP() can be used as a 
> projection column and referred in a WHERE clause. It is desirable to have 
> indexes on row timestamps. This will result in fast time range queries. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (PHOENIX-6434) Secondary Indexes on PHOENIX_ROW_TIMESTAMP()

2021-04-14 Thread Lars Hofhansl (Jira)


[ 
https://issues.apache.org/jira/browse/PHOENIX-6434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17321836#comment-17321836
 ] 

Lars Hofhansl edited comment on PHOENIX-6434 at 4/15/21, 1:33 AM:
--

Thanks [~kadir]

More problems:
{code:java}
 > select count(distinct(phoenix_row_timestamp())) from test; 
++
| DISTINCT_COUNT(" PHOENIX_ROW_TIMESTAMP()") |
++
| 234466 |
++
1 row selected (3.374 seconds)

> select /*+ NO_INDEX */ count(distinct(phoenix_row_timestamp())) from test;
+---+
| DISTINCT_COUNT(PHOENIX_ROW_TIMESTAMP()) |
+---+
| 26638668  |
+---+
1 row selected (11.65 seconds)

{code}
 

Looks like the value with from the index is actually the correct one. Phoenix 
generates an incorrect plan without the index. I found other weird things. For 
example Phoenix will create an incorrect plan for (the final count is 
completely missing from the plan)
{code:java}
select count(*) from (select 1 from test group by phoenix_row_timestamp()){code}
(I tried this to check whether it would produce a different number from the 
count(distinct).

And the following should not be a valid query, but it actually return v1 from 
each row
{code:java}
select v1 from test group by phoenix_row_timestamp(){code}
Anyway... These are all _*unrelated*_ to this PR, and some not even related to 
phoenix_row_timestamp(). I'm on vacation that's why I did not look at the first 
issue and won't look at this for a bit.

 


was (Author: lhofhansl):
Thanks [~kadir]

More problems:
{code:java}
 > select count(distinct(phoenix_row_timestamp())) from test; 
++
| DISTINCT_COUNT(" PHOENIX_ROW_TIMESTAMP()") |
++
| 234466 |
++
1 row selected (3.374 seconds)

> select /*+ NO_INDEX */ count(distinct(phoenix_row_timestamp())) from test;
+---+
| DISTINCT_COUNT(PHOENIX_ROW_TIMESTAMP(X.)) |
+---+
| 26638668  |
+---+
1 row selected (11.65 seconds)

{code}
 

Looks like the value with from the index is actually the correct one. Phoenix 
generates an incorrect plan without the index. I found other weird things. For 
example Phoenix will create an incorrect plan for (the final count is 
completely missing from the plan)
{code:java}
select count(*) from (select 1 from test group by phoenix_row_timestamp()){code}
(I tried this to check whether it would produce a different number from the 
count(distinct).

And the following should not be a valid query, but it actually return v1 from 
each row
{code:java}
select v1 from test group by phoenix_row_timestamp(){code}
Anyway... These are all _*unrelated*_ to this PR, and some not even related to 
phoenix_row_timestamp(). I'm on vacation that's why I did not look at the first 
issue and won't look at this for a bit.

 

> Secondary Indexes on PHOENIX_ROW_TIMESTAMP()
> 
>
> Key: PHOENIX-6434
> URL: https://issues.apache.org/jira/browse/PHOENIX-6434
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 5.1.0, 4.16.0
>Reporter: Kadir Ozdemir
>Priority: Major
> Attachments: PHOENIX-6434.4.x.001.patch, PHOENIX-6434.4.x.002.patch, 
> PHOENIX-6434.4.x.003.patch, PHOENIX-6434.4.x.004.patch
>
>
> PHOENIX-5629 introduced the function PHOENIX_ROW_TIMESTAMP() that returns the 
> last modified time of a row. PHOENIX_ROW_TIMESTAMP() can be used as a 
> projection column and referred in a WHERE clause. It is desirable to have 
> indexes on row timestamps. This will result in fast time range queries. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)