[jira] [Commented] (PHOENIX-418) Support approximate COUNT DISTINCT

2017-08-03 Thread Ethan Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113913#comment-16113913
 ] 

Ethan Wang commented on PHOENIX-418:


Thanks [~jamestaylor] [~sergey.soldatov]. My bad. I go prepare another one.

> Support approximate COUNT DISTINCT
> --
>
> Key: PHOENIX-418
> URL: https://issues.apache.org/jira/browse/PHOENIX-418
> Project: Phoenix
>  Issue Type: Task
>Reporter: James Taylor
>Assignee: Ethan Wang
>  Labels: gsoc2016
> Attachments: PHOENIX-418-v1.patch, PHOENIX-418-v2.patch
>
>
> Support an "approximation" of count distinct to prevent having to hold on to 
> all distinct values (since this will not scale well when the number of 
> distinct values is huge). The Apache Drill folks have had some interesting 
> discussions on this 
> [here](http://mail-archives.apache.org/mod_mbox/incubator-drill-dev/201306.mbox/%3CJIRA.12650169.1369931282407.88049.1370645900553%40arcas%3E).
>  They recommend using  [Welford's 
> method](http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance_Online_algorithm).
>  I'm open to having a config option that uses exact versus approximate. I 
> don't have experience implementing an approximate implementation, so I'm not 
> sure how much state is required to keep on the server and return to the 
> client (other than realizing it'd be much less that returning all distinct 
> values and their counts).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PHOENIX-418) Support approximate COUNT DISTINCT

2017-08-03 Thread Ethan Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ethan Wang updated PHOENIX-418:
---
Attachment: PHOENIX-418-v2.patch

> Support approximate COUNT DISTINCT
> --
>
> Key: PHOENIX-418
> URL: https://issues.apache.org/jira/browse/PHOENIX-418
> Project: Phoenix
>  Issue Type: Task
>Reporter: James Taylor
>Assignee: Ethan Wang
>  Labels: gsoc2016
> Attachments: PHOENIX-418-v1.patch, PHOENIX-418-v2.patch
>
>
> Support an "approximation" of count distinct to prevent having to hold on to 
> all distinct values (since this will not scale well when the number of 
> distinct values is huge). The Apache Drill folks have had some interesting 
> discussions on this 
> [here](http://mail-archives.apache.org/mod_mbox/incubator-drill-dev/201306.mbox/%3CJIRA.12650169.1369931282407.88049.1370645900553%40arcas%3E).
>  They recommend using  [Welford's 
> method](http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance_Online_algorithm).
>  I'm open to having a config option that uses exact versus approximate. I 
> don't have experience implementing an approximate implementation, so I'm not 
> sure how much state is required to keep on the server and return to the 
> client (other than realizing it'd be much less that returning all distinct 
> values and their counts).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4064) Why the group by the index field will execute scan over table.

2017-08-03 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113848#comment-16113848
 ] 

James Taylor commented on PHOENIX-4064:
---

Need to see your table & index DDL.

>  Why the group by the index field will execute scan over table.
> ---
>
> Key: PHOENIX-4064
> URL: https://issues.apache.org/jira/browse/PHOENIX-4064
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.11.0
>Reporter: jifei_yang
>  Labels: features
> Fix For: 4.11.0
>
>
> Why the group by the index field will execute scan over table.
> I created a Phoenix table, and then created the index, all the fields are 
> marked as index. However, when the group by index field is executed, the 
> execution plan is viewed and the index is not used. Please help solve the 
> next, thank you.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4064) Why the group by the index field will execute scan over table.

2017-08-03 Thread jifei_yang (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113830#comment-16113830
 ] 

jifei_yang commented on PHOENIX-4064:
-

The sql is :

{code:java}
explain select COMMANDID,CDNNETID,DOMAIN,SERVICECONTENT,ILLEGALTYPE  from  
PH_ILLEGALWEB_APP  where   CREATETIME= '2017-08-03'  and instr(CDNNETID,'a1a') 
>0 and  DOMAIN in('mvvideo4.meitudata.com')  group by 
DOMAIN,CDNNETID,COMMANDID,SERVICECONTENT,ILLEGALTYPE  ;
{code}

and explain is :

{code:java}
 CLIENT 3-CHUNK 0 ROWS 0 BYTES PARALLEL 3-WAY RANGE SCAN OVER PH_ILLEGALWEB_APP 
[0,'2017-08-03'] - [2,'2017-08-03']  |
   SERVER FILTER BY FIRST KEY ONLY AND (INSTR(CDNNETID, 'a1a') > 0 AND DOMAIN = 
'mvvideo4.meitudata.com')  |
SERVER DISTINCT PREFIX FILTER OVER [DOMAIN, CDNNETID, COMMANDID, 
SERVICECONTENT, ILLEGALTYPE]   |
 SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY [DOMAIN, CDNNETID, 
COMMANDID, SERVICECONTENT, ILLEGALTYPE]   |
 CLIENT MERGE SORT 
{code}


>  Why the group by the index field will execute scan over table.
> ---
>
> Key: PHOENIX-4064
> URL: https://issues.apache.org/jira/browse/PHOENIX-4064
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.11.0
>Reporter: jifei_yang
>  Labels: features
> Fix For: 4.11.0
>
>
> Why the group by the index field will execute scan over table.
> I created a Phoenix table, and then created the index, all the fields are 
> marked as index. However, when the group by index field is executed, the 
> execution plan is viewed and the index is not used. Please help solve the 
> next, thank you.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (PHOENIX-4064) Why the group by the index field will execute scan over table.

2017-08-03 Thread jifei_yang (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113830#comment-16113830
 ] 

jifei_yang edited comment on PHOENIX-4064 at 8/4/17 2:33 AM:
-

The sql is :

{code:java}
explain select COMMANDID,CDNNETID,DOMAIN,SERVICECONTENT,ILLEGALTYPE  from  
PH_ILLEGALWEB_APP  where   CREATETIME= '2017-08-03'  and instr(CDNNETID,'a1a') 
>0 and  DOMAIN in('mvvideo4.meitudata.com')  group by 
DOMAIN,CDNNETID,COMMANDID,SERVICECONTENT,ILLEGALTYPE  ;
{code}

and explain is :

{code:java}
 CLIENT 3-CHUNK 0 ROWS 0 BYTES PARALLEL 3-WAY RANGE SCAN OVER PH_ILLEGALWEB_APP 
[0,'2017-08-03'] - [2,'2017-08-03']  
   SERVER FILTER BY FIRST KEY ONLY AND (INSTR(CDNNETID, 'a1a') > 0 AND DOMAIN = 
'mvvideo4.meitudata.com')  
SERVER DISTINCT PREFIX FILTER OVER [DOMAIN, CDNNETID, COMMANDID, 
SERVICECONTENT, ILLEGALTYPE]   
 SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY [DOMAIN, CDNNETID, 
COMMANDID, SERVICECONTENT, ILLEGALTYPE]   
 CLIENT MERGE SORT 
{code}



was (Author: highfei2...@126.com):
The sql is :

{code:java}
explain select COMMANDID,CDNNETID,DOMAIN,SERVICECONTENT,ILLEGALTYPE  from  
PH_ILLEGALWEB_APP  where   CREATETIME= '2017-08-03'  and instr(CDNNETID,'a1a') 
>0 and  DOMAIN in('mvvideo4.meitudata.com')  group by 
DOMAIN,CDNNETID,COMMANDID,SERVICECONTENT,ILLEGALTYPE  ;
{code}

and explain is :

{code:java}
 CLIENT 3-CHUNK 0 ROWS 0 BYTES PARALLEL 3-WAY RANGE SCAN OVER PH_ILLEGALWEB_APP 
[0,'2017-08-03'] - [2,'2017-08-03']  |
   SERVER FILTER BY FIRST KEY ONLY AND (INSTR(CDNNETID, 'a1a') > 0 AND DOMAIN = 
'mvvideo4.meitudata.com')  
SERVER DISTINCT PREFIX FILTER OVER [DOMAIN, CDNNETID, COMMANDID, 
SERVICECONTENT, ILLEGALTYPE]   
 SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY [DOMAIN, CDNNETID, 
COMMANDID, SERVICECONTENT, ILLEGALTYPE]   
 CLIENT MERGE SORT 
{code}


>  Why the group by the index field will execute scan over table.
> ---
>
> Key: PHOENIX-4064
> URL: https://issues.apache.org/jira/browse/PHOENIX-4064
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.11.0
>Reporter: jifei_yang
>  Labels: features
> Fix For: 4.11.0
>
>
> Why the group by the index field will execute scan over table.
> I created a Phoenix table, and then created the index, all the fields are 
> marked as index. However, when the group by index field is executed, the 
> execution plan is viewed and the index is not used. Please help solve the 
> next, thank you.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (PHOENIX-4064) Why the group by the index field will execute scan over table.

2017-08-03 Thread jifei_yang (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113830#comment-16113830
 ] 

jifei_yang edited comment on PHOENIX-4064 at 8/4/17 2:33 AM:
-

The sql is :

{code:java}
explain select COMMANDID,CDNNETID,DOMAIN,SERVICECONTENT,ILLEGALTYPE  from  
PH_ILLEGALWEB_APP  where   CREATETIME= '2017-08-03'  and instr(CDNNETID,'a1a') 
>0 and  DOMAIN in('mvvideo4.meitudata.com')  group by 
DOMAIN,CDNNETID,COMMANDID,SERVICECONTENT,ILLEGALTYPE  ;
{code}

and explain is :

{code:java}
 CLIENT 3-CHUNK 0 ROWS 0 BYTES PARALLEL 3-WAY RANGE SCAN OVER PH_ILLEGALWEB_APP 
[0,'2017-08-03'] - [2,'2017-08-03']  |
   SERVER FILTER BY FIRST KEY ONLY AND (INSTR(CDNNETID, 'a1a') > 0 AND DOMAIN = 
'mvvideo4.meitudata.com')  
SERVER DISTINCT PREFIX FILTER OVER [DOMAIN, CDNNETID, COMMANDID, 
SERVICECONTENT, ILLEGALTYPE]   
 SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY [DOMAIN, CDNNETID, 
COMMANDID, SERVICECONTENT, ILLEGALTYPE]   
 CLIENT MERGE SORT 
{code}



was (Author: highfei2...@126.com):
The sql is :

{code:java}
explain select COMMANDID,CDNNETID,DOMAIN,SERVICECONTENT,ILLEGALTYPE  from  
PH_ILLEGALWEB_APP  where   CREATETIME= '2017-08-03'  and instr(CDNNETID,'a1a') 
>0 and  DOMAIN in('mvvideo4.meitudata.com')  group by 
DOMAIN,CDNNETID,COMMANDID,SERVICECONTENT,ILLEGALTYPE  ;
{code}

and explain is :

{code:java}
 CLIENT 3-CHUNK 0 ROWS 0 BYTES PARALLEL 3-WAY RANGE SCAN OVER PH_ILLEGALWEB_APP 
[0,'2017-08-03'] - [2,'2017-08-03']  |
   SERVER FILTER BY FIRST KEY ONLY AND (INSTR(CDNNETID, 'a1a') > 0 AND DOMAIN = 
'mvvideo4.meitudata.com')  |
SERVER DISTINCT PREFIX FILTER OVER [DOMAIN, CDNNETID, COMMANDID, 
SERVICECONTENT, ILLEGALTYPE]   |
 SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY [DOMAIN, CDNNETID, 
COMMANDID, SERVICECONTENT, ILLEGALTYPE]   |
 CLIENT MERGE SORT 
{code}


>  Why the group by the index field will execute scan over table.
> ---
>
> Key: PHOENIX-4064
> URL: https://issues.apache.org/jira/browse/PHOENIX-4064
> Project: Phoenix
>  Issue Type: Improvement
>Affects Versions: 4.11.0
>Reporter: jifei_yang
>  Labels: features
> Fix For: 4.11.0
>
>
> Why the group by the index field will execute scan over table.
> I created a Phoenix table, and then created the index, all the fields are 
> marked as index. However, when the group by index field is executed, the 
> execution plan is viewed and the index is not used. Please help solve the 
> next, thank you.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (PHOENIX-4064) Why the group by the index field will execute scan over table.

2017-08-03 Thread jifei_yang (JIRA)
jifei_yang created PHOENIX-4064:
---

 Summary:  Why the group by the index field will execute scan over 
table.
 Key: PHOENIX-4064
 URL: https://issues.apache.org/jira/browse/PHOENIX-4064
 Project: Phoenix
  Issue Type: Improvement
Affects Versions: 4.11.0
Reporter: jifei_yang
 Fix For: 4.11.0


Why the group by the index field will execute scan over table.
I created a Phoenix table, and then created the index, all the fields are 
marked as index. However, when the group by index field is executed, the 
execution plan is viewed and the index is not used. Please help solve the next, 
thank you.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (PHOENIX-3525) Cap automatic index rebuilding to inactive timestamp.

2017-08-03 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113128#comment-16113128
 ] 

James Taylor edited comment on PHOENIX-3525 at 8/4/17 2:25 AM:
---

Current plan is to eliminate simultaneous writes from the rebuilder and clients 
to prevent any race conditions by:
* introducing a PENDING_ACTIVE index state. When in PENDING_ACTIVE state, an 
index will not be used by queries until the server-based timestamp >= 
INDEX_ACTIVATE_TIMESTAMP.
* introducing an INDEX_ACTIVATE_TIMESTAMP column that determines when an index 
will be reactivated. This timestamp will be set by the rebuilder to a time in 
the future (by a configurable amount of time) after all index regions are 
online. The index will be put either left in an ACTIVE state (depending on 
config) or moved to a PENDING_ACTIVE state.
* prevent index maintenance by not sending IndexMaintainer until server-based 
timestamp >= INDEX_ACTIVATE_TIMESTAMP.
* include INDEX_ACTIVATE_TIMESTAMP in PTable so that clients can use it to 
control whether index maintenance is performed.



was (Author: jamestaylor):
Current plan is to eliminate simultaneous writes from the rebuilder and clients 
to prevent any race conditions by:
* introducing a PENDING_ACTIVE index state. When in PENDING_ACTIVE state, an 
index will not be used by queries until the server-based timestamp >= 
INDEX_ACTIVATE_TIMESTAMP.
* introducing an INDEX_ACTIVATE_TIMESTAMP column that determines when an index 
will be reactivated. This timestamp will be set by the rebuilder to a time in 
the future (by a configurable amount of time) after all index regions are 
online. The index will be put either left in an ACTIVE state (depending on 
config) or moved to a PENDING_ACTIVE state.
* prevent index maintenance by not sending IndexMaintainer until index is 
ACTIVE or PENDING_ACTIVE with server-based timestamp >= 
INDEX_ACTIVATE_TIMESTAMP.
* include INDEX_ACTIVATE_TIMESTAMP in PTable

> Cap automatic index rebuilding to inactive timestamp.
> -
>
> Key: PHOENIX-3525
> URL: https://issues.apache.org/jira/browse/PHOENIX-3525
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Ankit Singhal
>Assignee: James Taylor
> Attachments: PHOENIX-3525_wip2.patch, PHOENIX-3525_wip.patch
>
>
> From [~chrajeshbab...@gmail.com] review comment on 
> https://github.com/apache/phoenix/pull/210
> For automatic rebuilding ,DISABLED_TIMESTAMP is lower bound but there is no 
> upper bound so we are going rebuild all the new writes written after 
> DISABLED_TIMESTAMP even though indexes updated properly. So we can introduce 
> an upper bound of time where we are going to start a rebuild thread so we can 
> limit the data to rebuild. In case If there are frequent writes then we can 
> increment the rebuild period exponentially



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PHOENIX-4063) Partial Index Rebuilder must replay only latest cell at same timestamp

2017-08-03 Thread James Taylor (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Taylor updated PHOENIX-4063:
--
Summary: Partial Index Rebuilder must replay only latest cell at same 
timestamp  (was: Partial Index Rebuilder must replay from earliest to latest)

> Partial Index Rebuilder must replay only latest cell at same timestamp
> --
>
> Key: PHOENIX-4063
> URL: https://issues.apache.org/jira/browse/PHOENIX-4063
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>
> Since there may be multiple versions of a cell written while the index is 
> disabled, we need to replay all of them to ensure consistency. In theory, we 
> could ignore prior versions older than the index disable timestamp, this 
> would be difficult to implement.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PHOENIX-4063) Partial Index Rebuilder must replay from earliest to latest

2017-08-03 Thread James Taylor (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Taylor updated PHOENIX-4063:
--
Description: Since there may be multiple versions of a cell written while 
the index is disabled, we need to replay all of them to ensure consistency. In 
theory, we could ignore prior versions older than the index disable timestamp, 
this would be difficult to implement.  (was: Given that PHOENIX-4057 ignores 
updates for earlier versions, we should not replay all versions of a cell, but 
only the latest (current code replays all from latest to earliest). Also, we 
should include all versions of a cell instead of setting an upper bound on the 
timestamp as otherwise we'll put the index out-of-sync.)

> Partial Index Rebuilder must replay from earliest to latest
> ---
>
> Key: PHOENIX-4063
> URL: https://issues.apache.org/jira/browse/PHOENIX-4063
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>
> Since there may be multiple versions of a cell written while the index is 
> disabled, we need to replay all of them to ensure consistency. In theory, we 
> could ignore prior versions older than the index disable timestamp, this 
> would be difficult to implement.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PHOENIX-4063) Partial Index Rebuilder must replay from earliest to latest

2017-08-03 Thread James Taylor (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-4063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Taylor updated PHOENIX-4063:
--
Summary: Partial Index Rebuilder must replay from earliest to latest  (was: 
Partial Index Rebuilder should not cause upper bound to be set on index 
maintenance scan)

> Partial Index Rebuilder must replay from earliest to latest
> ---
>
> Key: PHOENIX-4063
> URL: https://issues.apache.org/jira/browse/PHOENIX-4063
> Project: Phoenix
>  Issue Type: Bug
>Reporter: James Taylor
>
> Given that PHOENIX-4057 ignores updates for earlier versions, we should not 
> replay all versions of a cell, but only the latest (current code replays all 
> from latest to earliest). Also, we should include all versions of a cell 
> instead of setting an upper bound on the timestamp as otherwise we'll put the 
> index out-of-sync.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (PHOENIX-4063) Partial Index Rebuilder should not cause upper bound to be set on index maintenance scan

2017-08-03 Thread James Taylor (JIRA)
James Taylor created PHOENIX-4063:
-

 Summary: Partial Index Rebuilder should not cause upper bound to 
be set on index maintenance scan
 Key: PHOENIX-4063
 URL: https://issues.apache.org/jira/browse/PHOENIX-4063
 Project: Phoenix
  Issue Type: Bug
Reporter: James Taylor


Given that PHOENIX-4057 ignores updates for earlier versions, we should not 
replay all versions of a cell, but only the latest (current code replays all 
from latest to earliest). Also, we should include all versions of a cell 
instead of setting an upper bound on the timestamp as otherwise we'll put the 
index out-of-sync.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (PHOENIX-3341) Schema update is not visible to following statements of the same connection due to CalciteSchema caching.

2017-08-03 Thread Maryann Xue (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maryann Xue resolved PHOENIX-3341.
--
Resolution: Fixed

> Schema update is not visible to following statements of the same connection 
> due to CalciteSchema caching.
> -
>
> Key: PHOENIX-3341
> URL: https://issues.apache.org/jira/browse/PHOENIX-3341
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: Maryann Xue
>Assignee: Maryann Xue
>  Labels: calcite
>
> The TableRef object contains a timestamp which will be used for TableScan. 
> The timestamp should be set at the time of the statement being compiled. Now 
> that the table resolving goes from Calcite and CalciteSchema caches 
> TableEntry through the whole connection, the table will not be re-resolved if 
> any previous statement has already resolved it. If a previous statement did 
> an update, the next statement cannot see the update since it's holding a 
> TableRef object containing the old timestamp.
> The CalciteSchema caching would also be a problem if a table, a view, or a 
> function is modified or dropped.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (PHOENIX-868) Make Time, Date, and Timestamp handling JDBC-compliant

2017-08-03 Thread James Taylor (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Taylor reassigned PHOENIX-868:


Assignee: Rajeshbabu Chintaguntla

> Make Time, Date, and Timestamp handling JDBC-compliant
> --
>
> Key: PHOENIX-868
> URL: https://issues.apache.org/jira/browse/PHOENIX-868
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Gabriel Reid
>Assignee: Rajeshbabu Chintaguntla
> Fix For: 5.0
>
>
> From what I understand from the JDBC documentation, the way that a 
> java.sql.Date should be handled via JDBC is simply as a day, month, and year, 
> despite the fact that it is internally represented as a timestamp (the same 
> kind of thing applies to Time objects, which are a triple of hours, minutes, 
> and seconds).
> Further, my understanding is that it is the responsibility of a JDBC driver 
> to do normalization of incoming Date and Time (and maybe Timestamp) objects 
> to interpret them as being in the current time zone, and remove the extra 
> components (i.e. time components for a Date, and date components for a Time) 
> before storing the value.
> This means that today, if I insert a column value consisting of 'new 
> Date(System.currentTimeMillis())', then I should be able to retrieve that 
> same value with a filter on 'Date.valueOf(“2014-03-18”)’. Additionally, that 
> filter should work regardless of my own local timezone.
> It also means that if I store ‘Time.valueOf("07:00:00”)’ in a TIME field in a 
> database in my current timezone, someone should get “07:00:00” if they run 
> 'ResultSet#getTime(1).toString()’ on that value, even if they’re in a 
> different timezone than me.
> From what I can see right now, Phoenix doesn’t currently exhibit this 
> behavior. Instead, the full long representation of Date, Time, and Timestamps 
> is stored directly in HBase, without dropping the extra date fields or doing 
> timezone conversion.
> From the current analysis, what is required for Phoenix to be JDBC-compliant 
> in terms of time/date/timestamp handling is:
> * All incoming time-style values should be interpreted in the local timezone 
> of the driver, then be normalized and converted to UTC before serialization 
> (unless a Calendar is supplied) in PreparedStatement calls
> * All outgoing time-style values should be converted from UTC into the local 
> timezone (unless a Calendar is supplied) in ResultSet calls
> * Supplying a Calendar to PreparedStatement methods should cause the time 
> value to be converted from the local timezone to the timezone of the calendar 
> (instead of UTC) before being serialized
> * Supplying a Calendar to ResultSet methods should cause the time value from 
> the database to be interpreted as if it was serialized in the timezone of the 
> Calendar, instead of UTC.
> Making the above changes would mean breaking backwards compatibility with 
> existing Phoenix installs (unless some kind of backwards-compatibility mode 
> is introduced or something similar). 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PHOENIX-868) Make Time, Date, and Timestamp handling JDBC-compliant

2017-08-03 Thread James Taylor (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Taylor updated PHOENIX-868:
-
Fix Version/s: 5.0

> Make Time, Date, and Timestamp handling JDBC-compliant
> --
>
> Key: PHOENIX-868
> URL: https://issues.apache.org/jira/browse/PHOENIX-868
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Gabriel Reid
>Assignee: Rajeshbabu Chintaguntla
> Fix For: 5.0
>
>
> From what I understand from the JDBC documentation, the way that a 
> java.sql.Date should be handled via JDBC is simply as a day, month, and year, 
> despite the fact that it is internally represented as a timestamp (the same 
> kind of thing applies to Time objects, which are a triple of hours, minutes, 
> and seconds).
> Further, my understanding is that it is the responsibility of a JDBC driver 
> to do normalization of incoming Date and Time (and maybe Timestamp) objects 
> to interpret them as being in the current time zone, and remove the extra 
> components (i.e. time components for a Date, and date components for a Time) 
> before storing the value.
> This means that today, if I insert a column value consisting of 'new 
> Date(System.currentTimeMillis())', then I should be able to retrieve that 
> same value with a filter on 'Date.valueOf(“2014-03-18”)’. Additionally, that 
> filter should work regardless of my own local timezone.
> It also means that if I store ‘Time.valueOf("07:00:00”)’ in a TIME field in a 
> database in my current timezone, someone should get “07:00:00” if they run 
> 'ResultSet#getTime(1).toString()’ on that value, even if they’re in a 
> different timezone than me.
> From what I can see right now, Phoenix doesn’t currently exhibit this 
> behavior. Instead, the full long representation of Date, Time, and Timestamps 
> is stored directly in HBase, without dropping the extra date fields or doing 
> timezone conversion.
> From the current analysis, what is required for Phoenix to be JDBC-compliant 
> in terms of time/date/timestamp handling is:
> * All incoming time-style values should be interpreted in the local timezone 
> of the driver, then be normalized and converted to UTC before serialization 
> (unless a Calendar is supplied) in PreparedStatement calls
> * All outgoing time-style values should be converted from UTC into the local 
> timezone (unless a Calendar is supplied) in ResultSet calls
> * Supplying a Calendar to PreparedStatement methods should cause the time 
> value to be converted from the local timezone to the timezone of the calendar 
> (instead of UTC) before being serialized
> * Supplying a Calendar to ResultSet methods should cause the time value from 
> the database to be interpreted as if it was serialized in the timezone of the 
> Calendar, instead of UTC.
> Making the above changes would mean breaking backwards compatibility with 
> existing Phoenix installs (unless some kind of backwards-compatibility mode 
> is introduced or something similar). 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (PHOENIX-2511) Allow precision to be specified for TIMESTAMP

2017-08-03 Thread Rajeshbabu Chintaguntla (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-2511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajeshbabu Chintaguntla reassigned PHOENIX-2511:


Assignee: Rajeshbabu Chintaguntla  (was: Maryann Xue)

> Allow precision to be specified for TIMESTAMP
> -
>
> Key: PHOENIX-2511
> URL: https://issues.apache.org/jira/browse/PHOENIX-2511
> Project: Phoenix
>  Issue Type: Sub-task
>Reporter: James Taylor
>Assignee: Rajeshbabu Chintaguntla
>
> We should allow a precision to be specified for our TIMESTAMP type 
> declaration. For legacy usage of TIMESTAMP, we can either upgrade existing 
> types to a precision of 9 or use that as the default value. Going forward, we 
> should have a default value of 3 (for millisecond resolution) which is more 
> standard.
> For query compilation, we can likely use the Date/Time expression instead of 
> the Timestamp ones (i.e. use DateAddExpression instead of 
> TimestampAddExpression), but we'd need to take care on the return type. Might 
> need to have an intermediate class.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (PHOENIX-3525) Cap automatic index rebuilding to inactive timestamp.

2017-08-03 Thread James Taylor (JIRA)

 [ 
https://issues.apache.org/jira/browse/PHOENIX-3525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Taylor reassigned PHOENIX-3525:
-

Assignee: James Taylor

> Cap automatic index rebuilding to inactive timestamp.
> -
>
> Key: PHOENIX-3525
> URL: https://issues.apache.org/jira/browse/PHOENIX-3525
> Project: Phoenix
>  Issue Type: Improvement
>Reporter: Ankit Singhal
>Assignee: James Taylor
> Attachments: PHOENIX-3525_wip2.patch, PHOENIX-3525_wip.patch
>
>
> From [~chrajeshbab...@gmail.com] review comment on 
> https://github.com/apache/phoenix/pull/210
> For automatic rebuilding ,DISABLED_TIMESTAMP is lower bound but there is no 
> upper bound so we are going rebuild all the new writes written after 
> DISABLED_TIMESTAMP even though indexes updated properly. So we can introduce 
> an upper bound of time where we are going to start a rebuild thread so we can 
> limit the data to rebuild. In case If there are frequent writes then we can 
> increment the rebuild period exponentially



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-3769) OnDuplicateKeyIT#testNewAndMultiDifferentUpdateOnSingleColumn fails on ppc64le

2017-08-03 Thread Sneha Kanekar (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112645#comment-16112645
 ] 

Sneha Kanekar commented on PHOENIX-3769:


In the function combineOnDupKey mentioned above, there is an if condition which 
checks if both old and new ON DUPLICATE KEY UPDATE clauses match. This is done 
by byte comparison function Bytes.compareTo:
{code:borderStyle=solid}
if (Bytes.compareTo(
oldOnDupKeyBytes, ON_DUP_KEY_HEADER_BYTE_SIZE, 
oldOnDupKeyBytes.length - ON_DUP_KEY_HEADER_BYTE_SIZE, 
newOnDupKeyBytes, Bytes.SIZEOF_SHORT + 
Bytes.SIZEOF_BOOLEAN, oldOnDupKeyBytes.length - ON_DUP_KEY_HEADER_BYTE_SIZE) == 
0) {
// If both old and new ON DUPLICATE KEY UPDATE clauses match,
// reduce the size of data we're sending over the wire.
// TODO: optimization size of RPC more.
{code}
There are two implementations of function compareTo. 
In case of x86, compareTo implemented by enum UnsafeComparer is executed 
whereas in case of ppc64le, compareTo implemented by enum PureJavaComparer is 
executed. [~elserj] looks like this is the root cause of this failure.

> OnDuplicateKeyIT#testNewAndMultiDifferentUpdateOnSingleColumn fails on ppc64le
> --
>
> Key: PHOENIX-3769
> URL: https://issues.apache.org/jira/browse/PHOENIX-3769
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.11.0
> Environment: $ uname -a
> Linux 6945c232192e 3.16.0-30-generic #40~14.04.1-Ubuntu SMP Thu Jan 15 
> 17:42:36 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux
> $ java -version
> openjdk version "1.8.0_111"
> OpenJDK Runtime Environment (build 1.8.0_111-8u111-b14-3~14.04.1-b14)
> OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode)
>Reporter: Sneha Kanekar
>  Labels: ppc64le
> Attachments: OnDuplicateKeyIT_Standard_output.txt, 
> PHOENIX-3769.patch, TEST-org.apache.phoenix.end2end.OnDuplicateKeyIT.xml
>
>
> The testcase 
> org.apache.phoenix.end2end.OnDuplicateKeyIT.testNewAndMultiDifferentUpdateOnSingleColumn
>  fails consistently on ppc64le architechture. The error message is as follows:
> {code: borderStyle=solid}
> java.lang.ArrayIndexOutOfBoundsException: 179
>   at 
> org.apache.phoenix.end2end.OnDuplicateKeyIT.testNewAndMultiDifferentUpdateOnSingleColumn(OnDuplicateKeyIT.java:392)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)