[jira] [Commented] (PHOENIX-418) Support approximate COUNT DISTINCT
[ https://issues.apache.org/jira/browse/PHOENIX-418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113913#comment-16113913 ] Ethan Wang commented on PHOENIX-418: Thanks [~jamestaylor] [~sergey.soldatov]. My bad. I go prepare another one. > Support approximate COUNT DISTINCT > -- > > Key: PHOENIX-418 > URL: https://issues.apache.org/jira/browse/PHOENIX-418 > Project: Phoenix > Issue Type: Task >Reporter: James Taylor >Assignee: Ethan Wang > Labels: gsoc2016 > Attachments: PHOENIX-418-v1.patch, PHOENIX-418-v2.patch > > > Support an "approximation" of count distinct to prevent having to hold on to > all distinct values (since this will not scale well when the number of > distinct values is huge). The Apache Drill folks have had some interesting > discussions on this > [here](http://mail-archives.apache.org/mod_mbox/incubator-drill-dev/201306.mbox/%3CJIRA.12650169.1369931282407.88049.1370645900553%40arcas%3E). > They recommend using [Welford's > method](http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance_Online_algorithm). > I'm open to having a config option that uses exact versus approximate. I > don't have experience implementing an approximate implementation, so I'm not > sure how much state is required to keep on the server and return to the > client (other than realizing it'd be much less that returning all distinct > values and their counts). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PHOENIX-418) Support approximate COUNT DISTINCT
[ https://issues.apache.org/jira/browse/PHOENIX-418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Wang updated PHOENIX-418: --- Attachment: PHOENIX-418-v2.patch > Support approximate COUNT DISTINCT > -- > > Key: PHOENIX-418 > URL: https://issues.apache.org/jira/browse/PHOENIX-418 > Project: Phoenix > Issue Type: Task >Reporter: James Taylor >Assignee: Ethan Wang > Labels: gsoc2016 > Attachments: PHOENIX-418-v1.patch, PHOENIX-418-v2.patch > > > Support an "approximation" of count distinct to prevent having to hold on to > all distinct values (since this will not scale well when the number of > distinct values is huge). The Apache Drill folks have had some interesting > discussions on this > [here](http://mail-archives.apache.org/mod_mbox/incubator-drill-dev/201306.mbox/%3CJIRA.12650169.1369931282407.88049.1370645900553%40arcas%3E). > They recommend using [Welford's > method](http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance_Online_algorithm). > I'm open to having a config option that uses exact versus approximate. I > don't have experience implementing an approximate implementation, so I'm not > sure how much state is required to keep on the server and return to the > client (other than realizing it'd be much less that returning all distinct > values and their counts). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-4064) Why the group by the index field will execute scan over table.
[ https://issues.apache.org/jira/browse/PHOENIX-4064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113848#comment-16113848 ] James Taylor commented on PHOENIX-4064: --- Need to see your table & index DDL. > Why the group by the index field will execute scan over table. > --- > > Key: PHOENIX-4064 > URL: https://issues.apache.org/jira/browse/PHOENIX-4064 > Project: Phoenix > Issue Type: Improvement >Affects Versions: 4.11.0 >Reporter: jifei_yang > Labels: features > Fix For: 4.11.0 > > > Why the group by the index field will execute scan over table. > I created a Phoenix table, and then created the index, all the fields are > marked as index. However, when the group by index field is executed, the > execution plan is viewed and the index is not used. Please help solve the > next, thank you. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-4064) Why the group by the index field will execute scan over table.
[ https://issues.apache.org/jira/browse/PHOENIX-4064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113830#comment-16113830 ] jifei_yang commented on PHOENIX-4064: - The sql is : {code:java} explain select COMMANDID,CDNNETID,DOMAIN,SERVICECONTENT,ILLEGALTYPE from PH_ILLEGALWEB_APP where CREATETIME= '2017-08-03' and instr(CDNNETID,'a1a') >0 and DOMAIN in('mvvideo4.meitudata.com') group by DOMAIN,CDNNETID,COMMANDID,SERVICECONTENT,ILLEGALTYPE ; {code} and explain is : {code:java} CLIENT 3-CHUNK 0 ROWS 0 BYTES PARALLEL 3-WAY RANGE SCAN OVER PH_ILLEGALWEB_APP [0,'2017-08-03'] - [2,'2017-08-03'] | SERVER FILTER BY FIRST KEY ONLY AND (INSTR(CDNNETID, 'a1a') > 0 AND DOMAIN = 'mvvideo4.meitudata.com') | SERVER DISTINCT PREFIX FILTER OVER [DOMAIN, CDNNETID, COMMANDID, SERVICECONTENT, ILLEGALTYPE] | SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY [DOMAIN, CDNNETID, COMMANDID, SERVICECONTENT, ILLEGALTYPE] | CLIENT MERGE SORT {code} > Why the group by the index field will execute scan over table. > --- > > Key: PHOENIX-4064 > URL: https://issues.apache.org/jira/browse/PHOENIX-4064 > Project: Phoenix > Issue Type: Improvement >Affects Versions: 4.11.0 >Reporter: jifei_yang > Labels: features > Fix For: 4.11.0 > > > Why the group by the index field will execute scan over table. > I created a Phoenix table, and then created the index, all the fields are > marked as index. However, when the group by index field is executed, the > execution plan is viewed and the index is not used. Please help solve the > next, thank you. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (PHOENIX-4064) Why the group by the index field will execute scan over table.
[ https://issues.apache.org/jira/browse/PHOENIX-4064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113830#comment-16113830 ] jifei_yang edited comment on PHOENIX-4064 at 8/4/17 2:33 AM: - The sql is : {code:java} explain select COMMANDID,CDNNETID,DOMAIN,SERVICECONTENT,ILLEGALTYPE from PH_ILLEGALWEB_APP where CREATETIME= '2017-08-03' and instr(CDNNETID,'a1a') >0 and DOMAIN in('mvvideo4.meitudata.com') group by DOMAIN,CDNNETID,COMMANDID,SERVICECONTENT,ILLEGALTYPE ; {code} and explain is : {code:java} CLIENT 3-CHUNK 0 ROWS 0 BYTES PARALLEL 3-WAY RANGE SCAN OVER PH_ILLEGALWEB_APP [0,'2017-08-03'] - [2,'2017-08-03'] SERVER FILTER BY FIRST KEY ONLY AND (INSTR(CDNNETID, 'a1a') > 0 AND DOMAIN = 'mvvideo4.meitudata.com') SERVER DISTINCT PREFIX FILTER OVER [DOMAIN, CDNNETID, COMMANDID, SERVICECONTENT, ILLEGALTYPE] SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY [DOMAIN, CDNNETID, COMMANDID, SERVICECONTENT, ILLEGALTYPE] CLIENT MERGE SORT {code} was (Author: highfei2...@126.com): The sql is : {code:java} explain select COMMANDID,CDNNETID,DOMAIN,SERVICECONTENT,ILLEGALTYPE from PH_ILLEGALWEB_APP where CREATETIME= '2017-08-03' and instr(CDNNETID,'a1a') >0 and DOMAIN in('mvvideo4.meitudata.com') group by DOMAIN,CDNNETID,COMMANDID,SERVICECONTENT,ILLEGALTYPE ; {code} and explain is : {code:java} CLIENT 3-CHUNK 0 ROWS 0 BYTES PARALLEL 3-WAY RANGE SCAN OVER PH_ILLEGALWEB_APP [0,'2017-08-03'] - [2,'2017-08-03'] | SERVER FILTER BY FIRST KEY ONLY AND (INSTR(CDNNETID, 'a1a') > 0 AND DOMAIN = 'mvvideo4.meitudata.com') SERVER DISTINCT PREFIX FILTER OVER [DOMAIN, CDNNETID, COMMANDID, SERVICECONTENT, ILLEGALTYPE] SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY [DOMAIN, CDNNETID, COMMANDID, SERVICECONTENT, ILLEGALTYPE] CLIENT MERGE SORT {code} > Why the group by the index field will execute scan over table. > --- > > Key: PHOENIX-4064 > URL: https://issues.apache.org/jira/browse/PHOENIX-4064 > Project: Phoenix > Issue Type: Improvement >Affects Versions: 4.11.0 >Reporter: jifei_yang > Labels: features > Fix For: 4.11.0 > > > Why the group by the index field will execute scan over table. > I created a Phoenix table, and then created the index, all the fields are > marked as index. However, when the group by index field is executed, the > execution plan is viewed and the index is not used. Please help solve the > next, thank you. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (PHOENIX-4064) Why the group by the index field will execute scan over table.
[ https://issues.apache.org/jira/browse/PHOENIX-4064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113830#comment-16113830 ] jifei_yang edited comment on PHOENIX-4064 at 8/4/17 2:33 AM: - The sql is : {code:java} explain select COMMANDID,CDNNETID,DOMAIN,SERVICECONTENT,ILLEGALTYPE from PH_ILLEGALWEB_APP where CREATETIME= '2017-08-03' and instr(CDNNETID,'a1a') >0 and DOMAIN in('mvvideo4.meitudata.com') group by DOMAIN,CDNNETID,COMMANDID,SERVICECONTENT,ILLEGALTYPE ; {code} and explain is : {code:java} CLIENT 3-CHUNK 0 ROWS 0 BYTES PARALLEL 3-WAY RANGE SCAN OVER PH_ILLEGALWEB_APP [0,'2017-08-03'] - [2,'2017-08-03'] | SERVER FILTER BY FIRST KEY ONLY AND (INSTR(CDNNETID, 'a1a') > 0 AND DOMAIN = 'mvvideo4.meitudata.com') SERVER DISTINCT PREFIX FILTER OVER [DOMAIN, CDNNETID, COMMANDID, SERVICECONTENT, ILLEGALTYPE] SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY [DOMAIN, CDNNETID, COMMANDID, SERVICECONTENT, ILLEGALTYPE] CLIENT MERGE SORT {code} was (Author: highfei2...@126.com): The sql is : {code:java} explain select COMMANDID,CDNNETID,DOMAIN,SERVICECONTENT,ILLEGALTYPE from PH_ILLEGALWEB_APP where CREATETIME= '2017-08-03' and instr(CDNNETID,'a1a') >0 and DOMAIN in('mvvideo4.meitudata.com') group by DOMAIN,CDNNETID,COMMANDID,SERVICECONTENT,ILLEGALTYPE ; {code} and explain is : {code:java} CLIENT 3-CHUNK 0 ROWS 0 BYTES PARALLEL 3-WAY RANGE SCAN OVER PH_ILLEGALWEB_APP [0,'2017-08-03'] - [2,'2017-08-03'] | SERVER FILTER BY FIRST KEY ONLY AND (INSTR(CDNNETID, 'a1a') > 0 AND DOMAIN = 'mvvideo4.meitudata.com') | SERVER DISTINCT PREFIX FILTER OVER [DOMAIN, CDNNETID, COMMANDID, SERVICECONTENT, ILLEGALTYPE] | SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY [DOMAIN, CDNNETID, COMMANDID, SERVICECONTENT, ILLEGALTYPE] | CLIENT MERGE SORT {code} > Why the group by the index field will execute scan over table. > --- > > Key: PHOENIX-4064 > URL: https://issues.apache.org/jira/browse/PHOENIX-4064 > Project: Phoenix > Issue Type: Improvement >Affects Versions: 4.11.0 >Reporter: jifei_yang > Labels: features > Fix For: 4.11.0 > > > Why the group by the index field will execute scan over table. > I created a Phoenix table, and then created the index, all the fields are > marked as index. However, when the group by index field is executed, the > execution plan is viewed and the index is not used. Please help solve the > next, thank you. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (PHOENIX-4064) Why the group by the index field will execute scan over table.
jifei_yang created PHOENIX-4064: --- Summary: Why the group by the index field will execute scan over table. Key: PHOENIX-4064 URL: https://issues.apache.org/jira/browse/PHOENIX-4064 Project: Phoenix Issue Type: Improvement Affects Versions: 4.11.0 Reporter: jifei_yang Fix For: 4.11.0 Why the group by the index field will execute scan over table. I created a Phoenix table, and then created the index, all the fields are marked as index. However, when the group by index field is executed, the execution plan is viewed and the index is not used. Please help solve the next, thank you. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (PHOENIX-3525) Cap automatic index rebuilding to inactive timestamp.
[ https://issues.apache.org/jira/browse/PHOENIX-3525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113128#comment-16113128 ] James Taylor edited comment on PHOENIX-3525 at 8/4/17 2:25 AM: --- Current plan is to eliminate simultaneous writes from the rebuilder and clients to prevent any race conditions by: * introducing a PENDING_ACTIVE index state. When in PENDING_ACTIVE state, an index will not be used by queries until the server-based timestamp >= INDEX_ACTIVATE_TIMESTAMP. * introducing an INDEX_ACTIVATE_TIMESTAMP column that determines when an index will be reactivated. This timestamp will be set by the rebuilder to a time in the future (by a configurable amount of time) after all index regions are online. The index will be put either left in an ACTIVE state (depending on config) or moved to a PENDING_ACTIVE state. * prevent index maintenance by not sending IndexMaintainer until server-based timestamp >= INDEX_ACTIVATE_TIMESTAMP. * include INDEX_ACTIVATE_TIMESTAMP in PTable so that clients can use it to control whether index maintenance is performed. was (Author: jamestaylor): Current plan is to eliminate simultaneous writes from the rebuilder and clients to prevent any race conditions by: * introducing a PENDING_ACTIVE index state. When in PENDING_ACTIVE state, an index will not be used by queries until the server-based timestamp >= INDEX_ACTIVATE_TIMESTAMP. * introducing an INDEX_ACTIVATE_TIMESTAMP column that determines when an index will be reactivated. This timestamp will be set by the rebuilder to a time in the future (by a configurable amount of time) after all index regions are online. The index will be put either left in an ACTIVE state (depending on config) or moved to a PENDING_ACTIVE state. * prevent index maintenance by not sending IndexMaintainer until index is ACTIVE or PENDING_ACTIVE with server-based timestamp >= INDEX_ACTIVATE_TIMESTAMP. * include INDEX_ACTIVATE_TIMESTAMP in PTable > Cap automatic index rebuilding to inactive timestamp. > - > > Key: PHOENIX-3525 > URL: https://issues.apache.org/jira/browse/PHOENIX-3525 > Project: Phoenix > Issue Type: Improvement >Reporter: Ankit Singhal >Assignee: James Taylor > Attachments: PHOENIX-3525_wip2.patch, PHOENIX-3525_wip.patch > > > From [~chrajeshbab...@gmail.com] review comment on > https://github.com/apache/phoenix/pull/210 > For automatic rebuilding ,DISABLED_TIMESTAMP is lower bound but there is no > upper bound so we are going rebuild all the new writes written after > DISABLED_TIMESTAMP even though indexes updated properly. So we can introduce > an upper bound of time where we are going to start a rebuild thread so we can > limit the data to rebuild. In case If there are frequent writes then we can > increment the rebuild period exponentially -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PHOENIX-4063) Partial Index Rebuilder must replay only latest cell at same timestamp
[ https://issues.apache.org/jira/browse/PHOENIX-4063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Taylor updated PHOENIX-4063: -- Summary: Partial Index Rebuilder must replay only latest cell at same timestamp (was: Partial Index Rebuilder must replay from earliest to latest) > Partial Index Rebuilder must replay only latest cell at same timestamp > -- > > Key: PHOENIX-4063 > URL: https://issues.apache.org/jira/browse/PHOENIX-4063 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor > > Since there may be multiple versions of a cell written while the index is > disabled, we need to replay all of them to ensure consistency. In theory, we > could ignore prior versions older than the index disable timestamp, this > would be difficult to implement. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PHOENIX-4063) Partial Index Rebuilder must replay from earliest to latest
[ https://issues.apache.org/jira/browse/PHOENIX-4063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Taylor updated PHOENIX-4063: -- Description: Since there may be multiple versions of a cell written while the index is disabled, we need to replay all of them to ensure consistency. In theory, we could ignore prior versions older than the index disable timestamp, this would be difficult to implement. (was: Given that PHOENIX-4057 ignores updates for earlier versions, we should not replay all versions of a cell, but only the latest (current code replays all from latest to earliest). Also, we should include all versions of a cell instead of setting an upper bound on the timestamp as otherwise we'll put the index out-of-sync.) > Partial Index Rebuilder must replay from earliest to latest > --- > > Key: PHOENIX-4063 > URL: https://issues.apache.org/jira/browse/PHOENIX-4063 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor > > Since there may be multiple versions of a cell written while the index is > disabled, we need to replay all of them to ensure consistency. In theory, we > could ignore prior versions older than the index disable timestamp, this > would be difficult to implement. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PHOENIX-4063) Partial Index Rebuilder must replay from earliest to latest
[ https://issues.apache.org/jira/browse/PHOENIX-4063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Taylor updated PHOENIX-4063: -- Summary: Partial Index Rebuilder must replay from earliest to latest (was: Partial Index Rebuilder should not cause upper bound to be set on index maintenance scan) > Partial Index Rebuilder must replay from earliest to latest > --- > > Key: PHOENIX-4063 > URL: https://issues.apache.org/jira/browse/PHOENIX-4063 > Project: Phoenix > Issue Type: Bug >Reporter: James Taylor > > Given that PHOENIX-4057 ignores updates for earlier versions, we should not > replay all versions of a cell, but only the latest (current code replays all > from latest to earliest). Also, we should include all versions of a cell > instead of setting an upper bound on the timestamp as otherwise we'll put the > index out-of-sync. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (PHOENIX-4063) Partial Index Rebuilder should not cause upper bound to be set on index maintenance scan
James Taylor created PHOENIX-4063: - Summary: Partial Index Rebuilder should not cause upper bound to be set on index maintenance scan Key: PHOENIX-4063 URL: https://issues.apache.org/jira/browse/PHOENIX-4063 Project: Phoenix Issue Type: Bug Reporter: James Taylor Given that PHOENIX-4057 ignores updates for earlier versions, we should not replay all versions of a cell, but only the latest (current code replays all from latest to earliest). Also, we should include all versions of a cell instead of setting an upper bound on the timestamp as otherwise we'll put the index out-of-sync. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (PHOENIX-3341) Schema update is not visible to following statements of the same connection due to CalciteSchema caching.
[ https://issues.apache.org/jira/browse/PHOENIX-3341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maryann Xue resolved PHOENIX-3341. -- Resolution: Fixed > Schema update is not visible to following statements of the same connection > due to CalciteSchema caching. > - > > Key: PHOENIX-3341 > URL: https://issues.apache.org/jira/browse/PHOENIX-3341 > Project: Phoenix > Issue Type: Sub-task >Reporter: Maryann Xue >Assignee: Maryann Xue > Labels: calcite > > The TableRef object contains a timestamp which will be used for TableScan. > The timestamp should be set at the time of the statement being compiled. Now > that the table resolving goes from Calcite and CalciteSchema caches > TableEntry through the whole connection, the table will not be re-resolved if > any previous statement has already resolved it. If a previous statement did > an update, the next statement cannot see the update since it's holding a > TableRef object containing the old timestamp. > The CalciteSchema caching would also be a problem if a table, a view, or a > function is modified or dropped. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (PHOENIX-868) Make Time, Date, and Timestamp handling JDBC-compliant
[ https://issues.apache.org/jira/browse/PHOENIX-868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Taylor reassigned PHOENIX-868: Assignee: Rajeshbabu Chintaguntla > Make Time, Date, and Timestamp handling JDBC-compliant > -- > > Key: PHOENIX-868 > URL: https://issues.apache.org/jira/browse/PHOENIX-868 > Project: Phoenix > Issue Type: Bug >Reporter: Gabriel Reid >Assignee: Rajeshbabu Chintaguntla > Fix For: 5.0 > > > From what I understand from the JDBC documentation, the way that a > java.sql.Date should be handled via JDBC is simply as a day, month, and year, > despite the fact that it is internally represented as a timestamp (the same > kind of thing applies to Time objects, which are a triple of hours, minutes, > and seconds). > Further, my understanding is that it is the responsibility of a JDBC driver > to do normalization of incoming Date and Time (and maybe Timestamp) objects > to interpret them as being in the current time zone, and remove the extra > components (i.e. time components for a Date, and date components for a Time) > before storing the value. > This means that today, if I insert a column value consisting of 'new > Date(System.currentTimeMillis())', then I should be able to retrieve that > same value with a filter on 'Date.valueOf(“2014-03-18”)’. Additionally, that > filter should work regardless of my own local timezone. > It also means that if I store ‘Time.valueOf("07:00:00”)’ in a TIME field in a > database in my current timezone, someone should get “07:00:00” if they run > 'ResultSet#getTime(1).toString()’ on that value, even if they’re in a > different timezone than me. > From what I can see right now, Phoenix doesn’t currently exhibit this > behavior. Instead, the full long representation of Date, Time, and Timestamps > is stored directly in HBase, without dropping the extra date fields or doing > timezone conversion. > From the current analysis, what is required for Phoenix to be JDBC-compliant > in terms of time/date/timestamp handling is: > * All incoming time-style values should be interpreted in the local timezone > of the driver, then be normalized and converted to UTC before serialization > (unless a Calendar is supplied) in PreparedStatement calls > * All outgoing time-style values should be converted from UTC into the local > timezone (unless a Calendar is supplied) in ResultSet calls > * Supplying a Calendar to PreparedStatement methods should cause the time > value to be converted from the local timezone to the timezone of the calendar > (instead of UTC) before being serialized > * Supplying a Calendar to ResultSet methods should cause the time value from > the database to be interpreted as if it was serialized in the timezone of the > Calendar, instead of UTC. > Making the above changes would mean breaking backwards compatibility with > existing Phoenix installs (unless some kind of backwards-compatibility mode > is introduced or something similar). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PHOENIX-868) Make Time, Date, and Timestamp handling JDBC-compliant
[ https://issues.apache.org/jira/browse/PHOENIX-868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Taylor updated PHOENIX-868: - Fix Version/s: 5.0 > Make Time, Date, and Timestamp handling JDBC-compliant > -- > > Key: PHOENIX-868 > URL: https://issues.apache.org/jira/browse/PHOENIX-868 > Project: Phoenix > Issue Type: Bug >Reporter: Gabriel Reid >Assignee: Rajeshbabu Chintaguntla > Fix For: 5.0 > > > From what I understand from the JDBC documentation, the way that a > java.sql.Date should be handled via JDBC is simply as a day, month, and year, > despite the fact that it is internally represented as a timestamp (the same > kind of thing applies to Time objects, which are a triple of hours, minutes, > and seconds). > Further, my understanding is that it is the responsibility of a JDBC driver > to do normalization of incoming Date and Time (and maybe Timestamp) objects > to interpret them as being in the current time zone, and remove the extra > components (i.e. time components for a Date, and date components for a Time) > before storing the value. > This means that today, if I insert a column value consisting of 'new > Date(System.currentTimeMillis())', then I should be able to retrieve that > same value with a filter on 'Date.valueOf(“2014-03-18”)’. Additionally, that > filter should work regardless of my own local timezone. > It also means that if I store ‘Time.valueOf("07:00:00”)’ in a TIME field in a > database in my current timezone, someone should get “07:00:00” if they run > 'ResultSet#getTime(1).toString()’ on that value, even if they’re in a > different timezone than me. > From what I can see right now, Phoenix doesn’t currently exhibit this > behavior. Instead, the full long representation of Date, Time, and Timestamps > is stored directly in HBase, without dropping the extra date fields or doing > timezone conversion. > From the current analysis, what is required for Phoenix to be JDBC-compliant > in terms of time/date/timestamp handling is: > * All incoming time-style values should be interpreted in the local timezone > of the driver, then be normalized and converted to UTC before serialization > (unless a Calendar is supplied) in PreparedStatement calls > * All outgoing time-style values should be converted from UTC into the local > timezone (unless a Calendar is supplied) in ResultSet calls > * Supplying a Calendar to PreparedStatement methods should cause the time > value to be converted from the local timezone to the timezone of the calendar > (instead of UTC) before being serialized > * Supplying a Calendar to ResultSet methods should cause the time value from > the database to be interpreted as if it was serialized in the timezone of the > Calendar, instead of UTC. > Making the above changes would mean breaking backwards compatibility with > existing Phoenix installs (unless some kind of backwards-compatibility mode > is introduced or something similar). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (PHOENIX-2511) Allow precision to be specified for TIMESTAMP
[ https://issues.apache.org/jira/browse/PHOENIX-2511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajeshbabu Chintaguntla reassigned PHOENIX-2511: Assignee: Rajeshbabu Chintaguntla (was: Maryann Xue) > Allow precision to be specified for TIMESTAMP > - > > Key: PHOENIX-2511 > URL: https://issues.apache.org/jira/browse/PHOENIX-2511 > Project: Phoenix > Issue Type: Sub-task >Reporter: James Taylor >Assignee: Rajeshbabu Chintaguntla > > We should allow a precision to be specified for our TIMESTAMP type > declaration. For legacy usage of TIMESTAMP, we can either upgrade existing > types to a precision of 9 or use that as the default value. Going forward, we > should have a default value of 3 (for millisecond resolution) which is more > standard. > For query compilation, we can likely use the Date/Time expression instead of > the Timestamp ones (i.e. use DateAddExpression instead of > TimestampAddExpression), but we'd need to take care on the return type. Might > need to have an intermediate class. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (PHOENIX-3525) Cap automatic index rebuilding to inactive timestamp.
[ https://issues.apache.org/jira/browse/PHOENIX-3525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Taylor reassigned PHOENIX-3525: - Assignee: James Taylor > Cap automatic index rebuilding to inactive timestamp. > - > > Key: PHOENIX-3525 > URL: https://issues.apache.org/jira/browse/PHOENIX-3525 > Project: Phoenix > Issue Type: Improvement >Reporter: Ankit Singhal >Assignee: James Taylor > Attachments: PHOENIX-3525_wip2.patch, PHOENIX-3525_wip.patch > > > From [~chrajeshbab...@gmail.com] review comment on > https://github.com/apache/phoenix/pull/210 > For automatic rebuilding ,DISABLED_TIMESTAMP is lower bound but there is no > upper bound so we are going rebuild all the new writes written after > DISABLED_TIMESTAMP even though indexes updated properly. So we can introduce > an upper bound of time where we are going to start a rebuild thread so we can > limit the data to rebuild. In case If there are frequent writes then we can > increment the rebuild period exponentially -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PHOENIX-3769) OnDuplicateKeyIT#testNewAndMultiDifferentUpdateOnSingleColumn fails on ppc64le
[ https://issues.apache.org/jira/browse/PHOENIX-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112645#comment-16112645 ] Sneha Kanekar commented on PHOENIX-3769: In the function combineOnDupKey mentioned above, there is an if condition which checks if both old and new ON DUPLICATE KEY UPDATE clauses match. This is done by byte comparison function Bytes.compareTo: {code:borderStyle=solid} if (Bytes.compareTo( oldOnDupKeyBytes, ON_DUP_KEY_HEADER_BYTE_SIZE, oldOnDupKeyBytes.length - ON_DUP_KEY_HEADER_BYTE_SIZE, newOnDupKeyBytes, Bytes.SIZEOF_SHORT + Bytes.SIZEOF_BOOLEAN, oldOnDupKeyBytes.length - ON_DUP_KEY_HEADER_BYTE_SIZE) == 0) { // If both old and new ON DUPLICATE KEY UPDATE clauses match, // reduce the size of data we're sending over the wire. // TODO: optimization size of RPC more. {code} There are two implementations of function compareTo. In case of x86, compareTo implemented by enum UnsafeComparer is executed whereas in case of ppc64le, compareTo implemented by enum PureJavaComparer is executed. [~elserj] looks like this is the root cause of this failure. > OnDuplicateKeyIT#testNewAndMultiDifferentUpdateOnSingleColumn fails on ppc64le > -- > > Key: PHOENIX-3769 > URL: https://issues.apache.org/jira/browse/PHOENIX-3769 > Project: Phoenix > Issue Type: Bug >Affects Versions: 4.11.0 > Environment: $ uname -a > Linux 6945c232192e 3.16.0-30-generic #40~14.04.1-Ubuntu SMP Thu Jan 15 > 17:42:36 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux > $ java -version > openjdk version "1.8.0_111" > OpenJDK Runtime Environment (build 1.8.0_111-8u111-b14-3~14.04.1-b14) > OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode) >Reporter: Sneha Kanekar > Labels: ppc64le > Attachments: OnDuplicateKeyIT_Standard_output.txt, > PHOENIX-3769.patch, TEST-org.apache.phoenix.end2end.OnDuplicateKeyIT.xml > > > The testcase > org.apache.phoenix.end2end.OnDuplicateKeyIT.testNewAndMultiDifferentUpdateOnSingleColumn > fails consistently on ppc64le architechture. The error message is as follows: > {code: borderStyle=solid} > java.lang.ArrayIndexOutOfBoundsException: 179 > at > org.apache.phoenix.end2end.OnDuplicateKeyIT.testNewAndMultiDifferentUpdateOnSingleColumn(OnDuplicateKeyIT.java:392) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)