[jira] [Commented] (DRILL-5377) Five-digit year dates are displayed incorrectly via jdbc
[ https://issues.apache.org/jira/browse/DRILL-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347010#comment-16347010 ] ASF GitHub Bot commented on DRILL-5377: --- Github user vdiravka commented on the issue: https://github.com/apache/drill/pull/916 @arina-ielchiieva You are right. According to SQL spec after resolving [CALCITE-2055](https://issues.apache.org/jira/browse/CALCITE-2055) and Drill-Calcite upgrade Drill and Calcite don't support five digit years. Please find more details in jira description. > Five-digit year dates are displayed incorrectly via jdbc > > > Key: DRILL-5377 > URL: https://issues.apache.org/jira/browse/DRILL-5377 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 1.10.0 >Reporter: Rahul Challapalli >Assignee: Vitalii Diravka >Priority: Minor > Fix For: 1.13.0 > > > git.commit.id.abbrev=38ef562 > The issue is connected to displaying five-digit year dates via jdbc > Below is the output, I get from test framework when I disable auto correction > for date fields > {code} > select l_shipdate from table(cp.`tpch/lineitem.parquet` (type => 'parquet', > autoCorrectCorruptDates => false)) order by l_shipdate limit 10; > ^@356-03-19 > ^@356-03-21 > ^@356-03-21 > ^@356-03-23 > ^@356-03-24 > ^@356-03-24 > ^@356-03-26 > ^@356-03-26 > ^@356-03-26 > ^@356-03-26 > {code} > Or a simpler case: > {code} > 0: jdbc:drill:> select cast('11356-02-16' as date) as FUTURE_DATE from > (VALUES(1)); > +--+ > | FUTURE_DATE | > +--+ > | 356-02-16 | > +--+ > 1 row selected (0.293 seconds) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-5377) Five-digit year dates are displayed incorrectly via jdbc
[ https://issues.apache.org/jira/browse/DRILL-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347011#comment-16347011 ] ASF GitHub Bot commented on DRILL-5377: --- Github user vdiravka closed the pull request at: https://github.com/apache/drill/pull/916 > Five-digit year dates are displayed incorrectly via jdbc > > > Key: DRILL-5377 > URL: https://issues.apache.org/jira/browse/DRILL-5377 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 1.10.0 >Reporter: Rahul Challapalli >Assignee: Vitalii Diravka >Priority: Minor > Fix For: 1.13.0 > > > git.commit.id.abbrev=38ef562 > The issue is connected to displaying five-digit year dates via jdbc > Below is the output, I get from test framework when I disable auto correction > for date fields > {code} > select l_shipdate from table(cp.`tpch/lineitem.parquet` (type => 'parquet', > autoCorrectCorruptDates => false)) order by l_shipdate limit 10; > ^@356-03-19 > ^@356-03-21 > ^@356-03-21 > ^@356-03-23 > ^@356-03-24 > ^@356-03-24 > ^@356-03-26 > ^@356-03-26 > ^@356-03-26 > ^@356-03-26 > {code} > Or a simpler case: > {code} > 0: jdbc:drill:> select cast('11356-02-16' as date) as FUTURE_DATE from > (VALUES(1)); > +--+ > | FUTURE_DATE | > +--+ > | 356-02-16 | > +--+ > 1 row selected (0.293 seconds) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-5377) Five-digit year dates are displayed incorrectly via jdbc
[ https://issues.apache.org/jira/browse/DRILL-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346944#comment-16346944 ] Volodymyr Vysotskyi commented on DRILL-5377: After the changes made in CALCITE-1690, date string should strictly match pattern {noformat} [0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9] {noformat} In CALCITE-2055 was added a check for ranges of date elements. More details connected with SQL spec. may be found in {{6.1 }} > Five-digit year dates are displayed incorrectly via jdbc > > > Key: DRILL-5377 > URL: https://issues.apache.org/jira/browse/DRILL-5377 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 1.10.0 >Reporter: Rahul Challapalli >Assignee: Vitalii Diravka >Priority: Minor > Fix For: 1.13.0 > > > git.commit.id.abbrev=38ef562 > The issue is connected to displaying five-digit year dates via jdbc > Below is the output, I get from test framework when I disable auto correction > for date fields > {code} > select l_shipdate from table(cp.`tpch/lineitem.parquet` (type => 'parquet', > autoCorrectCorruptDates => false)) order by l_shipdate limit 10; > ^@356-03-19 > ^@356-03-21 > ^@356-03-21 > ^@356-03-23 > ^@356-03-24 > ^@356-03-24 > ^@356-03-26 > ^@356-03-26 > ^@356-03-26 > ^@356-03-26 > {code} > Or a simpler case: > {code} > 0: jdbc:drill:> select cast('11356-02-16' as date) as FUTURE_DATE from > (VALUES(1)); > +--+ > | FUTURE_DATE | > +--+ > | 356-02-16 | > +--+ > 1 row selected (0.293 seconds) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-5377) Five-digit year dates are displayed incorrectly via jdbc
[ https://issues.apache.org/jira/browse/DRILL-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346911#comment-16346911 ] Arina Ielchiieva commented on DRILL-5377: - [~vitalii] after upgrade to Calcite 1.15 year with more then 4 digits is disallowed according to Sql standard. [~vvysotskyi] please confirm. > Five-digit year dates are displayed incorrectly via jdbc > > > Key: DRILL-5377 > URL: https://issues.apache.org/jira/browse/DRILL-5377 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 1.10.0 >Reporter: Rahul Challapalli >Assignee: Vitalii Diravka >Priority: Minor > Fix For: 1.13.0 > > > git.commit.id.abbrev=38ef562 > The issue is connected to displaying five-digit year dates via jdbc > Below is the output, I get from test framework when I disable auto correction > for date fields > {code} > select l_shipdate from table(cp.`tpch/lineitem.parquet` (type => 'parquet', > autoCorrectCorruptDates => false)) order by l_shipdate limit 10; > ^@356-03-19 > ^@356-03-21 > ^@356-03-21 > ^@356-03-23 > ^@356-03-24 > ^@356-03-24 > ^@356-03-26 > ^@356-03-26 > ^@356-03-26 > ^@356-03-26 > {code} > Or a simpler case: > {code} > 0: jdbc:drill:> select cast('11356-02-16' as date) as FUTURE_DATE from > (VALUES(1)); > +--+ > | FUTURE_DATE | > +--+ > | 356-02-16 | > +--+ > 1 row selected (0.293 seconds) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-5377) Five-digit year dates are displayed incorrectly via jdbc
[ https://issues.apache.org/jira/browse/DRILL-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346914#comment-16346914 ] ASF GitHub Bot commented on DRILL-5377: --- Github user arina-ielchiieva commented on the issue: https://github.com/apache/drill/pull/916 It seems that this PR is not relevant after Calcite upgrade. @vdiravka please confirm and close PR. > Five-digit year dates are displayed incorrectly via jdbc > > > Key: DRILL-5377 > URL: https://issues.apache.org/jira/browse/DRILL-5377 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 1.10.0 >Reporter: Rahul Challapalli >Assignee: Vitalii Diravka >Priority: Minor > Fix For: 1.13.0 > > > git.commit.id.abbrev=38ef562 > The issue is connected to displaying five-digit year dates via jdbc > Below is the output, I get from test framework when I disable auto correction > for date fields > {code} > select l_shipdate from table(cp.`tpch/lineitem.parquet` (type => 'parquet', > autoCorrectCorruptDates => false)) order by l_shipdate limit 10; > ^@356-03-19 > ^@356-03-21 > ^@356-03-21 > ^@356-03-23 > ^@356-03-24 > ^@356-03-24 > ^@356-03-26 > ^@356-03-26 > ^@356-03-26 > ^@356-03-26 > {code} > Or a simpler case: > {code} > 0: jdbc:drill:> select cast('11356-02-16' as date) as FUTURE_DATE from > (VALUES(1)); > +--+ > | FUTURE_DATE | > +--+ > | 356-02-16 | > +--+ > 1 row selected (0.293 seconds) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-5377) Five-digit year dates are displayed incorrectly via jdbc
[ https://issues.apache.org/jira/browse/DRILL-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165056#comment-16165056 ] ASF GitHub Bot commented on DRILL-5377: --- Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/916 Other than dealing with corrupted dates from Parquet, what other uses of five digit years are expected? Few business processes project 8,000 years into the future. Scientific projects are likely to project far more than just 10,000 years into the future; they would need to handle billions of years (death of the sun) to trillions (heat death of the universe.) By contrast, 8000 years ago were civilizations that we no know only from a few archeological remains. Few business records extend that far back. So, if not for corrupt Parquet dates, what is the use case for 5-digit years? Why is Drill the only tool needing such dates? If they were common, wouldn't `java.sql.Date`, SQL, the ISO standard and other mechanisms define the rules? > Five-digit year dates are displayed incorrectly via jdbc > > > Key: DRILL-5377 > URL: https://issues.apache.org/jira/browse/DRILL-5377 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 1.10.0 >Reporter: Rahul Challapalli >Assignee: Vitalii Diravka > Fix For: 1.12.0 > > > git.commit.id.abbrev=38ef562 > The issue is connected to displaying five-digit year dates via jdbc > Below is the output, I get from test framework when I disable auto correction > for date fields > {code} > select l_shipdate from table(cp.`tpch/lineitem.parquet` (type => 'parquet', > autoCorrectCorruptDates => false)) order by l_shipdate limit 10; > ^@356-03-19 > ^@356-03-21 > ^@356-03-21 > ^@356-03-23 > ^@356-03-24 > ^@356-03-24 > ^@356-03-26 > ^@356-03-26 > ^@356-03-26 > ^@356-03-26 > {code} > Or a simpler case: > {code} > 0: jdbc:drill:> select cast('11356-02-16' as date) as FUTURE_DATE from > (VALUES(1)); > +--+ > | FUTURE_DATE | > +--+ > | 356-02-16 | > +--+ > 1 row selected (0.293 seconds) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5377) Five-digit year dates are displayed incorrectly via jdbc
[ https://issues.apache.org/jira/browse/DRILL-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16161978#comment-16161978 ] ASF GitHub Bot commented on DRILL-5377: --- Github user vdiravka commented on the issue: https://github.com/apache/drill/pull/916 @paul-rogers There is no bug with corrupt Parquet dates, it was fixed in context of DRILL-4203. This commit fixes representing of the five digit year dates and doesn't change logic for the 4(3,2...) digit year dates. It is made in similar manner as TimePrintMillis. But the best solution is to use necessary formatting. I am working on this, so this PR can be closed. > Five-digit year dates are displayed incorrectly via jdbc > > > Key: DRILL-5377 > URL: https://issues.apache.org/jira/browse/DRILL-5377 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 1.10.0 >Reporter: Rahul Challapalli >Assignee: Vitalii Diravka > Labels: ready-to-commit > Fix For: 1.12.0 > > > git.commit.id.abbrev=38ef562 > The issue is connected to displaying five-digit year dates via jdbc > Below is the output, I get from test framework when I disable auto correction > for date fields > {code} > select l_shipdate from table(cp.`tpch/lineitem.parquet` (type => 'parquet', > autoCorrectCorruptDates => false)) order by l_shipdate limit 10; > ^@356-03-19 > ^@356-03-21 > ^@356-03-21 > ^@356-03-23 > ^@356-03-24 > ^@356-03-24 > ^@356-03-26 > ^@356-03-26 > ^@356-03-26 > ^@356-03-26 > {code} > Or a simpler case: > {code} > 0: jdbc:drill:> select cast('11356-02-16' as date) as FUTURE_DATE from > (VALUES(1)); > +--+ > | FUTURE_DATE | > +--+ > | 356-02-16 | > +--+ > 1 row selected (0.293 seconds) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5377) Five-digit year dates are displayed incorrectly via jdbc
[ https://issues.apache.org/jira/browse/DRILL-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16160099#comment-16160099 ] ASF GitHub Bot commented on DRILL-5377: --- Github user paul-rogers commented on the issue: https://github.com/apache/drill/pull/916 Back to my original question. The premise of this bug seems to be that we corrupt Parquet dates and convert perfectly valid 4-digit years into invalid 5-digit years. That is clearly a data corruption bug that should never occur. Why don't we fix that? Given that we've accepted the data corruption, we need to display five-digit years which the Java classes for date and time don't support in `toString()`. The code uses `toString()` because it does not do correct formatting using the classes provided. That's the second bug. Date display should make use of format preferences provided by the user, not the default ones provided by `toString()`. So, that's bug number 2. Now given the above two bugs, we introduce a third by creating ad-hoc, Drill-specific date/time classes, violating the JDBC standard, to display the corrupt five-digit years. So, no longer will Drill return the java.sql.Date class as specified by the standard, but rather our own subclass. How will this affect client code that relies on standard behavior? I feel we are compounding error upon error. Can we go back and fix the original problem: that users might prefer that we don't corrupt dates in their data? That is, the problem is not so much that we don't format corrupt data correctly, but rather that we do, in fact, corrupt data. > Five-digit year dates are displayed incorrectly via jdbc > > > Key: DRILL-5377 > URL: https://issues.apache.org/jira/browse/DRILL-5377 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 1.10.0 >Reporter: Rahul Challapalli >Assignee: Vitalii Diravka > Labels: ready-to-commit > Fix For: 1.12.0 > > > git.commit.id.abbrev=38ef562 > The issue is connected to displaying five-digit year dates via jdbc > Below is the output, I get from test framework when I disable auto correction > for date fields > {code} > select l_shipdate from table(cp.`tpch/lineitem.parquet` (type => 'parquet', > autoCorrectCorruptDates => false)) order by l_shipdate limit 10; > ^@356-03-19 > ^@356-03-21 > ^@356-03-21 > ^@356-03-23 > ^@356-03-24 > ^@356-03-24 > ^@356-03-26 > ^@356-03-26 > ^@356-03-26 > ^@356-03-26 > {code} > Or a simpler case: > {code} > 0: jdbc:drill:> select cast('11356-02-16' as date) as FUTURE_DATE from > (VALUES(1)); > +--+ > | FUTURE_DATE | > +--+ > | 356-02-16 | > +--+ > 1 row selected (0.293 seconds) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5377) Five-digit year dates are displayed incorrectly via jdbc
[ https://issues.apache.org/jira/browse/DRILL-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16152751#comment-16152751 ] ASF GitHub Bot commented on DRILL-5377: --- Github user vdiravka commented on the issue: https://github.com/apache/drill/pull/916 I've Added TODO with reference to [SQLLine dateFormat, timeFormat, timestampFormat](https://github.com/julianhyde/sqlline/issues/66) issue. The branch is rebased to master version @paul-rogers Please take take a look at this minor updates. > Five-digit year dates are displayed incorrectly via jdbc > > > Key: DRILL-5377 > URL: https://issues.apache.org/jira/browse/DRILL-5377 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 1.10.0 >Reporter: Rahul Challapalli >Assignee: Vitalii Diravka > Fix For: Future > > > git.commit.id.abbrev=38ef562 > The issue is connected to displaying five-digit year dates via jdbc > Below is the output, I get from test framework when I disable auto correction > for date fields > {code} > select l_shipdate from table(cp.`tpch/lineitem.parquet` (type => 'parquet', > autoCorrectCorruptDates => false)) order by l_shipdate limit 10; > ^@356-03-19 > ^@356-03-21 > ^@356-03-21 > ^@356-03-23 > ^@356-03-24 > ^@356-03-24 > ^@356-03-26 > ^@356-03-26 > ^@356-03-26 > ^@356-03-26 > {code} > Or a simpler case: > {code} > 0: jdbc:drill:> select cast('11356-02-16' as date) as FUTURE_DATE from > (VALUES(1)); > +--+ > | FUTURE_DATE | > +--+ > | 356-02-16 | > +--+ > 1 row selected (0.293 seconds) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5377) Five-digit year dates are displayed incorrectly via jdbc
[ https://issues.apache.org/jira/browse/DRILL-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16135302#comment-16135302 ] ASF GitHub Bot commented on DRILL-5377: --- Github user vdiravka commented on the issue: https://github.com/apache/drill/pull/916 @paul-rogers The similar manner is used for Time millis showing in Drill ([TimePrintMillis](https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/vector/accessor/sql/TimePrintMillis.java)) But you are right, using of the custom format for date-to-string conversion is better decision. Not only test framework converts `Date` to `String`, but [sqlline](https://github.com/julianhyde/sqlline/blob/master/src/main/java/sqlline/Rows.java#L183) as well. So I am going to create an issue ticket for sqlline. > Five-digit year dates are displayed incorrectly via jdbc > > > Key: DRILL-5377 > URL: https://issues.apache.org/jira/browse/DRILL-5377 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 1.10.0 >Reporter: Rahul Challapalli >Assignee: Vitalii Diravka > Fix For: Future > > > git.commit.id.abbrev=38ef562 > Below is the output, I get from test framework when I disable auto correction > for date fields > {code} > select l_shipdate from table(cp.`tpch/lineitem.parquet` (type => 'parquet', > autoCorrectCorruptDates => false)) order by l_shipdate limit 10; > ^@356-03-19 > ^@356-03-21 > ^@356-03-21 > ^@356-03-23 > ^@356-03-24 > ^@356-03-24 > ^@356-03-26 > ^@356-03-26 > ^@356-03-26 > ^@356-03-26 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5377) Five-digit year dates are displayed incorrectly via jdbc
[ https://issues.apache.org/jira/browse/DRILL-5377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16133099#comment-16133099 ] ASF GitHub Bot commented on DRILL-5377: --- GitHub user vdiravka opened a pull request: https://github.com/apache/drill/pull/916 DRILL-5377: Five-digit year dates are displayed incorrectly via jdbc You can merge this pull request into a Git repository by running: $ git pull https://github.com/vdiravka/drill DRILL-5377 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/916.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #916 commit 02b533dde137b85fd500cf1290bfa8a1a3f2f4e6 Author: Vitalii DiravkaDate: 2017-08-15T17:51:10Z DRILL-5377: Five-digit year dates are displayed incorrectly via jdbc > Five-digit year dates are displayed incorrectly via jdbc > > > Key: DRILL-5377 > URL: https://issues.apache.org/jira/browse/DRILL-5377 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 1.10.0 >Reporter: Rahul Challapalli >Assignee: Vitalii Diravka > > git.commit.id.abbrev=38ef562 > Below is the output, I get from test framework when I disable auto correction > for date fields > {code} > select l_shipdate from table(cp.`tpch/lineitem.parquet` (type => 'parquet', > autoCorrectCorruptDates => false)) order by l_shipdate limit 10; > ^@356-03-19 > ^@356-03-21 > ^@356-03-21 > ^@356-03-23 > ^@356-03-24 > ^@356-03-24 > ^@356-03-26 > ^@356-03-26 > ^@356-03-26 > ^@356-03-26 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)